Search for “Ana” fails to find the document containing “s Anom”? MOLEX morphologically normalizes Croatian words (“Anom” -> “Ana”), thus improving the performance of your search engine.
MOLEX (MOrphological LEXicon) is a morphological normalization module that enables morphologically-aware search and thus improves search performance. This is particularly important for Croatian as a morphologically complex language. The normalization module uses a morphological lexicon to conflate the various inflectional variants of a word into a single representative form (the lemma). A wide-cover lexicon has been acquired automatically from raw corpora based on a hand-crafted morphology model. The morphology model uses a representation framework that can be readily applied to other languages. This makes the development of morphological normalization modules for other languages easy and cost-effective.
- Morphological normalization of Croatian nouns, verbs, and adjectives
- High coverage lexicon (covering over 3.5M word forms), constructed semi-automatically from a large representatve corpus
- Produces a MultextEast morphosyntactic description of each input word form, providing information about the wordoforms case, gender, number etc.
- Modules implementing MOLEX for popular open-source platform Apache Solr and open-source engine Lucene
- Available as a standalone .NET and Java library
- Soon available via a web service
MOLEX modules for popular search engines (Lucene, Solr) are available for commercial purposes. If you are interested in using MOLEX, please drop us an email at info@takelab.hr.