If you are looking for a tool to extract domain-specific terminology from a domain-specific document collection, TermeX might be the solution.


TermeX is a tool for automated collocation extraction and terminology lexica construction. Extraction is based on fourteen different associatioon measures applicable to n-grams up to length four. Implemented lemmatization and POS-filtering enable TermeX to better cope with morphological complexity of natural languages.

Specification & Features
  • Extraction of collocations from UTF-8 formatted text files
  • Determining lists of posible collocations using one of 14 association measures
  • Processing of n-grams up to length four
  • Hand selection of candidate n-grams for terminology lexica
  • Viewing of concordances for extracted candidates
  • Exporting lists of colocations
  • Processing of multiple documents
  • Support for Windows and Linux operating systems
  • Fast and memory efficient processing of large corpora

TermeX is freely available for research purposes upon request. Demo version of the tool is available here. If you are interested in using TermeX commercially, please drop us an email at info@takelab.hr.