TermeX is a tool for automatic collocation extraction and terminology lexica construction. Extraction is based on fourteen
different associatioon measures applicable to n-grams up to length four. Implemented lemmatization and POS filtering enable
TermeX to better cope with morphological complexity of natural languages.
Main features of TermeX are:
- Extraction of collocations from UTF-8 formatted text files
- Determining lists of posible collocations using one of 14 association measures
- Processing of n-grams up to length four
- Hand selection of candidate n-grams for terminology lexica
- Viewing of concordances for extracted candidates
- Exporting lists of colocations
- Processing of multiple documents
- Support for Windows and Linux operating systems
In addition to that, TermeX ensures fast and memory efficient processing of large corpora.
Authors:
Davor Delač
Zoran Krleža
Frane Šarić, dipl. ing.
Project coordinators:
Publications:
Acknowledgements:
This work has been jointly supported by the Ministry of Science, Education
and Sports, Republic of Croatia and Government of Flanders under the grants
036-1300646-1986 and KRO/009/06 (
CADIAL).
Developed by
TakeLab, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia, 2008.