TermeX

TermeX is a tool for automatic collocation extraction and terminology lexica construction. Extraction is based on fourteen different associatioon measures applicable to n-grams up to length four. Implemented lemmatization and POS filtering enable TermeX to better cope with morphological complexity of natural languages.
Main features of TermeX are:

Extraction of collocations from UTF-8 formatted text files
Determining lists of posible collocations using one of 14 association measures
Processing of n-grams up to length four
Hand selection of candidate n-grams for terminology lexica
Viewing of concordances for extracted candidates
Exporting lists of colocations
Processing of multiple documents
Support for Windows and Linux operating systems

In addition to that, TermeX ensures fast and memory efficient processing of large corpora.

Authors:

Davor Delač
Zoran Krleža
Frane Šarić, dipl. ing.

Project coordinators:

dr. sc. Bojana Dalbelo Bašić
mr. sc. Jan Šnajder

Publications:

Delač, Davor; Krleža, Zoran; Dalbelo Bašić, Bojana; Šnajder, Jan; Šarić, Frane. TermeX: A Tool for Collocation Extraction. Lecture Notes in Computer Science (Computational Linguistics and Intelligent Text Processing). 5449 (2009) ; 149-157.
Petrović, Saša; Šnajder, Jan; Dalbelo Bašić, Bojana. Extending Lexical Association Measures for Collocation Extraction. Computer Speech and Language. (2009) doi:10.1016/j.csl.2009.06.001

Acknowledgements:

This work has been jointly supported by the Ministry of Science, Education and Sports, Republic of Croatia and Government of Flanders under the grants 036-1300646-1986 and KRO/009/06 (CADIAL).
Developed by TakeLab, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia, 2008.