TakeLab software and resources
This page contains a summary of freely available software and resources created by
TakeLab at the University of Zagreb, Croatia.
Tools and libraries
- Coral - Corpus aligner.
- CroNER - Croatian Named Entity Recognizer.
- DiaCRO - Diacritics restaurator for Croatian.
- GPKEX - Genetically programmed keyphrase extraction.
- libsentences - library for sentence boundary detection.
- MINERAL - A tool for extracting disease mentions from clinical text.
- TakeLab STS - semantic textual similarity system from SemEval 2012 shared task.
- TermeX - terminology extraction tool.
- TweetingJay A tool for recognizing semantically identical tweets (in English)
- Cross-domain Detection of Abusive Language Online An implementation of our paper at the 2nd Workshop on Abusive Language Online
Last revision: 25 July 2016
- argpremises - corpus of matched claims with implicit premises.
- ComArg - corpus of online user comments with arguments.
- Cro6WSD - a small Croatian word sense disambiguation dataset.
- Cro36WSD - medium multi-label Croatian word sense disambiguation datasets.
- CroCoref - corpus with manually annotated entity coreference.
- CroMWEsc - dataset annotated with semantic compositionality of Croatian Multiword Expressions.
- Cropinion - opinion mining from Croatian user reviews dataset.
- CroSyn - synonym choice dataset for Croatian.
- CroSemRel450 - word semantic relatedness dataset for Croatian.
- CroWSI - graph-based induction of word senses in Croatian.
- DerivBase.hr - a large-coverage derivational morphology resource for Croatian.
- dm.hr - distributional memory for Croatian.
- Event-centered information retrieval evaluation collections - two collections of queries and documents in English for event-centered information retrieval.
- Event coreference dataset - a dataset with annotated event coreference (English).
- Factual event anchor extraction dataset - a dataset of 750 English newswire texts annotated for factual event menrions and a dataset of 105 manually annotated event graphs.
- FAQ retrieval dataset (Croatian) - a dataset with queries and relevance judgements for FAQ retrieval in Croatian.
- FAQ retrieval dataset - a dataset with queries and relevance judgements for FAQ retrieval in English (manually annotated data from Yahoo Answers).
- FAQ retrieval dataset (StackExchange) - a dataset with queries and relevance judgements for FAQ retrieval in English (semiautomatically annotated data from StackExchange
- fHrWaC - a filtered version of the hrWaC corpus (Croatian).
- HeidelTime.hr - Croatian resources for the HeidelTime tagger.
- kex.hr - keyphrase extraction evaluation dataset for Croatian.
- NN13205 - indexed Croatian legislative collection.
- HOFM - higher-order functional morphology framework.
- MOLEX - a morphological lexicon for Croatian (temporarily offline; please check again later).
- Recognizing identical events dataset - a dataset with annotated identical and similar events (English).
- Semantic analogies dataset - a dataset with semantic analogies.
- Sentiment Lexica - prior sentiment lexica for English and Croatian.
- VerbCROcean - repository of fine-grained semantic verb relations for Croatian.
- WikiWarsHr - temporally tagged corpus of historical narratives (Croatian).