Synonym choice task is a standard task in distributional semantics. We provide a dataset for synonym choice task for Croatian. The construction of the dataset is described in:
Karan, M., Šnajder, J., Dalbelo Bašić, B. (2012). Distributional Semantics Approach to Detecting Synonyms in Croatian Language. In Proceedings of the Eighth Language Technologies Conference, Ljubljana. Information Society. 111-116.
The dataset provided below is a revised one, which we used in
experiments with dependency-based semantic models, as described in:
Jan Šnajder, Sebastian Padó, Željko Agić (2013). Building and Evaluating a Distributional Memory for Croatian. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia: Association for Computational Linguistics, 784-789. [pdf]
If you use the synonym choice dataset for your own
work, please cite the latter paper. The BibTeX citation is:
@InProceedings{snajder2013building,
title={Building and Evaluating a Distributional Memory for Croatian},
author={{\v S}najder, Jan and Pad{\'o}, Sebastian and Agi{\'c}, {\v Z}eljko},
booktitle={51st Annual Meeting of the Association for Computational Linguistics},
year={2013},
pages={in press}
}
hr-synonym-choice/hr-synonym-choice-N.txtThese files contain question items for nouns, adjectives, and verbs, respectively. Each file contains 1000 question items, one item per line, in the following format:
hr-synonym-choice/hr-synonym-choice-A.txt
hr-synonym-choice/hr-synonym-choice-V.txt
targetWord:answerWord1:answerWord2:answerWord3:answerWord4:answerId