Synonym Choice Dataset for Croatian

Version: 1.0
Release date: July 27, 2013

1 Description

Synonym choice task is a standard task in distributional semantics. We provide a dataset for synonym choice task for Croatian. The construction of the dataset is described in:

Karan, M., Šnajder, J., Dalbelo Bašić, B. (2012). Distributional Semantics Approach to Detecting Synonyms in Croatian Language. In Proceedings of the Eighth Language Technologies Conference, Ljubljana. Information Society. 111-116.

The dataset provided below is a revised one, which we used in experiments with dependency-based semantic models, as described in:

Jan Šnajder, Sebastian Padó, Željko Agić (2013). Building and Evaluating a Distributional Memory for Croatian. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia: Association for Computational Linguistics, 784-789. [pdf]

If you use the synonym choice dataset for your own work, please cite the latter paper. The BibTeX citation is:

@InProceedings{snajder2013building,
  title={Building and Evaluating a Distributional Memory for Croatian},
  author={{\v S}najder, Jan and Pad{\'o}, Sebastian and Agi{\'c}, {\v Z}eljko},
  booktitle={51st Annual Meeting of the Association for Computational Linguistics},
  year={2013},
  pages={in press}
}

2 Dataset

The dataset is available from here: hr-synonym-choice.tar.gz. The archive contains three files:

hr-synonym-choice/hr-synonym-choice-N.txt
hr-synonym-choice/hr-synonym-choice-A.txt
hr-synonym-choice/hr-synonym-choice-V.txt

These files contain question items for nouns, adjectives, and verbs, respectively. Each file contains 1000 question items, one item per line, in the following format:

     targetWord:answerWord1:answerWord2:answerWord3:answerWord4:answerId

3 License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.