Determining the semantic relatedness of words is a standard task in distributional semantics. We provide a dataset for semantic relatedness task for Croatian language. The dataset is described in:
Janković, V., Šnajder, J., Dalbelo Bašić. (2011). Random Indexing Distributional Semantic Models for Croatian Language. Lecture Notes in Artificial Intelligence (Third Int. Workshop on Balto-Slavonic Natural Language Processing), 6836, 411–418.
If you use this dataset for your own work, please cite the above paper. The BibTeX citation is:
@inproceedings{jankovic2011random, title={Random indexing distributional semantic models for Croatian language}, author={Jankovi{\'c}, Vedrana and {\v{S}}najder, Jan and Ba{\v{s}}i{\'c}, Bojana Dalbelo}, booktitle={Text, Speech and Dialogue}, pages={411--418}, year={2011}, organization={Springer} }
CroSemRel450-12.txt
CroSemRel450-6.txtBoth files contain a list of 450 word pairs and the average similarity scores assigned by the human annotators. The first file contains the scores averaged over 12 annotators. The second file contains the scores averaged over a subset of 6 annotators for which the observed agreement was higher. Consult the above mentioned paper for details.