Representation of words as high dimensional real valued vectors has been shown to be surprisingly good at capturing syntactic and semantic regularities in language. These regularities are observed as constant vector offsets between pairs of words sharing a particular relationship, for example the male/female relationship: man is to woman as king is to __; can be answered by searching for the word vector that is closest (excluding the input words) to the vector king - man + woman. Answering this type of analogy questions is called analogy reasoning. We provide two datasets for analogy reasoning in Croatian language. The dataset is described in:
Zuanović, L., Karan, M., Šnajder, J. (2014). Experiments with Neural Word Embeddings for Croatian TODO (TODO), TODO, TODO.
If you use this dataset for your own work, please cite the above paper. The BibTeX citation is:
@inproceedings{zuanovic2014experiments, title={Experiments with Neural Word Embeddings for Croatian}, author={Zuanovi{\'c}, Leo and Karan, Mladen and {\v{S}}najder, Jan}, booktitle={TODO}, pages={TODO}, year={2014}, organization={TODO} }
Both data sets can be downloaded from here.
Two files are provided: