Distributional memory (Baroni and Lenci, 2010) is a general framework for corpus-based semantics, which represents co-occurrence information as a tensor (a three-dimensional matrix) of weighted word-link-word tuples. Each tuple is associated with a score that reflects the strength of the association. By matricization, the tensor can be converted into matrices appropriate for various semantic tasks.
dm.hr is a distributional memory for Croatian,
compiled from a dependency-parsed Croatian web corpus HrWaC, and covers
about 2M lemmas. For details, please check out the following paper:
Jan Šnajder, Sebastian Padó, Željko Agić (2013). Building and Evaluating a Distributional Memory for Croatian. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia: Association for Computational Linguistics, 784-789. [pdf]
Should you decide to use dm.hr, please cite the paper. The BibTeX format is:
@InProceedings{snajder2013building,
title={Building and Evaluating a Distributional Memory for Croatian},
author={{\v S}najder, Jan and Pad{\'o}, Sebastian and Agi{\'c}, {\v Z}eljko},
booktitle={51st Annual Meeting of the Association for Computational Linguistics},
year={2013},
pages={784-789}
}
dm.hr - all files available by TakeLab, FER are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
This work was supported by the Croatian Science Foundation under the grant "02.03/162: Derivational Semantic Models for Information Retrieval".