WikiWarsHr is a corpus of historical narratives taken from the Croatian Wikipedia and temporally tagged with TIMEX3. The corpus consists of 22 articles, with 59,915 non-punctuation tokens and 1,440 tagged temporal expressions. WikiWarsHr is inspired by WikiWars, a similar resource for English.
For details, please check the following paper:
Skukan, L.,Glavaš, G.,Šnajder, J.(2014). HeidelTime.Hr: Extracting and Normalizing Temporal Expressions in Croatian. In Proceedings of the Ninth Language Technologies Conference, Ljubljana. Information Society, 99-103. [paper]
If you use this dataset for your own work, please cite the above paper. The BibTeX citation is:
@inproceedings{skukan2014heideltimehr, title={HeidelTime.Hr: Extracting and Normalizing Temporal Expressions in Croatian}, author={Skuka, Luka and Glava\v{s}, Goran and {\v{S}}najder, Jan}, booktitle={Proceedings of the Nineth Language Technologies Conference}, pages={99-103}, year={2014}, organization={Information Society} }
The dataset is available from here: TakeLab-WikiWarsHr.tar.gz.
The archive contains two directories, each containing 22 files. The in/ directory contains untagged articles formatted as .sgm files. The keyinline/ directory contains the tagged instances of same files.