DIACRO

Turn “cevapcici” into “ćevapčići” with DIACRO, a robust system for automatic diacritics restoration in Croatian texts.

Description

The absence of diacritics in digitally encoded text is a common problem for languages whose writing systems are not covered by the standard ASCII character set. It poses a serious impediment to automated text processing and information retrieval. DIACRO is a robust system for automatic diacritics restoration in Croatian texts. The system combines dictionary look-up and statistical language modelling. DIACRO achieves high accuracy with fairly simple and computationally inexpensive methods.

Specification & Features
  • Hybrid diacritics restauration for Croatian
  • Employs dictionary lookups
  • Uses statistical language modelling
Availability

DIACRO is freely available for research purposes upon request. Demo version of DIACRO is available here. If you are interested in using DIACRO commercially, please drop us an email at info@takelab.hr.