HINA Text Analytics

News agencies can greatly benefit from text analytics. In this project we developed a series of semantic text analysis tools for the Croatian News Agency (HINA), to improve the quality and reduce the costs of document processing services.

Description

A news agency heavily depends on text analytics. For the Croatian News Agency (HINA), TakeLab developed a package of text analytics tools to make text analysis in HINA more cost efficient. The package includes:

  • KTN – a system for automatic document classification
  • A package for training the classifiers – to further reduce human effort, the automatic classifiers are derived via active learning
  • KEX – a system for extracting keywords from documents, this can be used for indexing and clustering documents
  • An interface to existing HINA systems
Project fact sheet

Participants: HINA, TakeLab FER
Duration: 2 years