Knowledge Discovery in Textual Data

A national research project funded by the Ministry of Education, Science and Sports. The aim of this project was the development of language independent statistical methods and models for natural language processing, as well as concrete and deployable text analysis systems and language technology tools.

Description

The development of the web has introduced a rapid increase in amounts of digitally available data. However, while the quantity of information grows, human abilities to process and understand it remain the same. The aim of methods for knowledge discovery in textual data is to reduce human efforts required to process massive amounts of textual information, thus allowing users to focus on decision making based on automatically acquired knowledge.

Knowledge discovery in textual data builds on artificial intelligence (machine learning, natural language processing) and computational linguistics. The models used are mostly based on statistics. The research within this project was focused on: (1) text preprocessing techniques, (2) advancing data clustering and latent semantic space induction methods, (3) automatic document classification and summarization, and (3) intelligent text search and information extraction. The results are language independent methods and models, as well as concrete and deployable text analysis systems and language technology tools.

Project fact sheet

Participants: Croatian Ministry of Science Education and Sport, TakeLab
Duration: 7 years (2007 – 2013)