CORAL

Looking to create parallel corpora for machine translation or other purposes? CORAL (CORpus ALigner) can facilitate the task for you.

Description

CORAL (CORpus ALigner) is a tool used to ease the alignment of parallel corpora. It offers both completely automated sentence and paragraph alignment as well as computer-aided manual alignment.

CORAL was written in the platform-independent Java programming langugage. It therefore runs on virtually every PC, regardless of the installed operating system. CORAL was modelled after existing alignment programs, with great care given to the correction of flaws found in existing text alignment programs. CORAL’s ergonomically design user interface was developed in close cooperation with future users at the Faculty of Philosophy at the University of Zagreb.

In short, the program is used in the following way: First, a text in two different languages is loaded into the program. The two texts are then either automatically (using the proper algorithms) or manually (computer-aided) segmented into sentences. Paragraphs and sentences of one language are then linked to their translations again either with use of the build-in algoritnms or manually. The created sentence and paragraph alignments are then stored in an output file.

Specification & Features
  • Automatic segmentation of texts into sentences
  • Manual sentence segmentation editing
  • Automatic parallel text alignment using either the Gale-Church alignment method or a naïve one-on-one alignment approach
  • An extremely easy to use manual sentence alignment user interface
  • Exports alignment results into a standard TMX file
  • Runs on all operating systems that can run the Java Virtual Machine
  • Easy installation (a .zip archive is simply unpacked onto the user’s machine)
Availability

CORAL is freely available for research purposes and can be downloaded from here. If you are interested in using CORAL commercially, please drop us an email at info@takelab.hr.