DGT Translation Memory parallel corpus
DGT-Translation Memory is a database of aligned sentences from the European Union’s legislative documents (Acquis Communautaire) in 24 EU languages. Sketch Engine offers this database as parallel corpora which can be searched. Detailed information and how to cite the corpora can be found in the bibliography.
The DGT-Translation Memory consists of 24 European languages:
Bulgarian | German | Polish |
Czech | Greek | Portuguese |
Danish | Hungarian | Romanian |
Dutch | Irish | Croatian |
English | Italian | Slovak |
Estonian | Latvian | Slovenian |
Finnish | Lithuanian | Spanish |
French | Maltese | Swedish |
The aligned texts come from a large translation memory DGT published by The European Commission.
The individual corpora have been processed by the latest processing tools available in Sketch Engine.
Tools to work with the DGT Translation Memory parallel corpus
A complete set of Sketch Engine tools is available to work with this set of parallel corpora to generate:
- word sketch – collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Bibliographic references
For a more detailed description of the DGT-TM, including more statistics on the resource, see the following publication. When making reference to DGT-TM in scientific publications, please refer to:
Steinberger, R., Eisele, A., Klocek, S., Pilos, S., & Schlüter, P. (2013). DGT-TM: A freely available translation memory in 22 languages. arXiv preprint arXiv:1309.5226.
For a contrastive overview of DGT-TM and the other multilingual text resources offered for download on this site, you can read the following journal article:
Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez, M., Schlüter, P., Przybyszewski, M., & Gilbro, S. (2014). An overview of the European Union’s highly multilingual parallel corpora. Language resources and evaluation, 48(4), 679-707.
Search the DGT Translation Memory
Sketch Engine offers a range of tools to work with the DGT Translation Memory parallel corpus.
or
Tip
Learn to work with multilingual and parallel corpora in Sketch Engine. Refer to the user guide.
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.