Treebanks in translation studies - the (CroCo) Dependency Treebank

Čulo, O., & Hansen-Schirra S.
The CroCo Dependeny Treebank comprises a collection of parallel texts of both English and German originals from eight different registers with their German and English translations respectively. In addition to the original multi-layer annotation and alignment of the CroCo Corpus (part-of-speech and phrase structure) we added treebank information (dependencies) to a sample of the parallel texts and aligned the nodes of the tree. This deep annotation and alignment allows us to query the corpus for both crossing edges (e.g. an aligned word pair, which realizes different syntactic functions in the source and target text) and dropped leaves and cut branches (e.g. words or phrases that have no aligned counterparts or incomplete alignments). On this basis, translation shifts on various linguistic levels and combinations thereof can be extracted and classified automatically. Patterns like these will be examined and possible factors triggering shifts named so far, register, grammatical contrast and typical translation strategies, as well as commonalities and differences in valence across English and German are discussed in the light of a possible dimension for categorisation of shifts.


This work was partially funded by the Deutsche Forschungsgemeinschaft.

