Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/lrec2008/papers/776> ?p ?o. }
Showing items 1 to 13 of
13
with 100 items per page.
- 776 creator heiki-jaan-kaalep.
- 776 creator mark-fishel.
- 776 type InProceedings.
- 776 label "Experiments on Processing Overlapping Parallel Corpora".
- 776 sameAs 776.
- 776 abstract "The number and sizes of parallel corpora keep growing, which makes it necessary to have automatic methods of processing them: combining, checking and improving corpora quality, etc. We here introduce a method which enables performing many of these by exploiting overlapping parallel corpora. The method finds the correspondence between sentence pairs in two corpora: first the corresponding language parts of the corpora are aligned and then the two resulting alignments are compared. The method takes into consideration slight differences in the source documents, different levels of segmentation of the input corpora, encoding differences and other aspects of the task. The paper describes two experiments conducted to test the method. In the first experiment, the Estonian-English part of the JRC-Acquis corpus was combined with another corpus of legislation texts. In the second experiment alternatively aligned versions of the JRC-Acquis are compared to each other with the example of all language pairs between English, Estonian and Latvian. Several additional conclusions about the corpora can be drawn from the results. The method proves to be effective for several parallel corpora processing tasks.".
- 776 hasAuthorList authorList.
- 776 hasTopic Linguistics.
- 776 isPartOf proceedings.
- 776 keyword "Corpus (creation, annotation, etc.)".
- 776 keyword "Tools, systems, applications".
- 776 keyword "Validation of LRs".
- 776 title "Experiments on Processing Overlapping Parallel Corpora".