Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/lrec2008/papers/493> ?p ?o. }
Showing items 1 to 16 of
16
with 100 items per page.
- 493 creator cam-tu-nguyen.
- 493 creator hong-phuong-le.
- 493 creator mathias-rossignol.
- 493 creator quang-thang-dinh.
- 493 creator thi-minh-huyen-nguyen.
- 493 creator xuan-luong-vu.
- 493 type InProceedings.
- 493 label "Word Segmentation of Vietnamese Texts: a Comparison of Approaches".
- 493 sameAs 493.
- 493 abstract "We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition from about 7,000 syllables, which also have a meaning as isolated words. So the identification of word boundaries in a text is not a simple task, and ambiguities often appear. Beyond the presentation of the tested systems, we also propose a standard definition for word segmentation in Vietnamese, and introduce a reference corpus developed for the purpose of evaluating such a task. The results observed confirm that it can be relatively well treated by automatic means, although a solution needs to be found to take into account out-of-vocabulary words.".
- 493 hasAuthorList authorList.
- 493 hasTopic Linguistics.
- 493 isPartOf proceedings.
- 493 keyword "Corpus (creation, annotation, etc.)".
- 493 keyword "Other".
- 493 title "Word Segmentation of Vietnamese Texts: a Comparison of Approaches".