Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/lrec2008/papers/847> ?p ?o. }
Showing items 1 to 19 of
19
with 100 items per page.
- 847 creator attila-almasi.
- 847 creator csaba-hatvani.
- 847 creator dora-szauter.
- 847 creator gyoergy-szarvas.
- 847 creator janos-csirik.
- 847 creator richard-farkas.
- 847 creator robert-ormandi.
- 847 creator veronika-vincze.
- 847 type InProceedings.
- 847 label "Hungarian Word-Sense Disambiguated Corpus".
- 847 sameAs 847.
- 847 abstract "To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the purpose of word sense disambiguation. Among others, selection criteria required the given word form to be frequent in Hungarian language usage, and to have more than one sense considered frequent in usage. HNC and its Heti Világgazdaság subcorpus provided the basis for corpus text selection. This way, each sample has a relevant context (whole article), and information on the lemma, POS-tagging and automatic tokenization is also available. When planning the corpus, 300-500 samples of each word form were to be annotated. This size makes it possible that the subcorpora prepared for the individual word forms can be compared to data available for other languages. However, the finalized database also contains unannotated samples and samples with single annotation, which were annotated only by one of the linguists. The corpus follows the ACL s SensEval/SemEval WSD tasks format. The first version of the corpus was developed within the scope of the project titled The construction Hungarian WordNet Ontology and its application in Information Extraction Systems (Hatvani et al., 2007). The corpus for research and educational purposes is available and can be downloaded free of charge.".
- 847 hasAuthorList authorList.
- 847 hasTopic Linguistics.
- 847 isPartOf proceedings.
- 847 keyword "Corpus (creation, annotation, etc.)".
- 847 keyword "Semantics".
- 847 keyword "Word Sense Disambiguation".
- 847 title "Hungarian Word-Sense Disambiguated Corpus".