Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/lrec2008/papers/892> ?p ?o. }
Showing items 1 to 14 of
14
with 100 items per page.
- 892 creator amiya-nayak.
- 892 creator diana-inkpen.
- 892 creator leanne-spracklin.
- 892 type InProceedings.
- 892 label "Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution".
- 892 sameAs 892.
- 892 abstract "Traditional Authorship Attribution models extract normalized counts of lexical elements such as nouns, common words and punctuation and use these normalized counts or ratios as features for author fingerprinting. The text is viewed as a bag-of-words and the order of words and their position relative to other words is largely ignored. We propose a new method of feature extraction which quantifies the distribution of lexical elements within the text using Kolmogorov complexity estimates. Testing carried out on blog corpora indicates that such measures outperform ratios when used as features in an SVM authorship attribution model. Moreover, by adding complexity estimates to a model using ratios, we were able to increase the F-measure by 5.2-11.8%".
- 892 hasAuthorList authorList.
- 892 hasTopic Linguistics.
- 892 isPartOf proceedings.
- 892 keyword "Document Classification, Text categorisation".
- 892 keyword "Information Extraction, Information Retrieval".
- 892 keyword "Text mining".
- 892 title "Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution".