Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/lrec2008/papers/407> ?p ?o. }
Showing items 1 to 14 of
14
with 100 items per page.
- 407 creator abdel-rahim-madany.
- 407 creator hossam-ibrahim.
- 407 creator kareem-darwish.
- 407 type InProceedings.
- 407 label "Automatic Extraction of Textual Elements from News Web Pages".
- 407 sameAs 407.
- 407 abstract "In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a supervised machine learning classification technique based on the use of a Support Vector Machine (SVM) classifier to extract the desired textual elements. The technique uses internal structural features of a webpage without relying on the Document Object Model to which many content authors fail to adhere. The classifier uses a set of features which rely on the length of text, the percentage of hypertext, etc. The resulting classifier is nearly perfect on previously unseen news pages from different sites. The proposed technique is successfully employed in Alzoa.com, which is the largest Arabic news aggregator on the web.".
- 407 hasAuthorList authorList.
- 407 hasTopic Linguistics.
- 407 isPartOf proceedings.
- 407 keyword "Corpus (creation, annotation, etc.)".
- 407 keyword "LR web services".
- 407 keyword "Text mining".
- 407 title "Automatic Extraction of Textual Elements from News Web Pages".