Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2010/paper/main/254> ?p ?o. }
Showing items 1 to 15 of
15
with 100 items per page.
- 254 creator wenzhao-tan.
- 254 creator xiao-li.
- 254 creator xiaoxin-yin.
- 254 creator yi-chin-tu.
- 254 type InProceedings.
- 254 label "Automatic Extraction of Clickable Structured Web Contents for Name Entity Queries".
- 254 sameAs 254.
- 254 abstract "Today the major web search engines answer queries by showing ten result snippets, which need to be inspected by users for identifying relevant results. In this paper we investigate how to extract structured information from the web, in order to directly answer queries by showing the contents being searched for. We treat users’ search trails (i.e., post-search browsing behaviors) as implicit labels on the relevance between web contents and user queries. Based on such labels we use information extraction approach to build wrappers and extract structured information. An important observation is that many web sites contain pages for name entities of certain categories (e.g., AOL Music contains a page for each musician), and these pages have the same format. This makes it possible to build wrappers from a small amount of implicit labels, and use them to extract structured information from many web pages for different name entities. We propose STRUCLICK, a fully automated system for extracting structured information for queries containing name entities of certain categories. It can identify important web sites from web search logs, build wrappers from users’ search trails, filter out bad wrappers built from random user clicks, and combine structured information from different web sites for each query. Comparing with existing approaches on information extraction, STRUCLICK can assign semantics to extracted data without any human labeling or supervision. We perform comprehensive experiments, which show STRUCLICK achieves high accuracy and good scalability.".
- 254 hasAuthorList authorList.
- 254 isPartOf proceedings.
- 254 keyword "Semantic search".
- 254 keyword "entity retrieval".
- 254 keyword "geo/temporal search".
- 254 keyword "sub/super-documents".
- 254 title "Automatic Extraction of Clickable Structured Web Contents for Name Entity Queries".