Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2007/paper/main/790> ?p ?o. }
Showing items 1 to 12 of
12
with 100 items per page.
- 790 creator bernhard-kruepl.
- 790 creator bernhard-pollak.
- 790 creator marcus-herzog.
- 790 creator paul-bohunsky.
- 790 creator wolfgang-gatterbauer.
- 790 type InProceedings.
- 790 label "Domain Independent Information Extraction from Web Tables".
- 790 sameAs 790.
- 790 abstract "Traditionally, information extraction from web tables has focused on small more or less homogeneous corpora, often based on assumptions about the semantics and use of <table> tags. A multitude of implementation forms of tables render these approaches difficult to scale. In this paper, we address the problem of domain-independent information extraction from web tables by shifting the focus from the tree-based representation of web pages to the 2-dimensional representation as intended by human authors for human readers. This additional visual information would allow us to fill the gap between syntax and domain dependent semantic. We show that this approach gives us a new set of features, which allow each of the steps of table location, recognition and interpretation to work without any reliance on domain-specific knowledge or domain-specific table templates.".
- 790 hasAuthorList authorList.
- 790 isPartOf proceedings.
- 790 title "Domain Independent Information Extraction from Web Tables".