Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2012/paper/773> ?p ?o. }
Showing items 1 to 16 of
16
with 100 items per page.
- 773 creator bharath-balakrishnan.
- 773 creator ganesh-ramakrishnan.
- 773 creator rohit-saraf.
- 773 creator sasidhar-kasturi.
- 773 creator soumen-chakrabarti.
- 773 type InProceedings.
- 773 label "Compressed Data Structures for Annotated Web Search".
- 773 sameAs 773.
- 773 abstract "Entity relationship search at Web scale depends on adding dozens of entity annotations to each of billions of crawled pages and indexing the annotations at rates comparable to regular text indexing. Even small entity search benchmarks from TREC and INEX suggest that the entity catalog support thousands of entity types and tens to hundreds of millions of entities. The above targets raise many challenges, major ones being the design of highly compressed data structures in RAM for spotting and disambiguating entity mentions, and highly compressed disk-based annotation indices. These data structures cannot be readily built upon standard inverted indices. Here we present the fastest known Web scale entity annotator. Using a new workload-sensitive compressed multilevel map, we fit statistical disambiguation models for millions of entities within 1.1GB of RAM, and spend about 0.6 core-milliseconds per disambiguation. In contrast, DBPedia Spotlight spends 158 milliseconds, Wikipedia Miner spends 21 milliseconds, and Zemanta spends 9.5 milliseconds. Our annotation indices use ideas from vertical databases to reduce storage by 30%. On 40x8 cores with 40x3 disk spindles, we can annotate and index a billion Web pages with Wikipedia's two million entities and over 200,000 types in about a day. Index decompression and scan speed is comparable to MG4J.".
- 773 hasAuthorList authorList.
- 773 isPartOf proceedings.
- 773 keyword "annotation index".
- 773 keyword "compressed data structures".
- 773 keyword "entity catalog".
- 773 keyword "semantic annotation".
- 773 title "Compressed Data Structures for Annotated Web Search".