Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2010/paper/main/317> ?p ?o. }
Showing items 1 to 14 of
14
with 100 items per page.
- 317 creator josh-attenberg.
- 317 creator shuai-ding.
- 317 creator torsten-suel.
- 317 type InProceedings.
- 317 label "Scalable Techniques for Document Identifier Assignment in Inverted Indexes".
- 317 sameAs 317.
- 317 abstract "Web search engines are based on a full-text data structure called an inverted index. The size of the inverted index is a major performance bottleneck during query processing, and a large amount of research has focused on fast and effective techniques for compressing this structure.Several authors have recently proposed techniques for improving index compression by optimizing the assignment of document identifiers to the documents in the collection, leading to significant improvements in overall index size. In this paper, we propose improved techniques for document identifier assignment. Previous work includes simple and fast heuristics such as sorting by URL, as well as more involved approaches based on Travelling Salesman or graph partitioning problems that achieve good compression but do not scale to larger document collections. We propose a new framework based on performing a Travelling Salesman computation on a reduced sparse graph obtained using Locally Sensitive Hashing, which achieves improved compression while scaling to tens of millions of documents. Based on this framework, we describe a number of new algorithms, and perform a detailed evaluation on three large data sets showing improvements in index size.".
- 317 hasAuthorList authorList.
- 317 isPartOf proceedings.
- 317 keyword "Indexing".
- 317 keyword "caching".
- 317 keyword "distribution".
- 317 keyword "index compression".
- 317 title "Scalable Techniques for Document Identifier Assignment in Inverted Indexes".