Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2008/paper/103> ?p ?o. }
Showing items 1 to 14 of
14
with 100 items per page.
- 103 creator chuan-xiao.
- 103 creator jeffrey-xu-yu.
- 103 creator wei-wang.
- 103 creator xuemin-lin.
- 103 type InProceedings.
- 103 label "Efficient Similarity Joins for Near Duplicate Detection".
- 103 sameAs 103.
- 103 abstract "With the increasing amount of data and the need to integrate data from multiple data sources, one of the challenging issues is to find near duplicate records efficiently. In this paper, we focus on efficient algorithms to find pair of records such that their similarities are above a given threshold. Several existing algorithms rely on the prefix filtering principle to avoid computing similarity values for all possible pairs of records. We propose new filtering techniques by exploiting the ordering information; they are integrated into the existing methods and drastically reduce the candidate sizes and hence improve the efficiency. Experimental results show our proposed algorithms can achieve up to 5x speed-up over previous algorithms on several real datasets and provide alternative solutions to the near duplicate Web page detection problem.".
- 103 hasAuthorList authorList.
- 103 hasTopic World_Wide_Web.
- 103 isPartOf proceedings.
- 103 keyword "Near Duplicate Detection".
- 103 keyword "Set Similarity Join".
- 103 title "Efficient Similarity Joins for Near Duplicate Detection".