Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/wac7/data.semanticweb.org/workshop/wac7/2012/paper/7> ?p ?o. }
Showing items 1 to 13 of
13
with 100 items per page.
- 7 creator ian-auty.
- 7 creator oliver-charles.
- 7 creator paul-rayson.
- 7 type InProceedings.
- 7 label "Can Google count? Estimating search engine result consistency".
- 7 sameAs 7.
- 7 abstract "In the last ten years, corpus and computational linguists have begun to source language samples from the Web. A standard pipeline has emerged for compilation of the ‘web as corpus’ including crawling, filtering, de-duplication, tokenising, indexing etc. However, there are certain areas where building a large enough corpus even from the web is not feasible, and it is tempting to use result counts derived from search engines to overcome the sparse data problem. In this paper we explore the stability of these search engine result counts for both multiword expressions and single words. Commercial search engines employ a range of techniques to estimate the counts, and thus it is important that researchers understand the implications and how to minimize this instability. Through a variety of different experiments and analysis, we investigate exactly how this stability manifests itself, and conclude with a set of guidelines on how future projects can ensure they are using accurate frequency data from search engines. Search engine result reliability will also have impact on corpora sourced from the web using the web as corpus paradigm.".
- 7 hasAuthorList authorList.
- 7 isPartOf proceedings.
- 7 keyword "Web as corpus".
- 7 keyword "frequency".
- 7 keyword "search engines".
- 7 title "Can Google count? Estimating search engine result consistency".