Matches in DBpedia 2014 for { <http://dbpedia.org/resource/Focused_crawler> ?p ?o. }
Showing items 1 to 30 of
30
with 100 items per page.
- Focused_crawler abstract "A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the crawl frontier and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages with large PageRank", or "crawl pages about baseball". An important page property pertains to topics, leading to topical crawlers. For example, a topical crawler may be deployed to collect pages about solar power, or swine flu, while minimizing resources spent fetching pages on other topics. Crawl frontier management may not be the only device used by focused crawlers; they may use a Web directory, an Web text index, backlinks, or any other Web artifact.A focused crawler must predict the probability that an unvisited page will be relevant before actually downloading the page. A possible predictor is the anchor text of links; this was the approach taken by Pinkerton in a crawler developed in the early days of the Web. Topical crawling was first introduced by Menczer. Chakrabarti et al. coined the term focused crawler and used a text classifier to prioritize the crawl frontier. Andrew McCallum and co-authors also used reinforcement learning to focus crawlers. Diligenti 'et al. traced the context graph leading up to relevant pages, and their text content, to train classifiers. A form of online reinforcement learning has been used along with features extracted from the DOM tree and text of linking pages, to continually train classifiers that guide the crawl. In a review of topical crawling algorithms, Menczer et al. show that such simple strategies are very effective for short crawls, while more sophisticated techniques such as reinforcement learning and evolutionary adaptation can give the best performance over longer crawls.Crawlers are also focused on page properties other than topics. Cho et al. study a variety of crawl prioritization policies and their effects on the link popularity of fetched pages. Najork and Weiner show that breadth-first crawling, starting from popular seed pages, leads to collecting large-PageRank pages early in the crawl. Refinements involving detection of stale (poorly maintained) pages have been reported by Eiron et al.The performance of a focused crawler depends on the richness of links in the specific topic being searched, and focused crawling usually relies on a general web search engine for providing starting points. Davison presented studies on Web links and text that explain why focused crawling succeeds on broad topics; similar studies were presented by Chakrabarti et al. Seed selection can be important for focused crawlers and significantly influence the crawling efficiency. A whitelist strategy is to start the focus crawl from a list of high quality seed URLs and limit the crawling scope to the domains of these URLs. These high quality seeds should be selected based on a list of URL candidates which are accumulated over a sufficient long period of general web crawling. The whitelist should be updated periodically after it is created.".
- Focused_crawler wikiPageExternalLink the-url-frontier-1.html.
- Focused_crawler wikiPageID "11442799".
- Focused_crawler wikiPageRevisionID "584616280".
- Focused_crawler hasPhotoCollection Focused_crawler.
- Focused_crawler subject Category:Internet_search_algorithms.
- Focused_crawler subject Category:Web_crawlers.
- Focused_crawler subject Category:World_Wide_Web.
- Focused_crawler type CausalAgent100007347.
- Focused_crawler type Flatterer110095869.
- Focused_crawler type Follower110099375.
- Focused_crawler type LivingThing100004258.
- Focused_crawler type Object100002684.
- Focused_crawler type Organism100004475.
- Focused_crawler type Person100007846.
- Focused_crawler type PhysicalEntity100001930.
- Focused_crawler type Sycophant110684827.
- Focused_crawler type WebCrawlers.
- Focused_crawler type Whole100003553.
- Focused_crawler type YagoLegalActor.
- Focused_crawler type YagoLegalActorGeo.
- Focused_crawler comment "A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the crawl frontier and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages with large PageRank", or "crawl pages about baseball".".
- Focused_crawler label "Focused crawler".
- Focused_crawler label "الزاحف المركز".
- Focused_crawler sameAs m.02rct99.
- Focused_crawler sameAs Q5463958.
- Focused_crawler sameAs Q5463958.
- Focused_crawler sameAs Focused_crawler.
- Focused_crawler wasDerivedFrom Focused_crawler?oldid=584616280.
- Focused_crawler isPrimaryTopicOf Focused_crawler.