Distributed and Collaborative Web Change Detection System

Use this link to cite
http://hdl.handle.net/2183/35047Collections
- Investigación (FIC) [1635]
Metadata
Show full item recordTitle
Distributed and Collaborative Web Change Detection SystemDate
2015Citation
V. M. Prieto, M. Alvarez, V. Carneiro, and F. Cacheda, “Distributed and collaborative web change detection system,” Computer Science and Information Systems, vol. 12, no. 1, pp. 91–114, 2015, Accessed: Jan. 22, 2024. [Online]. Available: DOI 10.2298/CSIS131120081P
Abstract
[Absctract]: Search engines use crawlers to traverse the Web in order to download
web pages and build their indexes. Maintaining these indexes up-to-date is an
essential task to ensure the quality of search results. However, changes in web pages
are unpredictable. Identifying the moment when a web page changes as soon as
possible and with minimal computational cost is a major challenge. In this article
we present the Web Change Detection system that, in a best case scenario, is capable
to detect, almost in real time, when a web page changes. In a worst case scenario, it
will require, on average, 12 minutes to detect a change on a low PageRank web site
and about one minute on a web site with high PageRank. Meanwhile, current search
engines require more than a day, on average, to detect a modification in a web page
(in both cases).
Keywords
Content refresh
Incremental crawling
Crawling systems and Search engines
Incremental crawling
Crawling systems and Search engines
Editor version
ISSN
1820-0214
2406-1018
2406-1018