Distributed and Collaborative Web Change Detection System

Loading...
Thumbnail Image

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

V. M. Prieto, M. Alvarez, V. Carneiro, and F. Cacheda, “Distributed and collaborative web change detection system,” Computer Science and Information Systems, vol. 12, no. 1, pp. 91–114, 2015, Accessed: Jan. 22, 2024. [Online]. Available: DOI 10.2298/CSIS131120081P

Type of academic work

Academic degree

Abstract

[Absctract]: Search engines use crawlers to traverse the Web in order to download web pages and build their indexes. Maintaining these indexes up-to-date is an essential task to ensure the quality of search results. However, changes in web pages are unpredictable. Identifying the moment when a web page changes as soon as possible and with minimal computational cost is a major challenge. In this article we present the Web Change Detection system that, in a best case scenario, is capable to detect, almost in real time, when a web page changes. In a worst case scenario, it will require, on average, 12 minutes to detect a change on a low PageRank web site and about one minute on a web site with high PageRank. Meanwhile, current search engines require more than a day, on average, to detect a modification in a web page (in both cases).

Description

Rights