ITPilot: a toolkit for industrial-strength Web data extraction

Bibliographic citation

A. Pan, J. Raposo, M. Alvarez, P. Montoto, J. Losada, y J. Hidalgo, «ITPilot: A Toolkit for Industrial-Strength Web Data Extraction», en The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), Compiegne, France: IEEE, 2005, pp. 798-801. doi: 10.1109/WI.2005.85

Type of academic work

Academic degree

Abstract

[Abstract]: In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of today's Web sources are "human-readable" but not "machine-readable", these systems must address a number of difficult challenges, such as dealing with complex navigation sequences, extracting data from HTML pages and reacting to source changes. Denodo Corporation has developed ITPilot, an industrial-strength solution that allows complex "wrappers" for Web sources to be graphically generated and automatically maintained. This paper presents the architecture and the basic ideas "behind the scenes" in ITPilot.

Description

© 2005 IEEE. This version of the paper has been accepted for publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Conference held from 19 to 22 September 2005, Compiègne, France

Rights

Copyright © 2005, IEEE