• Distributed and Collaborative Web Change Detection System 

      Prieto Álvarez, Víctor Manuel; Álvarez Díaz, Manuel; Carneiro, Víctor; Cacheda, Fidel (ComSIS Consortium, 2015)
      [Absctract]: Search engines use crawlers to traverse the Web in order to download web pages and build their indexes. Maintaining these indexes up-to-date is an essential task to ensure the quality of search results. ...
    • Soft-404 Pages, A Crawling Problem 

      Prieto Álvarez, Víctor Manuel; Álvarez Díaz, Manuel; Cacheda, Fidel (Society for Information Organization in India, 2014-04)
      [Absctract]: During its traversal of the Web, crawler systems have to deal with multiple challenges. Some of them are related with detecting garbage content to avoid wasting resources processing it. Soft-404 pages are ...
    • The Evolution of the (Hidden) Web and its Hidden Data 

      Álvarez Díaz, Manuel; Prieto Álvarez, Víctor Manuel; Cacheda, Fidel (IGI Global, 2015)
      This paper presents an analysis of the most important features of the Web and its evolution and implications on the tools that traverse it to index its content to be searched later. It is important to remark that some of ...
    • Twitter: A Good Place to Detect Health Conditions 

      Prieto Álvarez, Víctor Manuel; Matos, Sergio; Álvarez Díaz, Manuel; Cacheda, Fidel; Oliveira, José Luís (PLoS, 2014-01)
      [Absctract]: With the proliferation of social networks and blogs, the Internet is increasingly being used to disseminate personal health information rather than just as a source of information. In this paper we exploit the ...