The wisdom of the rankers: a cost-effective method for building pooled test collections without participant systems

Bibliographic citation

David Otero, Javier Parapar, and Álvaro Barreiro. 2021. The Wisdom of the Rankers: A Cost-Effective Method for Building Pooled Test Collections with-out Participant Systems. In The 36th ACM/SIGAPP Symposium on Applied Computing (SAC ’21), March 22–26, 2021, Virtual Event, Republic of Korea. ACM, New York, NY, USA, 9 pages.

Type of academic work

Academic degree

Abstract

[Abstract]: Information Retrieval is an area where evaluation is crucial to validate newly proposed models. As the first step in the evaluation of models, researchers carry out offline experiments on specific datasets. While the field started around ad-hoc search, the number of new tasks is continuously growing. These tasks demand the development of new test collections (documents, information needs, and judgments). The construction of those datasets relies on expensive campaigns like TREC. Due to the size of modern collections, obtaining the relevance for each document-topic pair is infeasible. To reduce this cost, organizers usually apply a technique called pooling. When building pooled test collections, assessors only judge a portion of the documents selected among the participants' results. Although the judgments will not be exhaustive, they will be sufficiently complete and unbiased if pooling is done correctly. Therefore, researchers may safely use pooled collections to evaluate new models. However, the application of pooling depends on the existence of participant systems. This need is a handicap for tasks for which it is necessary to release training data before the celebration of the competition or for those with few participants. In this paper, we present a simple method for building pooled collections when such restrictions exist. Our proposal relies on two principles: the wisdom of the rankers and the application of pooling. By creating enough artificial participant systems, we can apply pooling on their results to select the documents that merit human assessment. Using an innovative approach to evaluate our method, we show that researchers may use it to produce high-quality collections on the absence of participant systems.

Description

This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing, https://doi.org/10.1145/3412841.3441947

Rights

© 2021 Owner/Author | ACM