Towards Reliable Testing for Multiple Information Retrieval System Comparisons

UDC.coleccionInvestigaciónes_ES
UDC.conferenceTitleECIR 2025es_ES
UDC.departamentoCiencias da Computación e Tecnoloxías da Informaciónes_ES
UDC.grupoInvInformation Retrieval Lab (IRlab)es_ES
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicaciónes_ES
dc.contributor.authorOtero, David
dc.contributor.authorParapar, Javier
dc.contributor.authorBarreiro, Álvaro
dc.date.accessioned2025-04-16T09:18:20Z
dc.date.embargoEndDate2026-04-07es_ES
dc.date.embargoLift2026-04-07
dc.date.issued2025
dc.descriptionPresented at: Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025es_ES
dc.descriptionThis version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-031-88711-6_27es_ES
dc.description.abstract[Abstract]: Null Hypothesis Significance Testing is the de facto tool for assessing effectiveness differences between Information Retrieval systems. Researchers use statistical tests to check whether those differences will generalise to online settings or are just due to the samples observed in the laboratory. Much work has been devoted to studying which test is the most reliable when comparing a pair of systems, but most of the IR real-world experiments involve more than two. In the multiple comparisons scenario, testing several systems simultaneously may inflate the errors committed by the tests. In this paper, we use a new approach to assess the reliability of multiple comparison procedures using simulated and real TREC data. Experiments show that Wilcoxon plus the Benjamini-Hochberg correction yields Type I error rates according to the significance level for typical sample sizes while being the best test in terms of statistical power.es_ES
dc.description.sponsorshipThe authors thank the financial support supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovación supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next Generation EU).es_ES
dc.description.sponsorshipXunta de Galicia; ED431G/01es_ES
dc.description.sponsorshipXunta de Galicia; ED431B 2022/33es_ES
dc.identifier.citationOtero, D., Parapar, J., Barreiro, Á. (2025). Towards Reliable Testing for Multiple Information Retrieval System Comparisons. In: Hauff, C., et al. Advances in Information Retrieval. ECIR 2025. Lecture Notes in Computer Science, vol 15573. Springer, Cham. https://doi.org/10.1007/978-3-031-88711-6_27es_ES
dc.identifier.doi10.1007/978-3-031-88711-6_27
dc.identifier.isbn978-3-031-88711-6
dc.identifier.urihttp://hdl.handle.net/2183/41777
dc.language.isoenges_ES
dc.publisherSpringeres_ES
dc.relation.ispartofseriesLecture Notes in Computer Science ; 15573es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACIONes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PLEC2021-007662/ES/BIG-eRISK: PREDICCIÓN TEMPRANA DE RIESGOS PERSONALES EN CONJUNTOS DE DATOS MASIVOSes_ES
dc.relation.urihttps://doi.org/10.1007/978-3-031-88711-6_27es_ES
dc.rights© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AGes_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectInformation Retrieval Evaluationes_ES
dc.subjectNull Hypothesis Significance Testinges_ES
dc.subjectMultiple Comparisonses_ES
dc.subjectFamily-Wise Error Ratees_ES
dc.subjectFalse Discovery Ratees_ES
dc.titleTowards Reliable Testing for Multiple Information Retrieval System Comparisonses_ES
dc.typeconference outputes_ES
dc.type.hasVersionAMes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication00d04042-9b75-419e-9aab-33fd14b201af
relation.isAuthorOfPublicationfef1a9cb-e346-4e53-9811-192e144f09d0
relation.isAuthorOfPublicationa3e43020-ee28-428d-8087-2f3c1e20aa2c
relation.isAuthorOfPublication.latestForDiscovery00d04042-9b75-419e-9aab-33fd14b201af

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Otero_David_2025_Towards_Reliable_Testing_for_Multiple_Information_Retrieval_System_Comparisons.pdf
Size:
673.22 KB
Format:
Adobe Portable Document Format
Description:
Versión aceptada