Towards Reliable Testing for Multiple Information Retrieval System Comparisons
| UDC.coleccion | Investigación | es_ES |
| UDC.conferenceTitle | ECIR 2025 | es_ES |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | es_ES |
| UDC.grupoInv | Information Retrieval Lab (IRlab) | es_ES |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | es_ES |
| dc.contributor.author | Otero, David | |
| dc.contributor.author | Parapar, Javier | |
| dc.contributor.author | Barreiro, Álvaro | |
| dc.date.accessioned | 2025-04-16T09:18:20Z | |
| dc.date.embargoEndDate | 2026-04-07 | es_ES |
| dc.date.embargoLift | 2026-04-07 | |
| dc.date.issued | 2025 | |
| dc.description | Presented at: Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025 | es_ES |
| dc.description | This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-031-88711-6_27 | es_ES |
| dc.description.abstract | [Abstract]: Null Hypothesis Significance Testing is the de facto tool for assessing effectiveness differences between Information Retrieval systems. Researchers use statistical tests to check whether those differences will generalise to online settings or are just due to the samples observed in the laboratory. Much work has been devoted to studying which test is the most reliable when comparing a pair of systems, but most of the IR real-world experiments involve more than two. In the multiple comparisons scenario, testing several systems simultaneously may inflate the errors committed by the tests. In this paper, we use a new approach to assess the reliability of multiple comparison procedures using simulated and real TREC data. Experiments show that Wilcoxon plus the Benjamini-Hochberg correction yields Type I error rates according to the significance level for typical sample sizes while being the best test in terms of statistical power. | es_ES |
| dc.description.sponsorship | The authors thank the financial support supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovación supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next Generation EU). | es_ES |
| dc.description.sponsorship | Xunta de Galicia; ED431G/01 | es_ES |
| dc.description.sponsorship | Xunta de Galicia; ED431B 2022/33 | es_ES |
| dc.identifier.citation | Otero, D., Parapar, J., Barreiro, Á. (2025). Towards Reliable Testing for Multiple Information Retrieval System Comparisons. In: Hauff, C., et al. Advances in Information Retrieval. ECIR 2025. Lecture Notes in Computer Science, vol 15573. Springer, Cham. https://doi.org/10.1007/978-3-031-88711-6_27 | es_ES |
| dc.identifier.doi | 10.1007/978-3-031-88711-6_27 | |
| dc.identifier.isbn | 978-3-031-88711-6 | |
| dc.identifier.uri | http://hdl.handle.net/2183/41777 | |
| dc.language.iso | eng | es_ES |
| dc.publisher | Springer | es_ES |
| dc.relation.ispartofseries | Lecture Notes in Computer Science ; 15573 | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PLEC2021-007662/ES/BIG-eRISK: PREDICCIÓN TEMPRANA DE RIESGOS PERSONALES EN CONJUNTOS DE DATOS MASIVOS | es_ES |
| dc.relation.uri | https://doi.org/10.1007/978-3-031-88711-6_27 | es_ES |
| dc.rights | © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.subject | Information Retrieval Evaluation | es_ES |
| dc.subject | Null Hypothesis Significance Testing | es_ES |
| dc.subject | Multiple Comparisons | es_ES |
| dc.subject | Family-Wise Error Rate | es_ES |
| dc.subject | False Discovery Rate | es_ES |
| dc.title | Towards Reliable Testing for Multiple Information Retrieval System Comparisons | es_ES |
| dc.type | conference output | es_ES |
| dc.type.hasVersion | AM | es_ES |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 00d04042-9b75-419e-9aab-33fd14b201af | |
| relation.isAuthorOfPublication | fef1a9cb-e346-4e53-9811-192e144f09d0 | |
| relation.isAuthorOfPublication | a3e43020-ee28-428d-8087-2f3c1e20aa2c | |
| relation.isAuthorOfPublication.latestForDiscovery | 00d04042-9b75-419e-9aab-33fd14b201af |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Otero_David_2025_Towards_Reliable_Testing_for_Multiple_Information_Retrieval_System_Comparisons.pdf
- Size:
- 673.22 KB
- Format:
- Adobe Portable Document Format
- Description:
- Versión aceptada

