Using Score Distributions to Compare Statistical Significance Tests for Information Retrieval Evaluation
| UDC.coleccion | Investigación | es_ES |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | es_ES |
| UDC.grupoInv | Information Retrieval Lab (IRlab) | es_ES |
| UDC.journalTitle | Journal of the Association for Information Science and Technology | es_ES |
| UDC.volume | 71 | |
| dc.contributor.author | Parapar, Javier | |
| dc.contributor.author | Losada, David E. | |
| dc.contributor.author | Presedo-Quindimil, Manuel-Antonio | |
| dc.contributor.author | Barreiro, Álvaro | |
| dc.date.accessioned | 2019-02-13T15:03:42Z | |
| dc.date.available | 2019-02-13T15:03:42Z | |
| dc.date.issued | 2019-01-11 | |
| dc.description | This is the peer reviewed version of the following article: Parapar, J., Losada, D.E., Presedo-Quindimil, M.A. and Barreiro, A. (2020), 'Using Score Distributions to Compare Statistical Significance Tests for Information Retrieval Evaluation', Journal of the Association for Information Science and Technology, 71: 98-113, which has been published in final form at https://doi.org/10.1002/asi.24203. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited. | es_ES |
| dc.description.abstract | [Abstract] Statistical significance tests can provide evidence that the observed difference in performance between two methods is not due to chance. In Information Retrieval, some studies have examined the validity and suitability of such tests for comparing search systems.We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in Information Retrieval evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have a good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested. | es_ES |
| dc.description.sponsorship | Xunta de Galicia; GPC 2016/035 | |
| dc.description.sponsorship | Xunta de Galicia; ED431G/01 | |
| dc.description.sponsorship | Xunta de Galicia; ED431G/08 | |
| dc.description.sponsorship | This work has received financial support from the "Ministerio de Economía y Competitividad" of the Government of Spain and FEDER Funds under the research project TIN2015-64282-R, Xunta de Galicia (project GPC 2016/035), and Xunta de Galicia – "Consellería de Cultura, Educación e Ordenación Universitaria" and the European Regional Development Fund (ERDF) through the following 2016-2019 accreditations: ED431G/01 ("Centro singular de investigación de Galicia") and ED431G/08. | |
| dc.identifier.citation | Parapar, J., Losada, D.E., Presedo-Quindimil, M.A. and Barreiro, A. (2020), 'Using Score Distributions to Compare Statistical Significance Tests for Information Retrieval Evaluation', Journal of the Association for Information Science and Technology, 71: 98-113, https://doi.org/10.1002/asi.24203. | |
| dc.identifier.doi | 10.1002/asi.24203 | |
| dc.identifier.issn | 2330-1643 | |
| dc.identifier.uri | http://hdl.handle.net/2183/21729 | |
| dc.language.iso | eng | es_ES |
| dc.publisher | Willey | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2015-64282-R/ES/MODELOS DE LENGUAJE PROBABILISTICOS PARA RANKINGS PERSONALIZADOS EN SISTEMAS DE ACCESO A LA INFORMACION | |
| dc.relation.uri | https://doi.org/10.1002/asi.24203 | |
| dc.rights | © 2019 ASIS&T. Este artículo puede utilizarse con fines no comerciales conforme a los Términos y Condiciones de Wiley para el Uso de Versiones Autoarchivadas. Este artículo no puede ser mejorado, enriquecido ni transformado de otro modo en una obra derivada, sin el permiso expreso de Wiley o por derechos legales bajo la legislación aplicable. Los avisos de derechos de autor no deben ser eliminados, ocultos ni modificados. El artículo debe estar vinculado a la versión registrada de Wiley en Wiley Online Library y cualquier incrustación, encuadre o puesta a disposición del artículo o páginas del mismo por terceros de plataformas, servicios y sitios web distintos a Wiley Online Library debe estar prohibido. | |
| dc.rights.accessRights | open access | es_ES |
| dc.subject | Information retrieval | es_ES |
| dc.subject | Statistical test | es_ES |
| dc.subject | Significance testing | es_ES |
| dc.subject | Wilcoxon | es_ES |
| dc.subject | Permutation | es_ES |
| dc.subject | Sign | es_ES |
| dc.subject | Bootstrap | es_ES |
| dc.subject | T-ttest | es_ES |
| dc.title | Using Score Distributions to Compare Statistical Significance Tests for Information Retrieval Evaluation | es_ES |
| dc.title.alternative | Compare statistical significance tests for information retrieval evaluation | es_ES |
| dc.type | journal article | es_ES |
| dc.type.hasVersion | AM | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | fef1a9cb-e346-4e53-9811-192e144f09d0 | |
| relation.isAuthorOfPublication | f0c0e95e-7cab-4486-8f96-e5ef248d6b27 | |
| relation.isAuthorOfPublication | a3e43020-ee28-428d-8087-2f3c1e20aa2c | |
| relation.isAuthorOfPublication.latestForDiscovery | fef1a9cb-e346-4e53-9811-192e144f09d0 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- jasist2019(1).pdf
- Size:
- 720.4 KB
- Format:
- Adobe Portable Document Format
- Description:
- Preprint of JASIST article

