Beyond questions: Leveraging ColBERT for keyphrase search

UDC.coleccionInvestigación
UDC.departamentoCiencias da Computación e Tecnoloxías da Información
UDC.grupoInvInformation Retrieval Lab (IRlab)
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.issue2, Part B
UDC.journalTitleInformation Processing & Management
UDC.startPage104480
UDC.volume63
dc.contributor.authorGabín, Jorge
dc.contributor.authorParapar, Javier
dc.contributor.authorMacdonald, Craig
dc.date.accessioned2026-01-23T12:57:10Z
dc.date.available2026-01-23T12:57:10Z
dc.date.issued2026-03
dc.descriptionThis study’s code and generated resources are available at https://github.com/JorgeGabin/ColBERTKP.
dc.description.abstract[Abstract]: While question-like queries are gaining popularity, keyphrase search is still the cornerstone of web search and other specialised domains such as academic and professional search. However, current dense retrieval models often fail with keyphrase-like queries, primarily because they are mostly trained on question-like ones. This paper introduces a novel model that employs the ColBERT architecture to enhance document ranking for keyphrase queries. For that, given the lack of large keyphrase-based retrieval datasets, we first explore how Large Language Models can convert question-like queries into keyphrase format. Then, using those keyphrases, we train a keyphrase-based ColBERT ranker (ColBERTKP ) to improve the performance when working with keyphrase queries. Furthermore, to make the model more flexible, allowing the use of both the question and keyphrase encoders depending on the query type, we investigate the feasibility of training only a keyphrase query encoder while keeping the document encoder weights static (ColBERTKP). We assess our proposals’ ranking performance using both automatically generated and manually annotated keyphrases. Our results reveal the potential of the late interaction architecture when working under the keyphrase search scenario.
dc.description.sponsorshipThis work has received support from projects: PID2022-137061OB-C21 (MCIN/AEI/10 .13039/501100011033/, Ministerio de Ciencia e Innovación, ERDF A way of making Europe, by the European Union); Consellería de Educación, Universidade e Formación Profesional, Spain (ED431G 2023/01 and ED431C 2025/49 GRC) and the European Regional Development Fund, which acknowledges the CITIC Research Center. The first author also acknowledges the support of grant DIN2020-011582 financed by the MCIN/AEI/10.13039/501100011033.
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.description.sponsorshipXunta de Galicia; ED431C 2025/49
dc.identifier.citationJ. Gabín, J. Parapar, and C. Macdonald, "Beyond questions: Leveraging ColBERT for keyphrase search", Information Processing & Management, Vol. 63, Issue 2, Part B, March 2026, 104480, https://doi.org/10.1016/j.ipm.2025.104480
dc.identifier.doi10.1016/j.ipm.2025.104480
dc.identifier.issn1873-5371
dc.identifier.urihttps://hdl.handle.net/2183/47080
dc.language.isoeng
dc.publisherElsevier
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/DIN2020-011582/ES/PERSONALIZACIÓN DE LA EXPERIENCIA DE USUARIO EN ESCENARIOS DE BÚSQUEDA COMPLEJA PARA BUSINESS INTELLIGENCE
dc.relation.urihttps://doi.org/10.1016/j.ipm.2025.104480
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectKeyphrase search
dc.subjectDense retrieval
dc.subjectLate interaction
dc.subjectLarge Language Models
dc.subjectKeyphrase generation
dc.titleBeyond questions: Leveraging ColBERT for keyphrase search
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublicationfef1a9cb-e346-4e53-9811-192e144f09d0
relation.isAuthorOfPublication.latestForDiscoveryfef1a9cb-e346-4e53-9811-192e144f09d0

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Parapar_Javier_2026_Beyond_questions.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format