On the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Tasks

UDC.coleccionInvestigaciónes_ES
UDC.departamentoLetrases_ES
UDC.endPage164es_ES
UDC.grupoInvLingua e Sociedade da Información (LYS)es_ES
UDC.journalTitleComputer Speech & Languagees_ES
UDC.startPage136es_ES
UDC.volume36es_ES
dc.contributor.authorVilares, Jesús
dc.contributor.authorVilares Ferro, Manuel
dc.contributor.authorAlonso, Miguel A.
dc.contributor.authorOakes, Michael P.
dc.date.accessioned2017-07-17T15:02:14Z
dc.date.available2017-07-17T15:02:14Z
dc.date.issued2016-03
dc.description.abstract[Abstract] The field of Cross-Language Information Retrieval relates techniques close to both the Machine Translation and Information Retrieval fields, although in a context involving characteristics of its own. The present study looks to widen our knowledge about the effectiveness and applicability to that field of non-classical translation mechanisms that work at character $n$-gram level.} For the purpose of this study, an $n$-gram based system of this type has been developed. This system requires only a bilingual machine-readable dictionary of $n$-grams, automatically generated from parallel corpora, which serves to translate queries previously $n$-grammed in the source language. $n$-Gramming is then used as an approximate string matching technique to perform monolingual text retrieval on the set of $n$-grammed documents in the target language. The tests for this work have been performed on CLEF collections for seven European languages, taking English as the target language. The performance attained, close to the upper baseline, confirms the validity of character $n$-gram based approaches for Cross Language Information Retrieval tasks, both for indexing--retrieval and translation purposes, these not being tied to a given implementation.es_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; FFI2014-51978-C2-1-Res_ES
dc.description.sponsorshipGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/034es_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; FFI2014-51978-C2-2-R
dc.identifier.citationJesús Vilares, Manuel Vilares, Miguel A. Alonso and Michael P. Oakes, On the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Tasks, Computer Speech and Language, 36:136-164, 2016.es_ES
dc.identifier.issn0885-2308
dc.identifier.urihttp://hdl.handle.net/2183/19291
dc.language.isoenges_ES
dc.relation.urihttp://www.sciencedirect.com/science/article/pii/S0885230815000935?via%3Dihubes_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectCross-Language Information Retrievales_ES
dc.subjectCharacter n-gramses_ES
dc.subjectAlignment algorithms for Machine Translationes_ES
dc.titleOn the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Taskses_ES
dc.typejournal articlees_ES
dspace.entity.typePublication
relation.isAuthorOfPublication3313b723-2288-4d9d-b0e7-32732c9c78d5
relation.isAuthorOfPublication3d821e9c-de0b-47cc-a4e0-7c531569602e
relation.isAuthorOfPublication1318edb8-3967-465c-a267-146624c05837
relation.isAuthorOfPublication.latestForDiscovery3313b723-2288-4d9d-b0e7-32732c9c78d5

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Vilares_Jesus_2016_On_the_Feasibility_of_Character_n-Grams_Pseudo-Translation_for_Cross-Language_Information_Retrieval_Tasks.pdf
Size:
666.15 KB
Format:
Adobe Portable Document Format
Description: