On the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Tasks

Vilares, Jesús; Vilares Ferro, Manuel; Alonso, Miguel A.; Oakes, Michael P.

On the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Tasks

UDC.coleccion	Investigación	es_ES
UDC.departamento	Letras	es_ES
UDC.endPage	164	es_ES
UDC.grupoInv	Lingua e Sociedade da Información (LYS)	es_ES
UDC.journalTitle	Computer Speech & Language	es_ES
UDC.startPage	136	es_ES
UDC.volume	36	es_ES
dc.contributor.author	Vilares, Jesús
dc.contributor.author	Vilares Ferro, Manuel
dc.contributor.author	Alonso, Miguel A.
dc.contributor.author	Oakes, Michael P.
dc.date.accessioned	2017-07-17T15:02:14Z
dc.date.available	2017-07-17T15:02:14Z
dc.date.issued	2016-03
dc.description.abstract	[Abstract] The field of Cross-Language Information Retrieval relates techniques close to both the Machine Translation and Information Retrieval fields, although in a context involving characteristics of its own. The present study looks to widen our knowledge about the effectiveness and applicability to that field of non-classical translation mechanisms that work at character $n$-gram level.} For the purpose of this study, an $n$-gram based system of this type has been developed. This system requires only a bilingual machine-readable dictionary of $n$-grams, automatically generated from parallel corpora, which serves to translate queries previously $n$-grammed in the source language. $n$-Gramming is then used as an approximate string matching technique to perform monolingual text retrieval on the set of $n$-grammed documents in the target language. The tests for this work have been performed on CLEF collections for seven European languages, taking English as the target language. The performance attained, close to the upper baseline, confirms the validity of character $n$-gram based approaches for Cross Language Information Retrieval tasks, both for indexing--retrieval and translation purposes, these not being tied to a given implementation.	es_ES
dc.description.sponsorship	Ministerio de Economía y Competitividad; FFI2014-51978-C2-1-R	es_ES
dc.description.sponsorship	Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/034	es_ES
dc.description.sponsorship	Ministerio de Economía y Competitividad; FFI2014-51978-C2-2-R
dc.identifier.citation	Jesús Vilares, Manuel Vilares, Miguel A. Alonso and Michael P. Oakes, On the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Tasks, Computer Speech and Language, 36:136-164, 2016.	es_ES
dc.identifier.issn	0885-2308
dc.identifier.uri	http://hdl.handle.net/2183/19291
dc.language.iso	eng	es_ES
dc.relation.uri	http://www.sciencedirect.com/science/article/pii/S0885230815000935?via%3Dihub	es_ES
dc.rights.accessRights	open access	es_ES
dc.subject	Cross-Language Information Retrieval	es_ES
dc.subject	Character n-grams	es_ES
dc.subject	Alignment algorithms for Machine Translation	es_ES
dc.title	On the Feasibility of Character n-Grams Pseudo-Translation for Cross-Language Information Retrieval Tasks	es_ES
dc.type	journal article	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	3313b723-2288-4d9d-b0e7-32732c9c78d5
relation.isAuthorOfPublication	3d821e9c-de0b-47cc-a4e0-7c531569602e
relation.isAuthorOfPublication	1318edb8-3967-465c-a267-146624c05837
relation.isAuthorOfPublication.latestForDiscovery	3313b723-2288-4d9d-b0e7-32732c9c78d5

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Vilares_Jesus_2016_On_the_Feasibility_of_Character_n-Grams_Pseudo-Translation_for_Cross-Language_Information_Retrieval_Tasks.pdf
Size:: 666.15 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Investigación (FFIL)