Mostrar o rexistro simple do ítem

dc.contributor.advisorGonzález-Domínguez, Jorge
dc.contributor.advisorTouriño, Juan
dc.contributor.authorAmatria Barral, Iñaki
dc.contributor.otherEnxeñaría informática, Grao enes_ES
dc.date.accessioned2022-10-04T15:22:48Z
dc.date.issued2021
dc.identifier.urihttp://hdl.handle.net/2183/31769
dc.description.abstract[Abstract]: For a long time, it was a common and well-established belief that RNA’s only role was to intermediate between DNA and protein. However, during the last three decades, this long-held belief has been completely shattered. With the development of next generation sequencing technologies, it has been found out that most RNA in the human genome does not translate into protein. This is the so called long noncoding RNA (lncRNA), whose discovery has drastically changed the way biologists approach genetics. Furthermore, studies show that, besides playing important roles in many biological processes, the dysfunction of many lncRNA sequences are associated with serious diseases, such as cancer or diabetes. Consequently, noncoding RNA biology is a hot research topic, and biologists are constantly trying to come up with new strategies to elucidate lncRNA functions, some of which include computational prediction of interacting RNA and lncRNA pairs (lncRNA works by being assembled with other proteins or RNA). For this very purpose, many application-specific bioinformatics tools have been developed. For instance: RIsearch2, ASSA and RIblast, which is one of the fastest, yet accurate, tools in the market right now. However, even though it is up to 64 times faster than other predictors, RIblast still falls very short when it is supplied with huge and significant lncRNA datasets, and, therefore, further progress in the field is still very limited. To address this particular problem, this thesis presents pRIblast: a high efficient, parallel application for extensive and comprehensive RNA-RNA interaction analysis. Programmed with industry standard parallel technologies (MPI and OpenMP), pRIblast introduces the RIblast algorithm into high performance computing facilities (i.e. clusters of multicore systems joined together by an interconnection network). Moreover, pRIblast has been optimized to reduce memory usage and input and output latencies to the bare minimum and ,therefore, the novel application is ready to take on new challenges that could never have been faced with the former RIblast tool (i.e. the human genome). To ensure pRIblast fulfills all quality criteria to be considered production ready, this thesis presents comprehensive benchmarking done on a 16-node computer cluster too (64GiB of main memory and 16 CPU cores per node, which amount for a total of 256 CPU cores). The results are outstanding. They not only point out that the parallelization of RIblast is successful (101 days worth of work were reduced to just 21 hours), but they also assert the importance of the optimizations applied to the tool (it was possible to analyze two datasets which exceed RIblast memory requirements, and I/O times were reduced from 4000 to just 90 seconds with a dataset that produced 407GiB of output data).es_ES
dc.description.abstract[Resumo]: Durante moitos anos pensouse que o ARN era un simple intermediario entre o ADN e as proteinas, mais, porén, a aparición de tecnoloxías de secuenciación de nova xeración permitiu descubrir que a maior parte do xenoma humano está formado por cadeas longas de ARN non codificante (lncRNA, polas súas siglas en inglés). É dicir, un tipo de ARN que non sintetiza proteínas. Ademais, estudos recentes demostraron que a disfunción dunha gran parte destas cadeas de ARN están relacionadas con enfermidades tan graves coma o cancro ou a diabetes. Para dilucidar a función das lncARNs, xurdiron numerosas ferramentas informáticas que tratan de predicir interacións ARN e lncRNA, xa que, as últimas, funcionan ensamblándose xunto a outras proteínas ou cadeas de ARN. Algunhas destas ferramentas son: RIsearch2, ASSA e, máis notablemente, RIblast, que obtén resultados até 64 veces máis rápido que outras aplicacións dispoñibles no mercado sen comprometer a calidade das predicións. Malia isto, RIblast aínda é demasiado lenta e non pode traballar con conxuntos de lncRNAs moi grandes sen que os tempos de predición medren exponencialmente. Neste Traballo Fin de Grao desenvolveuse pRIblast, que é unha mellora sobre o algoritmo RIblast que permite executalo en contornas de computación de altas prestacións. Para isto, utilizáronse tecnoloxías de programación paralela estándar (MPI e OpenMP) que fan que pRIblast poida explotar, eficientemente, calquera sistema de computación multinó con nós multinúcleo. A nova ferramenta tamén se optimizou para minimizar a latencia das operacións de entrada e saída e o uso de memoria. Así pois conseguiuse tanto reducir o tempo de cómputo do algoritmo RIblast en varias ordes de magnitude como posibilitar a execución de conxuntos de datos de gran tamaño que a ferramenta orixinal endexamais podería analizar (i.e. o xenoma humano). Para asegurar que a paralelización da ferramenta foi efectiva, fixéronse longas e extensivas probas de rendemento nun clúster con 16 nós de cómputo, con 64GiB de memoria e 16 núcleos por nó (256 núcleos en total). Os resultados obtidos foron moi satisfactorios, xa que se acadaron grandes aceleracións que permitiron executar un gran xenoma, que tardaría 101 días en procesar, en tan só 21 horas. A maiores, demostrouse que as optimizacións desenvolvidas sobre o algoritmo paralelo son moi efectivas. Por exemplo, reducíronse os tempos de escritura dende 4000 a 90 segundos nun conxunto de datos que produce 407GiB de resultados, e se puideron analizar dous datasets que non poderían ser procesados polo algoritmo orixinal debido ao seu uso intensivo de memoria.es_ES
dc.language.isoenges_ES
dc.rightsAtribución-No Comercial-No Derivadas 3.0 Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.titlepRIblast: a high efficient, parallel application for RNA-RNA interaction predictiones_ES
dc.typeinfo:eu-repo/semantics/bachelorThesises_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
dc.date.embargoEndDate2023-01-25es_ES
dc.date.embargoLift2023-01-25
dc.description.traballosTraballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/2022es_ES


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem