Prediction of missing sequences and branch lengths in phylogenomic data

Darriba, Diego; Weiß, Michael; Stamatakis, Alexandros

Prediction of missing sequences and branch lengths in phylogenomic data

UDC.coleccion	Investigación	es_ES
UDC.departamento	Enxeñaría de Computadores	es_ES
UDC.endPage	1337	es_ES
UDC.grupoInv	Grupo de Arquitectura de Computadores (GAC)	es_ES
UDC.issue	9	es_ES
UDC.journalTitle	Bioinformatics	es_ES
UDC.startPage	1131	es_ES
UDC.volume	32	es_ES
dc.contributor.author	Darriba, Diego
dc.contributor.author	Weiß, Michael
dc.contributor.author	Stamatakis, Alexandros
dc.date.accessioned	2018-08-29T11:49:32Z
dc.date.available	2018-08-29T11:49:32Z
dc.date.issued	2016-01-05
dc.description	This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordDiego Darriba, Michael Weiß, Alexandros Stamatakis; Prediction of missing sequences and branch lengths in phylogenomic data, Bioinformatics, Volume 32, Issue 9, 1 May 2016, Pages 1331–1337, is available online at: https://doi.org/10.1093/bioinformatics/btv768	es_ES
dc.description.abstract	[Abstract] Motivation: The presence of missing data in large-scale phylogenomic datasets has negative effects on the phylogenetic inference process. One effect that is caused by alignments with missing per-gene or per-partition sequences is that the inferred phylogenies may exhibit extremely long branch lengths. We investigate if statistically predicting missing sequences for organisms by using information from genes/partitions that have data for these organisms alleviates the problem and improves phylogenetic accuracy. Results: We present several algorithms for correcting excessively long branch lengths induced by missing data. We also present methods for predicting/imputing missing sequence data. We evaluate our algorithms by systematically removing sequence data from three empirical and 100 simulated alignments. We then compare the Maximum Likelihood trees inferred from the gappy alignments and on the alignments with predicted sequence data to the trees inferred from the original, complete datasets. The datasets with predicted sequences showed one to two orders of magnitude more accurate branch lengths compared to the branch lengths of the trees inferred from the alignments with missing data. However, prediction did not affect the RF distances between the trees.	es_ES
dc.identifier.citation	Diego Darriba, Michael Weiß, Alexandros Stamatakis; Prediction of missing sequences and branch lengths in phylogenomic data, Bioinformatics, Volume 32, Issue 9, 1 May 2016, Pages 1331–1337, https://doi.org/10.1093/bioinformatics/btv768	es_ES
dc.identifier.doi	10.1093/bioinformatics/btv768
dc.identifier.issn	1367-4803
dc.identifier.issn	1367-4811
dc.identifier.uri	http://hdl.handle.net/2183/20982
dc.language.iso	eng	es_ES
dc.publisher	Oxford University Press	es_ES
dc.relation.uri	https://doi.org/10.1093/bioinformatics/btv768	es_ES
dc.rights.accessRights	open access	es_ES
dc.subject	Phylogenetics	es_ES
dc.title	Prediction of missing sequences and branch lengths in phylogenomic data	es_ES
dc.type	journal article	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	64f4176e-8f06-4807-b964-3c474b876a4d
relation.isAuthorOfPublication.latestForDiscovery	64f4176e-8f06-4807-b964-3c474b876a4d

Files

Original bundle

Now showing 1 - 1 of 1

Name:: D.Darriba_Prediction_of_Missing_Sequences_and_Branch_Lengths_in_Phylogenomic_Data_2016.pdf
Size:: 282.04 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Investigación (FIC)