Sökning: id:"swepub:oai:DiVA.org:uu-360840" >
Examining sequence ...
Examining sequence alignments using a model-based approach
-
- Bogusz, Marcin (författare)
- Uppsala universitet,Evolutionsbiologi,Whelan Lab
-
- Ali, Raja Hashim (författare)
- Uppsala universitet,Evolutionsbiologi
-
- Whelan, Simon (författare)
- Uppsala universitet,Evolutionsbiologi
-
(creator_code:org_t)
- Engelska.
- Relaterad länk:
-
https://urn.kb.se/re...
Abstract
Ämnesord
Stäng
- Multiple sequence alignment (MSA) is a commonly performed procedure required for a number of evolutionary and comparative analyses. The common two-step process of sequence alignment followed by statistical phylogenetic inference depends on MSA quality. MSA is computationally difficult and as a result in many cases sequence alignments contain regions of spurious homologies. These errors in the alignment affect downstream results, so choosing an accurate MSA is critical. Researchers often face the problem of choosing an aligner out of many multiple sequence alignment methods (MSAMs). This choice is often based on the results of benchmarks with various popular methods claiming high accuracy scores. These methods compete to obtain the highest scores in the commonly used sum-of-pairs benchmark—which accounts for a fraction of the true homologies recovered—ignoring the fraction of introduced false positive homologies. Furthermore, these benchmarks do not account for the fact that some homologies are more difficult to recover than the others. We take a probabilistic model-based approach to examine the quality of pairwise homologies returned by four popular MSAMs. We use pair-hidden Markov models to break down alignment columns into pairs and obtain distributions of pairwise posterior scores for these aligners. Basing our results on a structural benchmark and a simulation study, we find that MSAMs appear to return a sample from a confidence set defined by high posterior probabilities. Furthermore, we find that the reference alignment contains low pairwise posterior portions of pairwise homologies which cannot be expected to be recovered by any MSAM. Finally, we look at several possible test statistics, with and without the need for reference alignments, and ultimately suggest using positive predictive value (PPV) and mean posterior probability for MSA evaluation.
Ämnesord
- NATURVETENSKAP -- Biologi -- Evolutionsbiologi (hsv//swe)
- NATURAL SCIENCES -- Biological Sciences -- Evolutionary Biology (hsv//eng)
Nyckelord
- Sequence alignment
- alignment accuracy
- alignment uncertainty
- pair hidden Markov models
Publikations- och innehållstyp
- vet (ämneskategori)
- ovr (ämneskategori)