Sökning: onr:"swepub:oai:DiVA.org:kth-176956" >
ARK :
ARK : Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition
-
Koslicki, David (författare)
-
- Chatterjee, Saikat (författare)
- KTH,Kommunikationsteori
-
Shahrivar, Damon (författare)
-
visa fler...
-
Walker, Alan W. (författare)
-
Francis, Suzanna C. (författare)
-
Fraser, Louise J. (författare)
-
Vehkaperae, Mikko (författare)
-
Lan, Yueheng (författare)
-
Corander, Jukka (författare)
-
visa färre...
-
(creator_code:org_t)
- 2015-10-23
- 2015
- Engelska.
-
Ingår i: PLOS ONE. - : PUBLIC LIBRARY SCIENCE. - 1932-6203. ; 10:10
- Relaterad länk:
-
https://doi.org/10.1...
-
visa fler...
-
https://github.com/d...
-
http://www.ee.kth.se...
-
https://journals.plo...
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Motivation Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. Results There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. Availability An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
Ämnesord
- TEKNIK OCH TEKNOLOGIER -- Elektroteknik och elektronik -- Signalbehandling (hsv//swe)
- ENGINEERING AND TECHNOLOGY -- Electrical Engineering, Electronic Engineering, Information Engineering -- Signal Processing (hsv//eng)
Nyckelord
- Split Vector Quantization
- LSF Parameters
- Sequences
- Megan
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
-
PLOS ONE
(Sök värdpublikationen i LIBRIS)
Till lärosätets databas