Tyck till om SwePub Sök
här!
Sökning: L773:0027 8424 OR L773:1091 6490
> Högskolan Dalarna >
Scaling metagenome ...
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
-
Pell, J. (författare)
-
- Hintze, Arend, Professor (författare)
- Michigan State University, East Lansing, United States
-
Canino-Koning, R. (författare)
-
visa fler...
-
Howe, A. (författare)
-
Tiedje, J. M. (författare)
-
Brown, C. T. (författare)
-
visa färre...
-
(creator_code:org_t)
- 2012-07-30
- 2012
- Engelska.
-
Ingår i: Proceedings of the National Academy of Sciences of the United States of America. - : Proceedings of the National Academy of Sciences. - 0027-8424 .- 1091-6490. ; 109:33, s. 13272-13277
- Relaterad länk:
-
https://www.pnas.org...
-
visa fler...
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory.We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.
Ämnesord
- NATURVETENSKAP -- Biologi -- Bioinformatik och systembiologi (hsv//swe)
- NATURAL SCIENCES -- Biological Sciences -- Bioinformatics and Systems Biology (hsv//eng)
Nyckelord
- Compression
- Metagenomics
- article
- gene sequence
- mathematical analysis
- metagenome
- plots and curves
- priority journal
- probabilistic de Bruijn graph
- Base Pairing
- Chromosomes
- Bacterial
- Computational Biology
- DNA
- Circular
- Escherichia coli
- Genome
- Bacterial
- Information Theory
- Nonlinear Dynamics
- Sequence Analysis
- DNA
- Soil Microbiology
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas