Tyck till om SwePub Sök
här!
Search: L773:0027 8424 OR L773:1091 6490
> Högskolan Dalarna >
Scaling metagenome ...
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
-
Pell, J. (author)
-
- Hintze, Arend, Professor (author)
- Michigan State University, East Lansing, United States
-
Canino-Koning, R. (author)
-
show more...
-
Howe, A. (author)
-
Tiedje, J. M. (author)
-
Brown, C. T. (author)
-
show less...
-
(creator_code:org_t)
- 2012-07-30
- 2012
- English.
-
In: Proceedings of the National Academy of Sciences of the United States of America. - : Proceedings of the National Academy of Sciences. - 0027-8424 .- 1091-6490. ; 109:33, s. 13272-13277
- Related links:
-
https://www.pnas.org...
-
show more...
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
show less...
Abstract
Subject headings
Close
- Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory.We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.
Subject headings
- NATURVETENSKAP -- Biologi -- Bioinformatik och systembiologi (hsv//swe)
- NATURAL SCIENCES -- Biological Sciences -- Bioinformatics and Systems Biology (hsv//eng)
Keyword
- Compression
- Metagenomics
- article
- gene sequence
- mathematical analysis
- metagenome
- plots and curves
- priority journal
- probabilistic de Bruijn graph
- Base Pairing
- Chromosomes
- Bacterial
- Computational Biology
- DNA
- Circular
- Escherichia coli
- Genome
- Bacterial
- Information Theory
- Nonlinear Dynamics
- Sequence Analysis
- DNA
- Soil Microbiology
Publication and Content Type
- ref (subject category)
- art (subject category)
Find in a library
To the university's database