SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Schliep Alexander 1967)
 

Sökning: WFRF:(Schliep Alexander 1967) > (2010-2014) > Indel-tolerant read...

Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees.

Mahmud, Md Pavel (författare)
Wiedenhoeft, John (författare)
Schliep, Alexander, 1967 (författare)
Gothenburg University,Göteborgs universitet,Institutionen för data- och informationsteknik, datavetenskap (GU),Department of Computer Science and Engineering, Computing Science (GU)
 (creator_code:org_t)
2012-09-03
2012
Engelska.
Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 28:18
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics.For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L(1) distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L(1) distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants.TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net.pavelm@cs.rutgers.eduSupplementary data are available at Bioinformatics online.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Bioinformatik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Bioinformatics (hsv//eng)

Nyckelord

Chromosome Mapping
Genetic Variation
Genome
Human
Genomics
methods
High-Throughput Nucleotide Sequencing
methods
Humans
INDEL Mutation
Nucleotides
chemistry
Sequence Analysis
DNA
methods

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Mahmud, Md Pavel
Wiedenhoeft, Joh ...
Schliep, Alexand ...
Om ämnet
NATURVETENSKAP
NATURVETENSKAP
och Data och informa ...
och Bioinformatik
Artiklar i publikationen
Bioinformatics ( ...
Av lärosätet
Göteborgs universitet

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy