SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:su-221725"
 

Search: onr:"swepub:oai:DiVA.org:su-221725" > Efficient mapping o...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Efficient mapping of accurate long reads in minimizer space with mapquik

Ekim, Baris (author)
Sahlin, Kristoffer (author)
Stockholms universitet,Matematiska institutionen,Science for Life Laboratory (SciLifeLab)
Medvedev, Paul (author)
show more...
Berger, Bonnie (author)
Chikhi, Rayan (author)
show less...
 (creator_code:org_t)
2023
2023
English.
In: Genome Research. - 1088-9051 .- 1549-5469. ; 33:7, s. 1188-1197
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. We focus on the critical problem of mapping, or aligning, low-divergence sequences from long reads (e.g., Pacific Biosciences [PacBio] HiFi) to a reference genome, which poses challenges in terms of accuracy and computational resources when using cutting-edge read mapping approaches that are designed for all types of alignments. A natural idea would be to optimize efficiency with longer seeds to reduce the probability of extraneous matches; however, contiguous exact seeds quickly reach a sensitivity limit. We introduce mapquik, a novel strategy that creates accurate longer seeds by anchoring alignments through matches of k consecutively sampled minimizers (k-min-mers) and only indexing k-min-mers that occur once in the reference genome, thereby unlocking ultrafast mapping while retaining high sensitivity. We show that mapquik significantly accelerates the seeding and chaining steps-fundamental bottlenecks to read mapping-for both the human and maize genomes with >96% sensitivity and near-perfect specificity. On the human genome, for both real and simulated reads, mapquik achieves a 37x speedup over the state-of-the-art tool minimap2, and on the maize genome, mapquik achieves a 410x speedup over minimap2, making mapquik the fastest mapper to date. These accelerations are enabled from not only minimizer-space seeding but also a novel heuristic O(n) pseudochaining algorithm, which improves upon the long-standing O(nlogn) bound. Minimizer-space computation builds the foundation for achieving real-time analysis of long-read sequencing data.

Subject headings

NATURVETENSKAP  -- Biologi -- Bioinformatik och systembiologi (hsv//swe)
NATURAL SCIENCES  -- Biological Sciences -- Bioinformatics and Systems Biology (hsv//eng)

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Ekim, Baris
Sahlin, Kristoff ...
Medvedev, Paul
Berger, Bonnie
Chikhi, Rayan
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Biological Scien ...
and Bioinformatics a ...
Articles in the publication
Genome Research
By the university
Stockholm University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view