SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:su-225341"
 

Search: onr:"swepub:oai:DiVA.org:su-225341" > Entropy predicts se...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist
  • Maier, Benjamin Dominik,1997-Stockholms universitet,Matematiska institutionen (author)

Entropy predicts sensitivity of pseudorandom seeds

  • Article/chapterEnglish2023

Publisher, publication year, extent ...

  • 2023
  • printrdacarrier

Numbers

  • LIBRIS-ID:oai:DiVA.org:su-225341
  • https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-225341URI
  • https://doi.org/10.1101/gr.277645.123DOI

Supplementary language notes

  • Language:English
  • Summary in:English

Part of subdatabase

Classification

  • Subject category:ref swepub-contenttype
  • Subject category:art swepub-publicationtype

Notes

  • Seed design is important for sequence similarity search applications such as read mapping and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are likely the most well-known and used seeds, sensitivity suffers at high error rates, particularly when indels are present. Recently, we developed a pseudorandom seeding construct, strobemers, which was empirically shown to have high sensitivity also at high indel rates. However, the study lacked a deeper understanding of why. In this study, we propose a model to estimate the entropy of a seed and find that seeds with high entropy, according to our model, in most cases have high match sensitivity. Our discovered seed randomness–sensitivity relationship explains why some seeds perform better than others, and the relationship provides a framework for designing even more sensitive seeds. We also present three new strobemer seed constructs: mixedstrobes, altstrobes, and multistrobes. We use both simulated and biological data to show that our new seed constructs improve sequence-matching sensitivity to other strobemers. We show that the three new seed constructs are useful for read mapping and ANI estimation. For read mapping, we implement strobemers into minimap2 and observe 30% faster alignment time and 0.2% higher accuracy than using k-mers when mapping reads at high error rates. As for ANI estimation, we find that higher entropy seeds have a higher rank correlation between estimated and true ANI.

Subject headings and genre

Added entries (persons, corporate bodies, meetings, titles ...)

  • Sahlin, Kristoffer,1984-Stockholms universitet,Matematiska institutionen(Swepub:su)krsa3529 (author)
  • Stockholms universitetMatematiska institutionen (creator_code:org_t)

Related titles

  • In:Genome Research33:7, s. 1162-11741088-90511549-5469

Internet link

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Maier, Benjamin ...
Sahlin, Kristoff ...
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Computer and Inf ...
and Bioinformatics
Articles in the publication
Genome Research
By the university
Stockholm University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view