Designing efficient randstrobes for sequence similarity analyses

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Sökning: id:"swepub:oai:DiVA.org:su-229044" > Designing efficient...

1 av 1
Föregående post
Nästa post
Till träfflistan

Karami, MoeinStockholms universitet,Matematiska institutionen,Science for Life Laboratory (SciLifeLab) (författare)

Designing efficient randstrobes for sequence similarity analyses

Artikel/kapitelEngelska2024

Förlag, utgivningsår, omfång ...

2024
printrdacarrier

Nummerbeteckningar

LIBRIS-ID:oai:DiVA.org:su-229044
https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-229044URI
https://doi.org/10.1093/bioinformatics/btae187DOI

Kompletterande språkuppgifter

Språk:engelska
Sammanfattning på:engelska

Ingår i deldatabas

SwePubSwePub

Klassifikation

Ämneskategori:ref swepub-contenttype
Ämneskategori:art swepub-publicationtype

Anmärkningar

Motivation: Substrings of length k, commonly referred to as k-mers, play a vital role in sequence analysis. However, k-mers are limited to exact matches between sequences leading to alternative constructs. We recently introduced a class of new constructs, strobemers, that can match across substitutions and smaller insertions and deletions. Randstrobes, the most sensitive strobemer proposed in Sahlin (Effective sequence similarity detection with strobemers. Genome Res 2021a;31:2080–94. https://doi.org/10.1101/gr.275648.121), has been used in several bioinformatics applications such as read classification, short-read mapping, and read overlap detection. Recently, we showed that the more pseudo-random the behavior of the construction (measured in entropy), the more efficient the seeds for sequence similarity analysis. The level of pseudo-randomness depends on the construction operators, but no study has investigated the efficacy.Results: In this study, we introduce novel construction methods, including a Binary Search Tree-based approach that improves time complexity over previous methods. To our knowledge, we are also the first to address biases in construction and design three metrics for measuring bias. Our evaluation shows that our methods have favorable speed and sampling uniformity compared to existing approaches. Lastly, guided by our results, we change the seed construction in strobealign, a short-read mapper, and find that the results change substantially. We suggest combining the two results to improve strobealign’s accuracy for the shortest reads in our evaluated datasets. Our evaluation highlights sampling biases that can occur and provides guidance on which operators to use when implementing randstrobes.Availability and implementation: All methods and evaluation benchmarks are available in a public Github repository at https://github.com/Moein-Karami/RandStrobes. The scripts for running the strobealign analysis are found at https://github.com/NBISweden/strobealign-evaluation.

Ämnesord och genrebeteckningar

Biuppslag (personer, institutioner, konferenser, titlar ...)

Mohammadi, Aryan SoltaniStockholms universitet,Science for Life Laboratory (SciLifeLab),Matematiska institutionen (författare)
Martin, MarcelStockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Natl Bioinformat Infrastruct Sweden, SE-17121 Solna, Sweden (författare)
Ekim, Baris (författare)
Shen, Wei (författare)
Guo, Lidong (författare)
Xu, Mengyang (författare)
Pibiri, Giulio Ermanno (författare)
Patro, Rob (författare)
Sahlin, Kristoffer,1984-Stockholms universitet,Matematiska institutionen,Science for Life Laboratory (SciLifeLab)(Swepub:su)krsa3529 (författare)
Stockholms universitetMatematiska institutionen (creator_code:org_t)

Sammanhörande titlar

Ingår i:Bioinformatics40:41367-48031367-4811

Internetlänk

Hitta via bibliotek

Bioinformatics (Sök värdpublikationen i LIBRIS)

Till lärosätets databas

1 av 1
Föregående post
Nästa post
Till träfflistan

Hitta mer i SwePub

Av författaren/redakt...: Karami, Moein; Mohammadi, Aryan ...; Martin, Marcel; Ekim, Baris; Shen, Wei; Guo, Lidong; visa fler...; Xu, Mengyang; Pibiri, Giulio E ...; Patro, Rob; Sahlin, Kristoff ...; visa färre...

Om ämnet

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Bioinformatik

TEKNIK OCH TEKNOLOGIER: TEKNIK OCH TEKNO ...; och Samhällsbyggnads ...; och Byggproduktion

Artiklar i publikationen: Bioinformatics

Av lärosätet: Stockholms universitet

Sök utanför SwePub

Sök vidare i:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se