SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Sahlin Kristoffer 1984 ) "

Sökning: WFRF:(Sahlin Kristoffer 1984 )

  • Resultat 1-4 av 4
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Karami, Moein, et al. (författare)
  • Designing efficient randstrobes for sequence similarity analyses
  • 2024
  • Ingår i: Bioinformatics. - 1367-4803 .- 1367-4811. ; 40:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: Substrings of length k, commonly referred to as k-mers, play a vital role in sequence analysis. However, k-mers are limited to exact matches between sequences leading to alternative constructs. We recently introduced a class of new constructs, strobemers, that can match across substitutions and smaller insertions and deletions. Randstrobes, the most sensitive strobemer proposed in Sahlin (Effective sequence similarity detection with strobemers. Genome Res 2021a;31:2080–94. https://doi.org/10.1101/gr.275648.121), has been used in several bioinformatics applications such as read classification, short-read mapping, and read overlap detection. Recently, we showed that the more pseudo-random the behavior of the construction (measured in entropy), the more efficient the seeds for sequence similarity analysis. The level of pseudo-randomness depends on the construction operators, but no study has investigated the efficacy.Results: In this study, we introduce novel construction methods, including a Binary Search Tree-based approach that improves time complexity over previous methods. To our knowledge, we are also the first to address biases in construction and design three metrics for measuring bias. Our evaluation shows that our methods have favorable speed and sampling uniformity compared to existing approaches. Lastly, guided by our results, we change the seed construction in strobealign, a short-read mapper, and find that the results change substantially. We suggest combining the two results to improve strobealign’s accuracy for the shortest reads in our evaluated datasets. Our evaluation highlights sampling biases that can occur and provides guidance on which operators to use when implementing randstrobes.Availability and implementation: All methods and evaluation benchmarks are available in a public Github repository at https://github.com/Moein-Karami/RandStrobes. The scripts for running the strobealign analysis are found at https://github.com/NBISweden/strobealign-evaluation.
  •  
2.
  • Maier, Benjamin Dominik, 1997-, et al. (författare)
  • Entropy predicts sensitivity of pseudorandom seeds
  • 2023
  • Ingår i: Genome Research. - 1088-9051 .- 1549-5469. ; 33:7, s. 1162-1174
  • Tidskriftsartikel (refereegranskat)abstract
    • Seed design is important for sequence similarity search applications such as read mapping and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are likely the most well-known and used seeds, sensitivity suffers at high error rates, particularly when indels are present. Recently, we developed a pseudorandom seeding construct, strobemers, which was empirically shown to have high sensitivity also at high indel rates. However, the study lacked a deeper understanding of why. In this study, we propose a model to estimate the entropy of a seed and find that seeds with high entropy, according to our model, in most cases have high match sensitivity. Our discovered seed randomness–sensitivity relationship explains why some seeds perform better than others, and the relationship provides a framework for designing even more sensitive seeds. We also present three new strobemer seed constructs: mixedstrobes, altstrobes, and multistrobes. We use both simulated and biological data to show that our new seed constructs improve sequence-matching sensitivity to other strobemers. We show that the three new seed constructs are useful for read mapping and ANI estimation. For read mapping, we implement strobemers into minimap2 and observe 30% faster alignment time and 0.2% higher accuracy than using k-mers when mapping reads at high error rates. As for ANI estimation, we find that higher entropy seeds have a higher rank correlation between estimated and true ANI.
  •  
3.
  • Namias, Alice, et al. (författare)
  • Nanopore sequencing of PCR products enables multicopy gene family reconstruction
  • 2023
  • Ingår i: Computational and Structural Biotechnology Journal. - 2001-0370. ; 21, s. 3656-3664
  • Tidskriftsartikel (refereegranskat)abstract
    • The importance of gene amplifications in evolution is more and more recognized. Yet, tools to study multi-copy gene families are still scarce, and many such families are overlooked using common sequencing methods. Haplotype reconstruction is even harder for polymorphic multi-copy gene families. Here, we show that all variants (or haplotypes) of a multi-copy gene family present in a single genome, can be obtained using Oxford Nanopore Technologies sequencing of PCR products, followed by steps of mapping, SNP calling and haplotyping. As a proof of concept, we acquired the sequences of highly similar variants of the cidA and cidB genes present in the genome of the Wolbachia wPip, a bacterium infecting Culex pipiens mosquitoes. Our method relies on a wide database of cid genes, previously acquired by cloning and Sanger sequencing. We addressed problems commonly faced when using mapping approaches for multi-copy gene families with highly similar variants. In addition, we confirmed that PCR amplification causes frequent chimeras which have to be carefully considered when working on families of recombinant genes. We tested the robustness of the method using a combination of bioinformatics (read simulations) and molecular biology approaches (sequence acquisitions through cloning and Sanger sequencing, specific PCRs and digital droplet PCR). When different haplotypes present within a single genome cannot be reconstructed from short reads sequencing, this pipeline confers a high throughput acquisition, gives reliable results as well as insights of the relative copy numbers of the different variants.
  •  
4.
  • Sahlin, Kristoffer, 1984-, et al. (författare)
  • Genome scaffolding with PE-contaminated mate-pair libraries
  • 2015
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Scaffolding is often an essential step in a genome assembly process,in which contigs are ordered and oriented using read pairs from a combination of paired-ends libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problem is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed in previous work on integrated scaffolders in end-to-end assemblers such as Allpaths-LG and MaSuRCA but the methods relies on the fact that the orientation is observable, \emph{e.g.}, by finding the junction adapter sequence in the reads. This is not always the case, making orientation and insert size of a read pair stochastic. Furthermore, work on modeling PE-contamination has so far been disregarded in stand-alone scaffolders and the effect that PE-contamination has on scaffolding quality has not been examined before. We have addressed PE-contamination in an update of our scaffolder BESST. We formulate the problem as an Integer Linear Program (ILP) and use characteristics of the problem, such as contig lengths and insert size, to efficiently solve the ILP using a linear amount (with respect to the number of contigs) of Linear Programs. Our results show significant improvement over both integrated and standalone scaffolders. The impact of modeling PE-contamination is quantified by comparison with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in increased number of misassemblies, more conservative scaffolding, and inflated assembly sizes. The model is implemented in BESST. Source code and usage instructions are found at https://github.com/ksahlin/BESST. BESST can also be downloaded using PyPI.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-4 av 4

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy