SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Snipen Lars) "

Sökning: WFRF:(Snipen Lars)

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Mehmood, Tahir, et al. (författare)
  • A Partial Least Squares based algorithm for parsimonious variable selection
  • 2011
  • Ingår i: Algorithms for Molecular Biology. - : Springer Science and Business Media LLC. - 1748-7188. ; 6
  • Tidskriftsartikel (refereegranskat)abstract
    • Abstract Background In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
  •  
3.
  • Mehmood, Tahir, et al. (författare)
  • Exploration of multivariate analysis in microbial coding sequence modeling.
  • 2012
  • Ingår i: BMC bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • ABSTRACT: BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies.
  •  
4.
  • Mehmood, Tahir, et al. (författare)
  • Improving stability and understandability of genotype-phenotype mapping in Saccharomyces using regularized variable selection in L-PLS regression.
  • 2012
  • Ingår i: BMC bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • ABSTRACT: BACKGROUND: Multivariate approaches have been successfully applied to genome wide association studies. Recently, a Partial Least Squares (PLS) based approach was introduced for mapping yeast genotype-phenotype relations, where background information such as gene function classification, gene dispensability, recent or ancient gene copy number variations and the presence of premature stop codons or frameshift mutations in reading frames, were used post hoc to explain selected genes. One of the latest advancement in PLS named L-Partial Least Squares (L-PLS), where 'L' presents the used data structure, enables the use of background information at the modeling level. Here, a modification of L-PLS with variable importance on projection (VIP) was implemented using a stepwise regularized procedure for gene and background information selection. Results werecompared to PLS-based procedures, where no background information was used. RESULTS: Applying the proposed methodology to yeast Saccharomyces cerevisiae data, we found the relationship between genotype-phenotype to have improved understandability. Phenotypic variations were explained by the variations of relatively stable genes and stable background variations. The suggested procedure provides an automatic way for genotype-phenotype mapping. The selected phenotype influencing genes were evolving 29% faster than non-influential genes, and the current results are supported by a recently conducted study. Further power analysis on simulated data verified that the proposed methodology selects relevant variables. CONCLUSIONS: A modification of L-PLS with VIP in a stepwise regularized elimination procedure can improve the understandability and stability of selected genes and background information. The approach is recommended for genome wide association studies where background information is available.
  •  
5.
  • Mehmood, Tahir, et al. (författare)
  • Mining for genotype-phenotype relations in Saccharomyces using partial least squares.
  • 2011
  • Ingår i: BMC bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 12
  • Tidskriftsartikel (refereegranskat)abstract
    • ABSTRACT: Background: Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations. Results: Applying this methodology to an extensive data set for the model yeast Saccharomyces cerevisiae, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than noninfluential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on Saccharomyces yeasts recent adaptation to environmental changes in its ecological niche. Conclusions: BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.
  •  
6.
  • Rajan, Sukithar K, 1978- (författare)
  • Metagenomic Characterization of the Gut Microbiome in Cohorts of Elderly
  • 2020
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Human gut microbiota plays a vital role in maintaining host health. This thesis aims to investigate the gut microbial population and function using next generation sequencing (NGS) data from faecal samples. Paper I examines the influence of sequencing depth and analysis methods in microbiota profiling using NGS whole genome sequencing (WGS) data. By subsampling the metagenomic data, the influence of varying sequencing depths on different phylogenetic classification methods is investigated. This suggests that necessary sequencing depth would be dependent on the individual research plan. This paper recommends the need for a consensus approach and an informed choice of NGS analysis method selection for a reliable prediction. Paper II relates the gut microbiota to general health, nutrient intake, physical activity, medications, and psychological distress in community-dwelling older adults and senior orienteers. A higher abundance of F. prausnitzi in the faecal microbiota of senior orienteers confirms the hypothesis that senior orienteers can be seen as a model for healthy ageing in the perspective of the microbiota. Paper III focuses on assessing the validity of function prediction using LC-MS at multiple annotation levels. Predicted and quantified protein-pathway profiles were subjected to correlation analyses, which showed statistically significant association between predicted and quantified proteins as well as predicted and quantified pathways. This study also showed a direct relation between protein abundance and correlation for predicted and quantified proteins at higher function levels. Paper IV investigates the effects of faecal microbiota transfer (FMT) on functional microbiota profiles. This study showed that allogenic FMT did not alter the metabolite profiles, but it seems to disturb the gut microbiota-metabolite interactions when compared to autologous FMT.This thesis reiterates the need for carefully selecting prediction tools and methods for microbiome analysis. The findings of this thesis could stimulate more focused studies using NGS in medicine and aid in better understanding of host-microbe interactions.
  •  
7.
  • Snipen, Lars, et al. (författare)
  • Detection of divergent genes in microbial aCGH experiments
  • 2006
  • Ingår i: BMC Bioinformatics. - London, UK : BioMed Central. - 1471-2105. ; 7
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Array-based comparative genome hybridization (aCGH) is a tool for rapid comparison of genomes from different bacterial strains. The purpose of such analysis is to detect highly divergent or absent genes in a sample strain compared to an index strain. Development of methods for analyzing aCGH data has primarily focused on copy number abberations in cancer research. In microbial aCGH analyses, genes are typically ranked by log-ratios, and classification into divergent or present is done by choosing a cutoff log-ratio, either manually or by statistics calculated from the log-ratio distribution. As experimental settings vary considerably, it is not possible to develop a classical discriminant or statistical learning approach.Methods: We introduce a more efficient method for analyzing microbial aCGH data using a finite mixture model and a data rotation scheme. Using the average posterior probabilities from the model fitted to log-ratios before and after rotation, we get a score for each gene, and demonstrate its advantages for ranking and detecting divergent genes with enlarged specificity and sensitivity.Results: The procedure is tested and compared to other approaches on simulated data sets, as well as on four experimental validation data sets for aCGH analysis on fully sequenced strains of Staphylococcus aureus and Streptococcus pneumoniae.Conclusion: When tested on simulated data as well as on four different experimental validation data sets from experiments with only fully sequenced strains, our procedure out-competes the standard procedures of using a simple log-ratio cutoff for classification into present and divergent genes.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy