SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Buongermino Pereira Mariana 1982)
 

Sökning: WFRF:(Buongermino Pereira Mariana 1982) > Statistical modelli...

Statistical modelling and analyses of DNA sequence data with applications to metagenomics

Buongermino Pereira, Mariana, 1982 (författare)
Gothenburg University,Göteborgs universitet,Institutionen för matematiska vetenskaper,Department of Mathematical Sciences,Chalmers tekniska högskola,Chalmers University of Technology
 (creator_code:org_t)
ISBN 9789175976075
Göteborg : Chalmers University of Technology, 2017
Engelska.
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • Microorganisms are organised in complex communities and are ubiquitous in all ecosystems, including natural environments and inside the human gut. Metagenomics, which is the direct sequencing of DNA from a sample, enables studying the collective genomes of the organisms that are there present. However, the resulting data is highly variable, and statistical models are therefore necessary to assure correct biological interpretations. This thesis aims to develop statistical models that provide an increased understanding of metagenomics data. In Paper I, we develop, implement and evaluate HattCI, which is a high-performance generalised hidden Markov model for the identification of integron-associated attC sites in DNA sequence data. In Paper II, we implement HattCI and other bioinformatics tools into a computational method to identify and characterise the biological functions of integron-mediated genes. The method is used to identify 13,397 integron-mediated genes present in metagenomic data. In Paper III, we provide a conceptual overview of the computational and statistical challenges involved in analysing gene abundance data. In Paper IV, we perform a comprehensive evaluation of nine normalisation methods for metagenomic gene abundance data. Our results highlight the importance of using a suitable method to avoid introducing an unacceptably high rate of false positives. The methods presented in this thesis improve the analysis of metagenomic data and thereby the understanding of microbial communities. Specifically, this thesis highlights the importance of statistical modelling in addressing the large variability of high-dimensional biological data and ensuring its sound interpretation.

Ämnesord

NATURVETENSKAP  -- Matematik -- Sannolikhetsteori och statistik (hsv//swe)
NATURAL SCIENCES  -- Mathematics -- Probability Theory and Statistics (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Bioinformatik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Bioinformatics (hsv//eng)

Nyckelord

generalised hidden Markov models
gene abundance data
metagenomics
statistical modelling
bioinformatics
normalisation
DNA sequence data
bioinformatics

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy