SwePub
Sök i LIBRIS databas

  Utökad sökning

onr:"swepub:oai:DiVA.org:uu-459152"
 

Sökning: onr:"swepub:oai:DiVA.org:uu-459152" > Elucidation of comp...

Elucidation of complex diseases by machine learning

Smolinska Garbulowska, Karolina (författare)
Uppsala universitet,Beräkningsbiologi och bioinformatik,Science for Life Laboratory, SciLifeLab
Komorowski, Jan, Professor (preses)
Uppsala universitet,Science for Life Laboratory, SciLifeLab,Beräkningsbiologi och bioinformatik,Kollegiet för avancerade studier (SCAS)
Wadelius, Claes, Professor, 1955- (preses)
Uppsala universitet,Science for Life Laboratory, SciLifeLab,Medicinsk genetik och genomik
visa fler...
Lappalainen, Tuuli, Professor (opponent)
SciLifeLab, KTH Royal Institute of Technology & New York Genome Center
visa färre...
 (creator_code:org_t)
ISBN 9789151313474
Uppsala : Acta Universitatis Upsaliensis, 2021
Engelska 64 s.
Serie: Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, 1651-6214 ; 2096
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • Uncovering the interpretability of models for complex health-related problems is a crucial task that is often neglected in machine learning (ML). The amount of available data makes the problem even more complicated. The focal point of my research was building and applying specialized tools that identify relevant descriptors (features and their values). These tools cover a spectrum of methods that originate in ML, statistics and network visualization.In the first part of the thesis, we predicted regulatory elements with potential regulatory impact on gene expression by incorporating several annotations tracks. Then, we created the funMotifs framework that enables the identification and analysis of functional transcription factor (TF) motifs in a tissue-specific manner (Paper I). The TF motifs were described by different chromatin signals from various genomics platforms. Afterwards, the data were merged into a functional score of the motif using logistic regression.Subsequently, funMotifs was used to characterize a map of regulatory mutations and regulatory elements in 37 cancer types from 2,515 samples (Paper II). We were able to identify 5,749 mutated regulatory elements containing 11,962 regulatory mutations. Additionally, we identified several dysregulated cancer-associated genes nearby the mutated elements. Finally, enrichment of cancer-related pathways was observed for the genes linked with the mutated elements.In the second part, we focused on interpretable ML modeling with rule-based classifiers. A rule-based model (RBM) consists of a set of IF-THEN rules, which are legible and allow to determine combinations of descriptors. To analyze RBMs, we created the R.ROSETTA R package that is a wrapper of ROSETTA (Paper III). As a result R.ROSETTA gained several additional functionalities that simplify validation and interpretation of RBMs.Visual inspection of RBMs is an essential step towards the identification of interesting descriptors of a classifier. In order to support the analysis of complex RBMs, we created the VisuNet R tool for rule network (RN) visualization (Paper IV). These networks are constructed from IF-THEN rules that constitute RBM; nodes are descriptors in rules, and an edge connects two nodes if the corresponding descriptors occur in the same rule. By creating RN for RBM, we are able to use network concepts to analyze complex health-related processes. We applied VisuNet on various datasets to illustrate the properties of the tool.In our studies, we showed the importance of identification of relevant descriptors for biological problems. Moreover, our methods may contribute to a better understanding of complex diseases.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Bioinformatik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Bioinformatics (hsv//eng)

Nyckelord

machine learning
complex disease
multi-omics
regulatory element
gene regulation
transcription factor motif
networks
rough sets
Bioinformatics
Bioinformatik

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy