SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Diamanti Klev) "

Sökning: WFRF:(Diamanti Klev)

  • Resultat 1-31 av 31
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Atienza-Párraga, Alba, et al. (författare)
  • Epigenomic re-configuration of primary multiple myeloma underlies the synergistic effect of combined DNMT and EZH2 inhibition.
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Multiple myeloma (MM) is characterized by an overexpression of EZH2 and a subsequent increase in H3K27me3-mediated silencing. However, the genome-wide redistribution of this mark in context with other epigenetic tags remains largely unexplored. Here, we show that EZH2 physically interacts with DNMT1 and that combined inhibition leads to a reduced G2/M arrest and increased apoptosis in MM. In addition, we present a catalogue of the genomic regulatory regions in normal plasma cells (NPC) as defined by their individual combination of histone marks. We used ChIP-seq and ATAC-seq data to generate whole-genome NPC chromatin annotations which we further analysed using DNA methylation arrays and RNA-seq. Comparison between NPC and MM demonstrated that, despite the global hypomethylation, enhancers show a tendency towards a higher DNA methylation levels in MM, whereas Polycomb and heterochromatic sites, highly methylated in NPC, show intermediate levels of the mark. Across all examined regulatory regions, 5-azacytidine treatment strongly reduced DNA methylation in MM. Furthermore, we find an extensive re-structuration of the global histone patterns in MM. We noticed a widespread increase in H3K27me3 except at active TSSs/promoters and enhancers, where we found a selective gain of the mark, suggestive of a directed silencing. In contrast, poised TSSs lose H3K27me3 and gain the activation mark H3K27ac, reflecting potential activation. Taken together, we present a comprehensive map of the epigenomic changes in MM as compared to NPC and provide insights into the interplay between EZH2 and DNMT1 in MM.
  •  
3.
  • Campbell, PJ, et al. (författare)
  • Pan-cancer analysis of whole genomes
  • 2020
  • Ingår i: Nature. - : Springer Science and Business Media LLC. - 1476-4687 .- 0028-0836. ; 578:7793, s. 82-
  • Tidskriftsartikel (refereegranskat)abstract
    • Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1–3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10–18.
  •  
4.
  • Carlevaro-Fita, J, et al. (författare)
  • Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis
  • 2020
  • Ingår i: Communications biology. - : Springer Science and Business Media LLC. - 2399-3642. ; 3:1, s. 56-
  • Tidskriftsartikel (refereegranskat)abstract
    • Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis.
  •  
5.
  • Cavalli, Marco, et al. (författare)
  • A Multi-Omics Approach to Liver Diseases : Integration of Single Nuclei Transcriptomics with Proteomics and HiCap Bulk Data in Human Liver
  • 2020
  • Ingår i: Omics. - : Mary Ann Liebert Inc. - 1536-2310 .- 1557-8100. ; 24:4, s. 180-194
  • Tidskriftsartikel (refereegranskat)abstract
    • The liver is the largest solid organ and a primary metabolic hub. In recent years, intact cell nuclei were used to perform single-nuclei RNA-seq (snRNA-seq) for tissues difficult to dissociate and for flash-frozen archived tissue samples to discover unknown and rare cell subpopulations. In this study, we performed snRNA-seq of a liver sample to identify subpopulations of cells based on nuclear transcriptomics. In 4282 single nuclei, we detected, on average, 1377 active genes and we identified seven major cell types. We integrated data from 94,286 distal interactions (p < 0.05) for 7682 promoters from a targeted chromosome conformation capture technique (HiCap) and mass spectrometry proteomics for the same liver sample. We observed a reasonable correlation between proteomics and in silico bulk snRNA-seq (r = 0.47) using tissue-independent gene-specific protein abundancy estimation factors. We specifically looked at genes of medical importance. The DPYD gene is involved in the pharmacogenetics of fluoropyrimidine toxicity and some of its variants are analyzed for clinical purposes. We identified a new putative polymorphic regulatory element, which may contribute to variation in toxicity. Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer and we investigated all known risk genes. We identified a complex regulatory landscape for the SLC2A2 gene with 16 candidate enhancers. Three of them harbor somatic motif breaking and other mutations in HCC in the Pan Cancer Analysis of Whole Genomes dataset and are candidates to contribute to malignancy. Our results highlight the potential of a multi-omics approach in the study of human diseases.
  •  
6.
  • Cavalli, Marco, et al. (författare)
  • Single Nuclei Transcriptome Analysis of Human Liver with Integration of Proteomics and Capture Hi-C Bulk Tissue Data
  • Tidskriftsartikel (refereegranskat)abstract
    • The liver is the largest solid organ and a primary metabolic hub. In recent years, intact cell nuclei were used to perform single-nuclei RNA-seq (snRNA-seq) for tissues difficult to dissociate and for flash-frozen archived tissue samples to discover unknown and rare cell sub-populations. In this study, we performed snRNA-seq of a liver sample to identify sub-populations of cells based on nuclear transcriptomics. In 4,282 single nuclei we detected on average 1,377 active genes and we identified seven major cell types. We integrated data from 94,286 distal interactions (p<0.05) for 7,682 promoters from a targeted chromosome conformation capture technique (HiCap) and mass spectrometry (MS) proteomics for the same liver sample. We observed a reasonable correlation between proteomics and in silico bulk snRNA-seq (r=0.47) using tissue-independent gene-specific protein abundancy estimation factors. We specifically looked at genes of medical importance. The DPYD gene is involved in the pharmacogenetics of fluoropyrimidines toxicity and some of its variants are analyzed for clinical purposes. We identified a new putative polymorphic regulatory element, which may contribute to variation in toxicity. Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer and we investigated all known risk genes. We found a complex regulatory network for the SLC2A2 gene with 16 candidate enhancers. Three of them harbor somatic motif breaking and other mutations in HCC in the Pan Cancer Analysis of Whole Genomes dataset and are candidates to contribute to malignancy. Our results highlight the potential of a multi-omics approach in the study of human diseases.
  •  
7.
  • Cavalli, Marco, et al. (författare)
  • The Thioesterase ACOT1 as a Regulator of Lipid Metabolism in Type 2 Diabetes Detected in a Multi-Omics Study of Human Liver
  • 2021
  • Ingår i: Omics. - : Mary Ann Liebert. - 1536-2310 .- 1557-8100. ; 25:10, s. 652-659
  • Tidskriftsartikel (refereegranskat)abstract
    • Type 2 diabetes (T2D) is characterized by pathophysiological alterations in lipid metabolism. One strategy to understand the molecular mechanisms behind these abnormalities is to identify cis-regulatory elements (CREs) located in chromatin-accessible regions of the genome that regulate key genes. In this study we integrated assay for transposase-accessible chromatin followed by sequencing (ATAC-seq) data, widely used to decode chromatin accessibility, with multi-omics data and publicly available CRE databases to identify candidate CREs associated with T2D for further experimental validations. We performed high-sensitive ATAC-seq in nine human liver samples from normal and T2D donors, and identified a set of differentially accessible regions (DARs). We identified seven DARs including a candidate enhancer for the ACOT1 gene that regulates the balance of acyl-CoA and free fatty acids (FFAs) in the cytoplasm. The relevance of ACOT1 regulation in T2D was supported by the analysis of transcriptomics and proteomics data in liver tissue. Long-chain acyl-CoA thioesterases (ACOTs) are a group of enzymes that hydrolyze acyl-CoA esters to FFAs and coenzyme A. ACOTs have been associated with regulation of triglyceride levels, fatty acid oxidation, mitochondrial function, and insulin signaling, linking their regulation to the pathogenesis of T2D. Our strategy integrating chromatin accessibility with DNA binding and other types of omics provides novel insights on the role of genetic regulation in T2D and is extendable to other complex multifactorial diseases.
  •  
8.
  • Dabrowski, Michal J., et al. (författare)
  • Unveiling new interdependencies between significant DNA methylation sites, gene expression profiles and glioma patients survival
  • 2018
  • Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • In order to find clinically useful prognostic markers for glioma patients' survival, we employed Monte Carlo Feature Selection and Interdependencies Discovery (MCFS-ID) algorithm on DNA methylation (HumanMethylation450 platform) and RNA-seq datasets from The Cancer Genome Atlas (TCGA) for 88 patients observed until death. The input features were ranked according to their importance in predicting patients' longer (400+ days) or shorter (<= 400 days) survival without prior classification of the patients. Interestingly, out of the 65 most important features found, 63 are methylation sites, and only two mRNAs. Moreover, 61 out of the 63 methylation sites are among those detected by the 450 k array technology, while being absent in the HumanMethylation27. The most important methylation feature (cg15072976) overlaps with the RE1 Silencing Transcription Factor (REST) binding site, and was confirmed to intersect with the REST binding motif in human U87 glioma cells. Six additional methylation sites from the top 63 overlap with REST sites. We found that the methylation status of the cg15072976 site affects transcription factor binding in U87 cells in gel shift assay. The cg15072976 methylation status discriminates <= 400 and 400+ patients in an independent dataset from TCGA and shows positive association with survival time as evidenced by Kaplan-Meier plots.
  •  
9.
  • Diamanti, Klev, 1987- (författare)
  • Integrating multi-omics for type 2 diabetes : Data science and big data towards personalized medicine
  • 2019
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Type 2 diabetes (T2D) is a complex metabolic disease characterized by multi-tissue insulin resistance and failure of the pancreatic β-cells to secrete sufficient amounts of insulin. Cells recruit transcription factors (TF) to specific genomic loci to regulate gene expression that consequently affects the protein and metabolite abundancies. Here we investigated the interplay of transcriptional and translational regulation, and its impact on metabolome and phenome for several insulin-resistant tissues from T2D donors. We implemented computational tools and multi-omics integrative approaches that can facilitate the selection of candidate combinatorial markers for T2D.We developed a data-driven approach to identify putative regulatory regions and TF-interaction complexes. The cell-specific sets of regulatory regions were enriched for disease-related single nucleotide polymorphisms (SNPs), highlighting the importance of such loci towards the genomic stability and the regulation of gene expression. We employed a similar principle in a second study where we integrated single nucleus ribonucleic acid sequencing (snRNA-seq) with bulk targeted chromosome-conformation-capture (HiCap) and mass spectrometry (MS) proteomics from liver. We identified a putatively polymorphic site that may contribute to variation in the pharmacogenetics of fluoropyrimidines toxicity for the DPYD gene. Additionally, we found a complex regulatory network between a group of 16 enhancers and the SLC2A2 gene that has been linked to increased risk for hepatocellular carcinoma (HCC). Moreover, three enhancers harbored motif-breaking mutations located in regulatory regions of a cohort of 314 HCC cases, and were candidate contributors to malignancy.In a cohort of 43 multi-organ donors we explored the alternating pattern of metabolites among visceral adipose tissue (VAT), pancreatic islets, skeletal muscle, liver and blood serum samples. A large fraction of lysophosphatidylcholines (LPC) decreased in muscle and serum of T2D donors, while a large number of carnitines increased in liver and blood of T2D donors, confirming that changes in metabolites occur in primary tissues, while their alterations in serum consist a secondary event. Next, we associated metabolite abundancies from 42 subjects to glucose uptake, fat content and volume of various organs measured by positron emission tomography/magnetic resonance imaging (PET/MRI). The fat content of the liver was positively associated with the amino acid tyrosine, and negatively associated with LPC(P-16:0). The insulin sensitivity of VAT and subcutaneous adipose tissue was positively associated with several LPCs, while the opposite applied to branch-chained amino acids. Finally, we presented the network visualization of a rule-based machine learning model that predicted non-diabetes and T2D in an “unseen” dataset with 78% accuracy.
  •  
10.
  • Diamanti, Klev, 1987-, et al. (författare)
  • Integration of whole-body [18F]FDG PET/MRI with non-targeted metabolomics can provide new insights on tissue-specific insulin resistance in type 2 diabetes
  • 2020
  • Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 10:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Alteration of various metabolites has been linked to type 2 diabetes (T2D) and insulin resistance. However, identifying significant associations between metabolites and tissue-specific phenotypes requires a multi-omics approach. In a cohort of 42 subjects with different levels of glucose tolerance (normal, prediabetes and T2D) matched for age and body mass index, we calculated associations between parameters of whole-body positron emission tomography (PET)/magnetic resonance imaging (MRI) during hyperinsulinemic euglycemic clamp and non-targeted metabolomics profiling for subcutaneous adipose tissue (SAT) and plasma. Plasma metabolomics profiling revealed that hepatic fat content was positively associated with tyrosine, and negatively associated with lysoPC(P-16:0). Visceral adipose tissue (VAT) and SAT insulin sensitivity (Ki), were positively associated with several lysophospholipids, while the opposite applied to branched-chain amino acids. The adipose tissue metabolomics revealed a positive association between non-esterified fatty acids and, VAT and liver Ki. Bile acids and carnitines in adipose tissue were inversely associated with VAT Ki. Furthermore, we detected several metabolites that were significantly higher in T2D than normal/prediabetes. In this study we present novel associations between several metabolites from SAT and plasma with the fat fraction, volume and insulin sensitivity of various tissues throughout the body, demonstrating the benefit of an integrative multi-omics approach.
  •  
11.
  • Diamanti, Klev, 1987-, et al. (författare)
  • Integration of whole-body PET/MRI with non-targeted metabolomics provides new insights into insulin sensitivity of various tissues
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Background: Alteration of various metabolites has been linked to type 2 diabetes (T2D) and insulin resistance. However, identifying significant associations between metabolites and tissue-specific alterations is challenging and requires a multi-omics approach. In this study, we aimed at discovering associations of metabolites from subcutaneous adipose tissue (SAT) and plasma with the volume, the fat fraction (FF) and the insulin sensitivity (Ki) of specific tissues using [18F]FDG PET/MRI.Materials and Methods: In a cohort of 42 subjects with different levels of glucose tolerance (normal, prediabetes and T2D) matched for age and body-mass-index (BMI) we calculated associations between parameters of whole-body FDG PET/MRI during clamp and non-targeted metabolomics profiling for SAT and blood plasma. We also used a rule-based classifier to identify a large collection of prevalent patterns of co-dependent metabolites that characterize non-diabetes (ND) and T2D.Results: The plasma metabolomics profiling revealed that hepatic fat content was positively associated with tyrosine, and negatively associated with lysoPC(P-16:0). Ki in visceral adipose tissue (VAT) and SAT, was positively associated with several species of lysophospholipids while the opposite applied to branched-chain amino acids (BCAA) and their intermediates. The adipose tissue metabolomics revealed a positive association between non-esterified fatty acids and, VAT and liver Ki. On the contrary, bile acids and carnitines in adipose tissue were inversely associated with VAT Ki. Finally, we presented a transparent machine-learning model that predicted ND or T2D in “unseen” data with an accuracy of 78%.Conclusions: Novel associations of several metabolites from SAT and plasma with the FF, volume and insulin senstivity of various tissues throughout the body were discovered using PET/MRI and a new integrative multi-omics approach. A promising computational model that predicted ND and T2D with high certainty, suggested novel non-linear interdependencies of metabolites.
  •  
12.
  • Diamanti, Klev, et al. (författare)
  • Intra- and inter-individual metabolic profiling highlights carnitine and lysophosphatidylcholine pathways as key molecular defects in type 2 diabetes
  • 2019
  • Ingår i: Scientific reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 9:1, s. 9653-
  • Tidskriftsartikel (refereegranskat)abstract
    • Type 2 diabetes (T2D) mellitus is a complex metabolic disease commonly caused by insulin resistance in several tissues. We performed a matched two-dimensional metabolic screening in tissue samples from 43 multi-organ donors. The intra-individual analysis was assessed across five key metabolic tissues (serum, visceral adipose tissue, liver, pancreatic islets and skeletal muscle), and the inter-individual across three different groups reflecting T2D progression. We identified 92 metabolites differing significantly between non-diabetes and T2D subjects. In diabetes cases, carnitines were significantly higher in liver, while lysophosphatidylcholines were significantly lower in muscle and serum. We tracked the primary tissue of origin for multiple metabolites whose alterations were reflected in serum. An investigation of three major stages spanning from controls, to pre-diabetes and to overt T2D indicated that a subset of lysophosphatidylcholines was significantly lower in the muscle of pre-diabetes subjects. Moreover, glycodeoxycholic acid was significantly higher in liver of pre-diabetes subjects while additional increase in T2D was insignificant. We confirmed many previously reported findings and substantially expanded on them with altered markers for early and overt T2D. Overall, the analysis of this unique dataset can increase the understanding of the metabolic interplay between organs in the development of T2D.
  •  
13.
  • Diamanti, Klev, et al. (författare)
  • Maps of context-dependent putative regulatory regions and genomic signal interactions
  • 2016
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 44:19, s. 9110-9120
  • Tidskriftsartikel (refereegranskat)abstract
    • Gene transcription is regulated mainly by transcription factors (TFs). ENCODE and Roadmap Epigenomics provide global binding profiles of TFs, which can be used to identify regulatory regions. To this end we implemented a method to systematically construct cell-type and species-specific maps of regulatory regions and TF-TF interactions. We illustrated the approach by developing maps for five human cell-lines and two other species. We detected similar to 144k putative regulatory regions among the human cell-lines, with the majority of them being similar to 300 bp. We found similar to 20k putative regulatory elements in the ENCODE heterochromatic domains suggesting a large regulatory potential in the regions presumed transcriptionally silent. Among the most significant TF interactions identified in the heterochromatic regions were CTCF and the cohesin complex, which is in agreement with previous reports. Finally, we investigated the enrichment of the obtained putative regulatory regions in the 3D chromatin domains. More than 90% of the regions were discovered in the 3D contacting domains. We found a significant enrichment of GWAS SNPs in the putative regulatory regions. These significant enrichments provide evidence that the regulatory regions play a crucial role in the genomic structural stability. Additionally, we generated maps of putative regulatory regions for prostate and colorectal cancer human cell-lines.
  •  
14.
  • Diamanti, Klev, 1987-, et al. (författare)
  • Organ-specific metabolic pathways distinguish prediabetes, type 2 diabetes, and normal tissues
  • 2022
  • Ingår i: Cell Reports Medicine. - : Elsevier BV. - 2666-3791. ; 3:10
  • Tidskriftsartikel (refereegranskat)abstract
    • Environmental and genetic factors cause defects in pancreatic islets driving type 2 diabetes (T2D) together with the progression of multi-tissue insulin resistance. Mass spectrometry proteomics on samples from five key metabolic tissues of a cross-sectional cohort of 43 multi-organ donors provides deep coverage of their proteomes. Enrichment analysis of Gene Ontology terms provides a tissue-specific map of altered biological processes across healthy, prediabetes (PD), and T2D subjects. We find widespread alterations in several relevant biological pathways, including increase in hemostasis in pancreatic islets of PD, increase in the complement cascade in liver and pancreatic islets of PD, and elevation in cholesterol biosynthesis in liver of T2D. Our findings point to inflammatory, immune, and vascular alterations in pancreatic islets in PD that are hypotheses to be tested for potential contributions to hormonal perturbations such as impaired insulin and increased glucagon production. This multi-tissue proteomic map suggests tissue-specific metabolic dysregulations in T2D. © 2022 The Author(s)
  •  
15.
  • Diamanti, Klev, 1987-, et al. (författare)
  • Single nucleus transcriptomics data integration recapitulates the major cell types in human liver
  • 2021
  • Ingår i: Hepatology Research. - : Wiley. - 1386-6346 .- 1872-034X. ; 51:2, s. 233-238
  • Tidskriftsartikel (refereegranskat)abstract
    • Hepatology Research published by John Wiley & Sons Australia, Ltd on behalf of Japan Society of Hepatology Aim: The aim of this study was to explore the benefits of data integration from different platforms for single nucleus transcriptomics profiling to characterize cell populations in human liver. Methods: We generated single-nucleus RNA sequencing data from Chromium 10X Genomics and Drop-seq for a human liver sample. We utilized state of the art bioinformatics tools to undertake a rigorous quality control and to integrate the data into a common space summarizing the gene expression variation from the respective platforms, while accounting for known and unknown confounding factors. Results: Analysis of single nuclei transcriptomes from both 10X and Drop-seq allowed identification of the major liver cell types, while the integrated set obtained enough statistical power to separate a small population of inactive hepatic stellate cells that was not characterized in either of the platforms. Conclusions: Integration of droplet-based single nucleus transcriptomics data enabled identification of a small cluster of inactive hepatic stellate cells that highlights the potential of our approach. We suggest single-nucleus RNA sequencing integrative approaches could be utilized to design larger and cost-effective studies.
  •  
16.
  • Dramiński, Michał, et al. (författare)
  • Discovering Networks of Interdependent Features in High-Dimensional Problems
  • 2016
  • Ingår i: Big Data Analysis. - Cham : Springer. - 9783319269894 ; , s. 285-304
  • Bokkapitel (refereegranskat)abstract
    • The availability of very large data sets in Life Sciences provided earlier by the technological breakthroughs such as microarrays and more recently by various forms of sequencing has created both challenges in analyzing these data as well as new opportunities. A promising, yet underdeveloped approach to Big Data, not limited to Life Sciences, is the use of feature selection and classification to discover interdependent features. Traditionally, classifiers have been developed for the best quality of supervised classification. In our experience, more often than not, rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying observations (objects, samples) into distinct classes and what the interdependencies between the features that describe the observation. Our underlying hypothesis is that the interdependent features and rule networks do not only reflect some syntactical properties of the data and classifiers but also may convey meaningful clues about true interactions in the modeled biological system. In this chapter we develop further our method of Monte Carlo Feature Selection and Interdependency Discovery (MCFS and MCFS-ID, respectively), which are particularly well suited for high-dimensional problems, i.e., those where each observation is described by very many features, often many more features than the number of observations. Such problems are abundant in Life Science applications. Specifically, we define Inter-Dependency Graphs (termed, somewhat confusingly, ID Graphs) that are directed graphs of interactions between features extracted by aggregation of information from the classification trees constructed by the MCFS algorithm. We then proceed with modeling interactions on a finer level with rule networks. We discuss some of the properties of the ID graphs and make a first attempt at validating our hypothesis on a large gene expression data set for CD4+ T-cells. The MCFS-ID and ROSETTA including the Ciruvis approach offer a new methodology for analyzing Big Data from feature selection, through identification of feature interdependencies, to classification with rules according to decision classes, to construction of rule networks. Our preliminary results confirm that MCFS-ID is applicable to the identification of interacting features that are functionally relevant while rule networks offer a complementary picture with finer resolution of the interdependencies on the level of feature-value pairs.
  •  
17.
  • Garbulowski, Mateusz, et al. (författare)
  • Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder
  • 2021
  • Ingår i: Frontiers in Genetics. - : Frontiers Media S.A.. - 1664-8021. ; 12
  • Tidskriftsartikel (refereegranskat)abstract
    • Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.
  •  
18.
  •  
19.
  • Garbulowski, Mateusz, et al. (författare)
  • Machine Learning-Based Analysis of Glioma Grades Reveals Co-Enrichment
  • 2022
  • Ingår i: Cancers. - : MDPI AG. - 2072-6694. ; 14:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Simple Summary Gliomas are heterogenous types of cancer, therefore the therapy should be personalized and targeted toward specific pathways. We developed a methodology that corrected strong batch effects from The Cancer Genome Atlas datasets and estimated glioma grade-specific co-enrichment mechanisms using machine learning. Our findings created hypotheses for annotations, e.g., pathways, that should be considered as therapeutic targets. Gliomas develop and grow in the brain and central nervous system. Examining glioma grading processes is valuable for improving therapeutic challenges. One of the most extensive repositories storing transcriptomics data for gliomas is The Cancer Genome Atlas (TCGA). However, such big cohorts should be processed with caution and evaluated thoroughly as they can contain batch and other effects. Furthermore, biological mechanisms of cancer contain interactions among biomarkers. Thus, we applied an interpretable machine learning approach to discover such relationships. This type of transparent learning provides not only good predictability, but also reveals co-predictive mechanisms among features. In this study, we corrected the strong and confounded batch effect in the TCGA glioma data. We further used the corrected datasets to perform comprehensive machine learning analysis applied on single-sample gene set enrichment scores using collections from the Molecular Signature Database. Furthermore, using rule-based classifiers, we displayed networks of co-enrichment related to glioma grades. Moreover, we validated our results using the external glioma cohorts. We believe that utilizing corrected glioma cohorts from TCGA may improve the application and validation of any future studies. Finally, the co-enrichment and survival analysis provided detailed explanations for glioma progression and consequently, it should support the targeted treatment.
  •  
20.
  • Garbulowski, Mateusz, et al. (författare)
  • R.ROSETTA : an interpretable machine learning framework
  • 2021
  • Ingår i: BMC Bioinformatics. - : BioMed Central (BMC). - 1471-2105. ; 22:1
  • Tidskriftsartikel (refereegranskat)abstract
    • BackgroundMachine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components.ResultsWe present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA. To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case–control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes.ConclusionsR.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
  •  
21.
  •  
22.
  • Kutashev, Konstantin O., et al. (författare)
  • Nucleolar rDNA folds into condensed foci with a specific combination of epigenetic marks
  • 2021
  • Ingår i: The Plant Journal. - : John Wiley & Sons. - 0960-7412 .- 1365-313X. ; 105:6, s. 1534-1548
  • Tidskriftsartikel (refereegranskat)abstract
    • Arabidopsis thaliana 45S ribosomal genes (rDNA) are located in tandem arrays called nucleolus organizing regions on the termini of chromosomes 2 and 4 (NOR2 and NOR4) and encode rRNA, a crucial structural element of the ribosome. The current model of rDNA organization suggests that inactive rRNA genes accumulate in the condensed chromocenters in the nucleus and at the nucleolar periphery, while the nucleolus delineates active genes. We challenge the perspective that all intranucleolar rDNA is active by showing that a subset of nucleolar rDNA assembles into condensed foci marked by H3.1 and H3.3 histones that also contain the repressive H3K9me2 histone mark. By using plant lines containing a low number of rDNA copies, we further found that the condensed foci relate to the folding of rDNA, which appears to be a common mechanism of rDNA regulation inside the nucleolus. The H3K9me2 histone mark found in condensed foci represents a typical modification of bulk inactive rDNA, as we show by genome-wide approaches, similar to the H2A.W histone variant. The euchromatin histone marks H3K27me3 and H3K4me3, in contrast, do not colocalize with nucleolar foci and their overall levels in the nucleolus are very low. We further demonstrate that the rDNA promoter is an important regulatory region of the rDNA, where the distribution of histone variants and histone modifications are modulated in response to rDNA activity.
  •  
23.
  • Pan, Gang, et al. (författare)
  • Multifaceted regulation of hepatic lipid metabolism by YY1
  • 2021
  • Ingår i: Life Science Alliance. - : LIFE SCIENCE ALLIANCE LLC. - 2575-1077. ; 4:7
  • Tidskriftsartikel (refereegranskat)abstract
    • Recent studies suggested that dysregulated YY1 plays a pivotal role in many liver diseases. To obtain a detailed view of genes and pathways regulated by YY1 in the liver, we carried out RNA sequencing in HepG2 cells after YY1 knockdown. A rigid set of 2,081 differentially expressed genes was identified by comparing the YY1-knockdown samples (n = 8) with the control samples (n = 14). YY1 knockdown significantly decreased the expression of several key transcription factors and their coactivators in lipid metabolism. This is illustrated by YY1 regulating PPARA expression through binding to its promoter and enhancer regions. Our study further suggest that down-regulation of the key transcription factors together with YY1 knockdown significantly decreased the cooperation between YY1 and these transcription factors at various regulatory regions, which are important in regulating the expression of genes in hepatic lipid metabolism. This was supported by the finding that the expression of SCD and ELOVL6, encoding key enzymes in lipogenesis, were regulated by the cooperation between YY1 and PPARA/RXRA complex over their promoters.
  •  
24.
  • Rheinbay, E, et al. (författare)
  • Analyses of non-coding somatic drivers in 2,658 cancer whole genomes
  • 2020
  • Ingår i: Nature. - : Springer Science and Business Media LLC. - 1476-4687 .- 0028-0836. ; 578:7793, s. 102-
  • Tidskriftsartikel (refereegranskat)abstract
    • The discovery of drivers of cancer has traditionally focused on protein-coding genes1–4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5′ region of TP53, in the 3′ untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.
  •  
25.
  •  
26.
  •  
27.
  • Stepniak, Karolina, et al. (författare)
  • Mapping chromatin accessibility and active regulatory elements reveals pathological mechanisms in human gliomas
  • 2021
  • Ingår i: Nature Communications. - : Springer Nature. - 2041-1723. ; 12:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Chromatin structure and accessibility, and combinatorial binding of transcription factors to regulatory elements in genomic DNA control transcription. Genetic variations in genes encoding histones, epigenetics-related enzymes or modifiers affect chromatin structure/dynamics and result in alterations in gene expression contributing to cancer development or progression. Gliomas are brain tumors frequently associated with epigenetics-related gene deregulation. We perform whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples to unravel epigenetic dysfunctions driving gliomagenesis. Based on the results of the integrative analysis of the acquired profiles, we create an atlas of active enhancers and promoters in benign and malignant gliomas. We explore these elements and intersect with Hi-C data to uncover molecular mechanisms instructing gene expression in gliomas. Gliomas are tumors often associated with epigenetics-related gene deregulation. Here the authors reveal an atlas of active enhancers and promoters in benign and malignant gliomas by performing whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples.
  •  
28.
  • Umer, Husen M., et al. (författare)
  • A Significant Regulatory Mutation Burden at a High-Affinity Position of the CTCF Motif in Gastrointestinal Cancers
  • 2016
  • Ingår i: Human Mutation. - : Hindawi Limited. - 1059-7794 .- 1098-1004. ; 37:9, s. 904-913
  • Tidskriftsartikel (refereegranskat)abstract
    • Somatic mutations drive cancer and there are established ways to study those in coding sequences. It has been shown that some regulatory mutations are over-represented in cancer. We develop a new strategy to find putative regulatory mutations based on experimentally established motifs for transcription factors (TFs). In total, we find 1,552 candidate regulatory mutations predicted to significantly reduce binding affinity of many TFs in hepatocellular carcinoma and affecting binding of CTCF also in esophagus, gastric, and pancreatic cancers. Near mutated motifs, there is a significant enrichment of (1) genes mutated in cancer, (2) tumor-suppressor genes, (3) genes in KEGG cancer pathways, and (4) sets of genes previously associated to cancer. Experimental and functional validations support the findings. The strategy can be applied to identify regulatory mutations in any cell type with established TF motifs and will aid identifications of genes contributing to cancer.
  •  
29.
  • Yones, Sara A., et al. (författare)
  • Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data
  • 2022
  • Ingår i: Scientific Reports. - : Springer Nature. - 2045-2322. ; 12
  • Tidskriftsartikel (refereegranskat)abstract
    • Transcriptomic analyses are commonly used to identify differentially expressed genes between patients and controls, or within individuals across disease courses. These methods, whilst effective, cannot encompass the combinatorial effects of genes driving disease. We applied rule-based machine learning (RBML) models and rule networks (RN) to an existing paediatric Systemic Lupus Erythematosus (SLE) blood expression dataset, with the goal of developing gene networks to separate low and high disease activity (DA1 and DA3). The resultant model had an 81% accuracy to distinguish between DA1 and DA3, with unsupervised hierarchical clustering revealing additional subgroups indicative of the immune axis involved or state of disease flare. These subgroups correlated with clinical variables, suggesting that the gene sets identified may further the understanding of gene networks that act in concert to drive disease progression. This included roles for genes i) induced by interferons (IFI35 and OTOF), ii) key to SLE cell types (KLRB1 encoding CD161), or iii) with roles in autophagy and NF-κB pathway responses (CKAP4). As demonstrated here, RBML approaches have the potential to reveal novel gene patterns from within a heterogeneous disease, facilitating patient clinical and therapeutic stratification. 
  •  
30.
  • Yones, Sara A., et al. (författare)
  • MetaFetcheR : An R Package for Complete Mapping of Small-Compound Data
  • 2021
  • Ingår i: Metabolites. - : MDPI. - 2218-1989 .- 2218-1989. ; 11:11
  • Tidskriftsartikel (refereegranskat)abstract
    • Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. A lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets.
  •  
31.
  • Yones, Sara A., et al. (författare)
  • Supplementary material: Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data
  • 2021
  • Annan publikationabstract
    • Transcriptomic analyses are commonly used to identify differentially expressed genes between patients and controls, or within individuals across disease courses. These methods, whilst effective, cannot encompass the combinatorial effects of genes driving disease. We applied rule-based machine learning (RBML) models and rule networks (RN) to an existing paediatric Systemic Lupus Erythematosus (SLE) blood expression dataset, with the goal of developing gene networks to separate low and high disease activity (DA1 and DA3). The resultant model had an 81% accuracy to distinguish between DA1 and DA3, with unsupervised hierarchical clustering revealing additional subgroups indicative of the immune axis involved or state of disease flare. These subgroups correlated with clinical variables, suggesting that the gene sets identified may further the understanding of gene networks that act in concert to drive disease progression. This included roles for genes i) induced by interferons (IFI35 and OTOF), ii) key to SLE cell types (KLRB1 encoding CD161), or iii) with roles in autophagy and NF-κB pathway responses (CKAP4). As demonstrated here, RBML approaches have the potential to reveal novel gene patterns from within a heterogeneous disease, facilitating patient clinical and therapeutic stratification. 
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-31 av 31

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy