SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Koronacki Jacek) "

Sökning: WFRF:(Koronacki Jacek)

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Dabrowski, Michal J., et al. (författare)
  • Unveiling new interdependencies between significant DNA methylation sites, gene expression profiles and glioma patients survival
  • 2018
  • Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • In order to find clinically useful prognostic markers for glioma patients' survival, we employed Monte Carlo Feature Selection and Interdependencies Discovery (MCFS-ID) algorithm on DNA methylation (HumanMethylation450 platform) and RNA-seq datasets from The Cancer Genome Atlas (TCGA) for 88 patients observed until death. The input features were ranked according to their importance in predicting patients' longer (400+ days) or shorter (<= 400 days) survival without prior classification of the patients. Interestingly, out of the 65 most important features found, 63 are methylation sites, and only two mRNAs. Moreover, 61 out of the 63 methylation sites are among those detected by the 450 k array technology, while being absent in the HumanMethylation27. The most important methylation feature (cg15072976) overlaps with the RE1 Silencing Transcription Factor (REST) binding site, and was confirmed to intersect with the REST binding motif in human U87 glioma cells. Six additional methylation sites from the top 63 overlap with REST sites. We found that the methylation status of the cg15072976 site affects transcription factor binding in U87 cells in gel shift assay. The cg15072976 methylation status discriminates <= 400 and 400+ patients in an independent dataset from TCGA and shows positive association with survival time as evidenced by Kaplan-Meier plots.
  •  
2.
  • Dramiński, Michał, et al. (författare)
  • Discovering Networks of Interdependent Features in High-Dimensional Problems
  • 2016
  • Ingår i: Big Data Analysis. - Cham : Springer. - 9783319269894 ; , s. 285-304
  • Bokkapitel (refereegranskat)abstract
    • The availability of very large data sets in Life Sciences provided earlier by the technological breakthroughs such as microarrays and more recently by various forms of sequencing has created both challenges in analyzing these data as well as new opportunities. A promising, yet underdeveloped approach to Big Data, not limited to Life Sciences, is the use of feature selection and classification to discover interdependent features. Traditionally, classifiers have been developed for the best quality of supervised classification. In our experience, more often than not, rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying observations (objects, samples) into distinct classes and what the interdependencies between the features that describe the observation. Our underlying hypothesis is that the interdependent features and rule networks do not only reflect some syntactical properties of the data and classifiers but also may convey meaningful clues about true interactions in the modeled biological system. In this chapter we develop further our method of Monte Carlo Feature Selection and Interdependency Discovery (MCFS and MCFS-ID, respectively), which are particularly well suited for high-dimensional problems, i.e., those where each observation is described by very many features, often many more features than the number of observations. Such problems are abundant in Life Science applications. Specifically, we define Inter-Dependency Graphs (termed, somewhat confusingly, ID Graphs) that are directed graphs of interactions between features extracted by aggregation of information from the classification trees constructed by the MCFS algorithm. We then proceed with modeling interactions on a finer level with rule networks. We discuss some of the properties of the ID graphs and make a first attempt at validating our hypothesis on a large gene expression data set for CD4+ T-cells. The MCFS-ID and ROSETTA including the Ciruvis approach offer a new methodology for analyzing Big Data from feature selection, through identification of feature interdependencies, to classification with rules according to decision classes, to construction of rule networks. Our preliminary results confirm that MCFS-ID is applicable to the identification of interacting features that are functionally relevant while rule networks offer a complementary picture with finer resolution of the interdependencies on the level of feature-value pairs.
  •  
3.
  • Dramiński, Michał, 1980-, et al. (författare)
  • Monte Carlo feature selection and interdependency discovery in supervised classification
  • 2010
  • Ingår i: Advances in Machine Learning. - Heidelberg : Springer. - 9783642051784
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • Applications of machine learning techniques in Life Sciences are the main applications forcing a paradigm shift in the way these techniques are used. Rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying distinct classes and what are the interdependencies between the features. To this end we significantly extend our earlier work [Dramiński et al. (2008)] that introduced an effective and reliable method for ranking features according to their importance for classification. We begin with adding a method for finding a cut-off between informative and non-informative fea- tures and then continue with a development of a methodology and an implementa- tion of a procedure for determining interdependencies between informative features. The reliability of our approach rests on multiple construction of tree classifiers. Essentially, each classifier is trained on a randomly chosen subset of the original data using only a fraction of all of the observed features. This approach is conceptually simple yet computer-intensive. The methodology is validated on a large and difficult task of modelling HIV-1 reverse transcriptase resistance to drugs which is a good example of the aforementioned paradigm shift. We construct a classifier but of the main interest is the identification of mutation points (i.e. features) and their combinations that model drug resistance.
  •  
4.
  • Draminski, Michal, et al. (författare)
  • Monte Carlo feature selection for supervised classification
  • 2008
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:1, s. 110-117
  • Tidskriftsartikel (refereegranskat)abstract
    • MOTIVATION: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods.
  •  
5.
  • Kierczak, Marcin, 1981-, et al. (författare)
  • A Monte Carlo approach to modeling post-translational modification sites using local physicochemical properties.
  • Annan publikation (populärvet., debatt m.m.)abstract
    • Many proteins undergo various chemical modifications during or shortly after translation. Post-translational modifications (PTM) greatly contribute to the diversity of protein functions and play crucial role in many cellular processes. Therefore understanding where and why certain protein is modified is an important issue in biomedical research. Mechanisms underlying some types of PTMs have been elucidated but many still remain unknown and a number of tools for predicting PTMs from short sequence fragments exists. While usually accurate at predicting modification sites, these tools are not designed to increase the understanding of modification mechanisms. Here we attempted at building easy-to-interpret models of PTMs and at identifying the physicochemical properties significant for determining modification status. To this end we applied our Monte Carlo feature selection and interdependency discovery (MCFS-ID) method. Considering 9 aa-long sequence fragments that were represented in terms of their physicochem- ical properties we analyzed 76 types of PTMs and for each type we identified the properties that played significant (p ≤ 0.05) role in the classification process. For 17 types of modifications no significant prop- erty was found. For the remaining 59 types, we used the significant properties to construct random forest-based high quality predictive models. We also showed an example of how to interpret the models by analyzing interdependency networks of significant properties and how to complement the networks with decision rules inferred using rough set theory. The obtained results showed the necessity of applying feature selection prior to constructing a model that considers short sequence fragments. Interestingly, for some types of modifications we saw that models based on insignificant features can yield accurate results. This observation deserves further investigation. Among the examined PTMs we observed groups that share similar patterns of significant properties. We also showed how to complement our models with decision rules that can guide life scientists in their research and to shed light on the actual molecular mechanisms determining modification status.
  •  
6.
  • Kierczak, Marcin, 1981-, et al. (författare)
  • A Rough Set-Based Model of HIV-1 Reverse Transcriptase Resistome
  • 2009
  • Ingår i: Bioinformatics and Biology Insights. - 1177-9322. ; 3, s. 109-127
  • Tidskriftsartikel (refereegranskat)abstract
    • Reverse transcriptase (RT) is a viral enzyme crucial for HIV-1 replication. Currently, 12 drugs are targeted against the RT. The low fidelity of the RT-mediated transcription leads to the quick accumulation of drug-resistance mutations. The sequence-resistance relationship remains only partially understood. Using publicly available data collected from over 15 years of HIV proteome research, we have created a general and predictive rule-based model of HIV-1 resistance to eight RT inhibitors. Our rough set-based model considers changes in the physicochemical properties of a mutated sequence as compared to the wild-type strain. Thanks to the application of the Monte Carlo feature selection method, the model takes into account only the properties that significantly contribute to the resistance phenomenon. The obtained results show that drug-resistance is determined in more complex way than believed. We confirmed the importance of many resistance-associated sites, found some sites to be less relevant than formerly postulated and— more importantly—identified several previously neglected sites as potentially relevant. By mapping some of the newly discovered sites on the 3D structure of the RT, we were able to suggest possible molecular-mechanisms of drug-resistance. Importantly, our model has the ability to generalize predictions to the previously unseen cases. The study is an example of how computational biology methods can increase our understanding of the HIV-1 resistome.
  •  
7.
  • Kierczak, Marcin, 1981-, et al. (författare)
  • Analysis of local molecular interaction networks underlying HIV-1 resistance to reverse transcriptase inhibitors.
  • Annan publikation (refereegranskat)abstract
    • Rapid emergence of drug resistant HIV-1 mutants is the ma jor cause of many treatment failures. A number of individual drug resistance mutations is known but the way they interact to create resistance often remains an open question. So far this question could be answered in an experimental way only. Here we apply a novel Monte Carlo feature selection-based approach to uncover molecular interaction networks that form HIV-1 reverse transcriptase (RT) resistome. By considering mutation-induced changes in the physicochemical properties of mutating amino acids, we were able to elucidate interaction networks leading to resistance to six anti-viral drugs. We selected significant properties (p − value <= 0.05) and analyzed the networks of the 20% strongest interdependencies between them. The topology of each network was validated by mapping it onto the 3D structure of RT and by relating the findings to the existing knowledge. The method can be easily applied to a wide range of similar problems in the domain of proteomics.
  •  
8.
  • Kruczyk, Marcin, et al. (författare)
  • Random Reducts : A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets
  • 2013
  • Ingår i: Fundamenta Informaticae. - 0169-2968 .- 1875-8681. ; 127:1-4, s. 273-288
  • Tidskriftsartikel (refereegranskat)abstract
    • An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts (RR) - a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection (MCFS) layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.
  •  
9.
  • Rudnicki, Witold R., et al. (författare)
  • A Statistical Method for Determining Importance of Variables in an Information System
  • 2006
  • Ingår i: Lecture Notes in Computer Science: Rough Sets and Current Trends in Computing. - Berlin, Heidelberg : Springer Berlin Heidelberg. - 0302-9743. ; 4259/2006
  • Tidskriftsartikel (refereegranskat)abstract
    • A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes’ importance.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy