SwePub - sökning: WFRF:(Spjuth Ola Docent)

Numrering	Referens	Omslagsbild	Hitta
1.	Herman, Stephanie (författare) Towards an Earlier Detection of Progressive Multiple Sclerosis using Metabolomics and Machine Learning 2020 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Decision-making guided by advanced analytics is becoming increasingly common in many fields. Implementing computationally driven healthcare solutions does, however, pose ethical dilemmas as it involves human health. Therefore, augmenting clinical expertise with advanced analytical insights to support decision-making in healthcare is probably a more feasible strategy.Multiple sclerosis is a debilitating neurological disease with two subtypes; relapsing-remitting multiple sclerosis (RRMS) and the typically late-stage progressive subtype (PMS). Progressive multiple sclerosis is a neurodegenerative phenotype, with a vague functional definition, that currently is diagnosed retrospectively. The challenge of diagnosing PMS earlier is a great example where data-driven insights might prove useful.This thesis addresses the need for an earlier detection of patients developing the progressive and neurodegenerative subtype of multiple sclerosis, using primarily metabolomics and machine learning approaches. In Paper I, the biochemical differences in cerebrospinal fluid (CSF) from RRMS and PMS patients were characterised, leading to the conclusion that it is possible to distinguish PMS patients based on biochemical alterations. In addition, pathway analysis revealed several metabolic pathways that were affected in the transition to PMS, including tryptophan metabolism and pyrimidine metabolism. In Paper II and III, the possibility of generating a concise PMS signature based on solely low-molecular measurements (III) or in combination with radiological and protein measures (II) was explored. In both cases, it was concluded that it is plausible to generate a condensed set of highly informative markers that can distinguish PMS patients from RRMS patients. In Paper III, the classifier was complemented with conformal prediction that enabled an estimate of confidence in single patient predictions and a personalised evaluation of current disease state. Finally, in Paper IV, the extracted low-molecular marker candidates were characterised in isolation, revealing that several metabolites were distinctively altered in the CSF of PMS patients, including increased levels of 4-acetamidobutanoate, 4-hydroxybenzoate and thymine.Overall, the results from this work indicate that it is possible to detect PMS at an earlier stage and that advanced analytical algorithms can support healthcare.
2.	Capuccini, Marco (författare) Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science 2019 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.
3.	Ahmed, Laeeq, et al. (författare) Predicting target profiles with confidence as a service using docking scores 2020 Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 12:1 Tidskriftsartikel (refereegranskat)abstract Background: Identifying and assessing ligand-target binding is a core component in early drug discovery as one or more unwanted interactions may be associated with safety issues. Contributions: We present an open-source, extendable web service for predicting target profiles with confidence using machine learning for a panel of 7 targets, where models are trained on molecular docking scores from a large virtual library. The method uses conformal prediction to produce valid measures of prediction efficiency for a particular confidence level. The service also offers the possibility to dock chemical structures to the panel of targets with QuickVina on individual compound basis. Results: The docking procedure and resulting models were validated by docking well-known inhibitors for each of the 7 targets using QuickVina. The model predictions showed comparable performance to molecular docking scores against an external validation set. The implementation as publicly available microservices on Kubernetes ensures resilience, scalability, and extensibility.
4.	Capuccini, Marco, et al. (författare) MaRe : Processing Big Data with application containers on Apache Spark 2020 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5 Tidskriftsartikel (refereegranskat)abstract Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
5.	Capuccini, Marco, et al. (författare) On-demand virtual research environments using microservices 2019 Ingår i: PeerJ Computer Science. - : PeerJ. - 2376-5992. ; 5 Tidskriftsartikel (refereegranskat)abstract The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed orchestration platform as software containers, referred to as on-demand, virtual research environments. The methodology is vendor agnostic and we provide an open source implementation that supports the major cloud providers, offering scalable management of scientific pipelines. We demonstrate applicability and scalability of our methodology in life science applications, but the methodology is general and can be applied to other scientific domains.
6.	Dahlö, Martin, et al. (författare) Tracking the NGS revolution : managing life science research on shared high-performance computing clusters 2018 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 7:5 Tidskriftsartikel (refereegranskat)abstract BackgroundNext-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.ResultsThe number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.ConclusionsHosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
7.	Emami Khoonsari, Payam, et al. (författare) Interoperable and scalable data analysis with microservices : Applications in metabolomics 2019 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 35:19, s. 3752-3760 Tidskriftsartikel (refereegranskat)abstract MotivationDeveloping a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.ResultsWe developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.
8.	Exner, T. E., et al. (författare) OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration and in silico analysis and modelling in risk assessment 2018 Ingår i: Toxicology Letters. - : Elsevier BV. - 0378-4274 .- 1879-3169. ; 295, s. S104-S104 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
9.	Gauraha, Niharika, et al. (författare) Robust Knowledge Transfer in Learning Under Privileged Information Framework Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models; data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge transfer approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules.Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function.Inspired by the cross-validation approach, we propose to partition the training data into K folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged features - we refer to this a robust knowledge transfer. We conduct empirical evaluation considering four different experimental setups using one synthetic and three real datasets. These experiments demonstrate that our approach yields improved accuracy as compared to LUPI with standard knowledge transfer.
10.	Gauraha, Niharika, et al. (författare) Split knowledge transfer in learning under privileged information framework 2019 Ingår i: Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications. - : PMLR. ; , s. 43-52 Konferensbidrag (refereegranskat)abstract Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models, data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge transfer approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules. Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function. Inspired by the cross-validation approach, we propose to partition the training data into $K$ folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged featuresâ€”we refer to this as split knowledge transfer. We evaluate the method using four different experimental setups comprising one synthetic and three real datasets. The results indicate that our approach leads to improved accuracy as compared to LUPI with standard knowledge transfer.
11.	Gauraha, Niharika, et al. (författare) Synergy Conformal Prediction Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Conformal Prediction is a machine learning methodology that produces valid prediction regions under mild conditions. Ensembles of conformal predictors have been proposed to improve the informational efficiency of inductive conformal predictors by combining p-values, however, the validity of such methods has been an open problem. We introduce Synergy Conformal Prediction which is an ensemble method that combines monotonic conformity scores, and is capable of producing valid prediction intervals. We study the applicability in two scenarios; where data is partitioned in order to reduce the total model training time, and where an ensemble of different machine learning methods is used to improve the overall efficiency of predictions. We evaluate the method on 10 data sets and show that the synergy conformal predictor produces valid predictions and improves informational efficiency as compared to inductive conformal prediction and existing ensemble methods. The results indicate that synergy conformal prediction has advantageous properties compared to contemporary approaches, and we also envision that it will have an impact in Big Data and federated environments.
12.	Gauraha, Niharika, et al. (författare) Synergy Conformal Prediction for Regression Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Large and distributed data sets pose many challenges for machine learning, including requirements on computational resources and training time. One approach is to train multiple models in parallel on subsets of data and aggregate the resulting predictions. Large data sets can then be partitioned into smaller chunks, and for distributed data the need for pooling can be avoided. Combining results from conformal predictors using synergy rules has been shown to have advantageous properties for classification problems. In this paper we extend the methodology to regression problems, and we show that it produces valid and efficient predictors compared to inductive conformal predictors and cross-conformal predictors for 10 different data sets from the UCI machine learning repository using three different machine learning methods. The approach offers a straightforward and compelling alternative to pooling data, such as when working in distributed environments.
13.	Georgieva, Polina, et al. (författare) Exploring the usefulness of morphological profiling of cells to study toxicity mechanisms 2018 Ingår i: Toxicology Letters. - : Elsevier BV. - 0378-4274 .- 1879-3169. ; 295, s. S203-S203 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
14.	Gupta, Anindya, et al. (författare) Deep Learning in Image Cytometry : A Review 2019 Ingår i: Cytometry Part A. - : Wiley. - 1552-4922 .- 1552-4930. ; 95:6, s. 366-380 Forskningsöversikt (refereegranskat)
15.	Harrison, Philip John, 1977- (författare) Deep learning approaches for image cytometry: assessing cellular morphological responses to drug perturbations 2023 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Image cytometry is the analysis of cell properties from microscopy image data and is used ubiquitously in basic cell biology, medical diagnosis and drug development. In recent years deep learning has shown impressive results for many image cytometry tasks, including image processing, segmentation, classification and detection. Deep learning enables a more data-driven and end-to-end approach than was previously possible with conventional methods. This thesis investigates deep learning-based approaches for assessing cellular morphological responses to drug perturbations. In paper I we demonstrated the benefit of combining convolutional neural networks and transfer learning for predicting mechanism of action and nucleus translocation. In paper II we showed, using convolutional and recurrent neural networks applied to time-lapse microscopy data, that it is possible to predict if mRNA delivery via nanoparticles has been effective based on cell morphology changes at time points prior to the protein production evidence of successful delivery. In paper III we used convolutional neural networks, adversarial training and privileged information to faithfully generate fluorescence imaging channels of adipocyte cells from their corresponding z-stack of brightfield images. Our models were both faithful at the fluorescence image level and at the level of the features extracted from these images, features that are commonly used for downstream analysis, including the design of effective drug therapies. In paper IV we showed that convolutional neural networks trained on brightfield image data provide similar, and in some cases superior, performance to models trained on fluorescence image data for predicting mechanism of action, due to the brightfield images possessing additional information not available in the fluorescence images. In paper V we applied deep learning models to brightfield time-lapse image data to explore the evolution of cellular morphological changes after drug administration for a diverse set of compounds, compounds that are often used as positive controls in image-based assays.
16.	Herman, Stephanie, et al. (författare) A biochemical signature of progressive multiple sclerosis Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Currently, very few treatments for patients in the progressive phases of multiple sclerosis (PMS) are available. To enable sensitive evaluation of future treatments, prognostic and predictive markers for therapeutic response are needed, as well as robust markers for early detection of PMS. We have previously demonstrated that a signature of 28 cerebrospinal fluid (CSF) biochemical markers could distinguish PMS from relapsing-remitting multiple sclerosis (RRMS) patients.Herein, we aimed to characterize the 28 previously extracted metabolites by assessing independent differences between 35 PMS and 35 RRMS patients as well as 49 healthy control subjects. Twenty-two of the PMS patients were part of a controlled clinical trial evaluating the effect of intrathecal rituximab for PMS. Using follow-up assessments, we related the metabolites to clinical outcomes of the trial and investigated if they could predict a poor or beneficial treatment response. Finally, we investigated the metabolites’ associations to a panel of six CSF protein biomarkers of axonal, myelin and astrocyte damage as well as T- and B- cell activation and differentiation.The composite signature was predominantly classifying patients as having a poor treatment outcome, achieving an estimated area under the curve (AUC) of 0.63 (sensitivity = 0.90, specificity = 0.38). Univariately, C4H6N6O4 and phenolic phosphate were significantly (p-value<0.05) increased in patients with a poor outcome. We also demonstrated that a majority (n=22) of the metabolites showed PMS distinctive alterations, including increased CSF levels of 4-acetamidobutanoate, 4-hydroxybenzoate and thymine. 4-Acetamidobutanoate did also display significant associations with the results from the symbol digit modalities test (SDMT) and the 9-hole peg test (9-HPT) using the dominant hand, and the protein markers myelin basic protein (MBP), neurofilament light (NFL) and glial fibrillary acidic protein (GFAp), whereas 4-hydroxybenzoate displayed significant associations with NFL. Only six metabolites showed significant differences between RRMS and healthy control subjects, suggesting that this is a PMS specific signature.To summarize, most of the individual metabolites did show significantly distinctive CSF levels in the PMS patients and some of them were also related to cognitive and motoric performance. This suggests that some of the investigated metabolites might have potential as individual markers.
17.	Herman, Stephanie, et al. (författare) Alterations in the tyrosine and phenylalanine pathways revealed by biochemical profiling in cerebrospinal fluid of Huntington's disease subjects 2019 Ingår i: Scientific Reports. - : NATURE PUBLISHING GROUP. - 2045-2322. ; 9 Tidskriftsartikel (refereegranskat)abstract Huntington's disease (HD) is a severe neurological disease leading to psychiatric symptoms, motor impairment and cognitive decline. The disease is caused by a CAG expansion in the huntingtin (HTT) gene, but how this translates into the clinical phenotype of HD remains elusive. Using liquid chromatography mass spectrometry, we analyzed the metabolome of cerebrospinal fluid (CSF) from premanifest and manifest HD subjects as well as control subjects. Inter-group differences revealed that the tyrosine metabolism, including tyrosine, thyroxine, L-DOPA and dopamine, was significantly altered in manifest compared with premanifest HD. These metabolites demonstrated moderate to strong associations to measures of disease severity and symptoms. Thyroxine and dopamine also correlated with the five year risk of onset in premanifest HD subjects. The phenylalanine and the purine metabolisms were also significantly altered, but associated less to disease severity. Decreased levels of lumichrome were commonly found in mutated HTT carriers and the levels correlated with the five year risk of disease onset in premanifest carriers. These biochemical findings demonstrates that the CSF metabolome can be used to characterize molecular pathogenesis occurring in HD, which may be essential for future development of novel HD therapies.
18.	Herman, Stephanie, et al. (författare) Biochemical Differences in Cerebrospinal Fluid between Secondary Progressive and Relapsing-Remitting Multiple Sclerosis 2019 Ingår i: Cells. - : MDPI AG. - 2073-4409. ; 8:2 Tidskriftsartikel (refereegranskat)abstract To better understand the pathophysiological differences between secondary progressive multiple sclerosis (SPMS) and relapsing-remitting multiple sclerosis (RRMS), and to identify potential biomarkers of disease progression, we applied high-resolution mass spectrometry (HRMS) to investigate the metabolome of cerebrospinal fluid (CSF). The biochemical differences were determined using partial least squares discriminant analysis (PLS-DA) and connected to biochemical pathways as well as associated to clinical and radiological measures. Tryptophan metabolism was significantly altered, with perturbed levels of kynurenate, 5-hydroxytryptophan, 5-hydroxyindoleacetate, and N-acetylserotonin in SPMS patients compared with RRMS and controls. SPMS patients had altered kynurenine compared with RRMS patients, and altered indole-3-acetate compared with controls. Regarding the pyrimidine metabolism, SPMS patients had altered levels of uridine and deoxyuridine compared with RRMS and controls, and altered thymine and glutamine compared with RRMS patients. Metabolites from the pyrimidine metabolism were significantly associated with disability, disease activity and brain atrophy, making them of particular interest for understanding the disease mechanisms and as markers of disease progression. Overall, these findings are of importance for the characterization of the molecular pathogenesis of SPMS and support the hypothesis that the CSF metabolome may be used to explore changes that occur in the transition between the RRMS and SPMS pathologies.
19.	Herman, Stephanie, et al. (författare) Disease phenotype prediction in multiple sclerosis 2023 Ingår i: iScience. - : Cell Press. - 2589-0042. ; 26:6 Tidskriftsartikel (refereegranskat)abstract Progressive multiple sclerosis (PMS) is currently diagnosed retrospectively. Here, we work toward a set of biomarkers that could assist in early diagnosis of PMS. A selection of cerebrospinal fluid metabolites (n = 15) was shown to differentiate between PMS and its preceding phenotype in an independent cohort (AUC = 0.93). Complementing the classifier with conformal prediction showed that highly confident predictions could be made, and that three out of eight patients developing PMS within three years of sample collection were predicted as PMS at that time point. Finally, this methodology was applied to PMS patients as part of a clinical trial for intrathecal treatment with rituximab. The methodology showed that 68% of the patients decreased their similarity to the PMS phenotype one year after treatment. In conclusion, the inclusion of confidence predictors contributes with more information compared to traditional machine learning, and this information is relevant for disease monitoring.
20.	Herman, Stephanie, et al. (författare) Integration of magnetic resonance imaging and protein and metabolite CSF measurements to enable early diagnosis of secondary progressive multiple sclerosis 2018 Ingår i: Theranostics. - : Ivyspring International Publisher. - 1838-7640. ; 8:16, s. 4477-4490 Tidskriftsartikel (refereegranskat)abstract Molecular networks in neurological diseases are complex. Despite this fact, contemporary biomarkers are in most cases interpreted in isolation, leading to a significant loss of information and power. We present an analytical approach to scrutinize and combine information from biomarkers originating from multiple sources with the aim of discovering a condensed set of biomarkers that in combination could distinguish the progressive degenerative phenotype of multiple sclerosis (SPMS) from the relapsing-remitting phenotype (RRMS). Methods: Clinical and magnetic resonance imaging (MRI) data were integrated with data from protein and metabolite measurements of cerebrospinal fluid, and a method was developed to sift through all the variables to establish a small set of highly informative measurements. This prospective study included 16 SPMS patients, 30 RRMS patients and 10 controls. Protein concentrations were quantitated with multiplexed fluorescent bead-based immunoassays and ELISA. The metabolome was recorded using liquid chromatography-mass spectrometry. Clinical follow-up data of the SPMS patients were used to assess disease progression and development of disability. Results: Eleven variables were in combination able to distinguish SPMS from RRMS patients with high confidence superior to any single measurement. The identified variables consisted of three MRI variables: the size of the spinal cord and the third ventricle and the total number of T1 hypointense lesions; six proteins: galectin-9, monocyte chemoattractant protein-1 (MCP-1), transforming growth factor alpha (TGF-alpha), tumor necrosis factor alpha (TNF-alpha), soluble CD40L (sCD40L) and platelet-derived growth factor AA (PDGF-AA); and two metabolites: 20 beta-dihydrocortisol (20 beta-DHF) and indolepyruvate. The proteins myelin basic protein (MBP) and macrophage-derived chemokine (MDC), as well as the metabolites 20 beta-DHF and 5,6-dihydroxyprostaglandin F1a (5,6-DH-PGF1), were identified as potential biomarkers of disability progression. Conclusion: Our study demonstrates, in a limited but well-defined and data-rich cohort, the importance and value of combining multiple biomarkers to aid diagnostics and track disease progression.
21.	Kensert, Alexander, et al. (författare) Evaluating parameters for ligand-based modeling with random forest on sparse data sets 2018 Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 10 Tidskriftsartikel (refereegranskat)abstract Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints (p <= 0.05), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.
22.	Kensert, Alexander, et al. (författare) Transfer Learning with Deep Convolutional Neural Networks for Classifying Cellular Morphological Changes 2019 Ingår i: SLAS Discovery. - : Elsevier BV. - 2472-5560 .- 2472-5552. ; 24:4, s. 466-475 Tidskriftsartikel (refereegranskat)abstract The quantification and identification of cellular phenotypes from high-content microscopy images has proven to be very useful for understanding biological activity in response to different drug treatments. The traditional approach has been to use classical image analysis to quantify changes in cell morphology, which requires several nontrivial and independent analysis steps. Recently, convolutional neural networks have emerged as a compelling alternative, offering good predictive performance and the possibility to replace traditional workflows with a single network architecture. In this study, we applied the pretrained deep convolutional neural networks ResNet50, InceptionV3, and InceptionResnetV2 to predict cell mechanisms of action in response to chemical perturbations for two cell profiling datasets from the Broad Bioimage Benchmark Collection. These networks were pretrained on ImageNet, enabling much quicker model training. We obtain higher predictive accuracy than previously reported, between 95% and 97%. The ability to quickly and accurately distinguish between different cell morphologies from a scarce amount of labeled data illustrates the combined benefit of transfer learning and deep convolutional neural networks for interrogating cell-based images.
23.	Lampa, Samuel, et al. (författare) Predicting off-target binding profiles with confidence using Conformal Prediction 2018 Ingår i: Frontiers in Pharmacology. - : Frontiers Media SA. - 1663-9812. ; 9 Tidskriftsartikel (refereegranskat)abstract Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only providing point predictions. We here describe the use of conformal prediction for predicting off-target interactions with models trained on data from 31 targets in the ExCAPE dataset, selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.
24.	Lampa, Samuel, 1983- (författare) Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web 2018 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high-throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. Based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.
25.	Lampa, Samuel, et al. (författare) SciPipe : A workflow library for agile development of complex and dynamic bioinformatics pipelines 2019 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5 Tidskriftsartikel (refereegranskat)abstract Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
26.	Lampa, Samuel, et al. (författare) SciPipe - Turning Scientific Workflows into Computer Programs 2019 Ingår i: Computing in science & engineering (Print). - : IEEE Computer Society. - 1521-9615 .- 1558-366X. ; 21:3, s. 109-113 Tidskriftsartikel (refereegranskat)
27.	Norinder, Ulf, 1956-, et al. (författare) Using Predicted Bioactivity Profiles to Improve Predictive Modeling 2020 Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 60:6, s. 2830-2837 Tidskriftsartikel (refereegranskat)abstract Predictive modeling is a cornerstone in early drug development. Using information for multiple domains or across prediction tasks has the potential to improve the performance of predictive modeling. However, aggregating data often leads to incomplete data matrices that might be limiting for modeling. In line with previous studies, we show that by generating predicted bioactivity profiles, and using these as additional features, prediction accuracy of biological endpoints can be improved. Using conformal prediction, a type of confidence predictor, we present a robust framework for the calculation of these profiles and the evaluation of their impact. We report on the outcomes from several approaches to generate the predicted profiles on 16 datasets in cytotoxicity and bioactivity and show that efficiency is improved the most when including the p-values from conformal prediction as bioactivity profiles.
28.	Oki, Noffisat, et al. (författare) OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration, in silico analysis and modelling in risk assessment 2018 Ingår i: Abstracts of Papers of the American Chemical Society. - 0065-7727. ; 255 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
29.	Peters, Kristian, et al. (författare) PhenoMeNal : Processing and analysis of metabolomics data in the cloud 2019 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:2 Tidskriftsartikel (refereegranskat)
30.	Schaduangrat, Nalini, et al. (författare) Towards reproducible computational drug discovery 2020 Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 12:1 Forskningsöversikt (refereegranskat)abstract The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.
31.	Spjuth, Ola, Docent, 1977-, et al. (författare) Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets 2019 Ingår i: Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications. - : PMLR. ; , s. 53-65 Konferensbidrag (refereegranskat)abstract Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.
32.	Spjuth, Ola, Docent, 1977- (författare) Novel applications of Machine Learning in cheminformatics 2018 Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 10 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
33.	Spjuth, Ola, Docent, 1977- (författare) Pharmaceutical Bioinformatics Primer Annan publikation (övrigt vetenskapligt/konstnärligt)
34.	Svensson, Fredrik, et al. (författare) Conformal Regression for Quantitative Structure-Activity Relationship Modeling-Quantifying Prediction Uncertainty 2018 Ingår i: Journal of Chemical Information and Modeling. - Washington DC : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 58:5, s. 1132-1140 Tidskriftsartikel (refereegranskat)abstract Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the outputted prediction intervals to create as efficient (i.e. narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges and the different approaches were evaluated on 29 publicly available datasets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals. This approach afforded an average prediction range of 1.65 pIC50 units at the 80 % confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.
35.	Toor, Salman, et al. (författare) SNIC Science Cloud (SSC): A national-scale cloud infrastructure for Swedish academia 2017 Ingår i: Proceedings - 13th IEEE International Conference on eScience, eScience 2017. - Los Alamitos, CA : IEEE Computer Society. ; , s. 219-227, s. 219-227 Konferensbidrag (refereegranskat)abstract The cloud computing paradigm have fundamentally changed the way computational resources are being offered. Although the number of large-scale providers in academia is still relatively small, there is a rapidly increasing interest and adoption of cloud Infrastructure-as-a-Service in the scientific community. The added flexibility in how applications can be implemented compared to traditional batch computing systems is one of the key success factors for the paradigm, and scientific cloud computing promises to increase adoption of simulation and data analysis in scientific communities not traditionally users of large scale e-Infrastructure, the so called long tail of science. In 2014, the Swedish National Infrastructure for Computing (SNIC) initiated a project to investigate the cost and constraints of offering cloud infrastructure for Swedish academia. The aim was to build a platform where academics could evaluate cloud computing for their use-cases. SNIC Science Cloud (SSC) has since then evolved into a national-scale cloud infrastructure based on three geographically distributed regions. In this article we present the SSC vision, architectural details and user stories. We summarize the experiences gained from running a nationalscale cloud facility into ten simple rules for starting up a science cloud project based on OpenStack. We also highlight some key areas that require careful attention in order to offer cloud infrastructure for ubiquitous academic needs and in particular scientific workloads.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "WFRF:(Spjuth Ola Docent) "

Avgränsa träffmängd

År