↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(Spjuth Ola) "

Sökning: WFRF:(Spjuth Ola)

Resultat 1-50 av 165

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Alvarsson, Jonathan, et al. (författare) Ligand-Based Target Prediction with Signature Fingerprints 2014 Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 54:10, s. 2647-2653 Tidskriftsartikel (refereegranskat)abstract When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.
2.	Raykova, Doroteya, 1986-, et al. (författare) A method for Boolean analysis of protein interactions at a molecular level 2022 Ingår i: Nature Communications. - : Springer Nature. - 2041-1723. ; 13:1 Tidskriftsartikel (refereegranskat)abstract Determination of interactions between native proteins in cells is important for understanding function. Here the authors report MolBoolean as a method to detect interactions between endogenous proteins in subcellular compartments, using antibody-DNA conjugates for identification and signal amplification. Determining the levels of protein-protein interactions is essential for the analysis of signaling within the cell, characterization of mutation effects, protein function and activation in health and disease, among others. Herein, we describe MolBoolean - a method to detect interactions between endogenous proteins in various subcellular compartments, utilizing antibody-DNA conjugates for identification and signal amplification. In contrast to proximity ligation assays, MolBoolean simultaneously indicates the relative abundances of protein A and B not interacting with each other, as well as the pool of A and B proteins that are proximal enough to be considered an AB complex. MolBoolean is applicable both in fixed cells and tissue sections. The specific and quantifiable data that the method generates provide opportunities for both diagnostic use and medical research.
3.	Wikberg, Jarl, et al. (författare) Introduction to Pharmaceutical Bioinformatics 2010. - 2 Bok (övrigt vetenskapligt/konstnärligt)
4.	Ahlberg, Ernst, et al. (författare) Interpretation of Conformal Prediction Classification Models 2015 Ingår i: STATISTICAL LEARNING AND DATA SCIENCES. - Cham : Springer International Publishing. - 9783319170916 - 9783319170909 ; , s. 323-334 Konferensbidrag (refereegranskat)abstract We present a method for interpretation of conformal prediction models. The discrete gradient of the largest p-value is calculated with respect to object space. A criterion is applied to identify the most important component of the gradient and the corresponding part of the object is visualized. The method is exemplified with data from drug discovery relating chemical compounds to mutagenicity. Furthermore, a comparison is made to already established important subgraphs with respect to mutagenicity and this initial assessment shows very useful results with respect to interpretation of a conformal predictor.
5.	Ahmed, Laeeq, et al. (författare) Efficient iterative virtual screening with Apache Spark and conformal prediction 2018 Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 10 Tidskriftsartikel (refereegranskat)abstract Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
6.	Ahmed, Laeeq, et al. (författare) Predicting target profiles with confidence as a service using docking scores 2020 Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 12:1 Tidskriftsartikel (refereegranskat)abstract Background: Identifying and assessing ligand-target binding is a core component in early drug discovery as one or more unwanted interactions may be associated with safety issues. Contributions: We present an open-source, extendable web service for predicting target profiles with confidence using machine learning for a panel of 7 targets, where models are trained on molecular docking scores from a large virtual library. The method uses conformal prediction to produce valid measures of prediction efficiency for a particular confidence level. The service also offers the possibility to dock chemical structures to the panel of targets with QuickVina on individual compound basis. Results: The docking procedure and resulting models were validated by docking well-known inhibitors for each of the 7 targets using QuickVina. The model predictions showed comparable performance to molecular docking scores against an external validation set. The implementation as publicly available microservices on Kubernetes ensures resilience, scalability, and extensibility.
7.	Ahmed, Laeeq (författare) Scalable Analysis of Large Datasets in Life Sciences 2019 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract We are experiencing a deluge of data in all fields of scientific and business research, particularly in the life sciences, due to the development of better instrumentation and the rapid advancements that have occurred in information technology in recent times. There are major challenges when it comes to handling such large amounts of data. These range from the practicalities of managing these large volumes of data, to understanding the meaning and practical implications of the data.In this thesis, I present parallel methods to efficiently manage, process, analyse and visualize large sets of data from several life sciences fields at a rapid rate, while building and utilizing various machine learning techniques in a novel way. Most of the work is centred on applying the latest Big Data Analytics frameworks for creating efficient virtual screening strategies while working with large datasets. Virtual screening is a method in cheminformatics used for Drug discovery by searching large libraries of molecule structures. I also present a method for the analysis of large Electroencephalography data in real time. Electroencephalography is one of the main techniques used to measure the brain electrical activity.First, I evaluate the suitability of Spark, a parallel framework for large datasets, for performing parallel ligand-based virtual screening. As a case study, I classify molecular library using prebuilt classification models to filter out the active molecules. I also demonstrate a strategy to create cloud-ready pipelines for structure-based virtual screening. The major advantages of this strategy are increased productivity and high throughput. In this work, I show that Spark can be applied to virtual screening, and that it is, in general, an appropriate solution for large-scale parallel pipelining. Moreover, I illustrate how Big Data analytics are valuable in working with life sciences datasets.Secondly, I present a method to further reduce the overall time of the structured-based virtual screening strategy using machine learning and a conformal-prediction-based iterative modelling strategy. The idea is to only dock those molecules that have a better than average chance of being an inhibitor when searching for molecules that could potentially be used as drugs. Using machine learning models from this work, I built a web service to predict the target profile of multiple compounds against ready-made models for a list of targets where 3D structures are available. These target predictions can be used to understand off-target effects, for example in the early stages of drug discovery projects.Thirdly, I present a method to detect seizures in long term Electroencephalography readings - this method works in real time taking the ongoing readings in as live data streams. The method involves tackling the challenges of real-time decision-making, storing large datasets in memory and updating the prediction model with newly produced data at a rapid rate. The resulting algorithm not only classifies seizures in real time, it also learns the threshold in real time. I also present a new feature "top-k amplitude measure" for classifying which parts of the data correspond to seizures. Furthermore, this feature helps to reduce the amount of data that needs to be processed in the subsequent steps.
8.	Alvarsson, Jonathan, et al. (författare) Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines 2014 Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 54:11, s. 3211-3217 Tidskriftsartikel (refereegranskat)abstract QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.
9.	Alvarsson, Jonathan, et al. (författare) Brunn : an open source laboratory information system for microplates with a graphical plate layout design process 2011 Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 12:1 Tidskriftsartikel (refereegranskat)abstract Background:Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.Results:A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.Conclusions:Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.
10.	Alvarsson, Jonathan, et al. (författare) Large-scale ligand-based predictive modelling using support vector machines 2016 Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8 Tidskriftsartikel (refereegranskat)abstract The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.
11.	Alvarsson, Jonathan, 1981- (författare) Ligand-based Methods for Data Management and Modelling 2015 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.
12.	Alvarsson, Jonathan, 1981-, et al. (författare) Predicting With Confidence : Using Conformal Prediction in Drug Discovery 2021 Ingår i: Journal of Pharmaceutical Sciences. - : Elsevier. - 0022-3549 .- 1520-6017. ; 110:1, s. 42-49 Forskningsöversikt (refereegranskat)abstract One of the challenges with predictive modeling is how to quantify the reliability of the models' predictions on new objects. In this work we give an introduction to conformal prediction, a framework that sits on top of traditional machine learning algorithms and which outputs valid confidence estimates to predictions from QSAR models in the form of prediction intervals that are specific to each predicted object. For regression, a prediction interval consists of an upper and a lower bound. For classification, a prediction interval is a set that contains none, one, or many of the potential classes. The size of the prediction interval is affected by a user-specified confidence/significance level, and by the nonconformity of the predicted object; i.e., the strangeness as defined by a nonconformity function. Conformal prediction provides a rigorous and mathematically proven framework for in silico modeling with guarantees on error rates as well as a consistent handling of the models' applicability domain intrinsically linked to the underlying machine learning model. Apart from introducing the concepts and types of conformal prediction, we also provide an example application for modeling ABC transporters using conformal prediction, as well as a discussion on general implications for drug discovery.
13.	Ameur, Adam, et al. (författare) The LCB Data Warehouse 2006 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 22:8, s. 1024-1026 Tidskriftsartikel (refereegranskat)abstract The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.
14.	Arvidsson McShane, Staffan, 1990- (författare) Confidence Predictions in Pharmaceutical Sciences 2023 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract The main focus of this thesis has been on Quantitative Structure Activity Relationship (QSAR) modeling using methods producing valid measures of uncertainty. The goal of QSAR is to prospectively predict the outcome from assays, such as ADMET (Absorption, Distribution, Metabolism, Excretion), toxicity and on- and off-target interactions, for novel compounds. QSAR modeling offers an appealing alternative to laboratory work, which is both costly and time-consuming, and can be applied earlier in the development process as candidate drugs can be tested in silico without requiring to synthesize them first. A common theme across the presented papers is the application of conformal and probabilistic prediction models, which are used in order to associate predictions with a level of their reliability – a desirable property that is essential in the stage of decision making. In Paper I we studied approaches on how to utilize biological assay data from legacy systems, in order to improve predictive models. This is otherwise problematic since mixing data from separate systems will cause issues for most machine learning algorithms. We demonstrated that old data could be used to augment the proper training set of a conformal predictor to yield more efficient predictions while preserving model calibration. In Paper II we studied a new approach of predicting metabolic transformations of small molecules based on transformations encoded in SMIRKS format. In this work use used the probabilistic Cross-Venn-ABERS predictor which overall worked well, but had difficulty in modeling the minority class of imbalanced datasets. In Paper III we studied metabolomics data from patients diagnosed with Multiple Sclerosis and found a set of 15 discriminatory metabolites that could be used to classify patients from a validation cohort into one of two sub types of the disease with high accuracy. We further demonstrated that conformal prediction could be useful for tracking the progression of the disease for individual patients, which we exemplified using data from a clinical trial. In Paper IV we introduced CPSign – a software for cheminformatics modeling using conformal and probabilistic methods. CPSign was compared against other regularly used methods for this task, using 32 benchmark datasets, demonstrating that CPSign produces predictive accuracy on par with the best performing methods.
15.	Arvidsson McShane, Staffan, 1990-, et al. (författare) CPSign : Conformal Prediction for Cheminformatics Modeling 2024 Ingår i: Journal of Cheminformatics. - : BioMed Central (BMC). - 1758-2946. ; 16 Tidskriftsartikel (refereegranskat)abstract Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign.
16.	Arvidsson McShane, Staffan, et al. (författare) Machine Learning Strategies When Transitioning between Biological Assays 2021 Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 61:7, s. 3722-3733 Tidskriftsartikel (refereegranskat)abstract Machine learning is widely used in drug development to predict activity in biological assays based on chemical structure. However, the process of transitioning from one experimental setup to another for the same biological endpoint has not been extensively studied. In a retrospective study, we here explore different modeling strategies of how to combine data from the old and new assays when training conformal prediction models using data from hERG and Na-v assays. We suggest to continuously monitor the validity and efficiency of models as more data is accumulated from the new assay and select a modeling strategy based on these metrics. In order to maximize the utility of data from the old assay, we propose a strategy that augments the proper training set of an inductive conformal predictor by adding data from the old assay but only having data from the new assay in the calibration set, which results in valid (well-calibrated) models with improved efficiency compared to other strategies. We study the results for varying sizes of new and old assays, allowing for discussion of different practical scenarios. We also conclude that our proposed assay transition strategy is more beneficial, and the value of data from the new assay is higher, for the harder case of regression compared to classification problems.
17.	Arvidsson, Staffan, et al. (författare) Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors 2017 Ingår i: Conformal and Probabilistic Prediction with Applications (COPA) 2017. ; , s. 118-131 Konferensbidrag (refereegranskat)abstract Prediction of drug metabolism is an important topic in the drug discovery process, and we here present a study using probabilistic predictions applying Cross Venn-ABERS Predictors (CVAPs) on data for site-of-metabolism. We used a dataset of 73599 biotransformations, applied SMIRKS to define biotransformations of interest and constructed five datasets where chemical structures were represented using signatures descriptors. The results show that CVAP produces well-calibrated predictions for all datasets with good predictive capability, making CVAP an interesting method for further exploration in drug discovery applications.
18.	Ashrafian, Hutan, et al. (författare) Metabolomics : The Stethoscope for the Twenty-First Century 2021 Ingår i: Medical principles and practice. - : S. Karger. - 1011-7571 .- 1423-0151. ; 30:4, s. 301-310 Tidskriftsartikel (refereegranskat)abstract Metabolomics encompasses the systematic identification and quantification of all metabolic products in the human body. This field could provide clinicians with novel sets of diagnostic biomarkers for disease states in addition to quantifying treatment response to medications at an individualized level. This literature review aims to highlight the technology underpinning metabolic profiling, identify potential applications of metabolomics in clinical practice, and discuss the translational challenges that the field faces. We searched PubMed, MEDLINE, and EMBASE for primary and secondary research articles regarding clinical applications of metabolomics. Metabolic profiling can be performed using mass spectrometry and nuclear magnetic resonance-based techniques using a variety of biological samples. This is carried out in vivo or in vitro following careful sample collection, preparation, and analysis. The potential clinical applications constitute disruptive innovations in their respective specialities, particularly oncology and metabolic medicine. Outstanding issues currently preventing widespread clinical use are scalability of data interpretation, standardization of sample handling practice, and e-infrastructure. Routine utilization of metabolomics at a patient and population level will constitute an integral part of future healthcare provision.
19.	Blamey, Ben, et al. (författare) Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit 2021 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:3, s. 1-14 Tidskriftsartikel (refereegranskat)abstract BACKGROUND: Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.FINDINGS: In our pipeline model, an "interestingness function" assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a "policy" guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope.CONCLUSIONS: Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems - and is intended for use with a range of technologies in different deployment scenarios.
20.	Braeuning, Albert, et al. (författare) Development of new approach methods for the identification and characterization of endocrine metabolic disruptors : a PARC project 2023 Ingår i: Frontiers in Toxicology. - : Frontiers Media SA. - 2673-3080. ; 5 Tidskriftsartikel (refereegranskat)abstract In past times, the analysis of endocrine disrupting properties of chemicals has mainly been focused on (anti-)estrogenic or (anti-)androgenic properties, as well as on aspects of steroidogenesis and the modulation of thyroid signaling. More recently, disruption of energy metabolism and related signaling pathways by exogenous substances, so-called metabolism-disrupting chemicals (MDCs) have come into focus. While general effects such as body and organ weight changes are routinely monitored in animal studies, there is a clear lack of mechanistic test systems to determine and characterize the metabolism-disrupting potential of chemicals. In order to contribute to filling this gap, one of the project within EU-funded Partnership for the Assessment of Risks of Chemicals (PARC) aims at developing novel in vitro methods for the detection of endocrine metabolic disruptors. Efforts will comprise projects related to specific signaling pathways, for example, involving mTOR or xenobiotic-sensing nuclear receptors, studies on hepatocytes, adipocytes and pancreatic beta cells covering metabolic and morphological endpoints, as well as metabolism-related zebrafish-based tests as an alternative to classic rodent bioassays. This paper provides an overview of the approaches and methods of these PARC projects and how this will contribute to the improvement of the toxicological toolbox to identify substances with endocrine disrupting properties and to decipher their mechanisms of action.
21.	Capuccini, Marco, et al. (författare) Conformal prediction in Spark : Large-scale machine learning with confidence 2015 Ingår i: Proc. 2nd International Symposium on Big Data Computing. - Los Alamitos, CA : IEEE Computer Society. - 9780769556963 ; , s. 61-67 Konferensbidrag (refereegranskat)
22.	Capuccini, Marco (författare) Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science 2019 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.
23.	Capuccini, Marco, et al. (författare) Large-scale virtual screening on public cloud resources with Apache Spark 2017 Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 9 Tidskriftsartikel (refereegranskat)abstract Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
24.	Capuccini, Marco, et al. (författare) MaRe : Processing Big Data with application containers on Apache Spark 2020 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5 Tidskriftsartikel (refereegranskat)abstract Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
25.	Capuccini, Marco, et al. (författare) MaRe: a MapReduce-Oriented Framework for Processing Big Data with Application Containers 2018 Annan publikation (övrigt vetenskapligt/konstnärligt)
26.	Capuccini, Marco, et al. (författare) On-demand virtual research environments using microservices 2019 Ingår i: PeerJ Computer Science. - : PeerJ. - 2376-5992. ; 5 Tidskriftsartikel (refereegranskat)abstract The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed orchestration platform as software containers, referred to as on-demand, virtual research environments. The methodology is vendor agnostic and we provide an open source implementation that supports the major cloud providers, offering scalable management of scientific pipelines. We demonstrate applicability and scalability of our methodology in life science applications, but the methodology is general and can be applied to other scientific domains.
27.	Carlsson, Lars, et al. (författare) Model building in Bioclipse Decision Support applied to open datasets 2012 Ingår i: Toxicology Letters. - : Elsevier BV. - 0378-4274 .- 1879-3169. ; 211:Suppl., s. S62- Tidskriftsartikel (refereegranskat)abstract Bioclipse Decision Support (DS) is a system capable of building predictive models of any collection of SAR data, and making them available in a simple user interface based on Bioclipse (www.bioclipse.net).The method is fast and uses Faulon Signatures as chemical descriptors together with a Support Vector Machine algorithm for QSAR model building. A key feature is the capability to visualize and interpret results by highlighting the substructures which contributed most to the prediction. This, together with very fast predictions, allows for editing chemical structures with instantly updated results.We here present the results from applying Bioclipse Decision Support to several open QSAR data sets, including endpoints from OpenTox and PubChem. The results show how to extract data from the sources and to build models which can be integrated with user specific models.
28.	Carlsson, Lars, et al. (författare) Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse 2010 Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 11, s. 362- Tidskriftsartikel (refereegranskat)abstract Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.
29.	Carreras-Puigvert, Jordi, et al. (författare) Artificial intelligence for high content imaging in drug discovery 2024 Ingår i: Current opinion in structural biology. - : Elsevier. - 0959-440X .- 1879-033X. ; 87 Tidskriftsartikel (refereegranskat)abstract Artificial intelligence (AI) and high-content imaging (HCI) are contributing to advancements in drug discovery, propelled by the recent progress in deep neural networks. This review highlights AI's role in analysis of HCI data from fixed and livecell imaging, enabling novel label-free and multi-channel fluorescent screening methods, and improving compound profiling. HCI experiments are rapid and cost-effective, facilitating large data set accumulation for AI model training. However, the success of AI in drug discovery also depends on highquality data, reproducible experiments, and robust validation to ensure model performance. Despite challenges like the need for annotated compounds and managing vast image data, AI's potential in phenotypic screening and drug profiling is significant. Future improvements in AI, including increased interpretability and integration of multiple modalities, are expected to solidify AI and HCI's role in drug discovery.
30.	Claesson, Alf, et al. (författare) On Mechanisms of Reactive Metabolite Formation from Drugs 2013 Ingår i: Mini-Reviews in medical chemistry. - : Bentham Science Publishers Ltd.. - 1389-5575 .- 1875-5607. ; 13:5, s. 720-729 Tidskriftsartikel (refereegranskat)abstract Idiosyncratic adverse drug reactions (IADRs) cause a broad range of clinically severe conditions of which drug induced liver injury (DILI) in particular is one of the most frequent causes of safety-related drug withdrawals. The underlying cause is almost invariably formation of reactive metabolites (RM) which by attacking macromolecules induce organ injuries. Attempts are being made in the pharmaceutical industry to lower the risk of selecting unfit compounds as clinical candidates. Approaches vary but do not seem to be overly successful at the initial design/synthesis stage. We review here the most frequent categories of mechanisms for RM formation and propose that many cases of RMs encountered within early ADME screening can be foreseen by applying chemical and metabolic knowledge. We also mention a web tool, SpotRM, which can be used for efficient look-up and learning about drugs that have recognized IADRs likely caused by RM formation.
31.	Dahlö, Martin (författare) Approaches for Distributing Large Scale Bioinformatic Analyses 2021 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Ever since high-throughput DNA sequencing became economically feasible, the amount of biological data has grown exponentially. This has been one of the biggest drivers in introducing high-performance computing (HPC) to the field of biology. Unlike physics and mathematics, biology education has not had a strong focus on programming or algorithmic development. This has forced many biology researchers to start learning a whole new skill set, and introduced new challenges for those managing the HPC clusters.The aim of this thesis is to investigate the problems that arise when novice users are using an HPC cluster for bioinformatics data analysis, and exploring approaches for how these can be mitigated. In paper 1 we quantify and visualise these problems and contrast them with the more computer experienced user groups already using the HPC cluster. In paper 2 we introduce a new workflow system (SciPipe), implemented as a Go library, as a way to organise and manage analysis steps. Paper 3 is aimed at cloud computing and how containerised tools can be used to run workflows without having to worry about software installations. In paper 4 we demonstrate a fully automated cloud-based system for image-based cell profiling. Starting with a robotic arm in a lab, it covers all the steps from cell culture and microscope to having the cell profiling results stored in a database and visualised in a web interface.
32.	Dahlö, Martin, et al. (författare) AROS: Open Source Lab Automation Enables Fully Automated CellPainting Annan publikation (övrigt vetenskapligt/konstnärligt)
33.	Dahlö, Martin, et al. (författare) BioImg.org : A catalog of virtual machine images for the life sciences 2015 Ingår i: Bioinformatics and Biology Insights. - 1177-9322. ; 9, s. 125-128 Tidskriftsartikel (refereegranskat)abstract Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education.
34.	Dahlö, Martin, et al. (författare) Tracking the NGS revolution : managing life science research on shared high-performance computing clusters 2018 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 7:5 Tidskriftsartikel (refereegranskat)abstract BackgroundNext-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.ResultsThe number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.ConclusionsHosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
35.	Eklund, Martin, 1978-, et al. (författare) An eScience-Bayes strategy for analyzing omics data 2010 Ingår i: BMC Bioinformatics. - : BioMed Central. - 1471-2105. ; 11, s. 282- Tidskriftsartikel (refereegranskat)abstract Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions: Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.
36.	Eklund, Martin, et al. (författare) The C1C2 : a framework for simultaneous model selection and assessment 2008 Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 9, s. 360- Tidskriftsartikel (refereegranskat)abstract BACKGROUND: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. RESULTS: The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. CONCLUSION: The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.
37.	Ekmefjord, Morgan, et al. (författare) Scalable federated machine learning with FEDn 2022 Ingår i: 2022 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781665499569 - 9781665499576 ; , s. 555-564 Konferensbidrag (refereegranskat)abstract Federated machine learning promises to overcome the input privacy challenge in machine learning. By iteratively updating a model on private clients and aggregating these local model updates into a global federated model, private data is incorporated in the federated model without needing to share and expose that data. Several open software projects for federated learning have appeared. Most of them focuses on supporting flexible experimentation with different model aggregation schemes and with different privacy-enhancing technologies. However, there is a lack of open frameworks that focuses on critical distributed computing aspects of the problem such as scalability and resilience. It is a big step to take for a data scientist to go from an experimental sandbox to testing their federated schemes at scale in real-world geographically distributed settings. To bridge this gap we have designed and developed a production-grade hierarchical federated learning framework, FEDn. The framework is specifically designed to make it easy to go from local development in pseudo-distributed mode to horizontally scalable distributed deployments. FEDn both aims to be production grade for industrial applications and a flexible research tool to explore real-world performance of novel federated algorithms and the framework has been used in number of industrial and academic R&D projects. In this paper we present the architecture and implementation of FEDn. We demonstrate the framework's scalability and efficiency in evaluations based on two case-studies representative for a cross-silo and a cross-device use-case respectively.
38.	Emami Khoonsari, Payam, et al. (författare) Interoperable and scalable data analysis with microservices : Applications in metabolomics 2019 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 35:19, s. 3752-3760 Tidskriftsartikel (refereegranskat)abstract MotivationDeveloping a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.ResultsWe developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.
39.	Exner, T. E., et al. (författare) OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration and in silico analysis and modelling in risk assessment 2018 Ingår i: Toxicology Letters. - : Elsevier BV. - 0378-4274 .- 1879-3169. ; 295, s. S104-S104 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
40.	Fagerholm, Urban, et al. (författare) Advances in Predictions of Oral Bioavailability of Candidate Drugs in Man with New Machine Learning Methodology 2021 Ingår i: Molecules. - : MDPI. - 1431-5157 .- 1420-3049. ; 26:9 Tidskriftsartikel (refereegranskat)abstract Oral bioavailability (F) is an essential determinant for the systemic exposure and dosing regimens of drug candidates. F is determined by numerous processes, and computational predictions of human estimates have so far shown limited results. We describe a new methodology where F in humans is predicted directly from chemical structure using an integrated strategy combining 9 machine learning models, 3 sets of structural alerts, and 2 physiologically-based pharmacokinetic models. We evaluate the model on a benchmark dataset consisting of 184 compounds, obtaining a predictive accuracy (Q2) of 0.50, which is successful according to a pharmaceutical industry proposal. Twenty-seven compounds were found (beforehand) to be outside the main applicability domain for the model. We compare our results with interspecies correlations (rat, mouse and dog vs. human) using the same dataset, where animal vs. human-correlations (R2) were found to be 0.21 to 0.40 and maximum prediction errors were smaller than maximum interspecies differences. We conclude that our method has sufficient predictive accuracy to be practically useful with applications in human exposure and dose predictions, compound optimization and decision making, with potential to rationalize drug discovery and development and decrease failures and overexposures in early clinical trials with candidate drugs.
41.	Fagerholm, Urban, et al. (författare) Comparison between lab variability and in silico prediction errors for the unbound fraction of drugs in human plasma 2021 Ingår i: Xenobiotica. - : Taylor & Francis. - 0049-8254 .- 1366-5928. ; 51:10, s. 1095-1100 Tidskriftsartikel (refereegranskat)abstract Variability of the unbound fraction in plasma (f(u)) between labs, methods and conditions is known to exist. Variability and uncertainty of this parameter influence predictions of the overall pharmacokinetics of drug candidates and might jeopardise safety in early clinical trials. Objectives of this study were to evaluate the variability of human in vitro f(u)-estimates between labs for a range of different drugs, and to develop and validate an in silico f(u)-prediction method and compare the results to the lab variability. A new in silico method with prediction accuracy (Q(2)) of 0.69 for log f(u) was developed. The median and maximum prediction errors were 1.9- and 92-fold, respectively. Corresponding estimates for lab variability (ratio between max and min f(u) for each compound) were 2.0- and 185-fold, respectively. Greater than 10-fold lab variability was found for 14 of 117 selected compounds. Comparisons demonstrate that in silico predictions were about as reliable as lab estimates when these have been generated during different conditions. Results propose that the new validated in silico prediction method is valuable not only for predictions at the drug design stage, but also for reducing uncertainties of f(u)-estimations and improving safety of drug candidates entering the clinical phase.
42.	Fagerholm, Urban, et al. (författare) In Silico Prediction of Human Clinical Pharmacokinetics with ANDROMEDA by Prosilico : Predictions for an Established Benchmarking Data Set, a Modern Small Drug Data Set, and a Comparison with Laboratory Methods 2023 Ingår i: ATLA (Alternatives to Laboratory Animals). - : SAGE Publications. - 0261-1929. ; 51:1, s. 39-54 Tidskriftsartikel (refereegranskat)abstract There is an ongoing aim to replace animal and in vitro laboratory models with in silico methods. Such replacement requires the successful validation and comparably good performance of the alternative methods. We have developed an in silico prediction system for human clinical pharmacokinetics, based on machine learning, conformal prediction and a new physiologically-based pharmacokinetic model, i.e. ANDROMEDA. The objectives of this study were: a) to evaluate how well ANDROMEDA predicts the human clinical pharmacokinetics of a previously proposed benchmarking data set comprising 24 physicochemically diverse drugs and 28 small drug molecules new to the market in 2021; b) to compare its predictive performance with that of laboratory methods; and c) to investigate and describe the pharmacokinetic characteristics of the modern drugs. Median and maximum prediction errors for the selected major parameters were ca 1.2 to 2.5-fold and 16-fold for both data sets, respectively. Prediction accuracy was on par with, or better than, the best laboratory-based prediction methods (superior performance for a vast majority of the comparisons), and the prediction range was considerably broader. The modern drugs have higher average molecular weight than those in the benchmarking set from 15 years earlier (ca 200 g/mol higher), and were predicted to (generally) have relatively complex pharmacokinetics, including permeability and dissolution limitations and significant renal, biliary and/or gut-wall elimination. In conclusion, the results were overall better than those obtained with laboratory methods, and thus serve to further validate the ANDROMEDA in silico system for the prediction of human clinical pharmacokinetics of modern and physicochemically diverse drugs.
43.	Fagerholm, Urban, et al. (författare) In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models 2021 Ingår i: Xenobiotica. - : Taylor & Francis. - 0049-8254 .- 1366-5928. ; 51:12, s. 1366-1371 Tidskriftsartikel (refereegranskat)abstract Volume of distribution at steady state (Vss) is an important pharmacokinetic endpoint. In this study we apply machine learning and conformal prediction for human Vss prediction, and make a head-to-head comparison with rat-to-man scaling, allometric scaling and the Rodgers-Lukova method on combined in silico and in vitro data, using a test set of 105 compounds with experimentally observed Vss.The mean prediction error and % with <2-fold prediction error for our method were 2.4-fold and 64%, respectively. 69% of test compounds had an observed Vss within the prediction interval at a 70% confidence level. In comparison, 2.2-, 2.9- and 3.1-fold mean errors and 69, 64 and 61% of predictions with <2-fold error was reached with rat-to-man and allometric scaling and Rodgers-Lukova method, respectively.We conclude that our method has theoretically proven validity that was empirically confirmed, and showing predictive accuracy on par with animal models and superior to an alternative widely used in silico-based method. The option for the user to select the level of confidence in predictions offers better guidance on how to optimise Vss in drug discovery applications.
44.	Fagerholm, Urban, et al. (författare) In Silico Predictions of the Gastrointestinal Uptake of Macrocycles in Man Using Conformal Prediction Methodology 2022 Ingår i: Journal of Pharmaceutical Sciences. - : Elsevier. - 0022-3549 .- 1520-6017. ; 111:9, s. 2614-2619 Tidskriftsartikel (refereegranskat)abstract The gastrointestinal uptake of macrocyclic compounds is not fully understood. Here we applied our previously validated integrated system based on machine learning and conformal prediction to predict the passive fraction absorbed (f(a)), maximum fraction dissolved (f(diss)), substrate specificities for major efflux transporters and total fraction absorbed (f(a,tot)) for a selected set of designed macrocyclic compounds (n = 37; MW 407-889 g/mol) and macrocyclic drugs (n = 16; MW 734-1203 g/mole) in vivo in man. Major aims were to increase the understanding of oral absorption of macrocycles and further validate our methodology. We predicted designed macrocycles to have high f(a )and low to high f(diss) and f(a,tot, )and average estimates were higher than for the larger macrocyclic drugs. With few exceptions, compounds were predicted to be effluxed and well absorbed. A 2-fold median prediction error for f(a,tot )was achieved for macrocycles (validation set). Advantages with our methodology include that it enables predictions for macrocycles with low permeability, Caco-2 recovery and solubility (BCS IV), and provides prediction intervals and guides optimization of absorption. The understanding of oral absorption of macrocycles was increased and the methodology was validated for prediction of the uptake of macrocycles in man.(C) 2022 American Pharmacists Association. Published by Elsevier Inc. All rights reserved.
45.	Fagerholm, Urban, et al. (författare) In silico predictions of the human pharmacokinetics/toxicokinetics of 65 chemicals from various classes using conformal prediction methodology 2022 Ingår i: Xenobiotica. - : Taylor & Francis Group. - 0049-8254 .- 1366-5928. ; 52:2, s. 113-118 Tidskriftsartikel (refereegranskat)abstract Pharmacokinetic/toxicokinetic (PK/TK) information for chemicals in humans is generally lacking. Here we applied machine learning, conformal prediction and a new physiologically-based PK/TK model for prediction of the human PK/TK of 65 chemicals from different classes, including carcinogens, food constituents and preservatives, vitamins, sweeteners, dyes and colours, pesticides, alternative medicines, flame retardants, psychoactive drugs, dioxins, poisons, UV-absorbents, surfactants, solvents and cosmetics. About 80% of the main human PK/TK (fraction absorbed, oral bioavailability, half-life, unbound fraction in plasma, clearance, volume of distribution, fraction excreted) for the selected chemicals was missing in the literature. This information was now added (from in silico predictions). Median and mean prediction errors for these parameters were 1.3- to 2.7-fold and 1.4- to 4.8-fold, respectively. In total, 59 and 86% of predictions had errors <2- and <5-fold, respectively. Predicted and observed PK/TK for the chemicals was generally within the range for pharmaceutical drugs. The results validated the new integrated system for prediction of the human PK/TK for different chemicals and added important missing information. No general difference in PK/TK-characteristics was found between the selected chemicals and pharmaceutical drugs.
46.	Fagerholm, Urban, et al. (författare) The Impact of Reference Data Selection for the Prediction Accuracy of Intrinsic Hepatic Metabolic Clearance 2022 Ingår i: Journal of Pharmaceutical Sciences. - : Elsevier. - 0022-3549 .- 1520-6017. ; 111:9, s. 2645-2649 Tidskriftsartikel (refereegranskat)abstract In vitro-in vivo prediction results for hepatic metabolic clearance (CLH) and intrinsic CLH (CLint) vary widely among studies. Reasons are not fully investigated and understood. The possibility to select favorable reference data for in vivo CLH and CLint and unbound fraction in plasma (f(u)) is among possible explanations. The main objective was to investigate how reference data selection influences log in vitro and in vivo CLint-correlations (r(2)). Another aim was to make a head-to-head comparison vs an in silico prediction method. Human hepatocyte CLint-data for 15 compounds from two studies were selected. These were correlated to in vivo CLint estimated using different reported CLH- and f(u)-estimates. Depending on the choice of reference data, r(2) from two studies were 0.07 to 0.86 and 0.06 to 0.79. When using average reference estimates a r(2) of 0.62 was achieved. Inclusion of two outliers in one of the studies resulted in a r(2) of 0.38, which was lower than the predictive accuracy (q(2)) for the in silico method (0.48). In conclusion, the selection of reference data appears to play a major role for demonstrated predictions and the in silico method showed higher accuracy and wider range than hepatocytes for human in vivo CLint-predictions. (C) 2022 Published by Elsevier Inc. on behalf of American Pharmacists Association.
47.	Francisco Rodríguez, María Andreína, 1984-, et al. (författare) A Constraint Programming Approach to Microplate Layout Design 2020 Konferensbidrag (refereegranskat)
48.	Francisco Rodríguez, María Andreína, 1984-, et al. (författare) Designing microplate layouts using artificial intelligence 2023 Ingår i: Artificial Intelligence in the Life Sciences. - : Elsevier. - 2667-3185. ; 3 Tidskriftsartikel (refereegranskat)abstract Microplates are indispensable in large-scale biomedical experiments but the physical location of samples and controls on the microplate can significantly affect the resulting data and quality metric values. We introduce a new method based on constraint programming for designing microplate layouts that reduces unwanted bias and limits the impact of batch effects after error correction and normalisation. We demonstrate that our method applied to dose-response experiments leads to more accurate regression curves and lower errors when estimating IC50/EC50, and for drug screening leads to increased precision, when compared to random layouts. It also reduces the risk of inflated scores from common microplate quality assessment metrics such as Z' factor and SSMD. We make our method available via a suite of tools (PLAID) including a reference constraint model, a web application, and Python notebooks to evaluate and compare designs when planning microplate experiments.
49.	Gauraha, Niharika, et al. (författare) Robust Knowledge Transfer in Learning Under Privileged Information Framework Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models; data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge transfer approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules.Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function.Inspired by the cross-validation approach, we propose to partition the training data into K folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged features - we refer to this a robust knowledge transfer. We conduct empirical evaluation considering four different experimental setups using one synthetic and three real datasets. These experiments demonstrate that our approach yields improved accuracy as compared to LUPI with standard knowledge transfer.
50.	Gauraha, Niharika, et al. (författare) Split knowledge transfer in learning under privileged information framework 2019 Ingår i: Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications. - : PMLR. ; , s. 43-52 Konferensbidrag (refereegranskat)abstract Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models, data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge transfer approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules. Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function. Inspired by the cross-validation approach, we propose to partition the training data into $K$ folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged featuresâ€”we refer to this as split knowledge transfer. We evaluate the method using four different experimental setups comprising one synthetic and three real datasets. The results indicate that our approach leads to improved accuracy as compared to LUPI with standard knowledge transfer.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-50 av 165

Avgränsa träffmängd

Typ av publikation: tidskriftsartikel (119); konferensbidrag (13); doktorsavhandling (12); annan publikation (9); forskningsöversikt (7); bokkapitel (4); visa fler...; bok (1); visa färre...

Typ av innehåll: refereegranskat (128); övrigt vetenskapligt/konstnärligt (33); populärvet., debatt m.m. (4)

Författare/redaktör: Spjuth, Ola, Profess ... (53); Spjuth, Ola, 1977- (52); Spjuth, Ola, Docent, ... (27); Spjuth, Ola (24); Carlsson, Lars (16); Eklund, Martin (14); visa fler...; Alvarsson, Jonathan, ... (14); Alvarsson, Jonathan (14); Lampa, Samuel (14); Willighagen, Egon (14); Carreras-Puigvert, J ... (13); Kultima, Kim (12); Capuccini, Marco (12); Lapins, Maris (12); Dahlö, Martin (11); Hellander, Andreas (10); Schaal, Wesley, PhD (10); Wikberg, Jarl E. S. (10); Wikberg, Jarl (10); Herman, Stephanie (10); Harrison, Philip J (9); Larsson, Anders (8); Norinder, Ulf, 1956- (8); Wählby, Carolina, pr ... (8); Steinbeck, Christoph (8); Gauraha, Niharika (8); Emami Khoonsari, Pay ... (7); Toor, Salman (7); Berg, Arvid (7); Wieslander, Håkan (7); Fagerholm, Urban (7); Jeliazkova, Nina (7); Rietdijk, Jonne (7); Eklund, Martin, 1978 ... (7); Hellberg, Sven (7); Laure, Erwin (6); Ahlberg, Ernst (6); Burman, Joachim, 197 ... (6); Grafström, Roland (6); Salek, Reza M (6); Rocca-Serra, Philipp ... (6); Spjuth, Ola, Docent (6); Neumann, Steffen (5); Arvidsson McShane, S ... (5); Kale, Namrata (5); Schober, Daniel (5); Bender, Andreas (5); Sintorn, Ida-Maria, ... (5); Georgiev, Polina (5); Pireddu, Luca (5); visa färre...

Lärosäte: Uppsala universitet (162); Karolinska Institutet (16); Örebro universitet (9); Kungliga Tekniska Högskolan (8); Stockholms universitet (7); Sveriges Lantbruksuniversitet (3); visa fler...; Umeå universitet (2); Jönköping University (2); Göteborgs universitet (1); Lunds universitet (1); Malmö universitet (1); Chalmers tekniska högskola (1); Blekinge Tekniska Högskola (1); visa färre...

Språk: Engelska (163); Tyska (2)

Forskningsämne (UKÄ/SCB): Naturvetenskap (116); Medicin och hälsovetenskap (58); Teknik (11); Lantbruksvetenskap (1); Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Stäng

Kopiera och spara länken för att återkomma till aktuell vy