SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Engkvist Ola) "

Sökning: WFRF:(Engkvist Ola)

  • Resultat 1-43 av 43
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Alvarsson, Jonathan, et al. (författare)
  • Ligand-Based Target Prediction with Signature Fingerprints
  • 2014
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 54:10, s. 2647-2653
  • Tidskriftsartikel (refereegranskat)abstract
    • When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.
  •  
2.
  •  
3.
  • Ahlberg, Ernst, et al. (författare)
  • Using conformal prediction to prioritize compound synthesis in drug discovery
  • 2017
  • Ingår i: Proceedings of Machine Learning Research. - Stockholm : Machine Learning Research. ; , s. 174-184
  • Konferensbidrag (refereegranskat)abstract
    • The choice of how much money and resources to spend to understand certain problems is of high interest in many areas. This work illustrates how computational models can be more tightly coupled with experiments to generate decision data at lower cost without reducing the quality of the decision. Several different strategies are explored to illustrate the trade off between lowering costs and quality in decisions.AUC is used as a performance metric and the number of objects that can be learnt from is constrained. Some of the strategies described reach AUC values over 0.9 and outperforms strategies that are more random. The strategies that use conformal predictor p-values show varying results, although some are top performing.The application studied is taken from the drug discovery process. In the early stages of this process compounds, that potentially could become marketed drugs, are being routinely tested in experimental assays to understand the distribution and interactions in humans.
  •  
4.
  • Atance, Sara Romeo, et al. (författare)
  • De Novo Drug Design Using Reinforcement Learning with Graph- Based Deep Generative Models
  • 2022
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-960X .- 1549-9596. ; 62:20, s. 4863-4872
  • Tidskriftsartikel (refereegranskat)abstract
    • Machine learning provides effective computational tools for exploring the chemical space via deep generative models. Here, we propose a new reinforcement learning scheme to finetune graph-based deep generative models for de novo molecular design tasks. We show how our computational framework can successfully guide a pretrained generative model toward the generation of molecules with a specific property profile, even when such molecules are not present in the training set and unlikely to be generated by the pretrained model. We explored the following tasks: generating molecules of decreasing/increasing size, increasing drug-likeness, and increasing bioactivity. Using the proposed approach, we achieve a model which generates diverse compounds with predicted DRD2 activity for 95% of sampled molecules, outperforming previously reported methods on this metric.
  •  
5.
  • Bender, Andreas, et al. (författare)
  • Evaluation guidelines for machine learning tools in the chemical sciences
  • 2022
  • Ingår i: Nature Reviews Chemistry. - : Springer Science and Business Media LLC. - 2397-3358. ; 6:6, s. 428-442
  • Tidskriftsartikel (refereegranskat)abstract
    • Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences. [Figure not available: see fulltext.]
  •  
6.
  • Bonner, Stephen, et al. (författare)
  • A review of biomedical datasets relating to drug discovery: a knowledge graph perspective
  • 2022
  • Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; In Press
  • Forskningsöversikt (refereegranskat)abstract
    • Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.
  •  
7.
  • Bonner, Stephen, et al. (författare)
  • Implications of topological imbalance for representation learning on biomedical knowledge graphs
  • 2022
  • Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; In Press
  • Tidskriftsartikel (refereegranskat)abstract
    • Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.
  •  
8.
  • Bost, Jeremy P., et al. (författare)
  • Novel endosomolytic compounds enable highly potent delivery of antisense oligonucleotides
  • 2022
  • Ingår i: Communications Biology. - : Springer Science and Business Media LLC. - 2399-3642. ; 5:1
  • Tidskriftsartikel (refereegranskat)abstract
    • The therapeutic and research potentials of oligonucleotides (ONs) have been hampered in part by their inability to effectively escape endosomal compartments to reach their cytosolic and nuclear targets. Splice-switching ONs (SSOs) can be used with endosomolytic small molecule compounds to increase functional delivery. So far, development of these compounds has been hindered by a lack of high-resolution methods that can correlate SSO trafficking with SSO activity. Here we present in-depth characterization of two novel endosomolytic compounds by using a combination of microscopic and functional assays with high spatiotemporal resolution. This system allows the visualization of SSO trafficking, evaluation of endosomal membrane rupture, and quantitates SSO functional activity on a protein level in the presence of endosomolytic compounds. We confirm that the leakage of SSO into the cytosol occurs in parallel with the physical engorgement of LAMP1-positive late endosomes and lysosomes. We conclude that the new compounds interfere with SSO trafficking to the LAMP1-positive endosomal compartments while inducing endosomal membrane rupture and concurrent ON escape into the cytosol. The efficacy of these compounds advocates their use as novel, potent, and quick-acting transfection reagents for antisense ONs.
  •  
9.
  • Buendia, Ruben, et al. (författare)
  • Accurate Hit Estimation for Iterative Screening Using Venn-ABERS Predictors
  • 2019
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 59:3, s. 1230-1237
  • Tidskriftsartikel (refereegranskat)abstract
    • Iterative screening has emerged as a promising approach to increase the efficiency of high-throughput screening (HTS) campaigns in drug discovery. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models. One of the challenges of iterative screening is to decide how many iterations to perform. This is mainly related to difficulties in estimating the prospective hit rate in any given iteration. In this article, a novel method based on Venn - ABERS predictors is proposed. The method provides accurate estimates of the number of hits retrieved in any given iteration during an HTS campaign. The estimates provide the necessary information to support the decision on the number of iterations needed to maximize the screening outcome. Thus, this method offers a prospective screening strategy for early-stage drug discovery.
  •  
10.
  • Buonfiglio, Rosa, et al. (författare)
  • Investigating Pharmacological Similarity by Charting Chemical Space
  • 2015
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-9596 .- 1549-960X. ; 55:11, s. 2375-2390
  • Tidskriftsartikel (refereegranskat)abstract
    • In this study, biologically relevant areas of the chemical space were analyzed using ChemGPS-NP. This application enables comparing groups of ligands within a multidimensional space based on principle components derived from physicochemical descriptors. Also, 3D visualization of the ChemGPS-NP global map can be used to conveniently evaluate bioactive compound similarity and visually distinguish between different types or groups of compounds. To further establish ChemGPS-NP as a method to accurately represent the chemical space, a comparison with structure-based fingerprint has been performed. Interesting complementarities between the two descriptions of molecules were observed. It has been shown that the accuracy of describing molecules with physicochemical descriptors like in ChemGPS-NP is similar to the accuracy of structural fingerprints in retrieving bioactive molecules. Lastly, pharmacological similarity of structurally diverse compounds has been investigated in ChemGPS-NP space. These results further strengthen the case of using ChemGPS-NP as a tool to explore and visualize chemical space.
  •  
11.
  • Chen, Hongming, et al. (författare)
  • In silico prediction of unbound brain-to-plasma concentration ratio using machine learning algorithms
  • 2011
  • Ingår i: Journal of Molecular Graphics and Modelling. - : Elsevier BV. - 1093-3263 .- 1873-4243. ; 29:8, s. 985-995
  • Tidskriftsartikel (refereegranskat)abstract
    • Distribution over the blood-brain barrier (BBB) is an important parameter to consider for compounds that will be synthesized in a drug discovery project. Drugs that aim at targets in the central nervous system (CNS) must pass the BBB. In contrast, drugs that act peripherally are often optimised to minimize the risk of CNS side effects by restricting their potential to reach the brain. Historically, most prediction methods have focused on the total compound distribution between the blood plasma and the brain. However, recently it has been proposed that the unbound brain-to-plasma concentration ratio (K(p,uu,brain)) is more relevant. In the current study, quantitative K(p,uu,brain) prediction models have been built on a set of 173 in-house compounds by using various machine learning algorithms. The best model was shown to be reasonably predictive for the test set of 73 compounds (R(2) = 0.58). When used for qualitative prediction the model shows an accuracy of 0.85 (Kappa = 0.68). An additional external test set containing 111 marketed CNS active drugs was also classified with the model and 89% of these drugs were correctly predicted as having high brain exposure.
  •  
12.
  • Conn, Jonathan G.M., et al. (författare)
  • Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models
  • 2023
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-960X .- 1549-9596. ; 63:4, s. 1099-1113
  • Tidskriftsartikel (refereegranskat)abstract
    • Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge"in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.
  •  
13.
  • Engkvist, Ola, et al. (författare)
  • On the relation between retention indexes and the interaction between the solute and the column in gas-liquid chromatography
  • 1996
  • Ingår i: Journal of chemical information and computer sciences. - : American Chemical Society (ACS). - 0095-2338 .- 1520-5142. ; 36:6, s. 1153-1161
  • Tidskriftsartikel (refereegranskat)abstract
    • Gas-liquid chromatography retention indexes for organic molecules are determined by the interaction between the molecule and the column liquid phase. In this article, a model for calculating the interaction energy between a molecule and a dielectric wall is developed. The model is at least to our knowledge the first attempt to predict retention indexes from the interaction between the molecules and the column. This approach to predict retention indexes is radically different from methods proposed before. Earlier predictions of the retention indexes have been done by a large number of descriptors, which were Linearly correlated to the retention indexes. The developed model has been tested for polycyclic aromatic hydrocarbons mainly with a molecular weight of 302. For the molecules with MW 302 the obtained correlation coefficient is 0.92. A somewhat simpler model is used to fit PAH with different MWs. A correlation coefficient of 0.998 is obtained if the retention indexes were fitted to the logarithm of the interaction energies between the PAHs and the column.
  •  
14.
  • Fialková, Vendy, et al. (författare)
  • LibINVENT: Reaction-based Generative Scaffold Decoration for in Silico Library Design
  • 2022
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-960X .- 1549-9596. ; 62:9, s. 2046-2063
  • Tidskriftsartikel (refereegranskat)abstract
    • Because of the strong relationship between the desired molecular activity and its structural core, the screening of focused, core-sharing chemical libraries is a key step in lead optimization. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been proposed. In this work, we present a novel tool for de novo drug design called LibINVENT. It is capable of rapidly proposing chemical libraries of compounds sharing the same core while maximizing a range of desirable properties. To further help the process of designing focused libraries, the user can list specific chemical reactions that can be used for the library creation. LibINVENT is therefore a flexible tool for generating virtual chemical libraries for lead optimization in a broad range of scenarios. Additionally, the shared core ensures that the compounds in the library are similar, possess desirable properties, and can also be synthesized under the same or similar conditions. The LibINVENT code is freely available in our public repository at https://github.com/MolecularAI/Lib-INVENT. The code necessary for data preprocessing is further available at: https://github.com/MolecularAI/Lib-INVENT-dataset.
  •  
15.
  • Genheden, Samuel, et al. (författare)
  • Prediction of the Chemical Context for Buchwald-Hartwig Coupling Reactions
  • 2022
  • Ingår i: Molecular Informatics. - : Wiley. - 1868-1751 .- 1868-1743. ; 41:8
  • Tidskriftsartikel (refereegranskat)abstract
    • We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions, i. e., what chemicals to add to the reactants to give a productive reaction. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy of approximately 90 %, which suggests strong predictivity. Furthermore, there seems to be an advantage of including multi-label data because the multi-label model shows higher accuracy and better sensitivity for the individual contexts than the single-label model. Although the models are performant, we also show that such models need to be re-trained periodically as there is a strong temporal characteristic to the usage of different contexts. Therefore, a model trained on historical data will decrease in usefulness with time as newer and better contexts emerge and replace older ones. We hypothesize that such significant transitions in the context-usage will likely affect any model predicting chemical contexts trained on historical data. Consequently, training context prediction models warrants careful planning of what data is used for training and how often the model needs to be re-trained.
  •  
16.
  • Geylan, Gökçe, 1996, et al. (författare)
  • A methodology to correctly assess the applicability domain of cell membrane permeability predictors for cyclic peptides
  • 2024
  • Ingår i: Digital Discovery. - : Royal Society of Chemistry. - 2635-098X.
  • Tidskriftsartikel (refereegranskat)abstract
    • Being able to predict the cell permeability of cyclic peptides is essential for unlocking their potential as a drug modality for intracellular targets. With a wide range of studies of cell permeability but a limited number of data points, the reliability of the machine learning (ML) models to predict previously unexplored chemical spaces becomes a challenge. In this work, we systemically investigate the predictive capability of ML models from the perspective of their extrapolation to never-before-seen applicability domains, with a particular focus on the permeability task. Four predictive algorithms, namely Support-Vector Machine, Random Forest, LightGBM and XGBoost, jointly with a conformal prediction framework were employed to characterize and evaluate the applicability through uncertainty quantification. Efficiency and validity of the models' predictions with multiple calibration strategies were assessed with respect to several external datasets from different parts of the chemical space through a set of experiments. The experiments showed that the predictors generalizing well to the applicability domain defined by the training data, can fail to achieve similar model performance on other parts of the chemical spaces. Our study proposes an approach to overcome such limitations by the means of improving the efficiency of models without sacrificing the validity. The trade-off between the reliability and informativeness was balanced when the models were calibrated with a subset of the data from the new targeted domain. This study outlines an approach to enable the extrapolation of predictive power and restore the models' reliability via a recalibration strategy without the need for retraining the underlying model. This work outlines peptide predictive model methodology with conformal prediction, focusing on extrapolation task. Calibrating on the unseen chemical space recovers efficiency and validity enabling reliable predictions without retraining the models.
  •  
17.
  • Gummesson Svensson, Hampus, 1996, et al. (författare)
  • Autonomous Drug Design with Multi-Armed Bandits
  • 2022
  • Ingår i: Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. ; , s. 5584-5592
  • Konferensbidrag (refereegranskat)abstract
    • Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design.
  •  
18.
  • Gummesson Svensson, Hampus, 1996, et al. (författare)
  • Utilizing reinforcement learning for de novo drug design
  • 2024
  • Ingår i: MACHINE LEARNING. - 0885-6125 .- 1573-0565.
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last few years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design.
  •  
19.
  • Guo, Jeff, et al. (författare)
  • DockStream: a docking wrapper to enhance de novo molecular design
  • 2021
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream.
  •  
20.
  • Guo, Jeff, et al. (författare)
  • Improving de novo molecular design with curriculum learning
  • 2022
  • Ingår i: Nature Machine Intelligence. - : Springer Science and Business Media LLC. - 2522-5839. ; 4:6, s. 555-563
  • Tidskriftsartikel (refereegranskat)abstract
    • While reinforcement learning can be a powerful tool for complex design tasks such as molecular design, training can be slow when problems are either too hard or too easy, as little is learned in these cases. Jeff Guo and colleagues provide a curriculum learning extension to the REINVENT de novo molecular design framework that provides problems of increasing difficulty over epochs such that the training process is more efficient. Reinforcement learning is a powerful paradigm that has gained popularity across multiple domains. However, applying reinforcement learning may come at the cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive periods of non-productivity. Curriculum learning provides a suitable alternative by arranging a sequence of tasks of increasing complexity, with the aim of reducing the overall cost of learning. Here we demonstrate the application of curriculum learning for drug discovery. We implement curriculum learning in the de novo design platform REINVENT, and apply it to illustrative molecular design problems of different complexities. The results show both accelerated learning and a positive impact on the quality of the output when compared with standard policy-based reinforcement learning.
  •  
21.
  • Guo, Jeff, et al. (författare)
  • Link-INVENT: generative linker design with reinforcement learning
  • 2023
  • Ingår i: Digital Discovery. - : Royal Society of Chemistry (RSC). - 2635-098X. ; 2:2, s. 392-408
  • Tidskriftsartikel (refereegranskat)abstract
    • In this work, we present Link-INVENT as an extension to the existing de novo molecular design platform REINVENT. We provide illustrative examples on how Link-INVENT can be applied to fragment linking, scaffold hopping, and PROTAC design case studies where the desirable molecules should satisfy a combination of different criteria. With the help of reinforcement learning, the agent used by Link-INVENT learns to generate favourable linkers connecting molecular subunits that satisfy diverse objectives, facilitating practical application of the model for real-world drug discovery projects. We also introduce a range of linker-specific objectives in the Scoring Function of REINVENT. The code is freely available at https://github.com/MolecularAI/Reinvent.
  •  
22.
  • Hansson, Mari, et al. (författare)
  • On the Relationship between Molecular Hit Rates in High-Throughput Screening and Molecular Descriptors
  • 2014
  • Ingår i: Journal of Biomolecular Screening. - : Elsevier BV. - 1087-0571 .- 1552-454X. ; 19:5, s. 727-737
  • Tidskriftsartikel (refereegranskat)abstract
    • W High-throughput screening (HTS) is widely used in the pharmaceutical industry to identify novel chemical starting points for drug discovery projects. The current study focuses on the relationship between molecular hit rate in recent in-house HTS and four common molecular descriptors: lipophilicity (ClogP), size (heavy atom count, HEV), fraction of sp(3)-hybridized carbons (Fsp3), and fraction of molecular framework (f(MF)). The molecular hit rate is defined as the fraction of times the molecule has been assigned as active in the HTS campaigns where it has been screened. Beta-binomial statistical models were built to model the molecular hit rate as a function of these descriptors. The advantage of the beta-binomial statistical models is that the correlation between the descriptors is taken into account. Higher degree polynomial terms of the descriptors were also added into the beta-binomial statistic model to improve the model quality. The relative influence of different molecular descriptors on molecular hit rate has been estimated, taking into account that the descriptors are correlated to each other through applying beta-binomial statistical modeling. The results show that ClogP has the largest influence on the molecular hit rate, followed by Fsp3 and HEV. f(MF) has only a minor influence besides its correlation with the other molecular descriptors.
  •  
23.
  • He, Jiazhen, et al. (författare)
  • Evaluation of reinforcement learning in transformer-based molecular design
  • 2024
  • Ingår i: Journal of Cheminformatics. - 1758-2946 .- 1758-2946. ; 16:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules. This provides a starting point for generating similar molecules to a given input molecule, but has limited flexibility regarding user-defined property profiles. Here, we evaluate the effect of reinforcement learning on transformer-based molecular generative models. The generative model can be considered as a pre-trained model with knowledge of the chemical space close to an input compound, while reinforcement learning can be viewed as a tuning phase, steering the model towards chemical space with user-specific desirable properties. The evaluation of two distinct tasks—molecular optimization and scaffold discovery—suggest that reinforcement learning could guide the transformer-based generative model towards the generation of more compounds of interest. Additionally, the impact of pre-trained models, learning steps and learning rates are investigated. Scientific contribution Our study investigates the effect of reinforcement learning on a transformer-based generative model initially trained for generating molecules similar to starting molecules. The reinforcement learning framework is applied to facilitate multiparameter optimisation of starting molecules. This approach allows for more flexibility for optimizing user-specific property profiles and helps finding more ideas of interest.
  •  
24.
  • He, Jiazhen, et al. (författare)
  • Molecular optimization by capturing chemist’s intuition using deep neural networks
  • 2021
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist’s intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.
  •  
25.
  • He, Jiazhen, et al. (författare)
  • Transformer-based molecular optimization beyond matched molecular pairs
  • 2022
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.
  •  
26.
  • Johansson, Simon, 1994, et al. (författare)
  • De novo generated combinatorial library design
  • 2024
  • Ingår i: Digital Discovery. - 2635-098X. ; 3:1, s. 122-135
  • Tidskriftsartikel (refereegranskat)abstract
    • Artificial intelligence (AI) contributes new methods for designing compounds in drug discovery, ranging from de novo design models suggesting new molecular structures or optimizing existing leads to predictive models evaluating their toxicological properties. However, a limiting factor for the effectiveness of AI methods in drug discovery is the lack of access to high-quality data sets leading to a focus on approaches optimizing data generation. Combinatorial library design is a popular approach for bioactivity testing as a large number of molecules can be synthesized from a limited number of building blocks. We propose a framework for designing combinatorial libraries using a molecular generative model to generate building blocks de novo, followed by using k-determinantal point processes and Gibbs sampling to optimize a selection from the generated blocks. We explore optimization of biological activity, Quantitative Estimate of Drug-likeness (QED) and diversity and the trade-offs between them, both in single-objective and in multi-objective library design settings. Using retrosynthesis models to estimate building block availability, the proposed framework is able to explore the prospective benefit from expanding a stock of available building blocks by synthesis or by purchasing the preferred building blocks before designing a library. In simulation experiments with building block collections from all available commercial vendors near-optimal libraries could be found without synthesis of additional building blocks; in other simulation experiments we showed that even one synthesis step to increase the number of available building blocks could improve library designs when starting with an in-house building block collection of reasonable size.
  •  
27.
  • Johansson, Simon, 1994, et al. (författare)
  • Diverse Data Expansion with Semi-Supervised k-Determinantal Point Processes
  • 2023
  • Ingår i: Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023. ; , s. 5260-5265
  • Konferensbidrag (refereegranskat)abstract
    • Determinantal point processes (DPPs) have become prominent in data summarization and recommender system tasks for their ability to simultaneously model diversity as well as relevance. In practical applications, k-Determinantal point processes (k-DPPs) are used to yield a selection of k items from a set of size N that are the most representative of the set. In this paper, we study a special case of the diverse subset selection problem where a fixed set GO is already given as a forced recommendation and the task is to determine the remainder of the recommendation G1. The standard k-DPP optimization objectives here can suggest items that are close to optimal when considering only items in G1, but are arbitrarily close to items in G0, i.e., they might not be sufficiently diverse w.r.t. G0. We explore a semi-supervised k-DPP objective that simultaneously considers G0 and G1 and compares the difference between the two recommendations. We demonstrate our findings using multiple examples where the diverse subset selection problem with forced recommendation is important in practice.
  •  
28.
  • Johansson, Simon, 1994, et al. (författare)
  • Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction
  • 2022
  • Ingår i: Molecular Informatics. - : Wiley. - 1868-1743 .- 1868-1751. ; 41:12
  • Tidskriftsartikel (refereegranskat)abstract
    • Computer aided synthesis planning, suggesting synthetic routes for molecules of interest, is a rapidly growing field. The machine learning methods used are often dependent on access to large datasets for training, but finite experimental budgets limit how much data can be obtained from experiments. This suggests the use of schemes for data collection such as active learning, which identifies the data points of highest impact for model accuracy, and which has been used in recent studies with success. However, little has been done to explore the robustness of the methods predicting reaction yield when used together with active learning to reduce the amount of experimental data needed for training. This study aims to investigate the influence of machine learning algorithms and the number of initial data points on reaction yield prediction for two public high-throughput experimentation datasets. Our results show that active learning based on output margin reached a pre-defined AUROC faster than random sampling on both datasets. Analysis of feature importance of the trained machine learning models suggests active learning had a larger influence on the model accuracy when only a few features were important for the model prediction.
  •  
29.
  • Menke, Janosch, 1995, et al. (författare)
  • Metis: a python-based user interface to collect expert feedback for generative chemistry models
  • 2024
  • Ingår i: Journal of Cheminformatics. - 1758-2946 .- 1758-2946. ; 16:1
  • Tidskriftsartikel (refereegranskat)abstract
    • One challenge that current de novo drug design models face is a disparity between the user’s expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists’ implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is continuously increasing, no tool currently exists that enables the collection of standardized and chemistry-specific feedback. Metis is a Python-based open-source graphical user interface (GUI), designed to solve this and enable the collection of chemists’ detailed feedback on molecular structures. The GUI enables chemists to explore and evaluate molecules, offering a user-friendly interface for annotating preferences and specifying desired or undesired structural features. By providing chemists the opportunity to give detailed feedback, allows researchers to capture more efficiently the chemist’s implicit knowledge and preferences. This knowledge is crucial to align the chemist’s idea with the de novo design agents. The GUI aims to enhance this collaboration between the human and the “machine” by providing an intuitive platform where chemists can interactively provide feedback on molecular structures, aiding in preference learning and refining de novo design strategies. Metis integrates with the existing de novo framework REINVENT, creating a closed-loop system where human expertise can continuously inform and refine the generative models. Scientific contribution We introduce a novel Graphical User Interface, that allows chemists/researchers to give detailed feedback on substructures and properties of small molecules. This tool can be used to learn the preferences of chemists in order to align de novo drug design models with the chemist’s ideas. The GUI can be customized to fit different needs and projects and enables direct integration into de novo REINVENT runs. We believe that Metis can facilitate the discussion and development of novel ways to integrate human feedback that goes beyond binary decisions of liking or disliking a molecule.
  •  
30.
  • Mercado, Rocio, 1992, et al. (författare)
  • Exploring Graph Traversal Algorithms in Graph-Based Molecular Generation
  • 2022
  • Ingår i: Journal of Chemical Information and Modeling. - : American Chemical Society (ACS). - 1549-960X .- 1549-9596. ; 62:9, s. 2093-2100
  • Tidskriftsartikel (refereegranskat)abstract
    • Here, we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better coverage of training data features compared to a depth-first traversal. We have quantified these differences using a variety of metrics on a data set of natural products. These metrics include percent validity, molecular coverage, and molecular shape. We also observe that by using either a breadth- or depth-first traversal it is possible to overtrain the generative models, at which point the results with either graph traversal algorithm are identical.
  •  
31.
  • Mercado, Rocio, 1992, et al. (författare)
  • Graph networks for molecular design
  • 2021
  • Ingår i: Machine Learning: Science and Technology. - : IOP Publishing. - 2632-2153. ; 2:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work compares six different GNN-based generative models in GraphINVENT, and shows that ultimately the gated-graph neural network performs best against the metrics considered here.
  •  
32.
  • Mercado, Rocio, 1992, et al. (författare)
  • Practical notes on building molecular graph generative models
  • 2020
  • Ingår i: Applied AI Letters. - : Wiley. - 2689-5595. ; 1:2
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Here are presented technical notes and tips on developing graph generative models for molecular design. Although this work stems from the development of GraphINVENT, a Python platform for iterative molecular generation using graph neural networks, this work is relevant to researchers studying other architectures for graph-based molecular design. In this work, technical details that could be of interest to researchers developing their own molecular generative models are discussed, including an overview of previous work in graph-based molecular design and strategies for designing new models. Advice on development and debugging tools which are helpful during code development is also provided. Finally, methods that were tested but which ultimately did not lead to promising results in the development of GraphINVENT are described here in the hope that this will help other researchers avoid pitfalls in development and instead focus their efforts on more promising strategies for graph-based molecular generation.
  •  
33.
  • Mervin, Lewis H., et al. (författare)
  • Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty
  • 2021
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
  •  
34.
  • Mervin, Lewis H., et al. (författare)
  • QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design
  • 2024
  • Ingår i: Journal of Chemical Information and Modeling. - 1549-960X .- 1549-9596. ; 64:14, s. 5365-5374
  • Tidskriftsartikel (refereegranskat)abstract
    • Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.
  •  
35.
  • Mervin, Lewis H., et al. (författare)
  • Uncertainty quantification in drug design
  • 2021
  • Ingår i: Drug Discovery Today. - : Elsevier BV. - 1878-5832 .- 1359-6446. ; 26:2, s. 474-489
  • Forskningsöversikt (refereegranskat)abstract
    • Machine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions. Uncertainty quantification will become a key component in autonomous decision making and will be crucial for integrating machine learning and chemistry automation to create an autonomous design–make–test–analyse cycle. This review covers the empirical, frequentist and Bayesian approaches to uncertainty quantification, and outlines how they can be used for drug design. We also outline the impact of uncertainty quantification on decision making.
  •  
36.
  • Moore, J. Harry, et al. (författare)
  • Icolos: a workflow manager for structure-based post-processing of de novo generated small molecules
  • 2022
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 38:21, s. 4951-4952
  • Tidskriftsartikel (refereegranskat)abstract
    • A Summary: We present Icolos, a workflow manager written in Python as a tool for automating complex structure-based workflows for drug design. Icolos can be used as a standalone tool, for example in virtual screening campaigns, or can be used in conjunction with deep learning-based molecular generation facilitated for example by REINVENT, a previously published molecular de novo design package. In this publication, we focus on the internal structure and general capabilities of Icolos, using molecular docking experiments as an illustrative example.
  •  
37.
  • Oldenhof, Martijn, et al. (författare)
  • Industry-Scale Orchestrated Federated Learning for Drug Discovery
  • 2023
  • Ingår i: Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023. ; 37, s. 15576-15584
  • Konferensbidrag (refereegranskat)abstract
    • To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n°831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.
  •  
38.
  • Rydholm, Emma, 1994, et al. (författare)
  • Expanding the chemical space using a chemical reaction knowledge graph
  • 2024
  • Ingår i: Digital Discovery. - 2635-098X. ; 3:7, s. 1378-1388
  • Tidskriftsartikel (refereegranskat)abstract
    • In this work, we present a new molecular de novo design approach which utilizes a knowledge graph encoding chemical reactions, extracted from the publicly available USPTO (United States Patent and Trademark Office) dataset. Our proposed method can be used to expand the chemical space by performing forward synthesis prediction by finding new combinations of reactants in the knowledge graph and can in this way generate libraries of de novo compounds along with a valid synthetic route. The forward synthesis prediction of novel compounds involves two steps. In the first step, a graph neural network-based link prediction model is used to suggest pairs of existing reactant nodes in the graph that are likely to react. In the second step, product prediction is performed using a molecular transformer model to obtain the potential products for the suggested reactant pairs. We achieve a ROC-AUC score of 0.861 for link prediction in the knowledge graph and for the product prediction, a top-1 accuracy of 0.924. The method's utility is demonstrated by generating a set of de novo compounds by predicting high probability reactions in the USPTO. The generated compounds are diverse in nature and many exhibit drug-like properties. A brief comparison with a template-based library design is provided. Furthermore, evaluation of the potential activity using a quantitative structure-activity relationship (QSAR) model suggested the presence of potential dopamine receptor D2 (DRD2) modulators among the proposed compounds. In summary, our results suggest that the proposed method can expand the easily accessible chemical space, by combining known compounds, and identify novel drug-like compounds for a specific target.
  •  
39.
  • Shevtsov, Oleksii, 1988, et al. (författare)
  • A de novo molecular generation method using latent vector based generative adversarial network
  • 2019
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 11:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: One to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.[Figure not available: See fulltext.]
  •  
40.
  • Sundin, Iiris, et al. (författare)
  • Human-in-the-loop assisted de novo molecular design
  • 2022
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer’s implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user’s feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user’s idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system. Graphical Abstract: [Figure not available: see fulltext.].
  •  
41.
  • Thakkar, Amol, et al. (författare)
  • Artificial intelligence and automation in computer aided synthesis planning
  • 2021
  • Ingår i: Reaction Chemistry and Engineering. - : Royal Society of Chemistry (RSC). - 2058-9883. ; 6:1, s. 27-51
  • Forskningsöversikt (refereegranskat)abstract
    • In this perspective we deal with questions pertaining to the development of synthesis planning technologies over the course of recent years. We first answer the question: what is computer assisted synthesis planning (CASP) and why is it relevant to drug discovery and development? We draw a distinction between discovery and development, focusing on their differing requirements. We highlight the need for an automated synthesis platform which chemists can use to augment their workflows and what it entails. The interaction between experimental and computational scientists is emphasized as a key driver in the development of such technologies. Advances in the development and application of algorithms is then covered, drawing a distinction between physics based and statistical or data driven modelling paradigms, their use in, and how they contribute to augmented drug discovery and development. Finally, developments in the coupling of artificial intelligence and automation are discussed. Throughout, we emphasize the need for an inter-disciplinary approach, blurring the distinction between fields in the pursuit of artificial intelligence and automated platforms that can be integrated into chemical workflows.
  •  
42.
  • Viguera Diez, Juan, 1997, et al. (författare)
  • Generation of conformational ensembles of small molecules via surrogate model-assisted molecular dynamics
  • 2024
  • Ingår i: Machine Learning: Science and Technology. - 2632-2153. ; 5:2
  • Tidskriftsartikel (refereegranskat)abstract
    • The accurate prediction of thermodynamic properties is crucial in various fields such as drug discovery and materials design. This task relies on sampling from the underlying Boltzmann distribution, which is challenging using conventional approaches such as simulations. In this work, we introduce surrogate model-assisted molecular dynamics (SMA-MD), a new procedure to sample the equilibrium ensemble of molecules. First, SMA-MD leverages deep generative models to enhance the sampling of slow degrees of freedom. Subsequently, the generated ensemble undergoes statistical reweighting, followed by short simulations. Our empirical results show that SMA-MD generates more diverse and lower energy ensembles than conventional MD simulations. Furthermore, we showcase the application of SMA-MD for the computation of thermodynamical properties by estimating implicit solvation free energies.
  •  
43.
  • Vigueras-Guillén, Juan P., et al. (författare)
  • Parallel Capsule Networks for Classification of White Blood Cells
  • 2021
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 12907 LNCS, s. 743-752
  • Konferensbidrag (refereegranskat)abstract
    • Capsule Networks (CapsNets) is a machine learning architecture proposed to overcome some of the shortcomings of convolutional neural networks (CNNs). However, CapsNets have mainly outperformed CNNs in datasets where images are small and/or the objects to identify have minimal background noise. In this work, we present a new architecture, parallel CapsNets, which exploits the concept of branching the network to isolate certain capsules, allowing each branch to identify different entities. We applied our concept to the two current types of CapsNet architectures, studying the performance for networks with different layers of capsules. We tested our design in a public, highly unbalanced dataset of acute myeloid leukaemia images (15 classes). Our experiments showed that conventional CapsNets show similar performance than our baseline CNN (ResNeXt-50) but depict instability problems. In contrast, parallel CapsNets can outperform ResNeXt-50, is more stable, and shows better rotational invariance than both, conventional CapsNets and ResNeXt-50.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-43 av 43
Typ av publikation
tidskriftsartikel (34)
konferensbidrag (5)
forskningsöversikt (3)
bok (1)
Typ av innehåll
refereegranskat (41)
övrigt vetenskapligt/konstnärligt (2)
Författare/redaktör
Engkvist, Ola, 1967 (27)
Engkvist, Ola (16)
Haghir Chehreghani, ... (5)
Tyrchan, Christian (5)
Mercado, Rocio, 1992 (4)
Carlsson, Lars (3)
visa fler...
Voronov, Alexey (3)
Olsson, Simon, 1985 (3)
Bender, Andreas (3)
Johansson, Ulf (2)
Ahlberg, Ernst (2)
Winiwarter, Susanne (2)
Linusson, Henrik (2)
Norinder, Ulf, 1956- (2)
Alvarsson, Jonathan (2)
Spjuth, Ola, 1977- (2)
Viguera Diez, Juan, ... (2)
Schliep, Alexander, ... (2)
Kaski, Samuel (2)
Boström, Henrik (1)
Eklund, Martin (1)
Tyrchan, C (1)
Löfström, Tuve (1)
Hammar, Oscar (1)
Bendtsen, Claus (1)
Esbjörner Winters, E ... (1)
Stevens, Molly M. (1)
Lindh, Roland, 1958- (1)
Svensson, Emma (1)
Colmsjö, Anders (1)
Gallud, Audrey, 1988 (1)
Vilhelmsson Wesén, E ... (1)
Backlund, Anders (1)
Simm, Jaak (1)
Wikberg, Jarl E. S. (1)
Wikberg, Jarl (1)
Noeske, Tobias (1)
Gupta, Dhanu (1)
El-Andaloussi, Samir (1)
Gustafsson, Oskar (1)
Schliep, Alexander (1)
Sandström, Emil (1)
Atance, Sara Romeo (1)
Jorner, Kjell (1)
Vikeved, Elisabet (1)
Fridén, Markus (1)
Dahlén, Anders (1)
Holme, Margaret N. (1)
Andersson, Shalini (1)
Bemgård, Agneta (1)
visa färre...
Lärosäte
Chalmers tekniska högskola (34)
Uppsala universitet (7)
Göteborgs universitet (5)
Umeå universitet (2)
Örebro universitet (2)
Jönköping University (2)
visa fler...
Kungliga Tekniska Högskolan (1)
Karolinska Institutet (1)
visa färre...
Språk
Engelska (43)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (41)
Teknik (14)
Medicin och hälsovetenskap (9)
Samhällsvetenskap (3)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy