SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1758 2946 "

Sökning: L773:1758 2946

  • Resultat 1-49 av 49
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • O'Boyle, Noel, et al. (författare)
  • Open Data, Open Source and Open Standards in chemistry : The Blue Obelisk five years on
  • 2011
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 3, s. 37-
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data,Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistryresearch by promoting interoperability between chemistry software, encouraging cooperation between OpenSource developers, and developing community resources and Open Standards. Results: This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveysprogress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. Conclusions: We show that the Blue Obelisk has been very successful in bringing together researchers anddevelopers with common interests in ODOSOS, leading to development of many useful resources freely availableto the chemistry community
  •  
2.
  • Samwald, Matthias, et al. (författare)
  • Linked open drug data for pharmaceutical research and development
  • 2011
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 3, s. 19-
  • Tidskriftsartikel (refereegranskat)abstract
    • There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD) is a task force within the World Wide Web Consortium's (W3C) Health Care and Life Sciences Interest Group (HCLS IG). LODD has surveyed publicly available data about drugs, created Linked Data representations of the data sets, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task force provides recommendations for the best practices of exposing data in a Linked Data representation. In this paper, we present past and ongoing work of LODD and discuss the growing importance of Linked Data as a foundation for pharmaceutical R&D data sharing.
  •  
3.
  • Spjuth, Ola, 1977-, et al. (författare)
  • Applications of the InChI in cheminformatics with the CDK and Bioclipse
  • 2013
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 5:14
  • Tidskriftsartikel (refereegranskat)abstract
    • BackgroundThe InChI algorithms are written in C++ and not available as Java library. Integration into softwarewritten in Java therefore requires a bridge between C and Java libraries, provided by the Java NativeInterface (JNI) technology.ResultsWe here describe how the InChI library is used in the Bioclipse workbench and the Chemistry Development Kit (CDK) cheminformatics library. To make this possible, a JNI bridge to the InChIlibrary was developed, JNI-InChI, allowing Java software to access the InChI algorithms. By usingthis bridge, the CDK project packages the InChI binaries in a module and offers easy access fromJava using the CDK API. The Bioclipse project packages and offers InChI as a dynamic OSGi bundlethat can easily be used by any OSGi-compliant software, in addition to the regular Java Archive andMaven bundles. Bioclipse itself uses the InChI as a key component and calculates it on the fly whenvisualizing and editing chemical structures. We demonstrate the utility of InChI with various applications in CDK and Bioclipse, such as decision support for chemical liability assessment, tautomergeneration, and for knowledge aggregation using a linked data approach.ConclusionsThese results show that the InChI library can be used in a variety of Java library dependency solutions, making the functionality easily accessible by Java software, such as in the CDK. The applications show various ways the InChI has been used in Bioclipse, to enrich its functionality.
  •  
4.
  • Spjuth, Ola, 1977-, et al. (författare)
  • Towards interoperable and reproducible QSAR analyses : Exchange of data sets
  • 2010
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 2
  • Tidskriftsartikel (refereegranskat)abstract
    • BACKGROUND: QSAR/QSPR is a widely used method to relate chemical structures and responses based on ex- perimental observations. In QSAR, chemical structures are expressed as descriptors, which are mathematical representations like calculated properties or enumerated fragments. Many existing QSAR data sets are based on a combination of different software tools mixed with in-house developed solutions, with datasets manually assembled in spreadsheets. Currently there exists no agreed-upon definition of descriptors and no standard for exchanging data sets in QSAR, which together with numerous different descriptor implementations makes it a virtually impossible task to reproduce and validate analyses, and significantly hinders collaborations and re-use of data.RESULTS: We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR/QSPR data sets, comprising an open XML format (QSAR-ML) and an open extensible descriptor ontology (Blue Obelisk Descriptor Ontology). The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a data set described by QSAR-ML makes its setup completely reproducible. We also provide an implementation as a set of plugins for Bioclipse that simplifies QSAR data set formation, and allows for exporting in QSAR-ML as well as traditional CSV formats. The implementation facilitates addition of new descriptor implementations, from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services.CONCLUSIONS: Standardized QSAR data sets opens up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible dataset formation, solving the problems of defining which software components were used, their versions, and the case of multiple names for the same descriptor. This makes is easy to join, extend, combine data sets and also to work collectively. The presented Bioclipse plugins equip scientists with intuitive tools that make QSAR-ML widely available for the community.
  •  
5.
  •  
6.
  • Willighagen, Egon L, et al. (författare)
  • The ChEMBL database as linked open data
  • 2013
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 5:1, s. 23-
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes thedata machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis. Results: This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable,linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying. Conclusions: We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, theChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support.
  •  
7.
  •  
8.
  • Öberg, Tomas, et al. (författare)
  • Updating existing QSAR models: selection and weighting of new data
  • 2010
  • Ingår i: Journal of Cheminformatics. - 1758-2946. ; 2:Suppl 1, s. P19-
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Computational chemistry and quantitative structure-activity relationships (QSAR) are foreseen to be extensively used in the implementation of the new REACH regulation for chemicals in Europe. However, for some compound groups the data are too few in number to permit both calibration and testing of a new model. Usage and previously developed or updated models are then viable alternatives.Perfluorocarboxylic acids (PFCAs) and fluoroteleomer alcohols (FTOHs) are two groups of environmentally relevant compounds, with unique physical and chemical properties. The subcooled liquid vapour pressure (pL) is one such property, where experimental determinations are limited and far from consistent [1]. Updating is, however, challenging when the new compounds are far outside of the original calibration domain space. But by carefully selecting and weighting only three new compounds, we have been able to update a previously developed general QSAR model [2], to cover the new domain while maintaining predictive performance for the earlier calibration and test data. The optimal weighting scheme was determined from the sample leverages and residuals in the calibration phase [3].The performance of this re-calibrated model greatly surpassed previous modelling attempts [4], when applied to an external test set of two PFCAs and four FTOHs with pL in the range 0.2-200 Pa; with Q2Ext = 0.994 and RMSEP = 0.190 units of log Pa. The domain coverage also increased from 1% to 51%, for 426 perfluoroalkylated compounds selected from the REACH registration list, the PhysProp database, and the OECD 2006 survey [5]. Selection and weighting of new calibration data can thus facilitate the extension and use of existing QSAR models. This investigation was supported by the EU FP7 project CADASTER (grant agreement no. 212668).
  •  
9.
  • Napolitano, F, et al. (författare)
  • Drug repositioning: a machine-learning approach through data integration
  • 2013
  • Ingår i: Journal of cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 5:1, s. 30-
  • Tidskriftsartikel (refereegranskat)abstract
    • Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses.
  •  
10.
  • Aghdam, Rosa, et al. (författare)
  • Using informative features in machine learning based method for COVID-19 drug repurposing
  • 2021
  • Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Coronavirus disease 2019 (COVID-19) is caused by a novel virus named Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). This virus induced a large number of deaths and millions of confirmed cases worldwide, creating a serious danger to public health. However, there are no specific therapies or drugs available for COVID-19 treatment. While new drug discovery is a long process, repurposing available drugs for COVID-19 can help recognize treatments with known clinical profiles. Computational drug repurposing methods can reduce the cost, time, and risk of drug toxicity. In this work, we build a graph as a COVID-19 related biological network. This network is related to virus targets or their associated biological processes. We select essential proteins in the constructed biological network that lead to a major disruption in the network. Our method from these essential proteins chooses 93 proteins related to COVID-19 pathology. Then, we propose multiple informative features based on drug-target and protein-protein interaction information. Through these informative features, we find five appropriate clusters of drugs that contain some candidates as potential COVID-19 treatments. To evaluate our results, we provide statistical and clinical evidence for our candidate drugs. From our proposed candidate drugs, 80% of them were studied in other studies and clinical trials.
  •  
11.
  • Ahmed, Laeeq, et al. (författare)
  • Efficient iterative virtual screening with Apache Spark and conformal prediction
  • 2018
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 10
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
  •  
12.
  • Ahmed, Laeeq, et al. (författare)
  • Predicting target profiles with confidence as a service using docking scores
  • 2020
  • Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 12:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Identifying and assessing ligand-target binding is a core component in early drug discovery as one or more unwanted interactions may be associated with safety issues. Contributions: We present an open-source, extendable web service for predicting target profiles with confidence using machine learning for a panel of 7 targets, where models are trained on molecular docking scores from a large virtual library. The method uses conformal prediction to produce valid measures of prediction efficiency for a particular confidence level. The service also offers the possibility to dock chemical structures to the panel of targets with QuickVina on individual compound basis. Results: The docking procedure and resulting models were validated by docking well-known inhibitors for each of the 7 targets using QuickVina. The model predictions showed comparable performance to molecular docking scores against an external validation set. The implementation as publicly available microservices on Kubernetes ensures resilience, scalability, and extensibility.
  •  
13.
  • Alvarsson, Jonathan, et al. (författare)
  • Large-scale ligand-based predictive modelling using support vector machines
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.
  •  
14.
  • Arvidsson McShane, Staffan, et al. (författare)
  • CPSign : conformal prediction for cheminformatics modeling
  • 2024
  • Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 16:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.
  •  
15.
  • Bahl, A, et al. (författare)
  • PROTEOMAS: a workflow enabling harmonized proteomic meta-analysis and proteomic signature mapping
  • 2023
  • Ingår i: Journal of cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 15:1, s. 34-
  • Tidskriftsartikel (refereegranskat)abstract
    • Toxicological evaluation of substances in regulation still often relies on animal experiments. Understanding the substances’ mode-of-action is crucial to develop alternative test strategies. Omics methods are promising tools to achieve this goal. Until now, most attention was focused on transcriptomics, while proteomics is not yet routinely applied in toxicology despite the large number of datasets available in public repositories. Exploiting the full potential of these datasets is hampered by differences in measurement procedures and follow-up data processing. Here we present the tool PROTEOMAS, which allows meta-analysis of proteomic data from public origin. The workflow was designed for analyzing proteomic studies in a harmonized way and to ensure transparency in the analysis of proteomic data for regulatory purposes. It agrees with the Omics Reporting Framework guidelines of the OECD with the intention to integrate proteomics to other omic methods in regulatory toxicology. The overarching aim is to contribute to the development of AOPs and to understand the mode of action of substances. To demonstrate the robustness and reliability of our workflow we compared our results to those of the original studies. As a case study, we performed a meta-analysis of 25 proteomic datasets to investigate the toxicological effects of nanomaterials at the lung level. PROTEOMAS is an important contribution to the development of alternative test strategies enabling robust meta-analysis of proteomic data. This workflow commits to the FAIR principles (Findable, Accessible, Interoperable and Reusable) of computational protocols.
  •  
16.
  • Bahl, A, et al. (författare)
  • PROTEOMAS: a workflow enabling harmonized proteomic meta-analysis and proteomic signature mapping
  • 2023
  • Ingår i: Journal of cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 15:1, s. 34-
  • Tidskriftsartikel (refereegranskat)abstract
    • Toxicological evaluation of substances in regulation still often relies on animal experiments. Understanding the substances’ mode-of-action is crucial to develop alternative test strategies. Omics methods are promising tools to achieve this goal. Until now, most attention was focused on transcriptomics, while proteomics is not yet routinely applied in toxicology despite the large number of datasets available in public repositories. Exploiting the full potential of these datasets is hampered by differences in measurement procedures and follow-up data processing. Here we present the tool PROTEOMAS, which allows meta-analysis of proteomic data from public origin. The workflow was designed for analyzing proteomic studies in a harmonized way and to ensure transparency in the analysis of proteomic data for regulatory purposes. It agrees with the Omics Reporting Framework guidelines of the OECD with the intention to integrate proteomics to other omic methods in regulatory toxicology. The overarching aim is to contribute to the development of AOPs and to understand the mode of action of substances. To demonstrate the robustness and reliability of our workflow we compared our results to those of the original studies. As a case study, we performed a meta-analysis of 25 proteomic datasets to investigate the toxicological effects of nanomaterials at the lung level. PROTEOMAS is an important contribution to the development of alternative test strategies enabling robust meta-analysis of proteomic data. This workflow commits to the FAIR principles (Findable, Accessible, Interoperable and Reusable) of computational protocols.
  •  
17.
  • Bajorath, J., et al. (författare)
  • Chemoinformatics and artificial intelligence colloquium: progress and challenges in developing bioactive compounds
  • 2022
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • We report the main conclusions of the first Chemoinformatics and Artificial Intelligence Colloquium, Mexico City, June 15–17, 2022. Fifteen lectures were presented during a virtual public event with speakers from industry, academia, and non-for-profit organizations. Twelve hundred and ninety students and academics from more than 60 countries. During the meeting, applications, challenges, and opportunities in drug discovery, de novo drug design, ADME-Tox (absorption, distribution, metabolism, excretion and toxicity) property predictions, organic chemistry, peptides, and antibiotic resistance were discussed. The program along with the recordings of all sessions are freely available at https://www.difacquim.com/english/events/2022-colloquium/.
  •  
18.
  • Bálint, Mónika, et al. (författare)
  • Systematic exploration of multiple drug binding sites
  • 2017
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 9:65
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Targets with multiple (prerequisite or allosteric) binding sites have an increasing importance in drug design. Experimental determination of atomic resolution structures of ligands weakly bound to multiple binding sites is often challenging. Blind docking has been widely used for fast mapping of the entire target surface for multiple binding sites. Reliability of blind docking is limited by approximations of hydration models, simplified handling of molecular flexibility, and imperfect search algorithms.Results: To overcome such limitations, the present study introduces Wrap 'n' Shake (WnS), an atomic resolution method that systematically "wraps" the entire target into a monolayer of ligand molecules. Functional binding sites are extracted by a rapid molecular dynamics shaker. WnS is tested on biologically important systems such as mitogenactivated protein, tyrosine-protein kinases, key players of cellular signaling, and farnesyl pyrophosphate synthase, a target of antitumor agents.
  •  
19.
  • Bandini, Elena, et al. (författare)
  • Physicochemical modelling of the retention mechanism of temperature-responsive polymeric columns for HPLC through machine learning algorithms
  • 2024
  • Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 16:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Temperature-responsive liquid chromatography (TRLC) offers a promising alternative to reversed-phase liquid chromatography (RPLC) for environmentally friendly analytical techniques by utilizing pure water as a mobile phase, eliminating the need for harmful organic solvents. TRLC columns, packed with temperature-responsive polymers coupled to silica particles, exhibit a unique retention mechanism influenced by temperature-induced polymer hydration. An investigation of the physicochemical parameters driving separation at high and low temperatures is crucial for better column manufacturing and selectivity control. Assessment of predictability using a dataset of 139 molecules analyzed at different temperatures elucidated the molecular descriptors (MDs) relevant to retention mechanisms. Linear regression, support vector regression (SVR), and tree-based ensemble models were evaluated, with no standout performer. The precision, accuracy, and robustness of models were validated through metrics, such as r and mean absolute error (MAE), and statistical analysis. At , logP predominantly influenced retention, akin to reversed-phase columns, while at , complex interactions with lipophilic and negative MDs, along with specific functional groups, dictated retention. These findings provide deeper insights into TRLC mechanisms, facilitating method development and maximizing column potential.
  •  
20.
  • Bohman, Björn (författare)
  • Determining the parent and associated fragment formulae in mass spectrometry via the parent subformula graph
  • 2023
  • Ingår i: Journal of Cheminformatics. - 1758-2946. ; 15
  • Tidskriftsartikel (refereegranskat)abstract
    • BackgroundIdentifying the molecular formula and fragmentation reactions of an unknown compound from its mass spectrum is crucial in areas such as natural product chemistry and metabolomics. We propose a method for identifying the correct candidate formula of an unidentified natural product from its mass spectrum. The method involves scoring the plausibility of parent candidate formulae based on a parent subformula graph (PSG), and two possible metrics relating to the number of edges in the PSG. This method is applicable to both electron-impact mass spectrometry (EI-MS) and tandem mass spectrometry (MS/MS) data. Additionally, this work introduces the two-dimensional fragmentation plot (2DFP) for visualizing PSGs.ResultsOur results suggest that incorporating information regarding the edges of the PSG results in enhanced performance in correctly identifying parent formulae, in comparison to the more well-accepted "MS/MS score", on the 2016 Computational Assessment of Small Molecule Identification (CASMI 2016) data set (76.3 vs 58.9% correct formula identification) and the Research Centre for Toxic Compounds in the Environment (RECETOX) data set (66.2% vs 59.4% correct formula identification). In the extension of our method to identify the correct candidate formula from complex EI-MS data of semiochemicals, our method again performed better (correct formula appearing in the top 4 candidates in 20/23 vs 7/23 cases) than the MS/MS score, and enables the rapid identification of both the correct parent ion mass and the correct parent formula with minimal expert intervention.ConclusionOur method reliably identifies the correct parent formula even when the mass information is ambiguous. Furthermore, should parent formula identification be successful, the majority of associated fragment formulae can also be correctly identified. Our method can also identify the parent ion and its associated fragments in EI-MS spectra where the identity of the parent ion is unclear due to low quantities and overlapping compounds. Finally, our method does not inherently require empirical fitting of parameters or statistical learning, meaning it is easy to implement and extend upon.Scientific contributionDeveloped, implemented and tested new metrics for assessing plausibility of candidate molecular formulae obtained from HR-MS data.
  •  
21.
  • Capuccini, Marco, et al. (författare)
  • Large-scale virtual screening on public cloud resources with Apache Spark
  • 2017
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
  •  
22.
  • Guo, Jeff, et al. (författare)
  • DockStream: a docking wrapper to enhance de novo molecular design
  • 2021
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream.
  •  
23.
  • He, Jiazhen, et al. (författare)
  • Molecular optimization by capturing chemist’s intuition using deep neural networks
  • 2021
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist’s intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.
  •  
24.
  • He, Jiazhen, et al. (författare)
  • Transformer-based molecular optimization beyond matched molecular pairs
  • 2022
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.
  •  
25.
  • Jespers, Willem, et al. (författare)
  • QligFEP : an automated workflow for small molecule free energy calculations in Q
  • 2019
  • Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • The process of ligand binding to a biological target can be represented as the equilibrium between the relevant solvated and bound states of the ligand. This which is the basis of structure-based, rigorous methods such as the estimation of relative binding affinities by free energy perturbation (FEP). Despite the growing capacity of computing power and the development of more accurate force fields, a high throughput application of FEP is currently hampered due to the need, in the current schemes, of an expert user definition of the alchemical transformations between molecules in the series explored. Here, we present QligFEP, a solution to this problem using an automated workflow for FEP calculations based on a dual topology approach. In this scheme, the starting poses of each of the two ligands, for which the relative affinity is to be calculated, are explicitly present in the MD simulations associated with the (dual topology) FEP transformation, making the perturbation pathway between the two ligands univocal. We show that this generalized method can be applied to accurately estimate solvation free energies for amino acid sidechain mimics, as well as the binding affinity shifts due to the chemical changes typical of lead optimization processes. This is illustrated in a number of protein systems extracted from other FEP studies in the literature: inhibitors of CDK2 kinase and a series of A(2A) adenosine G protein-coupled receptor antagonists, where the results obtained with QligFEP are in excellent agreement with experimental data. In addition, our protocol allows for scaffold hopping perturbations to identify the binding affinities between different core scaffolds, which we illustrate with a series of Chk1 kinase inhibitors. QligFEP is implemented in the open-source MD package Q, and works with the most common family of force fields: OPLS, CHARMM and AMBER.
  •  
26.
  • Joeres, R., et al. (författare)
  • GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
  • 2023
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 15:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at https://github. com/kalininalab/GlyLES.
  •  
27.
  • Johansson, Marcus, et al. (författare)
  • Automatic procedure for generating symmetry adapted wavefunctions
  • 2017
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 9:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Automatic detection of point groups as well as symmetrisation of molecular geometry and wavefunctions are useful tools in computational quantum chemistry. Algorithms for developing these tools as well as an implementation are presented. The symmetry detection algorithm is a clustering algorithm for symmetry invariant properties, combined with logical deduction of possible symmetry elements using the geometry of sets of symmetrically equivalent atoms. An algorithm for determining the symmetry adapted linear combinations (SALCs) of atomic orbitals is also presented. The SALCs are constructed with the use of projection operators for the irreducible representations, as well as subgroups for determining splitting fields for a canonical basis. The character tables for the point groups are auto generated, and the algorithm is described. Symmetrisation of molecules use a projection into the totally symmetric space, whereas for wavefunctions projection as well and partner function determination and averaging is used. The software has been released as a stand-alone, open source library under the MIT license and integrated into both computational and molecular modelling software. Graphical abstract.
  •  
28.
  • Kensert, Alexander, et al. (författare)
  • Evaluating parameters for ligand-based modeling with random forest on sparse data sets
  • 2018
  • Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 10
  • Tidskriftsartikel (refereegranskat)abstract
    • Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints (p <= 0.05), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.
  •  
29.
  • Kovačević, Goran, et al. (författare)
  • Luscus: molecular viewer and editor for MOLCAS.
  • 2015
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 7, s. 16-16
  • Tidskriftsartikel (refereegranskat)abstract
    • The novel program for graphical display and editing of molecular systems, luscus, is described. The program allows fast and easy building and/or editing different molecular structures, up to several thousands of atoms large. Luscus is able to visualise dipole moments, normal modes, molecular orbitals, electron densities and electrostatic potentials. In addition, simple geometrical objects can be rendered in order to reveal a geometrical feature or a physical quantity. The program is developed as a graphical interface for the MOLCAS program package, however its adaptive nature makes possible to use luscus with other computational program packages and chemical formats. All data files are opened via simple plug-ins which makes easy to implement a new file format in luscus. The easiness of editing molecular geometries makes luscus suitable for teaching students chemical concepts and molecular modelling. Graphical AbstractScreenshot of luscus program showing molecular orbital.
  •  
30.
  • Lampa, Samuel, et al. (författare)
  • Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.
  •  
31.
  • Lapins, Maris, et al. (författare)
  • A confidence predictor for logD using conformal regression and a support-vector machine
  • 2018
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 10:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.
  •  
32.
  • Lukashina, Nina, et al. (författare)
  • SimVec : predicting polypharmacy side effects for new drugs
  • 2022
  • Ingår i: Journal of Cheminformatics. - : Springer Nature. - 1758-2946. ; 14
  • Tidskriftsartikel (refereegranskat)abstract
    • Polypharmacy refers to the administration of multiple drugs on a daily basis. It has demonstrated effectiveness in treating many complex diseases , but it has a higher risk of adverse drug reactions. Hence, the prediction of polypharmacy side effects is an essential step in drug testing, especially for new drugs. This paper shows that the current knowledge graph (KG) based state-of-the-art approach to polypharmacy side effect prediction does not work well for new drugs, as they have a low number of known connections in the KG. We propose a new method , SimVec, that solves this problem by enhancing the KG structure with a structure-aware node initialization and weighted drug similarity edges. We also devise a new 3-step learning process, which iteratively updates node embeddings related to side effects edges, similarity edges, and drugs with limited knowledge. Our model significantly outperforms existing KG-based models. Additionally, we examine the problem of negative relations generation and show that the cache-based approach works best for polypharmacy tasks.
  •  
33.
  • Mervin, Lewis H., et al. (författare)
  • Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty
  • 2021
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
  •  
34.
  • Morger, Andrea, et al. (författare)
  • Assessing the calibration in toxicological in vitro models with conformal prediction
  • 2021
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data's descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy-exchanging the calibration data only-is convenient as it does not require retraining of the underlying model.
  •  
35.
  • Norinder, Ulf, 1956-, et al. (författare)
  • Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning
  • 2021
  • Ingår i: Journal of Cheminformatics. - : Chemistry Central. - 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox.
  •  
36.
  • Schaduangrat, Nalini, et al. (författare)
  • Towards reproducible computational drug discovery
  • 2020
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 12:1
  • Forskningsöversikt (refereegranskat)abstract
    • The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.
  •  
37.
  • Seal, Srijit, et al. (författare)
  • Merging bioactivity predictions from cell morphology and chemical fingerprint models using similarity to training data
  • 2023
  • Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 15:1
  • Tidskriftsartikel (refereegranskat)abstract
    • The applicability domain of machine learning models trained on structural fingerprints for the prediction of biological endpoints is often limited by the lack of diversity of chemical space of the training data. In this work, we developed similarity-based merger models which combined the outputs of individual models trained on cell morphology (based on Cell Painting) and chemical structure (based on chemical fingerprints) and the structural and morphological similarities of the compounds in the test dataset to compounds in the training dataset. We applied these similarity-based merger models using logistic regression models on the predictions and similarities as features and predicted assay hit calls of 177 assays from ChEMBL, PubChem and the Broad Institute (where the required Cell Painting annotations were available). We found that the similarity-based merger models outperformed other models with an additional 20% assays (79 out of 177 assays) with an AUC > 0.70 compared with 65 out of 177 assays using structural models and 50 out of 177 assays using Cell Painting models. Our results demonstrated that similarity-based merger models combining structure and cell morphology models can more accurately predict a wide range of biological assay outcomes and further expanded the applicability domain by better extrapolating to new structural and morphology spaces.
  •  
38.
  • Shevtsov, Oleksii, 1988, et al. (författare)
  • A de novo molecular generation method using latent vector based generative adversarial network
  • 2019
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 11:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: One to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.[Figure not available: See fulltext.]
  •  
39.
  • Simeon, Saw, et al. (författare)
  • osFP : a web server for predicting the oligomeric states of fluorescent proteins
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from the amino acid sequence. Results: After data curation, an exhaustive data set consisting of 397 non-redundant FP oligomeric states was compiled from the literature. Results from benchmarking of the protein descriptors revealed that the model built with amino acid composition descriptors was the top performing model with accuracy, sensitivity and specificity in excess of 80% and MCC greater than 0.6 for all three data subsets (e.g. training, tenfold cross-validation and external sets). The model provided insights on the important residues governing the oligomerization of FP. To maximize the benefit of the generated predictive model, it was implemented as a web server under the R programming environment. Conclusion: osFP affords a user-friendly interface that can be used to predict the oligomeric state of FP using the protein sequence. The advantage of osFP is that it is platform-independent meaning that it can be accessed via a web browser on any operating system and device. osFP is freely accessible at http://codes.bio/osfp/ while the source code and data set is provided on GitHub at https://github.com/chaninn/osFP/.
  •  
40.
  •  
41.
  • Spjuth, Ola, 1977-, et al. (författare)
  • XMetDB : an open access database for xenobiotic metabolism
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Xenobiotic metabolism is an active research topic but the limited amount of openly available high-quality biotransformation data constrains predictive modeling. Current database often default to commonly available information: which enzyme metabolizes a compound, but neither experimental conditions nor the atoms that undergo metabolization are captured. We present XMetDB, an open access database for drugs and other xenobiotics and their respective metabolites. The database contains chemical structures of xenobiotic biotransformations with substrate atoms annotated as reaction centra, the resulting product formed, and the catalyzing enzyme, type of experiment, and literature references. Associated with the database is a web interface for the submission and retrieval of experimental metabolite data for drugs and other xenobiotics in various formats, and a web API for programmatic access is also available. The database is open for data deposition, and a curation scheme is in place for quality control. An extensive guide on how to enter experimental data into is available from the XMetDB wiki. XMetDB formalizes how biotransformation data should be reported, and the openly available systematically labeled data is a big step forward towards better models for predictive metabolism.
  •  
42.
  • Sreenivasan, Akshai P., et al. (författare)
  • Predicting protein network topology clusters from chemical structure using deep learning
  • 2022
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.
  •  
43.
  • Sundin, Iiris, et al. (författare)
  • Human-in-the-loop assisted de novo molecular design
  • 2022
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer’s implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user’s feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user’s idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system. Graphical Abstract: [Figure not available: see fulltext.].
  •  
44.
  • Svensson, Fredrik, et al. (författare)
  • Maximizing gain in high-throughput screening using conformal prediction
  • 2018
  • Ingår i: Journal of Cheminformatics. - London : Chemistry Central. - 1758-2946. ; 10:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8-10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.
  •  
45.
  • van Rijn, J, et al. (författare)
  • European Registry of Materials: global, unique identifiers for (undisclosed) nanomaterials
  • 2022
  • Ingår i: Journal of cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 14:1, s. 57-
  • Tidskriftsartikel (refereegranskat)abstract
    • Management of nanomaterials and nanosafety data needs to operate under the FAIR (findability, accessibility, interoperability, and reusability) principles and this requires a unique, global identifier for each nanomaterial. Existing identifiers may not always be applicable or sufficient to definitively identify the specific nanomaterial used in a particular study, resulting in the use of textual descriptions in research project communications and reporting. To ensure that internal project documentation can later be linked to publicly released data and knowledge for the specific nanomaterials, or even to specific batches and variants of nanomaterials utilised in that project, a new identifier is proposed: the European Registry of Materials Identifier. We here describe the background to this new identifier, including FAIR interoperability as defined by FAIRSharing, identifiers.org, Bioregistry, and the CHEMINF ontology, and show how it complements other identifiers such as CAS numbers and the ongoing efforts to extend the InChI identifier to cover nanomaterials. We provide examples of its use in various H2020-funded nanosafety projects.Graphical Abstract
  •  
46.
  • Willighagen, Egon L., et al. (författare)
  • The Chemistry Development Kit (CDK) v2.0 : atom typing, depiction, molecular formulas, and substructure searching
  • 2017
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms.Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism.Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software.
  •  
47.
  • Zahoranszky-Kohalmi, G., et al. (författare)
  • SmartGraph: a network pharmacology investigation platform
  • 2020
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 12:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation Drug discovery investigations need to incorporate network pharmacology concepts while navigating the complex landscape of drug-target and target-target interactions. This task requires solutions that integrate high-quality biomedical data, combined with analytic and predictive workflows as well as efficient visualization. SmartGraph is an innovative platform that utilizes state-of-the-art technologies such as a Neo4j graph-database, Angular web framework, RxJS asynchronous event library and D3 visualization to accomplish these goals. Results The SmartGraph framework integrates high quality bioactivity data and biological pathway information resulting in a knowledgebase comprised of 420,526 unique compound-target interactions defined between 271,098 unique compounds and 2018 targets. SmartGraph then performs bioactivity predictions based on the 63,783 Bemis-Murcko scaffolds extracted from these compounds. Through several use-cases, we illustrate the use of SmartGraph to generate hypotheses for elucidating mechanism-of-action, drug-repurposing and off-target prediction. Availability: https://smartgraph.ncats.io/. .
  •  
48.
  • Zhang, Yumeng, et al. (författare)
  • Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification
  • 2023
  • Ingår i: Journal of Cheminformatics. - : BioMed Central (BMC). - 1758-2946. ; 15
  • Tidskriftsartikel (refereegranskat)abstract
    • Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n(2)) to O(n). It also results in a better prediction performance consistently on the three physicochemical datasets, using a multilayer perceptron with the circular fingerprint as a proof of concept. We further include into a Siamese neural network the transformer-based Chemformer, which extracts task-specific features from the simplified molecular-input line-entry system representation of compounds. Additionally, we propose a means to measure the prediction uncertainty by utilizing the variance in predictions from a set of reference compounds. Our results demonstrate that the high prediction accuracy correlates with the high confidence. Finally, we investigate implications of the similarity property principle in machine learning.
  •  
49.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-49 av 49
Typ av publikation
tidskriftsartikel (48)
forskningsöversikt (1)
Typ av innehåll
refereegranskat (47)
övrigt vetenskapligt/konstnärligt (2)
Författare/redaktör
Spjuth, Ola, 1977- (7)
Spjuth, Ola (5)
Norinder, Ulf, 1956- (5)
Spjuth, Ola, Profess ... (5)
Schaal, Wesley, PhD (4)
Alvarsson, Jonathan, ... (4)
visa fler...
Berg, Arvid (4)
Spjuth, Ola, Docent, ... (4)
Alvarsson, Jonathan (4)
Lampa, Samuel (4)
Engkvist, Ola, 1967 (4)
Willighagen, Egon (4)
Bjerrum, Esben Janni ... (4)
Nittinger, Eva (4)
Laure, Erwin (3)
Ahmed, Laeeq (3)
Svensson, Fredrik (3)
Tyrchan, Christian (3)
Bender, Andreas (3)
Jeliazkova, Nina (3)
Guha, Rajarshi (3)
Patronov, Atanas (3)
He, Jiazhen (3)
Willighagen, Egon L. (3)
aut (2)
Veryazov, Valera (2)
Nymark, P (2)
Carlsson, Lars (2)
Engkvist, Ola (2)
Capuccini, Marco (2)
Schaal, Wesley (2)
Wikberg, Jarl E. S. (2)
Arvidsson McShane, S ... (2)
Voronov, Alexey (2)
Steinbeck, Christoph (2)
Oprea, Tudor I (2)
Bahl, A (2)
Ibrahim, C (2)
Plate, K (2)
Haase, A (2)
Dengjel, J (2)
Dumit, VI (2)
Adams, Samuel (2)
Czechtizky, Werngard (2)
Papadopoulos, Kostas (2)
Nantasenamat, Chanin (2)
Willighagen, EL (2)
Simeon, Saw (2)
Evelo, Chris T. (2)
Trapotsi, Maria Anna (2)
visa färre...
Lärosäte
Uppsala universitet (26)
Karolinska Institutet (9)
Kungliga Tekniska Högskolan (6)
Örebro universitet (5)
Chalmers tekniska högskola (5)
Göteborgs universitet (3)
visa fler...
Stockholms universitet (3)
Lunds universitet (2)
Umeå universitet (1)
Jönköping University (1)
Linnéuniversitetet (1)
Sveriges Lantbruksuniversitet (1)
visa färre...
Språk
Engelska (49)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (37)
Medicin och hälsovetenskap (8)
Teknik (4)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy