SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1758 2946 srt2:(2015-2019)"

Sökning: L773:1758 2946 > (2015-2019)

  • Resultat 1-16 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahmed, Laeeq, et al. (författare)
  • Efficient iterative virtual screening with Apache Spark and conformal prediction
  • 2018
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 10
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
  •  
2.
  • Alvarsson, Jonathan, et al. (författare)
  • Large-scale ligand-based predictive modelling using support vector machines
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.
  •  
3.
  • Bálint, Mónika, et al. (författare)
  • Systematic exploration of multiple drug binding sites
  • 2017
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 9:65
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Targets with multiple (prerequisite or allosteric) binding sites have an increasing importance in drug design. Experimental determination of atomic resolution structures of ligands weakly bound to multiple binding sites is often challenging. Blind docking has been widely used for fast mapping of the entire target surface for multiple binding sites. Reliability of blind docking is limited by approximations of hydration models, simplified handling of molecular flexibility, and imperfect search algorithms.Results: To overcome such limitations, the present study introduces Wrap 'n' Shake (WnS), an atomic resolution method that systematically "wraps" the entire target into a monolayer of ligand molecules. Functional binding sites are extracted by a rapid molecular dynamics shaker. WnS is tested on biologically important systems such as mitogenactivated protein, tyrosine-protein kinases, key players of cellular signaling, and farnesyl pyrophosphate synthase, a target of antitumor agents.
  •  
4.
  • Capuccini, Marco, et al. (författare)
  • Large-scale virtual screening on public cloud resources with Apache Spark
  • 2017
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
  •  
5.
  • Jespers, Willem, et al. (författare)
  • QligFEP : an automated workflow for small molecule free energy calculations in Q
  • 2019
  • Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • The process of ligand binding to a biological target can be represented as the equilibrium between the relevant solvated and bound states of the ligand. This which is the basis of structure-based, rigorous methods such as the estimation of relative binding affinities by free energy perturbation (FEP). Despite the growing capacity of computing power and the development of more accurate force fields, a high throughput application of FEP is currently hampered due to the need, in the current schemes, of an expert user definition of the alchemical transformations between molecules in the series explored. Here, we present QligFEP, a solution to this problem using an automated workflow for FEP calculations based on a dual topology approach. In this scheme, the starting poses of each of the two ligands, for which the relative affinity is to be calculated, are explicitly present in the MD simulations associated with the (dual topology) FEP transformation, making the perturbation pathway between the two ligands univocal. We show that this generalized method can be applied to accurately estimate solvation free energies for amino acid sidechain mimics, as well as the binding affinity shifts due to the chemical changes typical of lead optimization processes. This is illustrated in a number of protein systems extracted from other FEP studies in the literature: inhibitors of CDK2 kinase and a series of A(2A) adenosine G protein-coupled receptor antagonists, where the results obtained with QligFEP are in excellent agreement with experimental data. In addition, our protocol allows for scaffold hopping perturbations to identify the binding affinities between different core scaffolds, which we illustrate with a series of Chk1 kinase inhibitors. QligFEP is implemented in the open-source MD package Q, and works with the most common family of force fields: OPLS, CHARMM and AMBER.
  •  
6.
  • Johansson, Marcus, et al. (författare)
  • Automatic procedure for generating symmetry adapted wavefunctions
  • 2017
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 9:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Automatic detection of point groups as well as symmetrisation of molecular geometry and wavefunctions are useful tools in computational quantum chemistry. Algorithms for developing these tools as well as an implementation are presented. The symmetry detection algorithm is a clustering algorithm for symmetry invariant properties, combined with logical deduction of possible symmetry elements using the geometry of sets of symmetrically equivalent atoms. An algorithm for determining the symmetry adapted linear combinations (SALCs) of atomic orbitals is also presented. The SALCs are constructed with the use of projection operators for the irreducible representations, as well as subgroups for determining splitting fields for a canonical basis. The character tables for the point groups are auto generated, and the algorithm is described. Symmetrisation of molecules use a projection into the totally symmetric space, whereas for wavefunctions projection as well and partner function determination and averaging is used. The software has been released as a stand-alone, open source library under the MIT license and integrated into both computational and molecular modelling software. Graphical abstract.
  •  
7.
  • Kensert, Alexander, et al. (författare)
  • Evaluating parameters for ligand-based modeling with random forest on sparse data sets
  • 2018
  • Ingår i: Journal of Cheminformatics. - : BMC. - 1758-2946. ; 10
  • Tidskriftsartikel (refereegranskat)abstract
    • Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints (p <= 0.05), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.
  •  
8.
  • Kovačević, Goran, et al. (författare)
  • Luscus: molecular viewer and editor for MOLCAS.
  • 2015
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 7, s. 16-16
  • Tidskriftsartikel (refereegranskat)abstract
    • The novel program for graphical display and editing of molecular systems, luscus, is described. The program allows fast and easy building and/or editing different molecular structures, up to several thousands of atoms large. Luscus is able to visualise dipole moments, normal modes, molecular orbitals, electron densities and electrostatic potentials. In addition, simple geometrical objects can be rendered in order to reveal a geometrical feature or a physical quantity. The program is developed as a graphical interface for the MOLCAS program package, however its adaptive nature makes possible to use luscus with other computational program packages and chemical formats. All data files are opened via simple plug-ins which makes easy to implement a new file format in luscus. The easiness of editing molecular geometries makes luscus suitable for teaching students chemical concepts and molecular modelling. Graphical AbstractScreenshot of luscus program showing molecular orbital.
  •  
9.
  • Lampa, Samuel, et al. (författare)
  • Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.
  •  
10.
  • Lapins, Maris, et al. (författare)
  • A confidence predictor for logD using conformal regression and a support-vector machine
  • 2018
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 10:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.
  •  
11.
  • Shevtsov, Oleksii, 1988, et al. (författare)
  • A de novo molecular generation method using latent vector based generative adversarial network
  • 2019
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 11:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: One to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.[Figure not available: See fulltext.]
  •  
12.
  • Simeon, Saw, et al. (författare)
  • osFP : a web server for predicting the oligomeric states of fluorescent proteins
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from the amino acid sequence. Results: After data curation, an exhaustive data set consisting of 397 non-redundant FP oligomeric states was compiled from the literature. Results from benchmarking of the protein descriptors revealed that the model built with amino acid composition descriptors was the top performing model with accuracy, sensitivity and specificity in excess of 80% and MCC greater than 0.6 for all three data subsets (e.g. training, tenfold cross-validation and external sets). The model provided insights on the important residues governing the oligomerization of FP. To maximize the benefit of the generated predictive model, it was implemented as a web server under the R programming environment. Conclusion: osFP affords a user-friendly interface that can be used to predict the oligomeric state of FP using the protein sequence. The advantage of osFP is that it is platform-independent meaning that it can be accessed via a web browser on any operating system and device. osFP is freely accessible at http://codes.bio/osfp/ while the source code and data set is provided on GitHub at https://github.com/chaninn/osFP/.
  •  
13.
  •  
14.
  • Spjuth, Ola, 1977-, et al. (författare)
  • XMetDB : an open access database for xenobiotic metabolism
  • 2016
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Xenobiotic metabolism is an active research topic but the limited amount of openly available high-quality biotransformation data constrains predictive modeling. Current database often default to commonly available information: which enzyme metabolizes a compound, but neither experimental conditions nor the atoms that undergo metabolization are captured. We present XMetDB, an open access database for drugs and other xenobiotics and their respective metabolites. The database contains chemical structures of xenobiotic biotransformations with substrate atoms annotated as reaction centra, the resulting product formed, and the catalyzing enzyme, type of experiment, and literature references. Associated with the database is a web interface for the submission and retrieval of experimental metabolite data for drugs and other xenobiotics in various formats, and a web API for programmatic access is also available. The database is open for data deposition, and a curation scheme is in place for quality control. An extensive guide on how to enter experimental data into is available from the XMetDB wiki. XMetDB formalizes how biotransformation data should be reported, and the openly available systematically labeled data is a big step forward towards better models for predictive metabolism.
  •  
15.
  • Svensson, Fredrik, et al. (författare)
  • Maximizing gain in high-throughput screening using conformal prediction
  • 2018
  • Ingår i: Journal of Cheminformatics. - London : Chemistry Central. - 1758-2946. ; 10:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8-10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.
  •  
16.
  • Willighagen, Egon L., et al. (författare)
  • The Chemistry Development Kit (CDK) v2.0 : atom typing, depiction, molecular formulas, and substructure searching
  • 2017
  • Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms.Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism.Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-16 av 16
Typ av publikation
tidskriftsartikel (16)
Typ av innehåll
refereegranskat (15)
övrigt vetenskapligt/konstnärligt (1)
Författare/redaktör
Spjuth, Ola (4)
Alvarsson, Jonathan, ... (3)
Spjuth, Ola, 1977- (3)
Lampa, Samuel (3)
Veryazov, Valera (2)
Laure, Erwin (2)
visa fler...
Norinder, Ulf, 1956- (2)
Ahmed, Laeeq (2)
Capuccini, Marco (2)
Schaal, Wesley, PhD (2)
Berg, Arvid (2)
Spjuth, Ola, Docent, ... (2)
Schaal, Wesley (2)
Alvarsson, Jonathan (2)
Wikberg, Jarl E. S. (2)
Jeliazkova, Nina (2)
Kovačević, Goran (1)
van Der Spoel, David (1)
Gutierrez-de-Teran, ... (1)
Carlsson, Lars (1)
Engkvist, Ola (1)
Georgiev, Valentin (1)
Toor, Salman (1)
Johansson, Marcus (1)
Andersson, Claes (1)
Åqvist, Johan (1)
Svensson, Fredrik (1)
Arvidsson, Staffan (1)
Prachayasittikul, Vi ... (1)
Steinbeck, Christoph (1)
Jespers, Willem (1)
Horváth, István (1)
Balint, Monika (1)
Hetenyi, Csaba (1)
Jeszenői, Norbert (1)
Esguerra, Mauricio (1)
Bender, Andreas (1)
Guha, Rajarshi (1)
Chen, Hongming (1)
Lapins, Maris (1)
Kuhn, Stefan (1)
Bjerrum, Esben Janni ... (1)
Rydberg, Patrik (1)
Pluskal, Tomáš (1)
Shevtsov, Oleksii, 1 ... (1)
Nantasenamat, Chanin (1)
Johansson, Simon, 19 ... (1)
Kensert, Alexander (1)
Simeon, Saw (1)
Anuwongcharoen, Nutt ... (1)
visa färre...
Lärosäte
Uppsala universitet (12)
Kungliga Tekniska Högskolan (2)
Örebro universitet (2)
Lunds universitet (2)
Stockholms universitet (1)
Chalmers tekniska högskola (1)
visa fler...
Karolinska Institutet (1)
visa färre...
Språk
Engelska (16)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (15)
Teknik (2)
Medicin och hälsovetenskap (2)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy