SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Lampa Samuel) "

Search: WFRF:(Lampa Samuel)

  • Result 1-10 of 18
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Alvarsson, Jonathan, et al. (author)
  • Large-scale ligand-based predictive modelling using support vector machines
  • 2016
  • In: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946. ; 8
  • Journal article (peer-reviewed)abstract
    • The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.
  •  
2.
  • Ameur, Adam, et al. (author)
  • SweGen : a whole-genome data resource of genetic variability in a cross-section of the Swedish population
  • 2017
  • In: European Journal of Human Genetics. - : NATURE PUBLISHING GROUP. - 1018-4813 .- 1476-5438. ; 25:11, s. 1253-1260
  • Journal article (peer-reviewed)abstract
    • Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.
  •  
3.
  • Grüning, Björn A., et al. (author)
  • Software engineering for scientific big data analysis
  • 2019
  • In: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5
  • Research review (peer-reviewed)abstract
    • The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
  •  
4.
  • Kierczak, Marcin, 1981-, et al. (author)
  • A Monte Carlo approach to modeling post-translational modification sites using local physicochemical properties.
  • Other publication (pop. science, debate, etc.)abstract
    • Many proteins undergo various chemical modifications during or shortly after translation. Post-translational modifications (PTM) greatly contribute to the diversity of protein functions and play crucial role in many cellular processes. Therefore understanding where and why certain protein is modified is an important issue in biomedical research. Mechanisms underlying some types of PTMs have been elucidated but many still remain unknown and a number of tools for predicting PTMs from short sequence fragments exists. While usually accurate at predicting modification sites, these tools are not designed to increase the understanding of modification mechanisms. Here we attempted at building easy-to-interpret models of PTMs and at identifying the physicochemical properties significant for determining modification status. To this end we applied our Monte Carlo feature selection and interdependency discovery (MCFS-ID) method. Considering 9 aa-long sequence fragments that were represented in terms of their physicochem- ical properties we analyzed 76 types of PTMs and for each type we identified the properties that played significant (p ≤ 0.05) role in the classification process. For 17 types of modifications no significant prop- erty was found. For the remaining 59 types, we used the significant properties to construct random forest-based high quality predictive models. We also showed an example of how to interpret the models by analyzing interdependency networks of significant properties and how to complement the networks with decision rules inferred using rough set theory. The obtained results showed the necessity of applying feature selection prior to constructing a model that considers short sequence fragments. Interestingly, for some types of modifications we saw that models based on insignificant features can yield accurate results. This observation deserves further investigation. Among the examined PTMs we observed groups that share similar patterns of significant properties. We also showed how to complement our models with decision rules that can guide life scientists in their research and to shed light on the actual molecular mechanisms determining modification status.
  •  
5.
  • Lampa, Samuel, et al. (author)
  • Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data
  • 2013
  • In: GigaScience. - 2047-217X. ; 2:1, s. 1-10
  • Journal article (peer-reviewed)abstract
    • Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.
  •  
6.
  • Lampa, Samuel, et al. (author)
  • Predicting off-target binding profiles with confidence using Conformal Prediction
  • 2018
  • In: Frontiers in Pharmacology. - : Frontiers Media SA. - 1663-9812. ; 9
  • Journal article (peer-reviewed)abstract
    • Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only providing point predictions. We here describe the use of conformal prediction for predicting off-target interactions with models trained on data from 31 targets in the ExCAPE dataset, selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.
  •  
7.
  • Lampa, Samuel, et al. (author)
  • RDFIO : extending Semantic MediaWiki for interoperable biomedical data management
  • 2017
  • In: Journal of Biomedical Semantics. - : Springer Science and Business Media LLC. - 2041-1480. ; 8
  • Journal article (peer-reviewed)abstract
    • BACKGROUND: Biological sciences are characterised not only by an increasing amount but also the extreme complexity of its data. This stresses the need for efficient ways of integrating these data in a coherent description of biological systems. In many cases, biological data needs organization before integration. This is not seldom a collaborative effort, and it is thus important that tools for data integration support a collaborative way of working. Wiki systems with support for structured semantic data authoring, such as Semantic MediaWiki, provide a powerful solution for collaborative editing of data combined with machine-readability, so that data can be handled in an automated fashion in any downstream analyses. Semantic MediaWiki lacks a built-in data import function though, which hinders efficient round-tripping of data between interoperable Semantic Web formats such as RDF and the internal wiki format.RESULTS: To solve this deficiency, the RDFIO suite of tools is presented, which supports importing of RDF data into Semantic MediaWiki, with metadata needed to export it again in the same RDF format, or ontology. Additionally, the new functionality enables mash-ups of automated data imports combined with manually created data presentations. The application of the suite of tools is demonstrated by importing drug discovery related data about rare diseases from Orphanet and acid dissociation constants from Wikidata. The RDFIO suite of tools is freely available for download via pharmb.io/project/rdfio .CONCLUSIONS: Through a set of biomedical demonstrators, it is demonstrated how the new functionality enables a number of usage scenarios where the interoperability of SMW and the wider Semantic Web is leveraged for biomedical data sets, to create an easy to use and flexible platform for exploring and working with biomedical data.
  •  
8.
  • Lampa, Samuel, 1983- (author)
  • Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
  • 2018
  • Doctoral thesis (other academic/artistic)abstract
    • The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high-throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. Based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.
  •  
9.
  • Lampa, Samuel, et al. (author)
  • SciPipe : A workflow library for agile development of complex and dynamic bioinformatics pipelines
  • 2019
  • In: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5
  • Journal article (peer-reviewed)abstract
    • Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 18
Type of publication
journal article (13)
research review (3)
other publication (1)
doctoral thesis (1)
Type of content
peer-reviewed (14)
other academic/artistic (3)
pop. science, debate, etc. (1)
Author/Editor
Lampa, Samuel (16)
Spjuth, Ola, Docent, ... (5)
Spjuth, Ola (4)
Alvarsson, Jonathan, ... (4)
Spjuth, Ola, 1977- (4)
Alvarsson, Jonathan (3)
show more...
Larsson, Anders (2)
Emami Khoonsari, Pay ... (2)
Kultima, Kim (2)
Capuccini, Marco (2)
Schaal, Wesley, PhD (2)
Berg, Arvid (2)
Gyllensten, Ulf B. (1)
Hellander, Andreas (1)
Lundeberg, Joakim (1)
Nilsson, Daniel (1)
Eklund, Martin (1)
Łabaj, Paweł P. (1)
Ahlberg, Ernst (1)
Johansson, Åsa (1)
Hankemeier, Thomas (1)
Magnusson, Patrik K ... (1)
Arvidsson Mc Shane, ... (1)
Lundin, Sverker (1)
Kähäri, Andreas (1)
Vezzi, Francesco (1)
Andersson, Gunnar (1)
Dahlberg, Johan (1)
Karlsson, Robert (1)
Andersson, Claes (1)
Ólason, Páll I. (1)
Nystedt, Björn, 1978 ... (1)
Liljedahl, Ulrika (1)
Syvänen, Ann-Christi ... (1)
Bongcam Rudloff, Eri ... (1)
Schaal, Wesley (1)
Neumann, Steffen (1)
Wikberg, Jarl E. S. (1)
Wikberg, Jarl (1)
Spjuth, Ola, Profess ... (1)
O'Donovan, Claire (1)
Ameur, Adam (1)
Che, Huiwen (1)
Martin, Marcel (1)
Olason, Pall (1)
Feuk, Lars (1)
Komorowski, Jan (1)
Viklund, Johan, 1982 ... (1)
Lundin, Pär (1)
Thutkawkorapin, Jess ... (1)
show less...
University
Uppsala University (18)
Stockholm University (3)
Karolinska Institutet (3)
Royal Institute of Technology (1)
Swedish University of Agricultural Sciences (1)
Language
English (18)
Research subject (UKÄ/SCB)
Natural sciences (16)
Medical and Health Sciences (6)
Engineering and Technology (2)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view