SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Spjuth Ola Associate Professor) "

Sökning: WFRF:(Spjuth Ola Associate Professor)

  • Resultat 1-4 av 4
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahmed, Laeeq (författare)
  • Scalable Analysis of Large Datasets in Life Sciences
  • 2019
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • We are experiencing a deluge of data in all fields of scientific and business research, particularly in the life sciences, due to the development of better instrumentation and the rapid advancements that have occurred in information technology in recent times. There are major challenges when it comes to handling such large amounts of data. These range from the practicalities of managing these large volumes of data, to understanding the meaning and practical implications of the data.In this thesis, I present parallel methods to efficiently manage, process, analyse and visualize large sets of data from several life sciences fields at a rapid rate, while building and utilizing various machine learning techniques in a novel way. Most of the work is centred on applying the latest Big Data Analytics frameworks for creating efficient virtual screening strategies while working with large datasets. Virtual screening is a method in cheminformatics used for Drug discovery by searching large libraries of molecule structures. I also present a method for the analysis of large Electroencephalography data in real time. Electroencephalography is one of the main techniques used to measure the brain electrical activity.First, I evaluate the suitability of Spark, a parallel framework for large datasets, for performing parallel ligand-based virtual screening. As a case study, I classify molecular library using prebuilt classification models to filter out the active molecules. I also demonstrate a strategy to create cloud-ready pipelines for structure-based virtual screening. The major advantages of this strategy are increased productivity and high throughput. In this work, I show that Spark can be applied to virtual screening, and that it is, in general, an appropriate solution for large-scale parallel pipelining. Moreover, I illustrate how Big Data analytics are valuable in working with life sciences datasets.Secondly, I present a method to further reduce the overall time of the structured-based virtual screening strategy using machine learning and a conformal-prediction-based iterative modelling strategy. The idea is to only dock those molecules that have a better than average chance of being an inhibitor when searching for molecules that could potentially be used as drugs. Using machine learning models from this work, I built a web service to predict the target profile of multiple compounds against ready-made models for a list of targets where 3D structures are available. These target predictions can be used to understand off-target effects, for example in the early stages of drug discovery projects.Thirdly, I present a method to detect seizures in long term Electroencephalography readings - this method works in real time taking the ongoing readings in as live data streams. The method involves tackling the challenges of real-time decision-making, storing large datasets in memory and updating the prediction model with newly produced data at a rapid rate. The resulting algorithm not only classifies seizures in real time, it also learns the threshold in real time. I also present a new feature "top-k amplitude measure" for classifying which parts of the data correspond to seizures. Furthermore, this feature helps to reduce the amount of data that needs to be processed in the subsequent steps.
  •  
2.
  • Dahlö, Martin (författare)
  • Approaches for Distributing Large Scale Bioinformatic Analyses
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Ever since high-throughput DNA sequencing became economically feasible, the amount of biological data has grown exponentially. This has been one of the biggest drivers in introducing high-performance computing (HPC) to the field of biology. Unlike physics and mathematics, biology education has not had a strong focus on programming or algorithmic development. This has forced many biology researchers to start learning a whole new skill set, and introduced new challenges for those managing the HPC clusters.The aim of this thesis is to investigate the problems that arise when novice users are using an HPC cluster for bioinformatics data analysis, and exploring approaches for how these can be mitigated. In paper 1 we quantify and visualise these problems and contrast them with the more computer experienced user groups already using the HPC cluster. In paper 2 we introduce a new workflow system (SciPipe), implemented as a Go library, as a way to organise and manage analysis steps. Paper 3 is aimed at cloud computing and how containerised tools can be used to run workflows without having to worry about software installations. In paper 4 we demonstrate a fully automated cloud-based system for image-based cell profiling. Starting with a robotic arm in a lab, it covers all the steps from cell culture and microscope to having the cell profiling results stored in a database and visualised in a web interface.
  •  
3.
  • Ekmefjord, Morgan, et al. (författare)
  • Scalable federated machine learning with FEDn
  • 2022
  • Ingår i: 2022 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781665499569 - 9781665499576 ; , s. 555-564
  • Konferensbidrag (refereegranskat)abstract
    • Federated machine learning promises to overcome the input privacy challenge in machine learning. By iteratively updating a model on private clients and aggregating these local model updates into a global federated model, private data is incorporated in the federated model without needing to share and expose that data. Several open software projects for federated learning have appeared. Most of them focuses on supporting flexible experimentation with different model aggregation schemes and with different privacy-enhancing technologies. However, there is a lack of open frameworks that focuses on critical distributed computing aspects of the problem such as scalability and resilience. It is a big step to take for a data scientist to go from an experimental sandbox to testing their federated schemes at scale in real-world geographically distributed settings. To bridge this gap we have designed and developed a production-grade hierarchical federated learning framework, FEDn. The framework is specifically designed to make it easy to go from local development in pseudo-distributed mode to horizontally scalable distributed deployments. FEDn both aims to be production grade for industrial applications and a flexible research tool to explore real-world performance of novel federated algorithms and the framework has been used in number of industrial and academic R&D projects. In this paper we present the architecture and implementation of FEDn. We demonstrate the framework's scalability and efficiency in evaluations based on two case-studies representative for a cross-silo and a cross-device use-case respectively.
  •  
4.
  • Zhang, Tianru, et al. (författare)
  • Data management of scientific applications in a reinforcement learning-based hierarchical storage system
  • 2024
  • Ingår i: Expert systems with applications. - : Elsevier. - 0957-4174 .- 1873-6793. ; 237
  • Tidskriftsartikel (refereegranskat)abstract
    • In many areas of data-driven science, large datasets are generated where the individual data objects are images, matrices, or otherwise have a clear structure. However, these objects can be information-sparse, and a challenge is to efficiently find and work with the most interesting data as early as possible in an analysis pipeline. We have recently proposed a new model for big data management where the internal structure and information of the data are associated with each data object (as opposed to simple metadata). There is then an opportunity for comprehensive data management solutions to account for data-specific internal structure as well as access patterns. In this article, we explore this idea together with our recently proposed hierarchical storage management framework that uses reinforcement learning (RL) for autonomous and dynamic data placement in different tiers in a storage hierarchy. Our case-study is based on four scientific datasets: Protein translocation microscopy images, Airfoil angle of attack meshes, 1000 Genomes sequences, and Phenotypic screening images. The presented results highlight that our framework is optimal and can quickly adapt to new data access requirements. It overall reduces the data processing time, and the proposed autonomous data placement is superior compared to any static or semi-static data placement policies.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-4 av 4

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy