Search: onr:"swepub:oai:DiVA.org:uu-453958" >
Approaches for Dist...
Approaches for Distributing Large Scale Bioinformatic Analyses
-
- Dahlö, Martin (author)
- Uppsala universitet,Institutionen för farmaceutisk biovetenskap,Pharmaceutical Bioinformatics
-
- Spjuth, Ola, Professor, 1977- (thesis advisor)
- Uppsala universitet,Institutionen för farmaceutisk biovetenskap
-
- Peterson, Hedi, Associate Professor (opponent)
- University of Tartu, Faculty of Science and Technology, Institute of Computer Science
-
(creator_code:org_t)
- ISBN 9789151313016
- Uppsala : Acta Universitatis Upsaliensis, 2021
- English 56 s.
-
Series: Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, 1651-6192 ; 302
- Related links:
-
https://uu.diva-port... (primary) (Raw object)
-
show more...
-
https://uu.diva-port... (Preview)
-
https://urn.kb.se/re...
-
show less...
Abstract
Subject headings
Close
- Ever since high-throughput DNA sequencing became economically feasible, the amount of biological data has grown exponentially. This has been one of the biggest drivers in introducing high-performance computing (HPC) to the field of biology. Unlike physics and mathematics, biology education has not had a strong focus on programming or algorithmic development. This has forced many biology researchers to start learning a whole new skill set, and introduced new challenges for those managing the HPC clusters.The aim of this thesis is to investigate the problems that arise when novice users are using an HPC cluster for bioinformatics data analysis, and exploring approaches for how these can be mitigated. In paper 1 we quantify and visualise these problems and contrast them with the more computer experienced user groups already using the HPC cluster. In paper 2 we introduce a new workflow system (SciPipe), implemented as a Go library, as a way to organise and manage analysis steps. Paper 3 is aimed at cloud computing and how containerised tools can be used to run workflows without having to worry about software installations. In paper 4 we demonstrate a fully automated cloud-based system for image-based cell profiling. Starting with a robotic arm in a lab, it covers all the steps from cell culture and microscope to having the cell profiling results stored in a database and visualised in a web interface.
Subject headings
- NATURVETENSKAP -- Data- och informationsvetenskap -- Bioinformatik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Bioinformatics (hsv//eng)
Keyword
- bioinformatics
- cloud computing
- HPC
- high-performance computing
- big data
- kubernetes
- spark
- Bioinformatics
- Bioinformatik
Publication and Content Type
- vet (subject category)
- dok (subject category)
Find in a library
To the university's database