SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:2047 217X OR L773:2047 217X ;hsvcat:2"

Sökning: L773:2047 217X OR L773:2047 217X > Teknik

  • Resultat 1-2 av 2
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Capuccini, Marco, et al. (författare)
  • MaRe : Processing Big Data with application containers on Apache Spark
  • 2020
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
  •  
2.
  • Dahlberg, Johan, et al. (författare)
  • Arteria : An automation system for a sequencing core facility
  • 2019
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:12
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.Findings: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/.Conclusions: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-2 av 2

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy