SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:2047 217X OR L773:2047 217X ;mspu:(article)"

Sökning: L773:2047 217X OR L773:2047 217X > Tidskriftsartikel

  • Resultat 1-10 av 42
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Bradnam, K. R., et al. (författare)
  • Assemblathon 2 : Evaluating de novo methods of genome assembly in three vertebrate species
  • 2013
  • Ingår i: GigaScience. - : BioMed Central (BMC). - 2047-217X. ; 2:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
  •  
2.
  • Lampa, Samuel, et al. (författare)
  • Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data
  • 2013
  • Ingår i: GigaScience. - 2047-217X. ; 2:1, s. 1-10
  • Tidskriftsartikel (refereegranskat)abstract
    • Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.
  •  
3.
  • Li, Cai, et al. (författare)
  • Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment
  • 2014
  • Ingår i: GigaScience. - 2047-217X. ; 3
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adelie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri]. Results: Phylogenetic dating suggests that early penguins arose similar to 60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from similar to 1 million years ago to similar to 100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology. Conclusions: Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.
  •  
4.
  • Blamey, Ben, et al. (författare)
  • Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:3, s. 1-14
  • Tidskriftsartikel (refereegranskat)abstract
    • BACKGROUND: Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.FINDINGS: In our pipeline model, an "interestingness function" assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a "policy" guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope.CONCLUSIONS: Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems - and is intended for use with a range of technologies in different deployment scenarios.
  •  
5.
  • Boulund, Fredrik, 1985, et al. (författare)
  • Tentacle: distributed quantification of genes in metagenomes
  • 2015
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X .- 2047-217X. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Findings Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Conclusions Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.
  •  
6.
  • Capuccini, Marco, et al. (författare)
  • MaRe : Processing Big Data with application containers on Apache Spark
  • 2020
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
  •  
7.
  •  
8.
  • Dahlberg, Johan, et al. (författare)
  • Arteria : An automation system for a sequencing core facility
  • 2019
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:12
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.Findings: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/.Conclusions: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.
  •  
9.
  • Dahlö, Martin, et al. (författare)
  • Tracking the NGS revolution : managing life science research on shared high-performance computing clusters
  • 2018
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 7:5
  • Tidskriftsartikel (refereegranskat)abstract
    • BackgroundNext-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.ResultsThe number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.ConclusionsHosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
  •  
10.
  • Dahn, Hollis A., et al. (författare)
  • Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types.Results: We find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4 degrees C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4 degrees C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield.Conclusion: We provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all similar to 70,000 extant vertebrate species.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 42
Typ av publikation
Typ av innehåll
refereegranskat (42)
Författare/redaktör
Wang, J. (2)
Hellander, Andreas (2)
Zhang, Yan (1)
Chen, X. (1)
Howard, J. (1)
Li, Y. (1)
visa fler...
Liu, B. (1)
Liu, Y. (1)
Zhang, H. (1)
Li, X. (1)
Zhou, Y. (1)
Liu, S. (1)
Yuan, J. (1)
Zhang, G (1)
Lee, J. S. (1)
Leo, S. (1)
Song, H. (1)
Zhou, S. (1)
Larsson, Anders (1)
Abalde, Samuel (1)
Zardoya, Rafael (1)
Tenorio, Manuel J. (1)
Afonso, Carlos M.L. (1)
Abarenkov, Kessy (1)
Kristiansson, Erik, ... (1)
Kõljalg, Urmas (1)
Nilsson, R. Henrik, ... (1)
Tedersoo, Leho (1)
Li, Z (1)
Huang, J. (1)
Chopra, S. (1)
Kumar, M (1)
Naguib, Mahmoud (1)
Boulund, Fredrik, 19 ... (1)
Ladenvall, Claes, Ph ... (1)
Lima, N (1)
Hammarstrom, L (1)
van der Laak, Jeroen (1)
Schuster, M. (1)
Green, Richard E. (1)
Ning, Z. (1)
Qin, X. (1)
Richards, S (1)
Moore, Edward R.B. 1 ... (1)
Bertilsson, Stefan (1)
Ghosh, SS (1)
Poline, JB (1)
Strother, SC (1)
Obst, Matthias, 1974 (1)
Duplouy, Anne (1)
visa färre...
Lärosäte
Uppsala universitet (22)
Karolinska Institutet (9)
Göteborgs universitet (5)
Stockholms universitet (4)
Kungliga Tekniska Högskolan (2)
Naturhistoriska riksmuseet (2)
visa fler...
Sveriges Lantbruksuniversitet (2)
Umeå universitet (1)
Linköpings universitet (1)
Lunds universitet (1)
Malmö universitet (1)
Chalmers tekniska högskola (1)
visa färre...
Språk
Engelska (42)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (30)
Medicin och hälsovetenskap (6)
Lantbruksvetenskap (4)
Teknik (2)
Samhällsvetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy