SwePub - sökning: L773:2047 217X

Numrering	Referens	Omslagsbild	Hitta
1.	Bradnam, K. R., et al. (författare) Assemblathon 2 : Evaluating de novo methods of genome assembly in three vertebrate species 2013 Ingår i: GigaScience. - : BioMed Central (BMC). - 2047-217X. ; 2:1 Tidskriftsartikel (refereegranskat)abstract Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
2.	Lampa, Samuel, et al. (författare) Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data 2013 Ingår i: GigaScience. - 2047-217X. ; 2:1, s. 1-10 Tidskriftsartikel (refereegranskat)abstract Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.
3.	Li, Cai, et al. (författare) Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment 2014 Ingår i: GigaScience. - 2047-217X. ; 3 Tidskriftsartikel (refereegranskat)abstract Background: Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adelie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri]. Results: Phylogenetic dating suggests that early penguins arose similar to 60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from similar to 1 million years ago to similar to 100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology. Conclusions: Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.
4.	Blamey, Ben, et al. (författare) Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit 2021 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:3, s. 1-14 Tidskriftsartikel (refereegranskat)abstract BACKGROUND: Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.FINDINGS: In our pipeline model, an "interestingness function" assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a "policy" guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope.CONCLUSIONS: Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems - and is intended for use with a range of technologies in different deployment scenarios.
5.	Boulund, Fredrik, 1985, et al. (författare) Tentacle: distributed quantification of genes in metagenomes 2015 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X .- 2047-217X. ; 4 Tidskriftsartikel (refereegranskat)abstract Background In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Findings Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Conclusions Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.
6.	Capuccini, Marco, et al. (författare) MaRe : Processing Big Data with application containers on Apache Spark 2020 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5 Tidskriftsartikel (refereegranskat)abstract Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
7.	Craddock, RC, et al. (författare) Brainhack: a collaborative workshop for the open neuroscience community 2016 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 5, s. 16- Tidskriftsartikel (refereegranskat)
8.	Dahlberg, Johan, et al. (författare) Arteria : An automation system for a sequencing core facility 2019 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:12 Tidskriftsartikel (refereegranskat)abstract Background: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.Findings: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/.Conclusions: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.
9.	Dahlö, Martin, et al. (författare) Tracking the NGS revolution : managing life science research on shared high-performance computing clusters 2018 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 7:5 Tidskriftsartikel (refereegranskat)abstract BackgroundNext-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.ResultsThe number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.ConclusionsHosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
10.	Dahn, Hollis A., et al. (författare) Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing 2022 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 11 Tidskriftsartikel (refereegranskat)abstract Background: Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types.Results: We find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4 degrees C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4 degrees C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield.Conclusion: We provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all similar to 70,000 extant vertebrate species.
11.	Davies, Neil, et al. (författare) The founding charter of the Genomic Observatories Network 2014 Ingår i: GigaScience. - 2047-217X. ; 3:2 Tidskriftsartikel (refereegranskat)abstract Abstract The co-authors of this paper hereby state their intention to work together to launch the Genomic Observatories Network (GOs Network) for which this document will serve as its Founding Charter. We define a Genomic Observatory as an ecosystem and/or site subject to long-term scientific research, including (but not limited to) the sustained study of genomic biodiversity from single-celled microbes to multicellular organisms.An international group of 64 scientists first published the call for a global network of Genomic Observatories in January 2012. The vision for such a network was expanded in a subsequent paper and developed over a series of meetings in Bremen (Germany), Shenzhen (China), Moorea (French Polynesia), Oxford (UK), Pacific Grove (California, USA), Washington (DC, USA), and London (UK). While this community-building process continues, here we express our mutual intent to establish the GOs Network formally, and to describe our shared vision for its future. The views expressed here are ours alone as individual scientists, and do not necessarily represent those of the institutions with which we are affiliated.
12.	Emery, Samantha J., et al. (författare) Differential protein expression and post-translational modifications in metronidazole-resistant Giardia duodenalis 2018 Ingår i: GigaScience. - : OXFORD UNIV PRESS. - 2047-217X. ; 7:4 Tidskriftsartikel (refereegranskat)abstract Background: Metronidazole (Mtz) is the frontline drug treatment for multiple anaerobic pathogens, including the gastrointestinal protist, Giardia duodenalis. However, treatment failure is common and linked to in vivo drug resistance. In Giardia, in vitro drug-resistant lines allow controlled experimental interrogation of resistance mechanisms in isogenic cultures. However, resistance-associated changes are inconsistent between lines, phenotypic data are incomplete, and resistance is rarely genetically fixed, highlighted by reversion to sensitivity after drug selection ceases or via passage through the life cycle. Comprehensive quantitative approaches are required to resolve isolate variability, fully define Mtz resistance phenotypes, and explore the role of post-translational modifications therein. Findings: We performed quantitative proteomics to describe differentially expressed proteins in 3 seminal Mtz-resistant lines compared to their isogenic, Mtz-susceptible, parental line. We also probed changes in post-translational modifications including protein acetylation, methylation, ubiquitination, and phosphorylation via immunoblotting. We quantified more than 1,000 proteins in each genotype, recording substantial genotypic variation in differentially expressed proteins between isotypes. Our data confirm substantial changes in the antioxidant network, glycolysis, and electron transport and indicate links between protein acetylation and Mtz resistance, including cross-resistance to deacetylase inhibitor trichostatin A in Mtz-resistant lines. Finally, we performed the first controlled, longitudinal study of Mtz resistance stability, monitoring lines after cessation of drug selection, revealing isolate-dependent phenotypic plasticity. Conclusions: Our data demonstrate understanding that Mtz resistance must be broadened to post-transcriptional and post-translational responses and that Mtz resistance is polygenic, driven by isolate-dependent variation, and is correlated with changes in protein acetylation networks.
13.	Fang, C, et al. (författare) Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing 2017 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 7:3, s. 1-8 Tidskriftsartikel (refereegranskat)
14.	Gan, H. M., et al. (författare) Genomic evidence of neo-sex chromosomes in the eastern yellow robin 2019 Ingår i: Gigascience. - : Oxford University Press (OUP). - 2047-217X. ; 8:9 Tidskriftsartikel (refereegranskat)abstract Background: Understanding sex-biased natural selection can be enhanced by access to well-annotated chromosomes including ones inherited in sex-specific fashion. The eastern yellow robin (EYR) is an endemic Australian songbird inferred to have experienced climate-driven sex-biased selection and is a prominent model for studying mitochondrial-nuclear interactions in the wild. However, the lack of an EYR reference genome containing both sex chromosomes (in birds, a female bearing Z and W chromosomes) limits efforts to understand the mechanisms of these processes. Here, we assemble the genome for a female EYR and use low-depth (10x) genome resequencing data from 19 individuals of known sex to identify chromosome fragments with sex-specific inheritance. Findings: MaSuRCA hybrid assembly using Nanopore and Illumina reads generated a 1.22-Gb EYR genome in 20,702 scaffolds (94.2% BUSCO completeness). Scaffolds were tested for W-linked (female-only) inheritance using a k-mer approach, and for Z-linked inheritance using median read-depth test in male and female reads (read-depths must indicate haploid female and diploid male representation). This resulted in 2,372 W-linked scaffolds (total length: 97,872,282 bp, N50: 81,931 bp) and 586 Z-linked scaffolds (total length: 121,817,358 bp, N50: 551,641 bp). Anchoring of the sex-linked EYR scaffolds to the reference genome of a female zebra finch revealed 2 categories of sex-linked genomic regions. First, 653 W-linked scaffolds (25.7 Mb) were anchored to the W sex chromosome and 215 Z-linked scaffolds (74.4 Mb) to the Z. Second, 1,138 W-linked scaffolds (70.9 Mb) and 179 Z-linked scaffolds (51.0 Mb) were anchored to a large section (coordinates similar to 5 to similar to 60 Mb) of zebra finch chromosome 1A. The first similar to 5 Mb and last similar to 14 Mb of the reference chromosome 1A had only autosomally behaving EYR scaffolds mapping to them. Conclusions: We report a female (W chromosome-containing) EYR genome and provide genomic evidence for a neo-sex (neo-W and neo-Z) chromosome system in the EYR, involving most of a large chromosome (1A) previously only reported to be autosomal in passerines.
15.	Gonzalez-Beltran, AN, et al. (författare) Community standards for open cell migration data 2020 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 9:5 Tidskriftsartikel (refereegranskat)abstract Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.
16.	Grüning, Björn A., et al. (författare) Software engineering for scientific big data analysis 2019 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5 Forskningsöversikt (refereegranskat)abstract The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
17.	Jarvis, Erich D., et al. (författare) Phylogenomic analyses data of the avian phylogenomics project 2015 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 4 Tidskriftsartikel (refereegranskat)abstract Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
18.	Johnson, David, et al. (författare) ISA API : An open platform for interoperable life science experimental metadata 2021 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:9 Tidskriftsartikel (refereegranskat)abstract Background. The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab—a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed.Results. In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community.Conclusions. The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
19.	Kuderna, Lukas F. K., et al. (författare) A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0) 2017 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 6:11, s. 1-6 Tidskriftsartikel (refereegranskat)abstract The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan tro 2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan tro 2.1.4) by several metrics, such as increased contiguity by > 750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan tro 2.1.4 assembly gaps spanning > 850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan tro 2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.
20.	Lampa, Samuel, et al. (författare) SciPipe : A workflow library for agile development of complex and dynamic bioinformatics pipelines 2019 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5 Tidskriftsartikel (refereegranskat)abstract Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
21.	Levitis, E, et al. (författare) Centering inclusivity in the design of online conferences-An OHBM-Open Science perspective 2021 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 10:8 Tidskriftsartikel (refereegranskat)abstract As the global health crisis unfolded, many academic conferences moved online in 2020. This move has been hailed as a positive step towards inclusivity in its attenuation of economic, physical, and legal barriers and effectively enabled many individuals from groups that have traditionally been underrepresented to join and participate. A number of studies have outlined how moving online made it possible to gather a more global community and has increased opportunities for individuals with various constraints, e.g., caregiving responsibilities.Yet, the mere existence of online conferences is no guarantee that everyone can attend and participate meaningfully. In fact, many elements of an online conference are still significant barriers to truly diverse participation: the tools used can be inaccessible for some individuals; the scheduling choices can favour some geographical locations; the set-up of the conference can provide more visibility to well-established researchers and reduce opportunities for early-career researchers. While acknowledging the benefits of an online setting, especially for individuals who have traditionally been underrepresented or excluded, we recognize that fostering social justice requires inclusivity to actively be centered in every aspect of online conference design.Here, we draw from the literature and from our own experiences to identify practices that purposefully encourage a diverse community to attend, participate in, and lead online conferences. Reflecting on how to design more inclusive online events is especially important as multiple scientific organizations have announced that they will continue offering an online version of their event when in-person conferences can resume.
22.	Litjens, Geert, et al. (författare) A Decade of GigaScience : The Challenges of Gigapixe Pathology Images 2022 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 11 Tidskriftsartikel (refereegranskat)abstract In the last decade, the field of computational pathology has advanced at a rapid pace because of the availability of deep neural networks, which achieved their first successes in computer vision tasks in 2012. An important driver for the progress of the field were public competitions, so called Grand Challenges, in which increasingly large data sets were offered to the public to solve clinically relevant tasks. Going from the first Pathology challenges, which had data obtained from 23 patients, to current challenges sharing data of thousands of patients, performance of developed deep learning solutions has reached (and sometimes surpassed) the level of experienced pathologists for specific tasks. We expect future challenges to broaden the horizon, for instance by combining data from radiology, pathology and tumor genetics, and to extract prognostic and predictive information independent of currently used grading schemes.
23.	Mueller, Ralf C., et al. (författare) A high-quality genome and comparison of short-versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck) 2021 Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:12 Tidskriftsartikel (refereegranskat)abstract Background: The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome.Findings: This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families.Conclusions: This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long -read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses.
24.	Nowell, Reuben W., et al. (författare) A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana 2017 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 6:7 Tidskriftsartikel (refereegranskat)abstract The mycalesine butterfly Bicyclus anynana, the Squinting bush brown, is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (similar to x260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html).
25.	Olsen, Remi-Andre, et al. (författare) De novo assembly of Dekkera bruxellensis : a multi technology approach using short and long-read sequencing and optical mapping 2015 Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 4 Tidskriftsartikel (refereegranskat)abstract Background: It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome. Methods: In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work. Results: We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "L773:2047 217X "

Avgränsa träffmängd

År