SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:2047 217X "

Sökning: L773:2047 217X

  • Resultat 1-45 av 45
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Bradnam, K. R., et al. (författare)
  • Assemblathon 2 : Evaluating de novo methods of genome assembly in three vertebrate species
  • 2013
  • Ingår i: GigaScience. - : BioMed Central (BMC). - 2047-217X. ; 2:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
  •  
2.
  • Lampa, Samuel, et al. (författare)
  • Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data
  • 2013
  • Ingår i: GigaScience. - 2047-217X. ; 2:1, s. 1-10
  • Tidskriftsartikel (refereegranskat)abstract
    • Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.
  •  
3.
  • Li, Cai, et al. (författare)
  • Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment
  • 2014
  • Ingår i: GigaScience. - 2047-217X. ; 3
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adelie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri]. Results: Phylogenetic dating suggests that early penguins arose similar to 60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from similar to 1 million years ago to similar to 100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology. Conclusions: Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.
  •  
4.
  • Blamey, Ben, et al. (författare)
  • Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:3, s. 1-14
  • Tidskriftsartikel (refereegranskat)abstract
    • BACKGROUND: Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.FINDINGS: In our pipeline model, an "interestingness function" assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a "policy" guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope.CONCLUSIONS: Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems - and is intended for use with a range of technologies in different deployment scenarios.
  •  
5.
  • Boulund, Fredrik, 1985, et al. (författare)
  • Tentacle: distributed quantification of genes in metagenomes
  • 2015
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X .- 2047-217X. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Findings Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Conclusions Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.
  •  
6.
  • Capuccini, Marco, et al. (författare)
  • MaRe : Processing Big Data with application containers on Apache Spark
  • 2020
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
  •  
7.
  •  
8.
  • Dahlberg, Johan, et al. (författare)
  • Arteria : An automation system for a sequencing core facility
  • 2019
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:12
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.Findings: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/.Conclusions: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.
  •  
9.
  • Dahlö, Martin, et al. (författare)
  • Tracking the NGS revolution : managing life science research on shared high-performance computing clusters
  • 2018
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 7:5
  • Tidskriftsartikel (refereegranskat)abstract
    • BackgroundNext-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.ResultsThe number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.ConclusionsHosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
  •  
10.
  • Dahn, Hollis A., et al. (författare)
  • Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types.Results: We find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4 degrees C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4 degrees C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield.Conclusion: We provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all similar to 70,000 extant vertebrate species.
  •  
11.
  • Davies, Neil, et al. (författare)
  • The founding charter of the Genomic Observatories Network
  • 2014
  • Ingår i: GigaScience. - 2047-217X. ; 3:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Abstract The co-authors of this paper hereby state their intention to work together to launch the Genomic Observatories Network (GOs Network) for which this document will serve as its Founding Charter. We define a Genomic Observatory as an ecosystem and/or site subject to long-term scientific research, including (but not limited to) the sustained study of genomic biodiversity from single-celled microbes to multicellular organisms.An international group of 64 scientists first published the call for a global network of Genomic Observatories in January 2012. The vision for such a network was expanded in a subsequent paper and developed over a series of meetings in Bremen (Germany), Shenzhen (China), Moorea (French Polynesia), Oxford (UK), Pacific Grove (California, USA), Washington (DC, USA), and London (UK). While this community-building process continues, here we express our mutual intent to establish the GOs Network formally, and to describe our shared vision for its future. The views expressed here are ours alone as individual scientists, and do not necessarily represent those of the institutions with which we are affiliated.
  •  
12.
  • Emery, Samantha J., et al. (författare)
  • Differential protein expression and post-translational modifications in metronidazole-resistant Giardia duodenalis
  • 2018
  • Ingår i: GigaScience. - : OXFORD UNIV PRESS. - 2047-217X. ; 7:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Metronidazole (Mtz) is the frontline drug treatment for multiple anaerobic pathogens, including the gastrointestinal protist, Giardia duodenalis. However, treatment failure is common and linked to in vivo drug resistance. In Giardia, in vitro drug-resistant lines allow controlled experimental interrogation of resistance mechanisms in isogenic cultures. However, resistance-associated changes are inconsistent between lines, phenotypic data are incomplete, and resistance is rarely genetically fixed, highlighted by reversion to sensitivity after drug selection ceases or via passage through the life cycle. Comprehensive quantitative approaches are required to resolve isolate variability, fully define Mtz resistance phenotypes, and explore the role of post-translational modifications therein. Findings: We performed quantitative proteomics to describe differentially expressed proteins in 3 seminal Mtz-resistant lines compared to their isogenic, Mtz-susceptible, parental line. We also probed changes in post-translational modifications including protein acetylation, methylation, ubiquitination, and phosphorylation via immunoblotting. We quantified more than 1,000 proteins in each genotype, recording substantial genotypic variation in differentially expressed proteins between isotypes. Our data confirm substantial changes in the antioxidant network, glycolysis, and electron transport and indicate links between protein acetylation and Mtz resistance, including cross-resistance to deacetylase inhibitor trichostatin A in Mtz-resistant lines. Finally, we performed the first controlled, longitudinal study of Mtz resistance stability, monitoring lines after cessation of drug selection, revealing isolate-dependent phenotypic plasticity. Conclusions: Our data demonstrate understanding that Mtz resistance must be broadened to post-transcriptional and post-translational responses and that Mtz resistance is polygenic, driven by isolate-dependent variation, and is correlated with changes in protein acetylation networks.
  •  
13.
  •  
14.
  • Gan, H. M., et al. (författare)
  • Genomic evidence of neo-sex chromosomes in the eastern yellow robin
  • 2019
  • Ingår i: Gigascience. - : Oxford University Press (OUP). - 2047-217X. ; 8:9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Understanding sex-biased natural selection can be enhanced by access to well-annotated chromosomes including ones inherited in sex-specific fashion. The eastern yellow robin (EYR) is an endemic Australian songbird inferred to have experienced climate-driven sex-biased selection and is a prominent model for studying mitochondrial-nuclear interactions in the wild. However, the lack of an EYR reference genome containing both sex chromosomes (in birds, a female bearing Z and W chromosomes) limits efforts to understand the mechanisms of these processes. Here, we assemble the genome for a female EYR and use low-depth (10x) genome resequencing data from 19 individuals of known sex to identify chromosome fragments with sex-specific inheritance. Findings: MaSuRCA hybrid assembly using Nanopore and Illumina reads generated a 1.22-Gb EYR genome in 20,702 scaffolds (94.2% BUSCO completeness). Scaffolds were tested for W-linked (female-only) inheritance using a k-mer approach, and for Z-linked inheritance using median read-depth test in male and female reads (read-depths must indicate haploid female and diploid male representation). This resulted in 2,372 W-linked scaffolds (total length: 97,872,282 bp, N50: 81,931 bp) and 586 Z-linked scaffolds (total length: 121,817,358 bp, N50: 551,641 bp). Anchoring of the sex-linked EYR scaffolds to the reference genome of a female zebra finch revealed 2 categories of sex-linked genomic regions. First, 653 W-linked scaffolds (25.7 Mb) were anchored to the W sex chromosome and 215 Z-linked scaffolds (74.4 Mb) to the Z. Second, 1,138 W-linked scaffolds (70.9 Mb) and 179 Z-linked scaffolds (51.0 Mb) were anchored to a large section (coordinates similar to 5 to similar to 60 Mb) of zebra finch chromosome 1A. The first similar to 5 Mb and last similar to 14 Mb of the reference chromosome 1A had only autosomally behaving EYR scaffolds mapping to them. Conclusions: We report a female (W chromosome-containing) EYR genome and provide genomic evidence for a neo-sex (neo-W and neo-Z) chromosome system in the EYR, involving most of a large chromosome (1A) previously only reported to be autosomal in passerines.
  •  
15.
  • Gaser, Christian, et al. (författare)
  • CAT : a computational anatomy toolbox for the analysis of structural MRI data
  • 2024
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 13, s. 1-13
  • Tidskriftsartikel (refereegranskat)abstract
    • A large range of sophisticated brain image analysis tools have been developed by the neuroscience community, greatly advancing the field of human brain mapping. Here we introduce the Computational Anatomy Toolbox (CAT)-a powerful suite of tools for brain morphometric analyses with an intuitive graphical user interface but also usable as a shell script. CAT is suitable for beginners, casual users, experts, and developers alike, providing a comprehensive set of analysis options, workflows, and integrated pipelines. The available analysis streams-illustrated on an example dataset-allow for voxel-based, surface-based, and region-based morphometric analyses. Notably, CAT incorporates multiple quality control options and covers the entire analysis workflow, including the preprocessing of cross-sectional and longitudinal data, statistical analysis, and the visualization of results. The overarching aim of this article is to provide a complete description and evaluation of CAT while offering a citable standard for the neuroscience community. 
  •  
16.
  • Gonzalez-Beltran, AN, et al. (författare)
  • Community standards for open cell migration data
  • 2020
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 9:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.
  •  
17.
  • Grüning, Björn A., et al. (författare)
  • Software engineering for scientific big data analysis
  • 2019
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5
  • Forskningsöversikt (refereegranskat)abstract
    • The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
  •  
18.
  • Jarvis, Erich D., et al. (författare)
  • Phylogenomic analyses data of the avian phylogenomics project
  • 2015
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
  •  
19.
  • Johnson, David, et al. (författare)
  • ISA API : An open platform for interoperable life science experimental metadata
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background. The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab—a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed.Results. In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community.Conclusions. The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
  •  
20.
  • Kuderna, Lukas F. K., et al. (författare)
  • A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0)
  • 2017
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 6:11, s. 1-6
  • Tidskriftsartikel (refereegranskat)abstract
    • The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan tro 2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan tro 2.1.4) by several metrics, such as increased contiguity by > 750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan tro 2.1.4 assembly gaps spanning > 850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan tro 2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.
  •  
21.
  • Lampa, Samuel, et al. (författare)
  • SciPipe : A workflow library for agile development of complex and dynamic bioinformatics pipelines
  • 2019
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 8:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
  •  
22.
  • Levitis, E, et al. (författare)
  • Centering inclusivity in the design of online conferences-An OHBM-Open Science perspective
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 10:8
  • Tidskriftsartikel (refereegranskat)abstract
    • As the global health crisis unfolded, many academic conferences moved online in 2020. This move has been hailed as a positive step towards inclusivity in its attenuation of economic, physical, and legal barriers and effectively enabled many individuals from groups that have traditionally been underrepresented to join and participate. A number of studies have outlined how moving online made it possible to gather a more global community and has increased opportunities for individuals with various constraints, e.g., caregiving responsibilities.Yet, the mere existence of online conferences is no guarantee that everyone can attend and participate meaningfully. In fact, many elements of an online conference are still significant barriers to truly diverse participation: the tools used can be inaccessible for some individuals; the scheduling choices can favour some geographical locations; the set-up of the conference can provide more visibility to well-established researchers and reduce opportunities for early-career researchers. While acknowledging the benefits of an online setting, especially for individuals who have traditionally been underrepresented or excluded, we recognize that fostering social justice requires inclusivity to actively be centered in every aspect of online conference design.Here, we draw from the literature and from our own experiences to identify practices that purposefully encourage a diverse community to attend, participate in, and lead online conferences. Reflecting on how to design more inclusive online events is especially important as multiple scientific organizations have announced that they will continue offering an online version of their event when in-person conferences can resume.
  •  
23.
  • Litjens, Geert, et al. (författare)
  • A Decade of GigaScience : The Challenges of Gigapixe Pathology Images
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • In the last decade, the field of computational pathology has advanced at a rapid pace because of the availability of deep neural networks, which achieved their first successes in computer vision tasks in 2012. An important driver for the progress of the field were public competitions, so called Grand Challenges, in which increasingly large data sets were offered to the public to solve clinically relevant tasks. Going from the first Pathology challenges, which had data obtained from 23 patients, to current challenges sharing data of thousands of patients, performance of developed deep learning solutions has reached (and sometimes surpassed) the level of experienced pathologists for specific tasks. We expect future challenges to broaden the horizon, for instance by combining data from radiology, pathology and tumor genetics, and to extract prognostic and predictive information independent of currently used grading schemes.
  •  
24.
  • Mueller, Ralf C., et al. (författare)
  • A high-quality genome and comparison of short-versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:12
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome.Findings: This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families.Conclusions: This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long -read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses.
  •  
25.
  • Nowell, Reuben W., et al. (författare)
  • A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana
  • 2017
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 6:7
  • Tidskriftsartikel (refereegranskat)abstract
    • The mycalesine butterfly Bicyclus anynana, the Squinting bush brown, is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (similar to x260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html).
  •  
26.
  • Olsen, Remi-Andre, et al. (författare)
  • De novo assembly of Dekkera bruxellensis : a multi technology approach using short and long-read sequencing and optical mapping
  • 2015
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome. Methods: In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work. Results: We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.
  •  
27.
  • Opgenoorth, Lars, et al. (författare)
  • The GenTree Platform : growth traits and tree-level environmental data in 12 European forest tree species
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 10:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Progress in the field of evolutionary forest ecology has been hampered by the huge challenge of phenotyping trees across their ranges in their natural environments, and the limitation in high-resolution environmental information.Findings: The GenTree Platform contains phenotypic and environmental data from 4,959 trees from 12 ecologically and economically important European forest tree species: Abies alba Mill. (silver fir), Betula pendula Roth. (silver birch), Fagus sylvatica L. (European beech), Picea abies (L.) H. Karst (Norway spruce), Pinus cembra L. (Swiss stone pine), Pinus halepensis Mill. (Aleppo pine), Pinus nigra Arnold (European black pine), Pinus pinaster Aiton (maritime pine), Pinus sylvestris L. (Scots pine), Populus nigra L. (European black poplar), Taxus baccata L. (English yew), and Quercus petraea (Matt.) Liebl. (sessile oak). Phenotypic (height, diameter at breast height, crown size, bark thickness, biomass, straightness, forking, branch angle, fructification), regeneration, environmental in situ measurements (soil depth, vegetation cover, competition indices), and environmental modeling data extracted by using bilinear interpolation accounting for surrounding conditions of each tree (precipitation, temperature, insolation, drought indices) were obtained from trees in 194 sites covering the species' geographic ranges and reflecting local environmental gradients.Conclusion: The GenTree Platform is a new resource for investigating ecological and evolutionary processes in forest trees. The coherent phenotyping and environmental characterization across 12 species in their European ranges allow for a wide range of analyses from forest ecologists, conservationists, and macro-ecologists. Also, the data here presented can be linked to the GenTree Dendroecological collection, the GenTree Leaf Trait collection, and the GenTree Genomic collection presented elsewhere, which together build the largest evolutionary forest ecology data collection available.
  •  
28.
  • Pardos-Blas, José Ramón, et al. (författare)
  • The genome of the venomous snail Lautoconus ventricosus sheds light on the origin of conotoxin diversity
  • 2021
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 10:5, s. 1-15
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Venoms are deadly weapons to subdue prey or deter predators that have evolved independently in many animal lineages. The genomes of venomous animals are essential to understand the evolutionary mechanisms involved inthe origin and diversification of venoms. Results: Here, we report the chromosome-level genome of the venomous Mediterranean cone snail, Lautoconus ventricosus (Caenogastropoda: Conidae). The total size of the assembly is 3.59 Gb; ithas high contiguity (N50 = 93.53 Mb) and 86.6 Mb of the genome assembled into the 35 largest scaffolds or pseudochromosomes. On the basis of venom gland transcriptomes, we annotated 262 complete genes encoding conotoxin precursors, hormones, and other venom-related proteins. These genes were scattered in the different pseudochromosomesand located within repetitive regions. The genes encoding conotoxin precursors were normally structured into 3 exons,which did not necessarily coincide with the 3 structural domains of the corresponding proteins. Additionally, we found evidence in the L. ventricosus genome for a past whole-genome duplication event by means of conserved gene synteny withthe Pomacea canaliculata genome, the only one available at the chromosome level within Caenogastropoda. The whole-genome duplication event was further confirmed by the presence of a duplicated hox gene cluster. Key genes for gastropod biology including those encoding proteins related to development, shell formation, and sex were located inthe genome. Conclusions: The new high-quality L. ventricosus genome should become a reference for assembling andanalyzing new gastropod genomes and will contribute to future evolutionary genomic studies among venomous animals.
  •  
29.
  •  
30.
  •  
31.
  • Prost, Stefan, et al. (författare)
  • Comparative analyses identify genomic features potentially involved in the evolution of birds-of-paradise
  • 2019
  • Ingår i: GigaScience. - : OXFORD UNIV PRESS. - 2047-217X. ; 8:5
  • Tidskriftsartikel (refereegranskat)abstract
    • The diverse array of phenotypes and courtship displays exhibited by birds-of-paradise have long fascinated scientists and nonscientists alike. Remarkably, almost nothing is known about the genomics of this iconic radiation. There are 41 species in 16 genera currently recognized within the birds-of-paradise family (Paradisaeidae), most of which are endemic to the island of New Guinea. In this study, we sequenced genomes of representatives from all five major clades within this family to characterize genomic changes that may have played a role in the evolution of the group's extensive phenotypic diversity. We found genes important for coloration, morphology, and feather and eye development to be under positive selection. In birds-of-paradise with complex lekking systems and strong sexual dimorphism, the core birds-of-paradise, we found Gene Ontology categories for "startle response" and "olfactory receptor activity" to be enriched among the gene families expanding significantly faster compared to the other birds in our study. Furthermore, we found novel families of retrovirus-like retrotransposons active in all three de novo genomes since the early diversification of the birds-of-paradise group, which might have played a role in the evolution of this fascinating group of birds.
  •  
32.
  • Siretskiy, Alexey, et al. (författare)
  • A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data
  • 2015
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. Results: In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. Conclusions: From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
  •  
33.
  • Smolander, Olli Pekka, et al. (författare)
  • Improved chromosome-level genome assembly of the Glanville fritillary butterfly (Melitaea cinxia) integrating Pacific Biosciences long reads and a high-density linkage map
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The Glanville fritillary (Melitaea cinxia) butterfly is a model system for metapopulation dynamics research in fragmented landscapes. Here, we provide a chromosome-level assembly of the butterfly's genome produced from Pacific Biosciences sequencing of a pool of males, combined with a linkage map from population crosses. Results: The final assembly size of 484 Mb is an increase of 94 Mb on the previously published genome. Estimation of the completeness of the genome with BUSCO indicates that the genome contains 92-94% of the BUSCO genes in complete and single copies. We predicted 14,810 genes using the MAKER pipeline and manually curated 1,232 of these gene models. Conclusions: The genome and its annotated gene models are a valuable resource for future comparative genomics, molecular biology, transcriptome, and genetics studies on this species.
  •  
34.
  • Soranno, Patricia A., et al. (författare)
  • LAGOS-NE : A multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of U.S. lakes
  • 2017
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 6:12, s. 1-22
  • Tidskriftsartikel (refereegranskat)abstract
    • Understanding the factors that affect water quality and the ecological services provided by freshwater ecosystems is an urgent global environmental issue. Predicting how water quality will respond to global changes not only requires water quality data, but also information about the ecological context of individual water bodies across broad spatial extents. Because lake water quality is usually sampled in limited geographic regions, often for limited time periods, assessing the environmental controls of water quality requires compilation of many data sets across broad regions and across time into an integrated database. LAGOS-NE accomplishes this goal for lakes in the northeastern-most 17 US states. LAGOS-NE contains data for 51101 lakes and reservoirs larger than 4 ha in 17 lake-rich US states. The database includes 3 datamodules for: lake location and physical characteristics for all lakes; ecological context (i.e., the land use, geologic, climatic, and hydrologic setting of lakes) for all lakes; and in situmeasurements of lake water quality for a subset of the lakes fromthe past 3 decades for approximately 2600–12 000 lakes depending on the variable. The database contains approximately 150000 measures of total phosphorus, 200 000 measures of chlorophyll, and 900 000 measures of Secchi depth. The water quality data were compiled from87 lake water quality data sets fromfederal, state, tribal, and non-profit agencies, university researchers, and citizen scientists. This database is one of the largest andmost comprehensive databases of its type because it includes both in situmeasurements and ecological context data. Because ecological context can be used to study a variety of other questions about lakes, streams, and wetlands, this database can also be used as the foundation for other studies of freshwaters at broad spatial and ecological scales
  •  
35.
  • Spjuth, Ola, et al. (författare)
  • Recommendations on e-infrastructures for next-generation sequencing
  • 2016
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 5
  • Forskningsöversikt (refereegranskat)abstract
    • With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.
  •  
36.
  • Sternes, Peter R., et al. (författare)
  • A combined meta-barcoding and shotgun metagenomic analysis of spontaneous wine fermentation
  • 2017
  • Ingår i: GigaScience. - : OXFORD UNIV PRESS. - 2047-217X. ; 6:7, s. 1-30
  • Tidskriftsartikel (refereegranskat)abstract
    • Background Wine is a complex beverage, comprising hundreds of metabolites produced through the action of yeasts and bacteria in fermenting grape must. Commercially, there is now a growing trend away from using wine yeast (Saccharomyces) starter cultures, towards the historic practice of uninoculated or "wild" fermentation, where the yeasts and bacteria associated with the grapes and/or winery perform the fermentation. It is the varied metabolic contributions of these numerous non-Saccharomyces species that are thought to impart complexity and desirable taste and aroma attributes to wild ferments in comparison to their inoculated counterparts. Results To map the microflora of spontaneous fermentation, metagenomic techniques were employed to characterize and monitor the progression of fungal species in five different wild fermentations. Both amplicon-based ribosomal DNA internal transcribed spacer )ITS) phylotyping and shotgun metagenomics were used to assess community structure across different stages of fermentation. While providing a sensitive and highly accurate means of characterizing the wine microbiome, the shotgun metagenomic data also uncovered a significant over-abundance bias in the ITS phylotyping abundance estimations for the common non-Saccharomyces wine yeast genus Metschnikowia. Conclusions By identifying biases such as that observed for Metschnikowia, abundance mesurements from future ITS-phylotyping datasets can corrected to provide more accurate species representation. Ulitmtaely, as more shotgun metagenomic and single-strain de novo assemblies for key wine species become available, the accuracy of both ITS-amplicon and shotgun studies will greatly increase, providing a powerful methodology for deciphering the influence of the microbial community on the wine flavor and aroma.
  •  
37.
  • Tedersoo, Leho, et al. (författare)
  • Standardizing metadata and taxonomic identification in metabarcoding studies
  • 2015
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X .- 2047-217X. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • High-throughput sequencing-based metabarcoding studies produce vast amounts of ecological data, but a lack of consensus on standardization of metadata and how to refer to the species recovered severely hampers reanalysis and comparisons among studies. Here we propose an automated workflow covering data submission, compression, storage and public access to allow easy data retrieval and inter-study communication. Such standardized and readily accessible datasets facilitate data management, taxonomic comparisons and compilation of global metastudies.
  •  
38.
  • Tesarova, Marketa, et al. (författare)
  • Living in darkness : Exploring adaptation of Proteus anguinus in 3 dimensions by X-ray imaging
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Lightless caves can harbour a wide range of living organisms. Cave animals have evolved a set of morphological, physiological, and behavioural adaptations known as troglomorphisms, enabling their survival in the perpetual darkness, narrow temperature and humidity ranges, and nutrient scarcity of the subterranean environment. In this study, we focused on adaptations of skull shape and sensory systems in the blind cave salamander, Proteus anguinus, also known as olm or simply proteus-the largest cave tetrapod and the only European amphibian living exclusively in subterranean environments. This extraordinary amphibian compensates for the loss of sight by enhanced non-visual sensory systems including mechanoreceptors, electroreceptors, and chemoreceptors. We compared developmental stages of P. anguinus with Ambystoma mexicanum, also known as axolotl, to make an exemplary comparison between cave- and surface-dwelling paedomorphic salamanders.Findings: We used contrast-enhanced X-ray computed microtomography for the 3D segmentation of the soft tissues in the head of P. anguinus and A. mexicanum. Sensory organs were visualized to elucidate how the animal is adapted to living in complete darkness. X-ray microCT datasets were provided along with 3D models for larval, juvenile, and adult specimens, showing the cartilage of the chondrocranium and the position, shape, and size of the brain, eyes, and olfactory epithelium.Conclusions: P. anguinus still keeps some of its secrets. Our high-resolution X-ray microCT scans together with 3D models of the anatomical structures in the head may help to elucidate the nature and origin of the mechanisms behind its adaptations to the subterranean environment, which led to a series of troglomorphisms.
  •  
39.
  • Tigabu, Mulualem (författare)
  • The Manchurian Walnut Genome: Insights into Juglone and Lipid Biosynthesis
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background Manchurian walnut (Juglans mandshurica Maxim.) is a tree with multiple industrial uses and medicinal properties in the Juglandaceae family (walnuts and hickories). J. mandshurica produces juglone, which is a toxic allelopathic agent and has potential utilization value. Furthermore, the seed of J. mandshurica is rich in various unsaturated fatty acids and has high nutritive value. Findings Here, we present a high-quality chromosome-scale reference genome assembly and annotation for J. mandshurica (n = 16) with a contig N50 of 21.4 Mb by combining PacBio high-fidelity reads with high-throughput chromosome conformation capture data. The assembled genome has an estimated sequence size of 548.7 Mb and consists of 657 contigs, 623 scaffolds, and 40,453 protein-coding genes. In total, 60.99% of the assembled genome consists of repetitive sequences. Sixteen super-scaffolds corresponding to the 16 chromosomes were assembled, with a scaffold N50 length of 33.7 Mb and a BUSCO complete gene percentage of 98.3%. J. mandshurica displays a close sequence relationship with Juglans cathayensis, with a divergence time of 13.8 million years ago. Combining the high-quality genome, transcriptome, and metabolomics data, we constructed a gene-to-metabolite network and identified 566 core and conserved differentially expressed genes, which may be involved in juglone biosynthesis. Five CYP450 genes were found that may contribute to juglone accumulation. NAC, bZip, NF-YA, and NF-YC are positively correlated with the juglone content. Some candidate regulators (e.g., FUS3, ABI3, LEC2, and WRI1 transcription factors) involved in the regulation of lipid biosynthesis were also identified. Conclusions Our genomic data provide new insights into the evolution of the walnut genome and create a new platform for accelerating molecular breeding and improving the comprehensive utilization of these economically important tree species.
  •  
40.
  •  
41.
  • Trac, QT, et al. (författare)
  • Discovery of druggable cancer-specific pathways with application in acute myeloid leukemia
  • 2022
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • An individualized cancer therapy is ideally chosen to target the cancer’s driving biological pathways, but identifying such pathways is challenging because of their underlying heterogeneity and there is no guarantee that they are druggable. We hypothesize that a cancer with an activated druggable cancer-specific pathway (DCSP) is more likely to respond to the relevant drug. Here we develop and validate a systematic method to search for such DCSPs, by (i) introducing a pathway activation score (PAS) that integrates cancer-specific driver mutations and gene expression profile and drug-specific gene targets, (ii) applying the method to identify DCSPs from pan-cancer datasets, and (iii) analyzing the correlation between PAS and the response to relevant drugs. In total, 4,794 DCSPs from 23 different cancers have been discovered in the Genomics of Drug Sensitivity in Cancer database and validated in The Cancer Genome Atlas database. Supporting the hypothesis, for the DCSPs in acute myeloid leukemia, cancers with higher PASs are shown to have stronger drug response, and this is validated in the BeatAML cohort. All DCSPs are publicly available at https://www.meb.ki.se/shiny/truvu/DCSP/.
  •  
42.
  • Vasicek, Jakub, et al. (författare)
  • Finding haplotypic signatures in proteins
  • 2023
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 12
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The nonrandom distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples and detectable by mass spectrometry, but they are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches and the discoverability of peptides specific to haplotypes remain unknown. Findings: Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 12.42% of the discoverable amino acid substitutions encoded by common haplotypes, 2 or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 352 spectra that matched to such multivariant peptides, and out of the 4,582 amino acid substitutions identified, 6.37% were covered by multivariant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. Conclusions: As these procedures become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.
  •  
43.
  • Wu, L., et al. (författare)
  • The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species
  • 2018
  • Ingår i: GigaScience. - : Oxford University Press (OUP). - 2047-217X. ; 7:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Genomic information is essential for taxonomic, phylogenetic, and functional studies to comprehensively decipher the characteristics of microorganisms, to explore microbiomes through metagenomics, and to answer fundamental questions of nature and human life. However, large gaps remain in the available genomic sequencing information published for bacterial and archaeal species, and the gaps are even larger for fungal type strains. The Global Catalogue of Microorganisms (GCM) leads an internationally coordinated effort to sequence type strains and close gaps in the genomic maps of microorganisms. Hence, the GCM aims to promote research by deep-mining genomic data.
  •  
44.
  • Xu, Chao-Qun, et al. (författare)
  • Genome sequence of Malania oleifera, a tree with great value for nervonic acid production
  • 2019
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 8:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Background Malania oleifera, a member of the Olacaceae family, is an IUCN red listed tree, endemic and restricted to the Karst region of southwest China. This tree's seed is valued for its high content of precious fatty acids (especially nervonic acid). However, studies on its genetic makeup and fatty acid biogenesis are severely hampered by a lack of molecular and genetic tools. Findings We generated 51 Gb and 135Gb of raw DNA sequences, using Pacific Biosciences (PacBio) single-molecule real-time and 10x Genomics sequencing, respectively. A final genome assembly, with a scaffold N50 size of 4.65 Mb and a total length of 1.51Gb, was obtained by primary assembly based on PacBio long reads plus scaffolding with 10x Genomics reads. Identified repeats constituted approximate to 82% of the genome, and 24,064 protein-coding genes were predicted with high support. The genome has low heterozygosity and shows no evidence for recent whole genome duplication. Metabolic pathway genes relating to the accumulation of long-chain fatty acid were identified and studied in detail. Conclusions Here, we provide the first genome assembly and gene annotation for M. oleifera. The availability of these resources will be of great importance for conservation biology and for the functional genomics of nervonic acid biosynthesis.
  •  
45.
  • Zhang, Zebin, et al. (författare)
  • Whole-genome resequencing reveals signatures of selection and timing of duck domestication
  • 2018
  • Ingår i: GigaScience. - : OXFORD UNIV PRESS. - 2047-217X. ; 7:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The genetic basis of animal domestication remains poorly understood, and systems with substantial phenotypic differences between wild and domestic populations are useful for elucidating the genetic basis of adaptation to new environments as well as the genetic basis of rapid phenotypic change. Here, we sequenced the whole genome of 78 individual ducks, from two wild and seven domesticated populations, with an average sequencing depth of 6.42X per individual. Results: Our population and demographic analyses indicate a complex history of domestication, with early selection for separate meat and egg lineages. Genomic comparison of wild to domesticated populations suggests that genes that affect brain and neuronal development have undergone strong positive selection during domestication. Our F-ST analysis also indicates that the duck white plumage is the result of selection at the melanogenesis-associated transcription factor locus. Conclusions: Our results advance the understanding of animal domestication and selection for complex phenotypic traits.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-45 av 45
Typ av publikation
tidskriftsartikel (43)
forskningsöversikt (2)
Typ av innehåll
refereegranskat (45)
Författare/redaktör
Wang, J. (2)
Hellander, Andreas (2)
Zhang, Yan (1)
Chen, X. (1)
Howard, J. (1)
Li, Y. (1)
visa fler...
Liu, B. (1)
Liu, Y. (1)
Zhang, H. (1)
Li, X. (1)
Zhou, Y. (1)
Liu, S. (1)
Yuan, J. (1)
Zhang, G (1)
Lee, J. S. (1)
Leo, S. (1)
Song, H. (1)
Zhou, S. (1)
Larsson, Anders (1)
Abalde, Samuel (1)
Zardoya, Rafael (1)
Tenorio, Manuel J. (1)
Afonso, Carlos M.L. (1)
Abarenkov, Kessy (1)
Kristiansson, Erik, ... (1)
Kõljalg, Urmas (1)
Nilsson, R. Henrik, ... (1)
Tedersoo, Leho (1)
Li, Z (1)
Huang, J. (1)
Chopra, S. (1)
Kumar, M (1)
Naguib, Mahmoud (1)
Boulund, Fredrik, 19 ... (1)
Ladenvall, Claes, Ph ... (1)
Lima, N (1)
Thompson, Paul M (1)
Hammarstrom, L (1)
van der Laak, Jeroen (1)
Schuster, M. (1)
Green, Richard E. (1)
Ning, Z. (1)
Qin, X. (1)
Richards, S (1)
Moore, Edward R.B. 1 ... (1)
Bertilsson, Stefan (1)
Ghosh, SS (1)
Poline, JB (1)
Strother, SC (1)
Obst, Matthias, 1974 (1)
visa färre...
Lärosäte
Uppsala universitet (25)
Karolinska Institutet (9)
Stockholms universitet (6)
Göteborgs universitet (5)
Sveriges Lantbruksuniversitet (3)
Kungliga Tekniska Högskolan (2)
visa fler...
Naturhistoriska riksmuseet (2)
Umeå universitet (1)
Linköpings universitet (1)
Lunds universitet (1)
Malmö universitet (1)
Chalmers tekniska högskola (1)
visa färre...
Språk
Engelska (45)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (33)
Medicin och hälsovetenskap (7)
Lantbruksvetenskap (4)
Teknik (2)
Samhällsvetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy