SwePub - sökning: L773:1477 4054 OR L773:1467 54...

Numrering	Referens	Omslagsbild	Hitta
1.	Bonner, Stephen, et al. (författare) A review of biomedical datasets relating to drug discovery: a knowledge graph perspective 2022 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; In Press Forskningsöversikt (refereegranskat)abstract Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.
2.	Bonner, Stephen, et al. (författare) Implications of topological imbalance for representation learning on biomedical knowledge graphs 2022 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; In Press Tidskriftsartikel (refereegranskat)abstract Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.
3.	Chattopadhyay, Subhayan, et al. (författare) Tracing the evolution of aneuploid cancers by multiregional sequencing with CRUST 2021 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 22:6 Tidskriftsartikel (refereegranskat)abstract Clonal deconvolution of mutational landscapes is crucial to understand the evolutionary dynamics of cancer. Two limiting factors for clonal deconvolution that have remained unresolved are variation in purity and chromosomal copy number across different samples of the same tumor. We developed a semi-supervised algorithm that tracks variant calls through multi-sample spatiotemporal tumor data. While normalizing allele frequencies based on purity, it also adjusts for copy number changes at clonal deconvolution. Absent à priori copy number data, it renders in silico copy number estimations from bulk sequences. Using published and simulated tumor sequences, we reliably segregated clonal/subclonal variants even at a low sequencing depth (~50×). Given at least one pure tumor sample (>70% purity), we could normalize and deconvolve paired samples down to a purity of 40%. This renders a reliable clonal reconstruction well adapted to multi-regionally sampled solid tumors, which are often aneuploid and contaminated by non-cancer cells.
4.	Chen, Mingshuai, et al. (författare) Fuzzy kernel evidence Random Forest for identifying pseudouridine sites 2024 Ingår i: Briefings in Bioinformatics. - Oxford : Oxford University Press. - 1467-5463 .- 1477-4054. ; 25:3, s. 1-14 Tidskriftsartikel (refereegranskat)abstract Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future. © The Author(s) 2024. Published by Oxford University Press.
5.	Cirenajwis, Helena, et al. (författare) Performance of gene expression-based single sample predictors for assessment of clinicopathological subgroups and molecular subtypes in cancers : a case comparison study in non-small cell lung cancer 2019 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 21:2, s. 729-740 Tidskriftsartikel (refereegranskat)abstract The development of multigene classifiers for cancer prognosis, treatment prediction, molecular subtypes or clinicopathological groups has been a cornerstone in transcriptomic analyses of human malignancies for nearly two decades. However, many reported classifiers are critically limited by different preprocessing needs like normalization and data centering. In response, a new breed of classifiers, single sample predictors (SSPs), has emerged. SSPs classify samples in an N-of-1 fashion, relying on, e.g. gene rules comparing expression values within a sample. To date, several methods have been reported, but there is a lack of head-to-head performance comparison for typical cancer classification problems, representing an unmet methodological need in cancer bioinformatics. To resolve this need, we performed an evaluation of two SSPs [k-top-scoring pair classifier (kTSP) and absolute intrinsic molecular subtyping (AIMS)] for two case examples of different magnitude of difficulty in non-small cell lung cancer: gene expression–based classification of (i) tumor histology and (ii) molecular subtype. Through the analysis of ~2000 lung cancer samples for each case example (n = 1918 and n = 2106, respectively), we compared the performance of the methods for different sample compositions, training data set sizes, gene expression platforms and gene rule selections. Three main conclusions are drawn from the comparisons: both methods are platform independent, they select largely overlapping gene rules associated with actual underlying tumor biology and, for large training data sets, they behave interchangeably performance-wise. While SSPs like AIMS and kTSP offer new possibilities to move gene expression signatures/predictors closer to a clinical context, they are still importantly limited by the difficultness of the classification problem at hand.
6.	Danielsson, Frida, et al. (författare) Assessing the consistency of public human tissue RNA-seq data sets 2015 Ingår i: Briefings in Bioinformatics. - : Oxford University Press. - 1467-5463 .- 1477-4054. ; 16:6, s. 941-949 Tidskriftsartikel (refereegranskat)abstract Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch effects, differences in RNA extraction and sequencing library preparation methods and bioinformatics processing pipelines. It would be unfortunate if different experiments were in fact incomparable, as there is great promise in data fusion and meta-analysis applied to sequencing data sets. We therefore compared reported gene expression measurements for ostensibly similar samples (specifically, human brain, heart and kidney samples) in several different RNA-seq studies to assess their overall consistency and to examine the factors contributing most to systematic differences. The same comparisons were also performed after preprocessing all data in a consistent way, eliminating potential bias from bioinformatics pipelines. We conclude that published human tissue RNA-seq expression measurements appear relatively consistent in the sense that samples cluster by tissue rather than laboratory of origin given simple preprocessing transformations. The article is supplemented by a detailed walkthrough with embedded R code and figures.
7.	Doran, S., et al. (författare) Multi-omics approaches for revealing the complexity of cardiovascular disease 2021 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 22:5 Tidskriftsartikel (refereegranskat)abstract The development and progression of cardiovascular disease (CVD) can mainly be attributed to the narrowing of blood vessels caused by atherosclerosis and thrombosis, which induces organ damage that will result in end-organ dysfunction characterized by events such as myocardial infarction or stroke. It is also essential to consider other contributory factors to CVD, including cardiac remodelling caused by cardiomyopathies and co-morbidities with other diseases such as chronic kidney disease. Besides, there is a growing amount of evidence linking the gut microbiota to CVD through several metabolic pathways. Hence, it is of utmost importance to decipher the underlying molecular mechanisms associated with these disease states to elucidate the development and progression of CVD. A wide array of systems biology approaches incorporating multi-omics data have emerged as an invaluable tool in establishing alterations in specific cell types and identifying modifications in signalling events that promote disease development. Here, we review recent studies that apply multi-omics approaches to further understand the underlying causes of CVD and provide possible treatment strategies by identifying novel drug targets and biomarkers. We also discuss very recent advances in gut microbiota research with an emphasis on how diet and microbial composition can impact the development of CVD. Finally, we present various biological network analyses and other independent studies that have been employed for providing mechanistic explanation and developing treatment strategies for end-stage CVD, namely myocardial infarction and stroke.
8.	Emanuelsson, Olof (författare) Predicting protein subcellular localisation from amino acid sequence information. 2002 Ingår i: Briefings in Bioinformatics. - 1467-5463 .- 1477-4054. ; 3:4, s. 361-76 Tidskriftsartikel (refereegranskat)abstract Predicting the subcellular localisation of proteins is an important part of the elucidation of their functions and interactions. Here, the amino acid sequence motifs that direct proteins to their proper subcellular compartment are surveyed, different methods for localisation prediction are discussed, and some benchmarks for the more commonly used predictors are presented.
9.	Fang, MY, et al. (författare) VIPPID: a gene-specific single nucleotide variant pathogenicity prediction tool for primary immunodeficiency diseases 2022 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 23:5 Tidskriftsartikel (refereegranskat)abstract Distinguishing pathogenic variants from non-pathogenic ones remains a major challenge in clinical genetic testing of primary immunodeficiency (PID) patients. Most of the existing mutation pathogenicity prediction tools treat all mutations as homogeneous entities, ignoring the differences in characteristics of different genes, and use the same model for genes in different diseases. In this study, we developed a single nucleotide variant (SNV) pathogenicity prediction tool, Variant Impact Predictor for PIDs (VIPPID; https://mylab.shinyapps.io/VIPPID/), which was tailored for PIDs genes and used a specific model for each of the most prevalent PID known genes. It employed a Conditional Inference Forest model and utilized information of 85 features of SNVs and scores from 20 existing prediction tools. Evaluation of VIPPID showed that it had superior performance (area under the curve = 0.91) over non-specific conventional tools. In addition, we also showed that the gene-specific model outperformed the non-gene-specific models. Our study demonstrated that disease-specific and gene-specific models can improve SNV pathogenicity prediction performance. This observation supports the notion that each feature of mutations in the model can be potentially used, in a new algorithm, to investigate the characteristics and function of the encoded proteins.
10.	Flores, Samuel, et al. (författare) Multiscale modeling of macromolecular biosystems 2012 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 13:4, s. 395-405 Tidskriftsartikel (refereegranskat)abstract In this article, we review the recent progress in multiresolution modeling of structure and dynamics of protein, RNA and their complexes. Many approaches using both physics-based and knowledge-based potentials have been developed at multiple granularities to model both protein and RNA. Coarse graining can be achieved not only in the length, but also in the time domain using discrete time and discrete state kinetic network models. Models with different resolutions can be combined either in a sequential or parallel fashion. Similarly, the modeling of assemblies is also often achieved using multiple granularities.The progress shows that a multiresolution approach has considerable potential to continue extending the length and time scales of macromolecular modeling.
11.	Forslund, Kristoffer, et al. (författare) OrthoDisease : tracking disease gene orthologs across 100 species 2011 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 12:5, s. 463-473 Tidskriftsartikel (refereegranskat)abstract Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.
12.	Ganna, Andrea, et al. (författare) Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies 2015 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 16:4, s. 563-575 Tidskriftsartikel (refereegranskat)abstract It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed.
13.	Hong, Ye, et al. (författare) PhosPiR : an automated phosphoproteomic pipeline in R 2022 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 23:1 Tidskriftsartikel (refereegranskat)abstract Large-scale phosphoproteome profiling using mass spectrometry (MS) provides functional insight that is crucial for disease biology and drug discovery. However, extracting biological understanding from these data is an arduous task requiring multiple analysis platforms that are not adapted for automated high-dimensional data analysis. Here, we introduce an integrated pipeline that combines several R packages to extract high-level biological understanding from large-scale phosphoproteomic data by seamless integration with existing databases and knowledge resources. In a single run, PhosPiR provides data clean-up, fast data overview, multiple statistical testing, differential expression analysis, phosphosite annotation and translation across species, multilevel enrichment analyses, proteome-wide kinase activity and substrate mapping and network hub analysis. Data output includes graphical formats such as heatmap, box-, volcano- and circos-plots. This resource is designed to assist proteome-wide data mining of pathophysiological mechanism without a need for programming knowledge.
14.	Kannan, L, et al. (författare) Public data and open source tools for multi-assay genomic investigation of disease 2016 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 17:4, s. 603-615 Tidskriftsartikel (refereegranskat)
15.	Klingström, Tomas, et al. (författare) Protein-protein interaction and pathway databases, a graphical review 2011 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 12:6, s. 702-713 Forskningsöversikt (refereegranskat)abstract The amount of information regarding protein-protein interactions (PPI) at a proteomic scale is constantly increasing. This is paralleled with an increase of databases making information available. Consequently there are diverse ways of delivering information about not only PPIs but also regarding the databases themselves. This creates a time consuming obstacle for many researchers working in the field. Our survey provides a valuable tool for researchers to reduce the time necessary to gain a broad overview of PPI-databases and is supported by a graphical representation of data exchange. The graphical representation is made available in cooperation with the team maintaining www.pathguide.org and can be accessed at http://www.pathguide.org/interactions.php in a new Cytoscape web implementation. The local copy of Cytoscape cys file can be downloaded from http://bio.icm.edu.pl/similar to darman/ppi web page.
16.	Lam, S., et al. (författare) Addressing the heterogeneity in liver diseases using biological networks 2021 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 22:2, s. 1751-1766 Tidskriftsartikel (refereegranskat)abstract The abnormalities in human metabolism have been implicated in the progression of several complex human diseases, including certain cancers. Hence, deciphering the underlying molecular mechanisms associated with metabolic reprogramming in a disease state can greatly assist in elucidating the disease aetiology. An invaluable tool for establishing connections between global metabolic reprogramming and disease development is the genome-scale metabolic model (GEM). Here, we review recent work on the reconstruction of cell/tissue-type and cancer-specific GEMs and their use in identifying metabolic changes occurring in response to liver disease development, stratification of the heterogeneous disease population and discovery of novel drug targets and biomarkers. We also discuss how GEMs can be integrated with other biological networks for generating more comprehensive cell/tissue models. In addition, we review the various biological network analyses that have been employed for the development of efficient treatment strategies. Finally, we present three case studies in which independent studies converged on conclusions underlying liver disease.
17.	Li, CX, et al. (författare) Multiomics integration-based molecular characterizations of COVID-19 2022 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 23:1 Tidskriftsartikel (refereegranskat)abstract The coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), rapidly became a global health challenge, leading to unprecedented social and economic consequences. The mechanisms behind the pathogenesis of SARS-CoV-2 are both unique and complex. Omics-scale studies are emerging rapidly and offer a tremendous potential to unravel the puzzle of SARS-CoV-2 pathobiology, as well as moving forward with diagnostics, potential drug targets, risk stratification, therapeutic responses, vaccine development and therapeutic innovation. This review summarizes various aspects of understanding multiomics integration-based molecular characterizations of COVID-19, which to date include the integration of transcriptomics, proteomics, genomics, lipidomics, immunomics and metabolomics to explore virus targets and developing suitable therapeutic solutions through systems biology tools. Furthermore, this review also covers an abridgment of omics investigations related to disease pathogenesis and virulence, the role of host genetic variation and a broad array of immune and inflammatory phenotypes contributing to understanding COVID-19 traits. Insights into this review, which combines existing strategies and multiomics integration profiling, may help further advance our knowledge of COVID-19.
18.	Li, L, et al. (författare) Librator: a platform for the optimized analysis, design, and expression of mutable influenza viral antigens 2022 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 23:2 Tidskriftsartikel (refereegranskat)abstract Artificial mutagenesis and protein engineering have laid the foundation for antigenic characterization and universal vaccine design for influenza viruses. However, many methods used in this process require manual sequence editing and protein expression, limiting their efficiency and utility in high-throughput applications. More streamlined in silico tools allowing researchers to properly analyze and visualize influenza viral protein sequences with accurate nomenclature are necessary to improve antigen design and productivity. To address this need, we developed Librator, a system for analyzing and designing custom protein sequences of influenza virus hemagglutinin (HA) and neuraminidase (NA) glycoproteins. Within Librator’s graphical interface, users can easily interrogate viral sequences and phylogenies, visualize antigen structures and conservation, mutate target residues and design custom antigens. Librator also provides optimized fragment design for Gibson Assembly of HA and NA expression constructs based on peptide conservation of all historical HA and NA sequences, ensuring fragments are reusable and compatible across related subtypes, thereby promoting reagent savings. Finally, the program facilitates single-cell immune profiling, epitope mapping of monoclonal antibodies and mosaic protein design. Using Librator-based antigen construction, we demonstrate that antigenicity can be readily transferred between HA molecules of H3, but not H1, lineage viruses. Altogether, Librator is a valuable tool for analyzing influenza virus HA and NA proteins and provides an efficient resource for optimizing recombinant influenza antigen synthesis.
19.	Lindvall, JM, et al. (författare) In silico tools for signal transduction research 2003 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 4:4, s. 315-24 Tidskriftsartikel (refereegranskat)
20.	Marabita, F, et al. (författare) Normalization of circulating microRNA expression data obtained by quantitative real-time RT-PCR 2016 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 17:2, s. 204-212 Tidskriftsartikel (refereegranskat)
21.	Marschall, Tobias, et al. (författare) Computational pan-genomics : status, promises and challenges 2018 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 19:1, s. 118-135 Tidskriftsartikel (refereegranskat)abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
22.	Martinez, David, et al. (författare) NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures 2023 Ingår i: Briefings in Bioinformatics. - : OXFORD UNIV PRESS. - 1467-5463 .- 1477-4054. ; 24:5 Tidskriftsartikel (refereegranskat)abstract Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
23.	Mehmood, Arfa, et al. (författare) Systematic evaluation of differential splicing tools for RNA-seq studies 2020 Ingår i: Briefings in Bioinformatics. - : Oxford University Press. - 1467-5463 .- 1477-4054. ; 21:6, s. 2052-2065 Tidskriftsartikel (refereegranskat)abstract Differential splicing (DS) is a post-transcriptional biological process with critical, wide-ranging effects on a plethora of cellular activities and disease processes. To date, a number of computational approaches have been developed to identify and quantify differentially spliced genes from RNA-seq data, but a comprehensive intercomparison and appraisal of these approaches is currently lacking. In this study, we systematically evaluated 10 DS analysis tools for consistency and reproducibility, precision, recall and false discovery rate, agreement upon reported differentially spliced genes and functional enrichment. The tools were selected to represent the three different methodological categories: exon-based (DEXSeq, edgeR, JunctionSeq, limma), isoform-based (cuffdiff2, DiffSplice) and event-based methods (dSpliceType, MAJIQ, rMATS, SUPPA). Overall, all the exon-based methods and two event-based methods (MAJIQ and rMATS) scored well on the selected measures. Of the 10 tools tested, the exon-based methods performed generally better than the isoform-based and event-based methods. However, overall, the different data analysis tools performed strikingly differently across different data sets or numbers of samples.
24.	Nacer, Deborah F., et al. (författare) Pan-cancer application of a lung-adenocarcinoma-derived gene-expression-based prognostic predictor 2021 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 22:6 Tidskriftsartikel (refereegranskat)abstract Gene-expression profiling can be used to classify human tumors into molecular subtypes or risk groups, representing potential future clinical tools for treatment prediction and prognostication. However, it is less well-known how prognostic gene signatures derived in one malignancy perform in a pan-cancer context. In this study, a gene-rule-based single sample predictor (SSP) called classifier for lung adenocarcinoma molecular subtypes (CLAMS) associated with proliferation was tested in almost 15 000 samples from 32 cancer types to classify samples into better or worse prognosis. Of the 14 malignancies that presented both CLAMS classes in sufficient numbers, survival outcomes were significantly different for breast, brain, kidney and liver cancer. Patients with samples classified as better prognosis by CLAMS were generally of lower tumor grade and disease stage, and had improved prognosis according to other type-specific classifications (e.g. PAM50 for breast cancer). In all, 99.1% of non-lung cancer cases classified as better outcome by CLAMS were comprised within the range of proliferation scores of lung adenocarcinoma cases with a predicted better prognosis by CLAMS. This finding demonstrates the potential of tuning SSPs to identify specific levels of for instance tumor proliferation or other transcriptional programs through predictor training. Together, pan-cancer studies such as this may take us one step closer to understanding how gene-expression-based SSPs act, which gene-expression programs might be important in different malignancies, and how to derive tools useful for prognostication that are efficient across organs.
25.	Schmartz, Georges Pierre, et al. (författare) Encyclopedia of tools for the analysis of miRNA isoforms 2021 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 22:4 Forskningsöversikt (refereegranskat)abstract RNA sequencing data sets rapidly increase in quantity. For microRNAs (miRNAs), frequently dozens to hundreds of billion reads are generated per study. The quantification of annotated miRNAs and the prediction of new miRNAs are leading computational tasks. Now, the increased depth of coverage allows to gain deeper insights into the variability of miRNAs. The analysis of isoforms of miRNAs (isomiRs) is a trending topic, and a range of computational tools for the analysis of isomiRs has been developed. We provide an overview on 27 available computational solutions for the analysis of isomiRs. These include both stand-alone programs (17 tools) and web-based solutions (10 tools) and span a publication time range from 2010 to 2020. Seven of the tools were published in 2019 and 2020, confirming the rising importance of the topic. While most of the analyzed tools work for a broad range of organisms or are completely independent of a reference organism, several tools have been tailored for the analysis of human miRNA data or for plants. While 14 of the tools are general analysis tools of miRNAs, and isomiR analysis is one of their features, the remaining 13 tools have specifically been developed for isomiR analysis. A direct comparison on 20 deep sequencing data sets for selected tools provides insights into the heterogeneity of results. With our work, we provide users a comprehensive overview on the landscape of isomiR analysis tools and in that support the selection of the most appropriate tool for their respective research task.
26.	Schmitt, Thomas, et al. (författare) Letter to the Editor : SeqXML and OrthoXML: standards for sequence and orthology information 2011 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 12:5, s. 485-488 Tidskriftsartikel (refereegranskat)abstract There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e. g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.
27.	Schwende, Isabel, et al. (författare) Pattern recognition and probabilistic measures in alignment-free sequence analysis 2013 Ingår i: Briefings in Bioinformatics. - : Oxford University Press. - 1467-5463 .- 1477-4054. ; 15:3, s. 354-368 Tidskriftsartikel (refereegranskat)abstract With the massive production of genomic and proteomic data, the number of available biological sequences in databases has reached a level that is not feasible anymore for exact alignments even when just a fraction of all sequences is used. To overcome this inevitable time complexity, ultrafast alignment-free methods are studied. Within the past two decades, a broad variety of nonalignment methods have been proposed including dissimilarity measures on classical representations of sequences like k-words or Markov models. Furthermore, articles were published that describe distance measures on alternative representations such as compression complexity, spectral time series or chaos game representation. However, alignments are still the standard method for real world applications in biological sequence analysis, and the time efficient alignment-free approaches are usually applied in cases when the accustomed algorithms turn out to fail or be too inconvenient.
28.	Sen, Partho, 1983-, et al. (författare) Deep learning meets metabolomics : a methodological perspective 2021 Ingår i: Briefings in Bioinformatics. - : Oxford University Press. - 1467-5463 .- 1477-4054. ; 22:2, s. 1531-1542 Forskningsöversikt (refereegranskat)abstract Deep learning (DL), an emerging area of investigation in the fields of machine learning and artificial intelligence, has markedly advanced over the past years. DL techniques are being applied to assist medical professionals and researchers in improving clinical diagnosis, disease prediction and drug discovery. It is expected that DL will help to provide actionable knowledge from a variety of 'big data', including metabolomics data. In this review, we discuss the applicability of DL to metabolomics, while presenting and discussing several examples from recent research. We emphasize the use of DL in tackling bottlenecks in metabolomics data acquisition, processing, metabolite identification, as well as in metabolic phenotyping and biomarker discovery. Finally, we discuss how DL is used in genome-scale metabolic modelling and in interpretation of metabolomics data. The DL-based approaches discussed here may assist computational biologists with the integration, prediction and drawing of statistical inference about biological outcomes, based on metabolomics data.
29.	Stahlschmidt, Sören Richard, et al. (författare) Multimodal deep learning for biomedical data fusion : a review 2022 Ingår i: Briefings in Bioinformatics. - : Oxford University Press. - 1467-5463 .- 1477-4054. ; 23:2 Forskningsöversikt (refereegranskat)abstract Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
30.	Strömbäck, Lena, et al. (författare) Representing, storing and accessing molecular interaction data: a review of models and tools 2006 Ingår i: Briefings in Bioinformatics. - : Oxford University Press. - 1467-5463 .- 1477-4054. ; 7:4, s. 331-338 Tidskriftsartikel (refereegranskat)abstract One important aim within systems biology is to integrate disparate pieces of information, leading to discovery of higher-level knowledge about important functionality within living organisms. This makes standards for representation of data and technology for exchange and integration of data important key points for development within the area. In this article, we focus on the recent developments within the field. We compare the recent updates to the three standard representations for exchange of data SBML, PSI MI and BioPAX. In addition, we give an overview of available tools for these three standards and a discussion on how these developments support possibilities for data exchange and integration.
31.	Toro-Dominguez, D, et al. (författare) A survey of gene expression meta-analysis: methods and applications 2021 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 22:2, s. 1694-1705 Tidskriftsartikel (refereegranskat)abstract The increasing use of high-throughput gene expression quantification technologies over the last two decades and the fact that most of the published studies are stored in public databases has triggered an explosion of studies available through public repositories. All this information offers an invaluable resource for reuse to generate new knowledge and scientific findings. In this context, great interest has been focused on meta-analysis methods to integrate and jointly analyze different gene expression datasets. In this work, we describe the main steps in the gene expression meta-analysis, from data preparation to the state-of-the art statistical methods. We also analyze the main types of applications and problems that can be approached in gene expression meta-analysis studies and provide a comparative overview of the available software and bioinformatics tools. Moreover, a practical guide for choosing the most appropriate method in each case is also provided.
32.	Via, Allegra, et al. (författare) Best practices in bioinformatics training for life scientists 2013 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 14:5, s. 528-537 Tidskriftsartikel (refereegranskat)abstract The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
33.	Yan, Jing, et al. (författare) A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues 2016 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 17:1, s. 88-105 Forskningsöversikt (refereegranskat)abstract Motivated by the pressing need to characterize protein-DNA and protein-RNA interactions on large scale, we review a comprehensive set of 30 computational methods for high-throughput prediction of RNA- or DNA-binding residues from protein sequences. We summarize these predictors from several significant perspectives including their design, outputs and availability. We perform empirical assessment of methods that offer web servers using a new benchmark data set characterized by a more complete annotation that includes binding residues transferred from the same or similar proteins. We show that predictors of DNA-binding (RNA-binding) residues offer relatively strong predictive performance but they are unable to properly separate DNA- from RNA-binding residues. We design and empirically assess several types of consensuses and demonstrate that machine learning (ML)-based approaches provide improved predictive performance when compared with the individual predictors of DNA-binding residues or RNA-binding residues. We also formulate and execute first-of-its-kind study that targets combined prediction of DNA- and RNA-binding residues. We design and test three types of consensuses for this prediction and conclude that this novel approach that relies on ML design provides better predictive quality than individual predictors when tested on prediction of DNA- and RNA-binding residues individually. It also substantially improves discrimination between these two types of nucleic acids. Our results suggest that development of a new generation of predictors would benefit from using training data sets that combine both RNA- and DNA-binding proteins, designing new inputs that specifically target either DNA- or RNA-binding residues and pursuing combined prediction of DNA- and RNA-binding residues.
34.	Yuan, Le, 1994, et al. (författare) HGTphyloDetect: facilitating the identification and phylogenetic analysis of horizontal gene transfer 2023 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 24:2 Tidskriftsartikel (refereegranskat)abstract Horizontal gene transfer (HGT) is an important driver in genome evolution, gain-of-function, and metabolic adaptation to environmental niches. Genome-wide identification of putative HGT events has become increasingly practical, given the rapid growth of genomic data. However, existing HGT analysis toolboxes are not widely used, limited by their inability to perform phylogenetic reconstruction to explore potential donors, and the detection of HGT from both evolutionarily distant and closely related species.In this study, we have developed HGTphyloDetect, which is a versatile computational toolbox that combines high-throughput analysis with phylogenetic inference, to facilitate comprehensive investigation of HGT events. Two case studies with Saccharomyces cerevisiae and Candida versatilis demonstrate the ability of HGTphyloDetect to identify horizontally acquired genes with high accuracy. In addition, HGTphyloDetect enables phylogenetic analysis to illustrate a likely path of gene transmission among the evolutionarily distant or closely related species.The HGTphyloDetect computational toolbox is designed for ease of use and can accurately find HGT events with a very low false discovery rate in a high-throughput manner. The HGTphyloDetect toolbox and its related user tutorial are freely available at https:// github.com/SysBioChalmers/HGTphyloDetect.
35.	Yuille, Martin, et al. (författare) Biobanking for Europe 2008 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 9:1, s. 14-24 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract Biobanks are well-organized resources comprising biological samples and associated information that are accessible to scientific investigation. Across Europe, millions of samples with related data are held in different types of collections. While individual collections can be well organized and accessible, the resources are subject to fragmentation, insecurity of funding and incompleteness. To address these issues, a Biobanking and BioMolecular Resources Infrastructure (BBMRI) is to be developed across Europe, thereby implementing a European roadmap for research infrastructures that was developed by a forum of EU member states and that has been received by the European Commission. In this review, we describe the work involved in preparing for the construction of BBMRI in a European and global context.
36.	Zaragoza-Infante, L, et al. (författare) IgIDivA: immunoglobulin intraclonal diversification analysis 2022 Ingår i: Briefings in bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 23:5 Tidskriftsartikel (refereegranskat)abstract Intraclonal diversification (ID) within the immunoglobulin (IG) genes expressed by B cell clones arises due to ongoing somatic hypermutation (SHM) in a context of continuous interactions with antigen(s). Defining the nature and order of appearance of SHMs in the IG genes can assist in improved understanding of the ID process, shedding light into the ontogeny and evolution of B cell clones in health and disease. Such endeavor is empowered thanks to the introduction of high-throughput sequencing in the study of IG gene repertoires. However, few existing tools allow the identification, quantification and characterization of SHMs related to ID, all of which have limitations in their analysis, highlighting the need for developing a purpose-built tool for the comprehensive analysis of the ID process. In this work, we present the immunoglobulin intraclonal diversification analysis (IgIDivA) tool, a novel methodology for the in-depth qualitative and quantitative analysis of the ID process from high-throughput sequencing data. IgIDivA identifies and characterizes SHMs that occur within the variable domain of the rearranged IG genes and studies in detail the connections between identified SHMs, establishing mutational pathways. Moreover, it combines established and new graph-based metrics for the objective determination of ID level, combined with statistical analysis for the comparison of ID level features for different groups of samples. Of importance, IgIDivA also provides detailed visualizations of ID through the generation of purpose-built graph networks. Beyond the method design, IgIDivA has been also implemented as an R Shiny web application. IgIDivA is freely available at https://bio.tools/igidiva
37.	Zhang, Jialin, et al. (författare) FS-GBDT : identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT 2021 Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1477-4054 .- 1467-5463. ; 22:3 Tidskriftsartikel (refereegranskat)abstract Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.
38.	Gerault, MA, et al. (författare) IMPRINTS.CETSA and IMPRINTS.CETSA.app: an R package and a Shiny application for the analysis and interpretation of IMPRINTS-CETSA data 2024 Ingår i: Briefings in bioinformatics. - 1477-4054. ; 25:3 Tidskriftsartikel (refereegranskat)
39.	Li, CX, et al. (författare) Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage 2024 Ingår i: Briefings in bioinformatics. - 1477-4054. ; 25:1 Tidskriftsartikel (refereegranskat)
40.	Li, Y, et al. (författare) Diagnostic Prediction of portal vein thrombosis in chronic cirrhosis patients using data-driven precision medicine model 2024 Ingår i: Briefings in bioinformatics. - 1477-4054. ; 25:1 Tidskriftsartikel (refereegranskat)
41.	Toro-Dominguez, D, et al. (författare) Response to the letter 'testing the effectiveness of MyPROSLE in classifying patients with lupus nephritis' 2024 Ingår i: Briefings in bioinformatics. - 1477-4054. ; 25:1 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "L773:1477 4054 OR L773:1467 5463 "

Avgränsa träffmängd

År