↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "L773:1367 4811 "

Sökning: L773:1367 4811

Resultat 1-50 av 277

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Abdel-Rehim, Abbi, et al. (författare) Protein-ligand binding affinity prediction exploiting sequence constituent homology 2023 Ingår i: Bioinformatics. - 1367-4803 .- 1367-4811. ; 39:8 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS: The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION: Code and data uploaded to https://github.com/abbiAR/PLBAffinity.
2.	Afkham, Heydar Maboudi, et al. (författare) Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics 2017 Ingår i: Bioinformatics. - : OXFORD UNIV PRESS. - 1367-4803 .- 1367-4811. ; 33:4, s. 508-513 Tidskriftsartikel (refereegranskat)abstract Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.
3.	Alexeyenko, Andrey, et al. (författare) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. 2006 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1460-2059. ; 22:14, s. E9-E15 Tidskriftsartikel (refereegranskat)
4.	Ameur, Adam, et al. (författare) The LCB Data Warehouse 2006 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 22:8, s. 1024-1026 Tidskriftsartikel (refereegranskat)abstract The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.
5.	Andersson, Anders, et al. (författare) Dual-genome primer design for construction of DNA microarrays 2005 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 21:3, s. 325-332 Tidskriftsartikel (refereegranskat)abstract Motivation: Microarray experiments using probes covering a whole transcriptome are expensive to initiate, and a major part of the costs derives from synthesizing gene-specific PCR primers or hybridization probes. The high costs may force researchers to limit their studies to a single organism, although comparing gene expression in different species would yield valuable information. Results: We have developed a method, implemented in the software DualPrime, that reduces the number of primers required to amplify the genes of two different genomes. The software identifies regions of high sequence similarity, and from these regions selects PCR primers shared between the genomes, such that either one or, preferentially, both primers in a given PCR can be used for amplification from both genomes. To assure high microarray probe specificity, the software selects primer pairs that generate products of low sequence similarity to other genes within the same genome. We used the software to design PCR primers for 2182 and 1960 genes from the hyperthermophilic archaea Sulfolobus solfataricus and Sulfolobus acidocaldarius, respectively. Primer pairs were shared among 705 pairs of genes, and single primers were shared among 1184 pairs of genes, resulting in a saving of 31% compared to using only unique primers. We also present an alternative primer design method, in which each gene shares primers with two different genes of the other genome, enabling further savings.
6.	Andersson, Alma, et al. (författare) sepal : identifying transcript profiles with spatial patterns by diffusion-based modeling 2021 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811 .- 1460-2059. ; 37:17, s. 2644-2650 Tidskriftsartikel (refereegranskat)abstract Motivation: Collection of spatial signals in large numbers has become a routine task in multiple omics-fields, but parsing of these rich datasets still pose certain challenges. In whole or near-full transcriptome spatial techniques, spurious expression profiles are intermixed with those exhibiting an organized structure. To distinguish profiles with spatial patterns from the background noise, a metric that enables quantification of spatial structure is desirable. Current methods designed for similar purposes tend to be built around a framework of statistical hypothesis testing, hence we were compelled to explore a fundamentally different strategy. Results: We propose an unexplored approach to analyze spatial transcriptomics data, simulating diffusion of individual transcripts to extract genes with spatial patterns. The method performed as expected when presented with synthetic data. When applied to real data, it identified genes with distinct spatial profiles, involved in key biological processes or characteristic for certain cell types. Compared to existing methods, ours seemed to be less informed by the genes' expression levels and showed better time performance when run with multiple cores.
7.	Andersson, Robin, et al. (författare) A Segmental Maximum A Posteriori Approach to Genome-wide Copy Number Profiling 2008 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:6, s. 751-758 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract MOTIVATION: Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis. RESULTS: We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.
8.	Andersson, Siv G E, et al. (författare) Comparative genomics of microbial pathogens and symbionts. 2002 Ingår i: Bioinformatics. - 1367-4803 .- 1367-4811. ; 18 Suppl 2, s. S17- Tidskriftsartikel (refereegranskat)abstract We are interested in quantifying the contribution of gene acquisition, loss, expansion and rearrangements to the evolution of microbial genomes. Here, we discuss factors influencing microbial genome divergence based on pair-wise genome comparisons of closely related strains and species with different lifestyles. A particular focus is on intracellular pathogens and symbionts of the genera Rickettsia, Bartonella and BUCHNERA: Extensive gene loss and restricted access to phage and plasmid pools may provide an explanation for why single host pathogens are normally less successful than multihost pathogens. We note that species-specific genes tend to be shorter than orthologous genes, suggesting that a fraction of these may represent fossil-orfs, as also supported by multiple sequence alignments among species. The results of our genome comparisons are placed in the context of phylogenomic analyses of alpha and gamma proteobacteria. We highlight artefacts caused by different rates and patterns of mutations, suggesting that atypical phylogenetic placements can not a priori be taken as evidence for horizontal gene transfer events. The flexibility in genome structure among free-living microbes contrasts with the extreme stability observed for the small genomes of aphid endosymbionts, in which no rearrangements or inflow of genetic material have occurred during the past 50 millions years (1). Taken together, the results suggest that genomic stability correlate with the content of repeated sequences and mobile genetic elements, and thereby indirectly with bacterial lifestyles.
9.	Anil, Anandashankar, et al. (författare) HiCapTools : a software suite for probe design and proximity detection for targeted chromosome conformation capture applications 2018 Ingår i: Bioinformatics. - : OXFORD UNIV PRESS. - 1367-4803 .- 1367-4811. ; 34:4, s. 675-677 Tidskriftsartikel (refereegranskat)abstract Folding of eukaryotic genomes within nuclear space enables physical and functional contacts between regions that are otherwise kilobases away in sequence space. Targeted chromosome conformation capture methods (T2C, chi-C and HiCap) are capable of informing genomic contacts for a subset of regions targeted by probes. We here present HiCapTools, a software package that can design sequence capture probes for targeted chromosome capture applications and analyse sequencing output to detect proximities involving targeted fragments. Two probes are designed for each feature while avoiding repeat elements and non-unique regions. The data analysis suite processes alignment files to report genomic proximities for each feature at restriction fragment level and is isoform-aware for gene features. Statistical significance of contact frequencies is evaluated using an empirically derived background distribution. Targeted chromosome conformation capture applications are invaluable for locating target genes of disease-associated variants found by genome-wide association studies. Hence, we believe our software suite will prove to be useful for a wider user base within clinical and functional applications.
10.	Ardell, David H (författare) SCANMS : adjusting for multiple comparisons in sliding window neutrality tests. 2004 Ingår i: Bioinformatics. - 1367-4803 .- 1367-4811. ; 20:12, s. 1986-8 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
11.	Arvestad, Lars, et al. (författare) Bayesian gene/species tree reconciliation and orthology analysis using MCMC 2003 Ingår i: Bioinformatics. - : Oxford Journals. - 1367-4803 .- 1367-4811. ; 19, s. i7-i15 Tidskriftsartikel (refereegranskat)abstract Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available. Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.
12.	Ausmees, Kristiina, et al. (författare) Achieving improved accuracy for imputation of ancient DNA 2023 Ingår i: Bioinformatics. - : Oxford University Press. - 1367-4803 .- 1367-4811. ; 39:1 Tidskriftsartikel (refereegranskat)abstract MotivationGenotype imputation has the potential to increase the amount of information that can be gained from the often limited biological material available in ancient samples. As many widely used tools have been developed with modern data in mind, their design is not necessarily reflective of the requirements in studies of ancient DNA. Here, we investigate if an imputation method based on the full probabilistic Li and Stephens model of haplotype frequencies might be beneficial for the particular challenges posed by ancient data.ResultsWe present an implementation called prophaser and compare imputation performance to two alternative pipelines that have been used in the ancient DNA community based on the Beagle software. Considering empirical ancient data downsampled to lower coverages as well as present-day samples with artificially thinned genotypes, we show that the proposed method is advantageous at lower coverages, where it yields improved accuracy and ability to capture rare variation. The software prophaser is optimized for running in a massively parallel manner and achieved reasonable runtimes on the experiments performed when executed on a GPU.
13.	Baldassarre, Federico, et al. (författare) GraphQA: Protein Model Quality Assessment using Graph Convolutional Networks 2020 Ingår i: Bioinformatics. - : Oxford University Press. - 1367-4803 .- 1367-4811 .- 1460-2059. ; 37:3, s. 360-366 Tidskriftsartikel (refereegranskat)abstract MotivationProteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive, and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results.GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance, and computational efficiency.ResultsGraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated.Availability and implementationPyTorch implementation, datasets, experiments, and link to an evaluation server are available through this GitHub repository: github.com/baldassarreFe/graphqaSupplementary informationSupplementary material is available at Bioinformatics online.
14.	Basu, Sankar Chandra, et al. (författare) Finding correct protein-protein docking models using ProQDock 2016 Ingår i: Bioinformatics. - : OXFORD UNIV PRESS. - 1367-4803 .- 1367-4811. ; 32:12, s. 262-270 Tidskriftsartikel (refereegranskat)abstract Motivation: Protein-protein interactions are a key in virtually all biological processes. For a detailed understanding of the biological processes, the structure of the protein complex is essential. Given the current experimental techniques for structure determination, the vast majority of all protein complexes will never be solved by experimental techniques. In lack of experimental data, computational docking methods can be used to predict the structure of the protein complex. A common strategy is to generate many alternative docking solutions (atomic models) and then use a scoring function to select the best. The success of the computational docking technique is, to a large degree, dependent on the ability of the scoring function to accurately rank and score the many alternative docking models. Results: Here, we present ProQDock, a scoring function that predicts the absolute quality of docking model measured by a novel protein docking quality score (DockQ). ProQDock uses support vector machines trained to predict the quality of protein docking models using features that can be calculated from the docking model itself. By combining different types of features describing both the protein-protein interface and the overall physical chemistry, it was possible to improve the correlation with DockQ from 0.25 for the best individual feature (electrostatic complementarity) to 0.49 for the final version of ProQDock. ProQDock performed better than the state-of-the-art methods ZRANK and ZRANK2 in terms of correlations, ranking and finding correct models on an independent test set. Finally, we also demonstrate that it is possible to combine ProQDock with ZRANK and ZRANK2 to improve performance even further.
15.	Bengtsson-Palme, Johan, 1985, et al. (författare) Metaxa2 Database Builder: enabling taxonomic identification from metagenomic or metabarcoding data using any genetic marker 2018 Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 34:23, s. 4027-4033 Tidskriftsartikel (refereegranskat)abstract Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, no genetic marker gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. This has led to the adoption of a range of genetic markers for DNA metabarcoding. While many taxonomic classification software tools can be re-trained on these genetic markers, they are often designed with assumptions that impair their utility on genes other than the SSU and LSU rRNA. Here, we present an update to Metaxa2 that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data.We evaluated the Metaxa2 Database Builder on eleven commonly used barcoding regions and found that while there are wide differences in performance between different genetic markers, our software performs satisfactorily provided that the input taxonomy and sequence data are of high quality.Freely available on the web as part of the Metaxa2 package at http://microbiology.se/software/metaxa2/.Supplementary data are available at Bioinformatics online.
16.	Bernhem, Kristoffer, et al. (författare) SMLocalizer, a GPU accelerated ImageJ plugin for single molecule localization microscopy 2018 Ingår i: Bioinformatics. - : Oxford University Press. - 1367-4803 .- 1367-4811. ; 34:1, s. 137- Tidskriftsartikel (refereegranskat)abstract SMLocalizer combines the availability of ImageJ with the power of GPU processing for fast and accurate analysis of single molecule localization microscopy data. Analysis of 2D and 3D data in multiple channels is supported.
17.	Birin, H., et al. (författare) Inferring horizontal transfers in the presence of rearrangements by the minimum evolution criterion 2008 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:6, s. 826-832 Tidskriftsartikel (refereegranskat)abstract Motivation: The evolution of viruses is very rapid and in addition to local point mutations (insertion, deletion, substitution) it also includes frequent recombinations, genome rearrangements and horizontal transfer of genetic materials (HGTS). Evolutionary analysis of viral sequences is therefore a complicated matter for two main reasons: First, due to HGTs and recombinations, the right model of evolution is a network and not a tree. Second, due to genome rearrangements, an alignment of the input sequences is not guaranteed. These facts encourage developing methods for inferring phylogenetic networks that do not require aligned sequences as input. Results: In this work, we present the first computational approach which deals with both genome rearrangements and horizontal gene transfers and does not require a multiple alignment as input. We formalize a new set of computational problems which involve analyzing such complex models of evolution. We investigate their computational complexity, and devise algorithms for solving them. Moreover, we demonstrate the viability of our methods on several synthetic datasets as well as four biological datasets.
18.	Björkholm, Patrik, et al. (författare) Comparative analysis and unification of domain-domain interaction networks 2009 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:22, s. 3020-5 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Certain protein domains are known to preferentially interact with other domains. Several approaches have been proposed to predict domain-domain interactions, and over nine datasets are available. Our aim is to analyse the coverage and quality of the existing resources, as well as the extent of their overlap. With this knowledge, we have the opportunity to merge individual domain interaction networks to construct a comprehensive and reliable database. RESULTS: In this article we introduce a new approach towards comparing domain-domain interaction networks. This approach is used to compare nine predicted domain and protein interaction networks. The networks were used to generate a database of unified domain interactions, UniDomInt. Each interaction in the dataset is scored according to the benchmarked reliability of the sources. The performance of UniDomInt is an improvement compared to the underlying source networks and to another composite resource, Domine. AVAILABILITY: http://sonnhammer.sbc.su.se/download/UniDomInt/
19.	Björkholm, Patrik, et al. (författare) Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts 2009 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:10, s. 1264-1270 Tidskriftsartikel (refereegranskat)abstract Motivation: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. Results: We propose a novel hidden Markov model (HMM)based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 . L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short- range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.
20.	Bonet, Jose, et al. (författare) DeepMP : a deep learning tool to detect DNA base modifications on Nanopore sequencing data 2022 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 38:5, s. 1235-1243 Tidskriftsartikel (refereegranskat)abstract Motivation: DNA methylation plays a key role in a variety of biological processes. Recently, Nanopore long-read sequencing has enabled direct detection of these modifications. As a consequence, a range of computational methods have been developed to exploit Nanopore data for methylation detection. However, current approaches rely on a human-defined threshold to detect the methylation status of a genomic position and are not optimized to detect sites methylated at low frequency. Furthermore, most methods use either the Nanopore signals or the basecalling errors as the model input and do not take advantage of their combination. Results: Here, we present DeepMP, a convolutional neural network-based model that takes information from Nanopore signals and basecalling errors to detect whether a given motif in a read is methylated or not. Besides, DeepMP introduces a threshold-free position modification calling model sensitive to sites methylated at low frequency across cells. We comprehensively benchmarked DeepMP against state-of-the-art methods on Escherichia coli, human and pUC19 datasets. DeepMP outperforms current approaches at read-based and position-based methylation detection across sites methylated at different frequencies in the three datasets. Availability and implementation: DeepMP is implemented and freely available under MIT license at https://github.
21.	Bongcam Rudloff, Erik (författare) The GOBLET training portal: a global repository of bioinformatics training materials, courses and trainers 2015 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 31, s. 140-142 Tidskriftsartikel (refereegranskat)abstract A Summary: Rapid technological advances have led to an explosion of biomedical data in recent years. The pace of change has inspired new collaborative approaches for sharing materials and resources to help train life scientists both in the use of cutting-edge bioinformatics tools and databases and in how to analyse and interpret large datasets. A prototype platform for sharing such training resources was recently created by the Bioinformatics Training Network (BTN). Building on this work, we have created a centralized portal for sharing training materials and courses, including a catalogue of trainers and course organizers, and an announcement service for training events. For course organizers, the portal provides opportunities to promote their training events; for trainers, the portal offers an environment for sharing materials, for gaining visibility for their work and promoting their skills; for trainees, it offers a convenient one-stop shop for finding suitable training resources and identifying relevant training events and activities locally and worldwide.
22.	Brameier, Markus, et al. (författare) NucPred - Predicting nuclear localization of proteins 2007 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811 .- 1460-2059. ; 23:9, s. 1159-1160 Tidskriftsartikel (refereegranskat)abstract NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression matching and multiple program classifiers induced by genetic programming. A likelihood score is derived from the programs for each input sequence and each residue position. Different forms of visualization are provided to assist the detection of nuclear localization signals (NLSs). The NucPred server also provides access to additional sources of biological information (real and predicted) for a better validation and interpretation of results.
23.	Brunius, Carl, 1974, et al. (författare) Prediction and modeling of pre-analytical sampling errors as a strategy to improve plasma NMR metabolomics data 2017 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1460-2059 .- 1367-4811. ; 33:22, s. 3567-3574 Tidskriftsartikel (refereegranskat)abstract Biobanks are important infrastructures for life science research. Optimal sample handling regarding e.g. collection and processing of biological samples is highly complex, with many variables that could alter sample integrity and even more complex when considering multiple study centers or using legacy samples with limited documentation on sample management. Novel means to understand and take into account such variability would enable high-quality research on archived samples. This study investigated whether pre-analytical sample variability could be predicted and reduced by modeling alterations in the plasma metabolome, measured by NMR, as a function of pre-centrifugation conditions (1-36 h pre-centrifugation delay time at 4 A degrees C and 22 A degrees C) in 16 individuals. Pre-centrifugation temperature and delay times were predicted using random forest modeling and performance was validated on independent samples. Alterations in the metabolome were modeled at each temperature using a cluster-based approach, revealing reproducible effects of delay time on energy metabolism intermediates at both temperatures, but more pronounced at 22 A degrees C. Moreover, pre-centrifugation delay at 4 A degrees C resulted in large, specific variability at 3 h, predominantly of lipids. Pre-analytical sample handling error correction resulted in significant improvement of data quality, particularly at 22 A degrees C. This approach offers the possibility to predict pre-centrifugation delay temperature and time in biobanked samples before use in costly downstream applications. Moreover, the results suggest potential to decrease the impact of undesired, delay-induced variability. However, these findings need to be validated in multiple, large sample sets and with analytical techniques covering a wider range of the metabolome, such as LC-MS.
24.	Brunnsåker, Daniel, 1992, et al. (författare) Interpreting protein abundance in Saccharomyces cerevisiae through relational learning 2024 Ingår i: Bioinformatics. - : Oxford University Press. - 1367-4803 .- 1367-4811. ; 40:2 Tidskriftsartikel (refereegranskat)abstract Motivation: Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well-studied model organism, and there is a large amount of structured knowledge on yeast systems biology in databases such as the Saccharomyces Genome Database, and highly curated genome-scale metabolic models like Yeast8. These datasets, the result of decades of experiments, are abundant in information, and adhere to semantically meaningful ontologies. Results: By representing this knowledge in an expressive Datalog database we generated data descriptors using relational learning that, when combined with supervised machine learning, enables us to predict protein abundances in an explainable manner. We learnt predictive relationships between protein abundances, function and phenotype; such as a-amino acid accumulations and deviations in chronological lifespan. We further demonstrate the power of this methodology on the proteins His4 and Ilv2, connecting qualitative biological concepts to quantified abundances. Availability and implementation: All data and processing scripts are available at the following Github repository: https://github.com/ DanielBrunnsaker/ProtPredict.
25.	Bylesjö, Max, et al. (författare) MASQOT-GUI : spot quality assessment for the two-channel microarray platform 2006 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 22:20, s. 2554-2555 Tidskriftsartikel (refereegranskat)abstract MASQOT-GUI provides an open-source, platform-independent software pipeline for two-channel microarray spot quality control. This includes gridding, segmentation, quantification, quality assessment and data visualization. It hosts a set of independent applications, with interactions between the tools as well as import and export support for external software. The implementation of automated multivariate quality control assessment, which is a unique feature of MASQOT-GUI, is based on the previously documented and evaluated MASQOT methodology. Further abilities of the application are outlined and illustrated. AVAILABILITY: MASQOT-GUI is Java-based and licensed under the GNU LGPL. Source code and installation files are available for download at http://masqot-gui.sourceforge.net/
26.	Bystry, Vojtech, et al. (författare) ARResT/AssignSubsets : a novel application for robust subclassification of chronic lymphocytic leukemia based on B cell receptor IG stereotypy 2015 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 31:23, s. 3844-3846 Tidskriftsartikel (refereegranskat)abstract Motivation: An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for similar to 30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. Results: We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution.
27.	Carlborg, Örjan, et al. (författare) Methodological aspects of the genetic dissection of gene expression. 2005 Ingår i: Bioinformatics. - 1367-4803 .- 1367-4811. ; 21:10 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QLT mapping methods are not tailored for the highly automated analyses required to deal with the thousand of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression.RESULTS: The analyses of expression data on > 12,000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5% threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether > 300,000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.AVAILABILITY: The software used for this study is available on request from the authors.
28.	Chagas, Vinicius S., et al. (författare) RTNduals an R/Bioconductor package for analysis of co-regulation and inference of dual regulons 2019 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 35:24, s. 5357-5358 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Transcription factors (TFs) are key regulators of gene expression, and can activate or repress multiple target genes, forming regulatory units, or regulons. Understanding downstream effects of these regulators includes evaluating how TFs cooperate or compete within regulatory networks. Here we present RTNduals, an R/Bioconductor package that implements a general method for analyzing pairs of regulons. RESULTS: RTNduals identifies a dual regulon when the number of targets shared between a pair of regulators is statistically significant. The package extends the RTN (Reconstruction of Transcriptional Networks) package, and uses RTN transcriptional networks to identify significant co-regulatory associations between regulons. The Supplementary Information reports two case studies for TFs using the METABRIC and TCGA breast cancer cohorts. AVAILABILITY AND IMPLEMENTATION: RTNduals is written in the R language, and is available from the Bioconductor project at http://bioconductor.org/packages/RTNduals/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
29.	Chalk, Alistair M, et al. (författare) siRNA specificity searching incorporating mismatch tolerance data. 2008 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1460-2059. ; 24:10, s. 1316-1317 Tidskriftsartikel (refereegranskat)abstract Artificially synthesized short interfering RNAs (siRNAs) are widely used in functional genomics to knock down specific target genes. One ongoing challenge is to guarantee that the siRNA does not elicit off-target effects. Initial reports suggested that siRNAs were highly sequence-specific; however, subsequent data indicates that this is not necessarily the case. It is still uncertain what level of similarity and other rules are required for an off-target effect to be observed, and scoring schemes have not been developed to look beyond simple measures such as the number of mismatches or the number of consecutive matching bases present. We created design rules for predicting the likelihood of a non-specific effect and present a web server that allows the user to check the specificity of a given siRNA in a flexible manner using a combination of methods. The server finds potential off-target matches in the corresponding RefSeq database and ranks them according to a scoring system based on experimental studies of specificity. AVAILABILITY: The server is available at http://informatics-eskitis.griffith.edu.au/SpecificityServer.
30.	Chatterjee, Saikat, et al. (författare) SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition 2014 Ingår i: Bioinformatics. - : Oxford University Press. - 1460-2059 .- 1367-4803 .- 1367-4811. ; 30:17, s. 2423-2431 Tidskriftsartikel (refereegranskat)abstract Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment.Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.Availability and implementation: A platform-independent Matlab implementation of the method is freely available at http://www.ee.kth.se/ctsoftware; source code that does not require access to Matlab is currently being tested and will be made available later through the above Web site.
31.	Chen, Yu, 1990, et al. (författare) Systematic inference of functional phosphorylation events in yeast metabolism 2017 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 33:13, s. 1995-2001 Tidskriftsartikel (refereegranskat)abstract Motivation: Protein phosphorylation is a post-translational modification that affects proteins by changing their structure and conformation in a rapid and reversible way, and it is an important mechanism for metabolic regulation in cells. Phosphoproteomics enables high-throughput identification of phosphorylation events on metabolic enzymes, but identifying functional phosphorylation events still requires more detailed biochemical characterization. Therefore, development of computational methods for investigating unknown functions of a large number of phosphorylation events identified by phosphoproteomics has received increased attention. Results: We developed a mathematical framework that describes the relationship between phosphorylation level of a metabolic enzyme and the corresponding flux through the enzyme. Using this framework, it is possible to quantitatively estimate contribution of phosphorylation events to flux changes. We showed that phosphorylation regulation analysis, combined with a systematic workflow and correlation analysis, can be used for inference of functional phosphorylation events in steady and dynamic conditions, respectively. Using this analysis, we assigned functionality to phosphorylation events of 17 metabolic enzymes in the yeast Saccharomyces cerevisiae, among which 10 are novel. Phosphorylation regulation analysis cannot only be extended for inference of other functional post-translational modifications but also be a promising scaffold formulti-omics data integration in systems biology.
32.	Climer, Sharlee, et al. (författare) How frugal is mother nature with haplotypes? 2009 Ingår i: Bioinformatics. - Oxford : Oxford University Press. - 1367-4803 .- 1367-4811. ; 25:1, s. 68-74 Tidskriftsartikel (refereegranskat)abstract Motivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution.Results: This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.
33.	Conley, Christopher J., et al. (författare) Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection 2014 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 30:18, s. 2636-2643 Tidskriftsartikel (refereegranskat)abstract Motivation: Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrometry (XC-MS) data analysis that faces a multitude of technical challenges on complex samples. The Kalman filter (KF) application to IT detection addresses some of these challenges; it discriminates closely eluting ITs in the m/z dimension, flexibly handles heteroscedastic m/z variances and does not bin the m/z axis. Yet, the behavior of this KF application has not been fully characterized, as no cost-free open-source implementation exists and incomplete evaluation standards for IT detection persist.Results: Massifquant is an open-source solution for KF IT detection that has been subjected to novel and rigorous methods of performance evaluation. The presented evaluation with accompanying annotations and optimization guide sets a new standard for comparative IT detection. Compared with centWave, matchedFilter and MZMine2-alternative IT detection engines-Massifquant detected more true ITs in a real LC-MS complex sample, especially low-intensity ITs. It also offers competitive specificity and equally effective quantitation accuracy.
34.	Costa, Ivan G, et al. (författare) Constrained mixture estimation for analysis and robust classification of clinical time series. 2009 Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 25:12 Tidskriftsartikel (refereegranskat)abstract Personalized medicine based on molecular aspects of diseases, such as gene expression profiling, has become increasingly popular. However, one faces multiple challenges when analyzing clinical gene expression data; most of the well-known theoretical issues such as high dimension of feature spaces versus few examples, noise and missing data apply. Special care is needed when designing classification procedures that support personalized diagnosis and choice of treatment. Here, we particularly focus on classification of interferon-beta (IFNbeta) treatment response in Multiple Sclerosis (MS) patients which has attracted substantial attention in the recent past. Half of the patients remain unaffected by IFNbeta treatment, which is still the standard. For them the treatment should be timely ceased to mitigate the side effects.We propose constrained estimation of mixtures of hidden Markov models as a methodology to classify patient response to IFNbeta treatment. The advantages of our approach are that it takes the temporal nature of the data into account and its robustness with respect to noise, missing data and mislabeled samples. Moreover, mixture estimation enables to explore the presence of response sub-groups of patients on the transcriptional level. We clearly outperformed all prior approaches in terms of prediction accuracy, raising it, for the first time, >90%. Additionally, we were able to identify potentially mislabeled samples and to sub-divide the good responders into two sub-groups that exhibited different transcriptional response programs. This is supported by recent findings on MS pathology and therefore may raise interesting clinical follow-up questions.The method is implemented in the GQL framework and is available at http://www.ghmm.org/gql. Datasets are available at http://www.cin.ufpe.br/ approximately igcf/MSConst.Supplementary data are available at Bioinformatics online.
35.	Costa, Ivan G, et al. (författare) Inferring differentiation pathways from gene expression. 2008 Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 24:13 Tidskriftsartikel (refereegranskat)abstract The regulation of proliferation and differentiation of embryonic and adult stem cells into mature cells is central to developmental biology. Gene expression measured in distinguishable developmental stages helps to elucidate underlying molecular processes. In previous work we showed that functional gene modules, which act distinctly in the course of development, can be represented by a mixture of trees. In general, the similarities in the gene expression programs of cell populations reflect the similarities in the differentiation path.We propose a novel model for gene expression profiles and an unsupervised learning method to estimate developmental similarity and infer differentiation pathways. We assess the performance of our model on simulated data and compare it with favorable results to related methods. We also infer differentiation pathways and predict functional modules in gene expression data of lymphoid development.We demonstrate for the first time how, in principal, the incorporation of structural knowledge about the dependence structure helps to reveal differentiation pathways and potentially relevant functional gene modules from microarray datasets. Our method applies in any area of developmental biology where it is possible to obtain cells of distinguishable differentiation stages.The implementation of our method (GPL license), data and additional results are available at http://algorithmics.molgen.mpg.de/Supplements/InfDif/.Supplementary data is available at Bioinformatics online.
36.	Da Silva, Vinicius, et al. (författare) CNVRanger: association analysis of CNVs with gene expression and quantitative phenotypes 2020 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 36, s. 972-973 Tidskriftsartikel (refereegranskat)abstract A Summary: Copy number variation (CNV) is a major type of structural genomic variation that is increasingly studied across different species for association with diseases and production traits. Established protocols for experimental detection and computational inference of CNVs from SNP array and next-generation sequencing data are available. We present the CNVRanger R/Bioconductor package which implements a comprehensive toolbox for structured downstream analysis of CNVs. This includes functionality for summarizing individual CNV calls across a population, assessing overlap with functional genomic regions, and genome-wide association analysis with gene expression and quantitative phenotypes.
37.	Dalevi, Daniel, 1974, et al. (författare) Bayesian classifiers for detecting HGT using fixed and variable order Markov models of genomic signatures 2006 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 22:5, s. 517-522 Tidskriftsartikel (refereegranskat)abstract Analyses of genomic signatures are gaining attention as they allow studies of species-specific relationships without involving alignments of homologous sequences. A naïve Bayesian classifier was built to discriminate between different bacterial compositions of short oligomers, also known as DNA words. The classifier has proven successful in identifying foreign genes in Neisseria meningitis. In this study we extend the classifier approach using either a fixed higher order Markov model (Mk) or a variable length Markov model (VLMk).
38.	Dalevi, Daniel, 1974, et al. (författare) Expected Gene Order Distances and Model Selection in Bacteria 2008 Ingår i: Bioinformatics. - Oxford, United Kingdom : Oxford University Press. - 1367-4803 .- 1367-4811. ; 24:11, s. 1332-1338 Tidskriftsartikel (refereegranskat)abstract Motivation: The evolutionary distance inferred from gene-order comparisons of related bacteria is dependent on the model. Therefore, it is highly important to establish reliable assumptions before inferring its magnitude. Results: We investigate the patterns of dotplots between species of bacteria with the purpose of model selection in gene-order problems. We find several categories of data which can be explained by carefully weighing the contributions of reversals, transpositions, symmetrical reversals, single gene transpositions and single gene reversals. We also derive method of moments distance estimates for some previously uncomputed cases, such as symmetrical reversals, single gene reversals and their combinations, as well as the single gene transpositions edit distance.
39.	Das, Sarbashis, et al. (författare) ABWGAT : anchor-based whole genome analysis tool. 2009 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:24, s. 3319-20 Tidskriftsartikel (refereegranskat)abstract SUMMARY: Large numbers of genomes are being sequenced regularly and the rate will go up in future due to availability of new genome sequencing techniques. In order to understand genotype to phenotype relationships, it is necessary to identify sequence variations at the genomic level. Alignment of a pair of genomes and parsing the alignment data is an accepted approach for identification of variations. Though there are a number of tools available for whole-genome alignment, none of these allows automatic parsing of the alignment and identification of different kinds of genomic variants with high degree of sensitivity. Here we present a simple web-based interface for whole genome comparison named ABWGAT (Anchor-Based Whole Genome Analysis Tool) that is simple to use. The output is a list of variations such as SNVs, indels, repeat expansion and inversion.AVAILABILITY: The web server is freely available to non-commercial users at the following address http://abwgc.jnu.ac.in/_sarba. Supplementary data are available at http://abwgc.jnu.ac.in/_sarba/cgi-bin/abwgc_retrival.cgi using job id 524, 526 and 528.CONTACT: dsarbashis@gmail.com; alok.bhattacharya@gmail.com
40.	Davila Lopez, Marcela, et al. (författare) eGOB: eukaryotic Gene Order Browser. 2011 Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1460-2059 .- 1367-4803. ; 27:8, s. 1150-1 Tidskriftsartikel (refereegranskat)abstract A large number of genomes have been sequenced, allowing a range of comparative studies. Here, we present the eukaryotic Gene Order Browser with information on the order of protein and non-coding RNA (ncRNA) genes of 74 different eukaryotic species. The browser is able to display a gene of interest together with its genomic context in all species where that gene is present. Thereby, questions related to the evolution of gene organization and non-random gene order may be examined. The browser also provides access to data collected on pairs of adjacent genes that are evolutionarily conserved. AVAILABILITY: eGOB as well as underlying data are freely available at http://egob.biomedicine.gu.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: tore.samuelsson@medkem.gu.se.
41.	de Weerd, Hendrik A., et al. (författare) MODifieR : an ensemble R package for inference of disease modules from transcriptomics networks 2020 Ingår i: Bioinformatics. - : Oxford University Press. - 1367-4803 .- 1367-4811 .- 1460-2059. ; 36:12, s. 3918-3919 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Complex diseases are due to the dense interactions of many disease-associated factors that dysregulate genes that in turn form so-called disease modules, which have shown to be a powerful concept for understanding pathological mechanisms. There exist many disease module inference methods that rely on somewhat different assumptions, but there is still no gold standard or best performing method. Hence, there is a need for combining these methods to generate robust disease modules.RESULTS: We developed MODule IdentiFIER (MODifieR), an ensemble R package of nine disease module inference methods from transcriptomics networks. MODifieR uses standardized input and output allowing the possibility to combine individual modules generated from these methods into more robust disease-specific modules, contributing to a better understanding of complex diseases.AVAILABILITY: MODifieR is available under the GNU GPL license and can be freely downloaded from https://gitlab.com/Gustafsson-lab/MODifieR and as a Docker image from https://hub.docker.com/r/ddeweerd/modifier.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
42.	Delhomme, Nicolas, et al. (författare) easyRNASeq : a bioconductor package for processing RNA-Seq data. 2012 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 28:19 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: RNA sequencing is becoming a standard for expression profiling experiments and many tools have been developed in the past few years to analyze RNA-Seq data. Numerous 'Bioconductor' packages are available for next-generation sequencing data loading in R, e.g. ShortRead and Rsamtools as well as to perform differential gene expression analyses, e.g. DESeq and edgeR. However, the processing tasks lying in between these require the precise interplay of many Bioconductor packages, e.g. Biostrings, IRanges or external solutions are to be sought.RESULTS: We developed 'easyRNASeq', an R package that simplifies the processing of RNA sequencing data, hiding the complex interplay of the required packages behind a single functionality.AVAILABILITY: The package is implemented in R (as of version 2.15) and is available from Bioconductor (as of version 2.10) at the URL: http://bioconductor.org/packages/release/bioc/html/easyRNASeq.html, where installation and usage instructions can be found.CONTACT: delhomme@embl.de.
43.	Demissie, Meaza, et al. (författare) Unequal group variances in microarray data analyses 2008 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:9, s. 1168-1174 Tidskriftsartikel (refereegranskat)abstract Motivation: In searching for differentially expressed (DE) genes in microarray data, we often observe a fraction of the genes to have unequal variability between groups. This is not an issue in large samples, where a valid test exists that uses individual variances separately. The problem arises in the small-sample setting, where the approximately valid Welch test lacks sensitivity, while the more sensitive moderated t-test assumes equal variance. Methods: We introduce a moderated Welch test (MWT) that allows unequal variance between groups. It is based on (i) weighting of pooled and unpooled standard errors and (ii) improved estimation of the gene-level variance that exploits the information from across the genes. Results: When a non-trivial proportion of genes has unequal variability, false discovery rate (FDR) estimates based on the standard t and moderated t-tests are often too optimistic, while the standard Welch test has low sensitivity. The MWT is shown to (i) perform better than the standard t, the standard Welch and the moderated t-tests when the variances are unequal between groups and (ii) perform similarly to the moderated t, and better than the standard t and Welch tests when the group variances are equal. These results mean that MWT is more reliable than other existing tests over wider range of data conditions. Availability: R package to perform MWT is available at http://www.meb.ki.se/similar to yudpaw Contact: yudi.pawitan@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.
44.	Dessimoz, Christophe, et al. (författare) Toward community standards in the quest for orthologs 2012 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 28:6, s. 900-904 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
45.	Desvignes, Thomas, et al. (författare) Unification of miRNA and isomiR research : the mirGFF3 format and the mirtop API 2020 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 36:3, s. 698-703 Tidskriftsartikel (refereegranskat)abstract Motivation: MicroRNAs (miRNAs) are small RNA molecules (similar to 22 nucleotide long) involved in post-transcriptional gene regulation. Advances in high-throughput sequencing technologies led to the discovery of isomiRs, which are miRNA sequence variants. While many miRNA-seq analysis tools exist, the diversity of output formats hinders accurate comparisons between tools and precludes data sharing and the development of common downstream analysis methods. Results: To overcome this situation, we present here a community-based project, miRNA Transcriptomic Open Project (miRTOP) working towards the optimization of miRNA analyses. The aim of miRTOP is to promote the development of downstream isomiR analysis tools that are compatible with existing detection and quantification tools. Based on the existing GFF3 format, we first created a new standard format, mirGFF3, for the output of miRNA/isomiR detection and quantification results from small RNA-seq data. Additionally, we developed a command line Python tool, mirtop, to create and manage the mirGFF3 format. Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, isomiR-SEA, sRNAbench, Prost! as well as BAM files. Some tools have also incorporated the mirGFF3 format directly into their code, such as, miRge2.0, IsoMIRmap and OptimiR. Its open architecture enables any tool or pipeline to output or convert results into mirGFF3. Collectively, this isomiR categorization system, along with the accompanying mirGFF3 and mirtop API, provide a comprehensive solution for the standardization of miRNA and isomiR annotation, enabling data sharing, reporting, comparative analyses and benchmarking, while promoting the development of common miRNA methods focusing on downstream steps of miRNA detection, annotation and quantification.
46.	Dib, L., et al. (författare) Evolutionary footprint of coevolving positions in genes 2014 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 30:9, s. 1241-1249 Tidskriftsartikel (refereegranskat)abstract Motivation: The analysis of molecular coevolution provides information on the potential functional and structural implication of positions along DNA sequences, and several methods are available to identify coevolving positions using probabilistic or combinatorial approaches. The specific nucleotide or amino acid profile associated with the coevolution process is, however, not estimated, but only known profiles, such as the Watson-Crick constraint, are usually considered a priori in current measures of coevolution. Results: Here, we propose a new probabilistic model, Coev, to identify coevolving positions and their associated profile in DNA sequences while incorporating the underlying phylogenetic relationships. The process of coevolution is modeled by a 16 X 16 instantaneous rate matrix that includes rates of transition as well as a profile of coevolution. We used simulated, empirical and illustrative data to evaluate our model and to compare it with a model of `independent' evolution using Akaike Information Criterion. We showed that the Coev model is able to discriminate between coevolving and non-coevolving positions and provides better specificity and specificity than other available approaches. We further demonstrate that the identification of the profile of coevolution can shed new light on the process of dependent substitution during lineage evolution.
47.	Dickinson, Q., et al. (författare) Multi-omic integration by machine learning (MIMaL) 2022 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 38:21, s. 4908-4918 Tidskriftsartikel (refereegranskat)abstract Motivation: Cells respond to environments by regulating gene expression to exploit resources optimally. Recent advances in technologies allow for measuring the abundances of RNA, proteins, lipids and metabolites. These highly complex datasets reflect the states of the different layers in a biological system. Multi-omics is the integration of these disparate methods and data to gain a clearer picture of the biological state. Multi-omic studies of the proteome and metabolome are becoming more common as mass spectrometry technology continues to be democratized. However, knowledge extraction through the integration of these data remains challenging. Results: Connections between molecules in different omic layers were discovered through a combination of machine learning and model interpretation. Discovered connections reflected protein control (ProC) over metabolites. Proteins discovered to control citrate were mapped onto known genetic and metabolic networks, revealing that these protein regulators are novel. Further, clustering the magnitudes of ProC over all metabolites enabled the prediction of five gene functions, each of which was validated experimentally. Two uncharacterized genes, YJR120W and YDL157C, were accurately predicted to modulate mitochondrial translation. Functions for three incompletely characterized genes were also predicted and validated, including SDH9, ISC1 and FMP52. A website enables results exploration and also MIMaL analysis of user-supplied multi-omic data.
48.	Dimou, Niki L., et al. (författare) GWAR : robust analysis and meta-analysis of genome-wide association studies 2017 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 33:10, s. 1521-1527 Tidskriftsartikel (refereegranskat)abstract Motivation: In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. Results: The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata.
49.	Draminski, Michal, et al. (författare) Monte Carlo feature selection for supervised classification 2008 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:1, s. 110-117 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods.
50.	Duchemin, Wandrille, et al. (författare) RecPhyloXML : a format for reconciled gene trees 2018 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 34:21, s. 3646-3652 Tidskriftsartikel (refereegranskat)abstract Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc. -along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-50 av 277

Avgränsa träffmängd

Typ av publikation: tidskriftsartikel (277)

Typ av innehåll: refereegranskat (273); övrigt vetenskapligt/konstnärligt (4)

Författare/redaktör: Sonnhammer, Erik L L (18); Lundeberg, Joakim (7); Orešič, Matej, 1967- (6); Arvestad, Lars (6); Käll, Lukas, 1969- (5); Nilsson, R. Henrik, ... (4); visa fler...; Menéndez Hurtado (, ... (4); Lagergren, Jens (4); van Der Spoel, David (4); Elf, Johan (4); Dalevi, Daniel, 1974 (3); Lindblad-Toh, Kersti ... (3); Nilsson, Björn (3); Sjödin, Andreas (3); Staaf, Johan (3); Larsson, Anders (2); Bengtsson-Palme, Joh ... (2); Kristiansson, Erik, ... (2); Lambrix, Patrick (2); Nielsen, Jens B, 196 ... (2); Abdel-Rehim, Abbi (2); King, Ross, 1962 (2); Uhlén, Mathias (2); Groop, Leif (2); Hellander, Andreas (2); Levander, Fredrik (2); Rydén, Tobias (2); Sonnhammer, Erik (2); Karlsson, Niclas G., ... (2); Enroth, Stefan (2); Niroula, Abhishek (2); Larsson, Per (2); Landberg, Rikard, 19 ... (2); Forsman, Mats (2); Sennblad, Bengt (2); Carlborg, Örjan (2); Höglund, Mattias (2); Delhomme, Nicolas (2); Stenius, U (2); Rögnvaldsson, Thorst ... (2); Fontes, Magnus (2); Tamas, Ivica (2); Tjärnberg, Andreas (2); Pawitan, Yudi (2); Di Palma, Federica (2); Mauceli, Evan (2); Korhonen, A (2); Whelan, Simon (2); Sjöstrand, Joel (2); Johansson, Mikael (2); visa färre...

Lärosäte: Stockholms universitet (72); Uppsala universitet (69); Kungliga Tekniska Högskolan (48); Göteborgs universitet (36); Karolinska Institutet (32); Linköpings universitet (23); visa fler...; Umeå universitet (20); Lunds universitet (20); Chalmers tekniska högskola (19); Örebro universitet (11); Sveriges Lantbruksuniversitet (9); Högskolan i Halmstad (3); Högskolan i Skövde (3); Mälardalens universitet (1); Mittuniversitetet (1); Södertörns högskola (1); visa färre...

Språk: Engelska (277)

Forskningsämne (UKÄ/SCB): Naturvetenskap (218); Medicin och hälsovetenskap (29); Teknik (22); Lantbruksvetenskap (2); Samhällsvetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Stäng

Kopiera och spara länken för att återkomma till aktuell vy