SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Sonnhammer Erik L L) "

Sökning: WFRF:(Sonnhammer Erik L L)

  • Resultat 1-50 av 97
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Berglund, Ann-Charlotte, et al. (författare)
  • InParanoid 6 : eukaryotic ortholog clusters with inparalogs
  • 2008
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 36, s. D263-D266
  • Tidskriftsartikel (refereegranskat)abstract
    • The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.
  •  
2.
  • Guala, Dimitri, et al. (författare)
  • MaxLink : network-based prioritization of genes tightly linked to a disease seed set
  • 2014
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 30:18, s. 2689-2690
  • Tidskriftsartikel (refereegranskat)abstract
    • A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.
  •  
3.
  • Hillerton, Thomas, et al. (författare)
  • GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data.
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Understanding how genes interact with and regulate each other is a key challenge in systems biology. One of the primary methods to study this is through gene regulatory networks (GRNs). The field of GRN inference however faces many challenges, such as the complexity of gene regulation and high noise levels, which necessitates effective tools for evaluating inference methods. For this purpose, data that corresponds to a known GRN, from various conditions and experimental setups is necessary, which is only possible to attain via simulation.  Existing tools for simulating data for GRN inference have limitations either in the way networks are constructed or data is produced, and are often not flexible for adjusting the algorithm or parameters. To overcome these issues we present GeneSNAKE, a Python package designed to allow users to generate biologically realistic GRNs, and from a GRN simulate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties. GeneSNAKE improves on previous work in the field by adding a perturbation model that allows for a greater range of perturbation schemes along with the ability to control noise and modify the perturbation strength. For benchmarking, GeneSNAKE offers a number of functions both for comparing a true GRN to an inferred GRN, and to study properties in data and GRN models. These functions can in addition be used to study properties of biological data to produce simulated data with more realistic properties.  GeneSNAKE is an open-source, comprehensive simulation and benchmarking package with powerful capabilities that are not combined in any other single package, and thanks to the Python implementation it is simple to extend and modify by a user.
  •  
4.
  • Zhivkoplias, Erik K., et al. (författare)
  • Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops
  • 2022
  • Ingår i: Frontiers in Genetics. - : Frontiers Media SA. - 1664-8021. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.
  •  
5.
  •  
6.
  •  
7.
  •  
8.
  •  
9.
  • Alexeyenko, Andrey, et al. (författare)
  • Comparative interactomics with Funcoup 2.0
  • 2012
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 40:D1, s. D821-D828
  • Tidskriftsartikel (refereegranskat)abstract
    • FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.
  •  
10.
  • Alexeyenko, Andrey, et al. (författare)
  • Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity
  • 2010
  • Ingår i: PLOS ONE. - : Public Library of Science (PLoS). - 1932-6203. ; 5:5, s. e10465-
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: In order to generate hypotheses regarding the mechanisms by which 2,3,7,8-tetrachlorodibenzo-p-dioxin (dioxin) causes toxicity, we analyzed global gene expression changes in developing zebrafish embryos exposed to this potent toxicant in the context of a dynamic gene network. For this purpose, we also computationally inferred a zebrafish (Danio rerio) interactome based on orthologs and interaction data from other eukaryotes. Methodology/Principal Findings: Using novel computational tools to analyze this interactome, we distinguished between dioxin-dependent and dioxin-independent interactions between proteins, and tracked the temporal propagation of dioxin-dependent transcriptional changes from a few genes that were altered initially, to large groups of biologically coherent genes at later times. The most notable processes altered at later developmental stages were calcium and iron metabolism, embryonic morphogenesis including neuronal and retinal development, a variety of mitochondria-related functions, and generalized stress response (not including induction of antioxidant genes). Within the interactome, many of these responses were connected to cytochrome P4501A (cyp1a) as well as other genes that were dioxin-regulated one day after exposure. This suggests that cyp1a may play a key role initiating the toxic dysregulation of those processes, rather than serving simply as a passive marker of dioxin exposure, as suggested by earlier research. Conclusions/Significance: Thus, a powerful microarray experiment coupled with a flexible interactome and multi-pronged interactome tools (which are now made publicly available for microarray analysis and related work) suggest the hypothesis that dioxin, best known in fish as a potent cardioteratogen, has many other targets. Many of these types of toxicity have been observed in mammalian species and are potentially caused by alterations to cyp1a.
  •  
11.
  • Alexeyenko, Andrey, et al. (författare)
  • Global networks of functional coupling in eukaryotes from comprehensive data integration
  • 2009
  • Ingår i: Genome Research. - : Cold Spring Harbor Laboratory. - 1088-9051 .- 1549-5469. ; 19:6, s. 1107-16
  • Tidskriftsartikel (refereegranskat)abstract
    • No single experimental method can discover all connections in the interactome. A computational approach can help by integrating data from multiple, often unrelated, proteomics and genomics pipelines. Reconstructing global networks of functional coupling (FC) faces the challenges of scale and heterogeneity--how to efficiently integrate huge amounts of diverse data from multiple organisms, yet ensuring high accuracy. We developed FunCoup, an optimized Bayesian framework, to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions: physical interaction, protein complex member, metabolic, or signaling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimize the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 data sets in seven organisms and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network, only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. We show how FunCoup can be used for discovering candidate members of the Parkinson and Alzheimer pathways. Cross-species pathway conservation analysis provided further support to these observations.
  •  
12.
  • Barrientos-Somarribas, Mauricio, et al. (författare)
  • Discovering viral genomes in human metagenomic data by predicting unknown protein families
  • 2018
  • Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.
  •  
13.
  • Björkholm, Patrik, et al. (författare)
  • Comparative analysis and unification of domain-domain interaction networks
  • 2009
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:22, s. 3020-5
  • Tidskriftsartikel (refereegranskat)abstract
    • MOTIVATION: Certain protein domains are known to preferentially interact with other domains. Several approaches have been proposed to predict domain-domain interactions, and over nine datasets are available. Our aim is to analyse the coverage and quality of the existing resources, as well as the extent of their overlap. With this knowledge, we have the opportunity to merge individual domain interaction networks to construct a comprehensive and reliable database. RESULTS: In this article we introduce a new approach towards comparing domain-domain interaction networks. This approach is used to compare nine predicted domain and protein interaction networks. The networks were used to generate a database of unified domain interactions, UniDomInt. Each interaction in the dataset is scored according to the benchmarked reliability of the sources. The performance of UniDomInt is an improvement compared to the underlying source networks and to another composite resource, Domine. AVAILABILITY: http://sonnhammer.sbc.su.se/download/UniDomInt/
  •  
14.
  • Buzzao, Davide, et al. (författare)
  • TOPAS, a network-based approach to detect disease modules in a top-down fashion 
  • 2022
  • Ingår i: NAR Genomics and Bioinformatics. - : Oxford University Press (OUP). - 2631-9268. ; 4:4
  • Tidskriftsartikel (refereegranskat)abstract
    • A vast scenario of potential disease mechanisms and remedies is yet to be discovered. The field of Network Medicine has grown thanks to the massive amount of high-throughput data and the emerging evidence that disease-related proteins form ‘disease modules’. Relying on prior disease knowledge, network-based disease module detection algorithms aim at connecting the list of known disease associated genes by exploiting interaction networks. Most existing methods extend disease modules by iteratively adding connector genes in a bottom-up fashion, while top-down approaches remain largely unexplored. We have created TOPAS, an iterative approach that aims at connecting the largest number of seed nodes in a top-down fashion through connectors that guarantee the highest flow of a Random Walk with Restart in a network of functional associations. We used a corpus of 382 manually selected functional gene sets to benchmark our algorithm against SCA, DIAMOnD, MaxLink and ROBUST across four interactomes. We demonstrate that TOPAS outperforms competing methods in terms of Seed Recovery Rate, Seed to Connector Ratio and consistency during module detection. We also show that TOPAS achieves competitive performance in terms of biological relevance of detected modules and scalability. 
  •  
15.
  • Carreras-Puigvert, Jordi, et al. (författare)
  • A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family
  • 2017
  • Ingår i: Nature Communications. - : Nature Publishing Group. - 2041-1723. ; 8:1
  • Tidskriftsartikel (refereegranskat)abstract
    • The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.
  •  
16.
  • Castresana-Aguirre, Miguel, 1991-, et al. (författare)
  • Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis
  • 2022
  • Ingår i: Frontiers in Genetics. - : Frontiers Media SA. - 1664-8021. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each module. We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering can be beneficial by increasing the sensitivity of pathway analysis methods and by providing deeper insights of biological mechanisms related to the phenotype under study. However, keeping a high specificity is a challenge. For ANUBIX, clustering caused a minor loss of specificity, while for BinoX and NEAT it caused an unacceptable loss of specificity. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We show examples of this approach and conclude that clustering can improve overall pathway annotation performance, but should only be used if the used enrichment method has a low false positive rate.
  •  
17.
  • Castresana-Aguirre, Miguel, et al. (författare)
  • PathBIX—a web server for network-based pathway annotation with adaptive null models
  • 2021
  • Ingår i: Bioinformatics Advances. - : Oxford University Press (OUP). - 2635-0041. ; 1:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: Pathway annotation is a vital tool for interpreting and giving meaning to experimental data in life sciences. Numerous tools exist for this task, where the most recent generation of pathway enrichment analysis tools, network-based methods, utilize biological networks to gain a richer source of information as a basis of the analysis than merely the gene content. Network-based methods use the network crosstalk between the query gene set and the genes in known pathways, and compare this to a null model of random expectation.Results: We developed PathBIX, a novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.
  •  
18.
  • Castresana-Aguirre, Miguel, et al. (författare)
  • Pathway-specific model estimation for improved pathway annotation by network crosstalk
  • 2020
  • Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 10:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Pathway enrichment analysis is the most common approach for understanding which biological processes are affected by altered gene activities under specific conditions. However, it has been challenging to find a method that efficiently avoids false positives while keeping a high sensitivity. We here present a new network-based method ANUBIX based on sampling random gene sets against intact pathway. Benchmarking shows that ANUBIX is considerably more accurate than previous network crosstalk based methods, which have the drawback of modelling pathways as random gene sets. We demonstrate that ANUBIX does not have a bias for finding certain pathways, which previous methods do, and show that ANUBIX finds biologically relevant pathways that are missed by other methods.
  •  
19.
  • Chalk, Alistair M, et al. (författare)
  • siRNA specificity searching incorporating mismatch tolerance data.
  • 2008
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1460-2059. ; 24:10, s. 1316-1317
  • Tidskriftsartikel (refereegranskat)abstract
    • Artificially synthesized short interfering RNAs (siRNAs) are widely used in functional genomics to knock down specific target genes. One ongoing challenge is to guarantee that the siRNA does not elicit off-target effects. Initial reports suggested that siRNAs were highly sequence-specific; however, subsequent data indicates that this is not necessarily the case. It is still uncertain what level of similarity and other rules are required for an off-target effect to be observed, and scoring schemes have not been developed to look beyond simple measures such as the number of mismatches or the number of consecutive matching bases present. We created design rules for predicting the likelihood of a non-specific effect and present a web server that allows the user to check the specificity of a given siRNA in a flexible manner using a combination of methods. The server finds potential off-target matches in the corresponding RefSeq database and ranks them according to a scoring system based on experimental studies of specificity. AVAILABILITY: The server is available at http://informatics-eskitis.griffith.edu.au/SpecificityServer.
  •  
20.
  • Dessimoz, Christophe, et al. (författare)
  • Toward community standards in the quest for orthologs
  • 2012
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 28:6, s. 900-904
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
  •  
21.
  • El-Gebali, Sara, et al. (författare)
  • The Pfam protein families database in 2019
  • 2019
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 47:D1, s. D427-D432
  • Tidskriftsartikel (refereegranskat)abstract
    • The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families(EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
  •  
22.
  • Finn, Robert D, et al. (författare)
  • Pfam : clans, web tools and services.
  • 2006
  • Ingår i: Nucleic Acids Res. - : Oxford University Press (OUP). - 1362-4962 .- 0305-1048. ; 34:Database issue, s. D247-51
  • Tidskriftsartikel (refereegranskat)
  •  
23.
  • Finn, Robert D., et al. (författare)
  • Pfam : the protein families database
  • 2014
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 42:D1, s. d222-D230
  • Tidskriftsartikel (refereegranskat)abstract
    • Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
  •  
24.
  • Finn, Robert D., et al. (författare)
  • The Pfam protein families database
  • 2010
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 38, s. d211-d222
  • Tidskriftsartikel (refereegranskat)abstract
    • Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is similar to 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
  •  
25.
  • Forslund, Kristoffer, et al. (författare)
  • Benchmarking homology detection procedures with low complexity filters
  • 2009
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:19, s. 2500-2505
  • Tidskriftsartikel (refereegranskat)abstract
    • BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences. RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed. CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated. AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz
  •  
26.
  • Forslund, Kristoffer, et al. (författare)
  • Domain architecture conservation in orthologs
  • 2011
  • Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 12, s. 326-
  • Tidskriftsartikel (refereegranskat)abstract
    • Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.
  •  
27.
  • Forslund, Kristoffer, et al. (författare)
  • Domain tree-based analysis of protein architecture evolution
  • 2008
  • Ingår i: Molecular biology and evolution. - : Oxford University Press (OUP). - 0737-4038 .- 1537-1719. ; 25:2, s. 254-264
  • Tidskriftsartikel (refereegranskat)abstract
    • Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.
  •  
28.
  • Forslund, Kristoffer, et al. (författare)
  • Evolution of Protein Domain Architectures
  • 2012
  • Ingår i: Evolutionary Genomics. - Totowa, NJ : Humana Press. - 9781617795848 ; , s. 187-216
  • Bokkapitel (refereegranskat)abstract
    • This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions arc reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).
  •  
29.
  • Forslund, Kristoffer, et al. (författare)
  • OrthoDisease : tracking disease gene orthologs across 100 species
  • 2011
  • Ingår i: Briefings in Bioinformatics. - : Oxford University Press (OUP). - 1467-5463 .- 1477-4054. ; 12:5, s. 463-473
  • Tidskriftsartikel (refereegranskat)abstract
    • Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.
  •  
30.
  • Forslund, Kristoffer, et al. (författare)
  • Predicting protein function from domain content
  • 2008
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:15, s. 1681-1687
  • Tidskriftsartikel (refereegranskat)abstract
    • MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions. RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains. AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar
  •  
31.
  • Friedrich, Stefanie, et al. (författare)
  • Fusion transcript detection using spatial transcriptomics
  • 2020
  • Ingår i: BMC Medical Genomics. - : Springer Science and Business Media LLC. - 1755-8794. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Fusion transcripts are involved in tumourigenesis and play a crucial role in tumour heterogeneity, tumour evolution and cancer treatment resistance. However, fusion transcripts have not been studied at high spatial resolution in tissue sections due to the lack of full-length transcripts with spatial information. New high-throughput technologies like spatial transcriptomics measure the transcriptome of tissue sections on almost single-cell level. While this technique does not allow for direct detection of fusion transcripts, we show that they can be inferred using the relative poly(A) tail abundance of the involved parental genes.Method: We present a new method STfusion, which uses spatial transcriptomics to infer the presence and absence of poly(A) tails. A fusion transcript lacks a poly(A) tail for the 5 ' gene and has an elevated number of poly(A) tails for the 3 ' gene. Its expression level is defined by the upstream promoter of the 5 ' gene. STfusion measures the difference between the observed and expected number of poly(A) tails with a novel C-score.Results: We verified the STfusion ability to predict fusion transcripts on HeLa cells with known fusions. STfusion and C-score applied to clinical prostate cancer data revealed the spatial distribution of the cis-SAGeSLC45A3-ELK4in 12 tissue sections with almost single-cell resolution. The cis-SAGe occurred in disease areas, e.g. inflamed, prostatic intraepithelial neoplastic, or cancerous areas, and occasionally in normal glands.Conclusions: STfusion detects fusion transcripts in cancer cell line and clinical tissue data, and distinguishes chimeric transcripts from chimeras caused by trans-splicing events. With STfusion and the use of C-scores, fusion transcripts can be spatially localised in clinical tissue sections on almost single cell level.
  •  
32.
  • Friedrich, Stefanie, et al. (författare)
  • MetaCNV-a consensus approach to infer accurate copy numbers from low coverage data
  • 2020
  • Ingår i: BMC Medical Genomics. - : Springer Science and Business Media LLC. - 1755-8794. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The majority of copy number callers requires high read coverage data that is often achieved with elevated material input, which increases the heterogeneity of tissue samples. However, to gain insights into smaller areas within a tissue sample, e.g. a cancerous area in a heterogeneous tissue sample, less material is used for sequencing, which results in lower read coverage. Therefore, more focus needs to be put on copy number calling that is sensitive enough for low coverage data.Results: We present MetaCNV, a copy number caller that infers reliable copy numbers for human genomes with a consensus approach. MetaCNV specializes in low coverage data, but also performs well on normal and high coverage data. MetaCNV integrates the results of multiple copy number callers and infers absolute and unbiased copy numbers for the entire genome. MetaCNV is based on a meta-model that bypasses the weaknesses of current calling models while combining the strengths of existing approaches. Here we apply MetaCNV based on ReadDepth, SVDetect, and CNVnator to real and simulated datasets in order to demonstrate how the approach improves copy number calling.Conclusions: MetaCNV, available at https://bitbucket.org/sonnhammergroup/metacnv, provides accurate copy number prediction on low coverage data and performs well on high coverage data.
  •  
33.
  • Frings, Oliver, 1982-, et al. (författare)
  • MGclus : Network clustering employing shared neighbors
  • 2013
  • Ingår i: Molecular BioSystems. - : Royal Society of Chemistry (RSC). - 1742-206X .- 1742-2051. ; 9:7, s. 1670-1675
  • Tidskriftsartikel (refereegranskat)abstract
    • Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from http://sonnhammer.sbc.su.se/download/software/ MGclus/.
  •  
34.
  • Frings, Oliver, et al. (författare)
  • Network Analysis of Functional Genomics Data : Application to Avian Sex-Biased Gene Expression
  • 2012
  • Ingår i: Scientific World Journal. - : Hindawi Limited. - 1537-744X. ; , s. 130491-
  • Tidskriftsartikel (refereegranskat)abstract
    • Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.
  •  
35.
  • Frings, Oliver, et al. (författare)
  • Prognostic Significance in Breast Cancer of a Gene Signature Capturing Stromal PDGF Signaling
  • 2013
  • Ingår i: American Journal of Pathology. - : Elsevier BV. - 0002-9440 .- 1525-2191. ; 182:6, s. 2037-2047
  • Tidskriftsartikel (refereegranskat)abstract
    • In this study, we describe a novel gene expression signature of platelet-derived growth factor (PDGF) activated fibroblasts, which is able to identify breast cancers with a PDGF-stimulated fibroblast stroma and displays an independent and strong prognostic significance. Global gene expression was compared between PDGF-stimulated human fibroblasts and cultured resting fibroblasts. The most differentially expressed genes were reduced to a gene expression signature of 113 genes. The biological significance and prognostic capacity of this signature were investigated using four independent clinical breast cancer data sets. Concomitant high expression of PDGF beta receptor and its cognate Ligands is associated with a high PDGF signature score. This supports the notion that the signature detects tumors with PDGF-activated stroma. Subsequent analyses indicated significant associations between high PDGF signature score and clinical characteristics, including human epidermal growth factor receptor 2 positivity, estrogen receptor negativity, high tumor grade, and large tumor size. A high PDGF signature score is associated with shorter survival in univariate analysis. Furthermore, the high PDGF signature score acts as a significant marker of poor prognosis in multivariate survival analyses, including classic prognostic markers, Ki-67 status, a proliferation gene signature, or other recently described stroma-derived gene expression signatures.
  •  
36.
  • Guala, Dimitri, et al. (författare)
  • A large-scale benchmark of gene prioritization methods
  • 2017
  • Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 7
  • Tidskriftsartikel (refereegranskat)abstract
    • In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
  •  
37.
  • Guala, Dimitri, et al. (författare)
  • Experimental validation of predicted cancer genes using FRET
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Huge amounts of data are generated in genome wide experiments, designed to investigatediseases with complex genetic causes. Follow up of all potential leads produced by suchexperiments is currently cost prohibitive and time consuming. Gene prioritization toolsalleviate these constraints by directing further experimental efforts towards the mostpromising candidate targets. Recently a gene prioritization tool called MaxLink was shown tooutperform other widely used state-of-the-art prioritization tools in a large scale in silicobenchmark. An experimental validation of predictions made by MaxLink has however beenlacking. In this study we used Fluorescent Resonance Energy Transfer, an establishedexperimental technique for detection of protein-protein interactions, to validate potentialcancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink forselection of new targets in the battle with polygenic diseases.
  •  
38.
  • Guala, Dimitri, et al. (författare)
  • Experimental validation of predicted cancer genes using FRET
  • 2018
  • Ingår i: METHODS AND APPLICATIONS IN FLUORESCENCE. - : IOP PUBLISHING LTD. - 2050-6120. ; 6:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.
  •  
39.
  •  
40.
  • Guala, Dimitri, 1979- (författare)
  • Functional association networks for disease gene prediction
  • 2017
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Mapping of the human genome has been instrumental in understanding diseasescaused by changes in single genes. However, disease mechanisms involvingmultiple genes have proven to be much more elusive. Their complexityemerges from interactions of intracellular molecules and makes them immuneto the traditional reductionist approach. Only by modelling this complexinteraction pattern using networks is it possible to understand the emergentproperties that give rise to diseases.The overarching term used to describe both physical and indirect interactionsinvolved in the same functions is functional association. FunCoup is oneof the most comprehensive networks of functional association. It uses a naïveBayesian approach to integrate high-throughput experimental evidence of intracellularinteractions in humans and multiple model organisms. In the firstupdate, both the coverage and the quality of the interactions, were increasedand a feature for comparing interactions across species was added. The latestupdate involved a complete overhaul of all data sources, including a refinementof the training data and addition of new class and sources of interactionsas well as six new species.Disease-specific changes in genes can be identified using high-throughputgenome-wide studies of patients and healthy individuals. To understand theunderlying mechanisms that produce these changes, they can be mapped tocollections of genes with known functions, such as pathways. BinoX wasdeveloped to map altered genes to pathways using the topology of FunCoup.This approach combined with a new random model for comparison enables BinoXto outperform traditional gene-overlap-based methods and other networkbasedtechniques.Results from high-throughput experiments are challenged by noise and biases,resulting in many false positives. Statistical attempts to correct for thesechallenges have led to a reduction in coverage. Both limitations can be remediedusing prioritisation tools such as MaxLink, which ranks genes using guiltby association in the context of a functional association network. MaxLink’salgorithm was generalised to work with any disease phenotype and its statisticalfoundation was strengthened. MaxLink’s predictions were validatedexperimentally using FRET.The availability of prioritisation tools without an appropriate way to comparethem makes it difficult to select the correct tool for a problem domain.A benchmark to assess performance of prioritisation tools in terms of theirability to generalise to new data was developed. FunCoup was used for prioritisationwhile testing was done using cross-validation of terms derived fromGene Ontology. This resulted in a robust and unbiased benchmark for evaluationof current and future prioritisation tools. Surprisingly, previously superiortools based on global network structure were shown to be inferior to a localnetwork-based tool when performance was analysed on the most relevant partof the output, i.e. the top ranked genes.This thesis demonstrates how a network that models the intricate biologyof the cell can contribute with valuable insights for researchers that study diseaseswith complex genetic origins. The developed tools will help the researchcommunity to understand the underlying causes of such diseases and discovernew treatment targets. The robust way to benchmark such tools will help researchersto select the proper tool for their problem domain.
  •  
41.
  • Guala, Dimitri, 1979-, et al. (författare)
  • Network Crosstalk as a Basis for Drug Repurposing
  • 2022
  • Ingår i: Frontiers in Genetics. - : Frontiers Media SA. - 1664-8021. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • The need for systematic drug repurposing has seen a steady increase over the past decade and may be particularly valuable to quickly remedy unexpected pandemics. The abundance of functional interaction data has allowed mapping of substantial parts of the human interactome modeled using functional association networks, favoring network-based drug repurposing. Network crosstalk-based approaches have never been tested for drug repurposing despite their success in the related and more mature field of pathway enrichment analysis. We have, therefore, evaluated the top performing crosstalk-based approaches for drug repurposing. Additionally, the volume of new interaction data as well as more sophisticated network integration approaches compelled us to construct a new benchmark for performance assessment of network-based drug repurposing tools, which we used to compare network crosstalk-based methods with a state-of-the-art technique. We find that network crosstalk-based drug repurposing is able to rival the state-of-the-art method and in some cases outperform it.
  •  
42.
  • Haider, Christian, et al. (författare)
  • TreeDom : a graphical web tool for analysing domain architecture evolution
  • 2016
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 32:15, s. 2384-2385
  • Tidskriftsartikel (refereegranskat)abstract
    • We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence.
  •  
43.
  •  
44.
  • Henricson, Anna, et al. (författare)
  • Orthology confers intron position conservation
  • 2010
  • Ingår i: BMC Genomics. - : Springer Science and Business Media LLC. - 1471-2164. ; 11:412
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence. Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity. Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.
  •  
45.
  • Hillerton, Thomas, et al. (författare)
  • Fast and accurate gene regulatory network inference by normalized least squares regression
  • 2022
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 38:8, s. 2263-2268
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.
  •  
46.
  • Hillerton, Thomas, 1992- (författare)
  • In silico modelling for refining gene regulatory network inference
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Gene regulation is at the centre of all cellular functions, regulating the cell's healthy and pathological responses. The interconnected system of regulatory interactions is known as the gene regulatory network (GRN), where genes influence each other to maintain strict and robust control. Today a large number of methods exist for inferring GRNs, which necessitates benchmarking to determine which method is most suitable for a specific goal. Paper I presents such a benchmark focusing on the effect of using known perturbations to infer GRNs. A further challenge when studying GRNs is that experimental data contains high levels of noise and that artefacts may be introduced by the experiment itself. The LSCON method was developed in paper II to reduce the effect of one such artefact that can occur if the expression of a gene shows no or minimal change across most or all experiments.  With few fully determined biological GRNs available, it is problematic to use these to evaluate an inference method's correctness. Instead, the GRN field relies on simulated data, using a known GRN and generating the corresponding data. When simulating GRNs, capturing the topological properties of the biological GRN is vital. The FFLatt algorithm was developed in paper III to create scale-free, feed-forward loop motif-enriched GRNs, capturing two of the most prominent topological features in biological GRNs.  Once a high-quality GRN is obtained, the next step is to simulate gene expression data corresponding to the GRN. In paper IV, building on the FFLatt method, an open-source Python simulation tool called GeneSNAKE was developed to generate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties and improves on previous tools by featuring a variety of perturbation schemes along with the ability to control noise and modify the perturbation strength.
  •  
47.
  • Hollich, Volker, et al. (författare)
  • PfamAlyzer : domain-centric homology search
  • 2007
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811 .- 1460-2059. ; 23:24, s. 3382-3383
  • Tidskriftsartikel (refereegranskat)abstract
    • PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture.
  •  
48.
  • Hong, Junmei, et al. (författare)
  • Focusing on RISC assembly in mammalian cells.
  • 2008
  • Ingår i: Biochem Biophys Res Commun. - : Elsevier BV. - 1090-2104 .- 0006-291X. ; 368:3, s. 703-8
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • RISC (RNA-induced silencing complex) is a central protein complex in RNAi, into which a siRNA strand is assembled to become effective in gene silencing. By using an in vitro RNAi reaction based on Drosophila embryo extract, an asymmetric model was recently proposed for RISC assembly of siRNA strands, suggesting that the strand that is more loosely paired at its 5' end is selectively assembled into RISC and results in target gene silencing. However, in the present study, we were unable to establish such a correlation in cell-based RNAi assays, as well as in large-scale RNAi data analyses. This suggests that the thermodynamic stability of siRNA is not a major determinant of gene silencing in mammalian cells. Further studies on fork siRNAs showed that mismatch at the 5' end of the siRNA sense strand decreased RISC assembly of the antisense strand, but surprisingly did not increase RISC assembly of the sense strand. More interestingly, measurements of melting temperature showed that the terminal stability of fork siRNAs correlated with the positions of the mismatches, but not gene silencing efficacy. In summary, our data demonstrate that there is no definite correlation between siRNA stability and gene silencing in mammalian cells, which suggests that instead of thermodynamic stability, other features of the siRNA duplex contribute to RISC assembly in RNAi.
  •  
49.
  • Kaduk, Mateusz, et al. (författare)
  • HieranoiDB : a database of orthologs inferred by Hieranoid
  • 2017
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 45:D1, s. D687-D690
  • Tidskriftsartikel (refereegranskat)abstract
    • HieranoiDB (http://hieranoiDB.sbc.su.se) is a freely available on-line database for hierarchical groups of orthologs inferred by the Hieranoid algorithm. It infers orthologs at each node in a species guide tree with the InParanoid algorithm as it progresses from the leaves to the root. Here we present a database HieranoiDB with a web interface that makes it easy to search and visualize the output of Hieranoid, and to download it in various formats. Searching can be performed using protein description, identifier or sequence. In this first version, orthologs are available for the 66 Quest for Orthologs reference proteomes. The ortholog trees are shown graphically and interactively with marked speciation and duplication nodes that show the inferred evolutionary scenario, and allow for correct extraction of predicted orthologs from the Hieranoid trees.
  •  
50.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 97
Typ av publikation
tidskriftsartikel (88)
annan publikation (4)
doktorsavhandling (4)
bokkapitel (1)
Typ av innehåll
refereegranskat (87)
övrigt vetenskapligt/konstnärligt (10)
Författare/redaktör
Sonnhammer, Erik L L (91)
Forslund, Kristoffer (11)
Tjärnberg, Andreas (10)
Alexeyenko, Andrey (8)
Guala, Dimitri (8)
Nelander, Sven (7)
visa fler...
Hillerton, Thomas (7)
Nordling, Torbjörn E ... (7)
Käll, Lukas, 1969- (5)
Schmitt, Thomas (5)
Frings, Oliver (5)
Bateman, Alex (5)
Finn, Robert D. (5)
Messina, David N. (5)
Mistry, Jaina (5)
Eddy, Sean R. (5)
Seçilmiş, Deniz, 199 ... (5)
Helleday, Thomas (4)
Ogris, Christoph (4)
Lundberg, Emma (4)
Östlund, Gabriel, 19 ... (4)
Castresana-Aguirre, ... (4)
Guala, Dimitri, 1979 ... (4)
Lassmann, Timo (4)
Abhiman, Saraswathi (3)
Brismar, Hjalmar (3)
Bernhem, Kristoffer (3)
Holm, Liisa (3)
Heger, Andreas (3)
Schreiber, Fabian (3)
Krogh, Anders (3)
Kaduk, Mateusz (3)
Ait Blal, Hammou (3)
Persson, Emma, 1991- (3)
Tate, John (3)
Hollich, Volker (3)
Henricson, Anna (3)
Sonnhammer, Erik L L ... (3)
Käll, Lukas (2)
Sjölund, Erik (2)
Friedrich, Stefanie (2)
Castresana-Aguirre, ... (2)
Clements, Jody (2)
Eberhardt, Ruth Y. (2)
Punta, Marco (2)
Ceric, Goran (2)
Frings, Oliver, 1982 ... (2)
Sonnhammer, Erik L (2)
Klammer, Martin (2)
Roopra, Sanjit (2)
visa färre...
Lärosäte
Stockholms universitet (94)
Karolinska Institutet (28)
Kungliga Tekniska Högskolan (17)
Uppsala universitet (10)
Linköpings universitet (3)
Umeå universitet (1)
visa fler...
Lunds universitet (1)
visa färre...
Språk
Engelska (97)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (73)
Medicin och hälsovetenskap (10)
Teknik (3)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy