SwePub - sökning: WFRF:(Sonnhammer Erik L L)

Numrering	Referens	Omslagsbild	Hitta
1.	Berglund, Ann-Charlotte, et al. (författare) InParanoid 6 : eukaryotic ortholog clusters with inparalogs 2008 Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 36, s. D263-D266 Tidskriftsartikel (refereegranskat)abstract The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.
2.	Guala, Dimitri, et al. (författare) MaxLink : network-based prioritization of genes tightly linked to a disease seed set 2014 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 30:18, s. 2689-2690 Tidskriftsartikel (refereegranskat)abstract A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.
3.	Hillerton, Thomas, et al. (författare) GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data. Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Understanding how genes interact with and regulate each other is a key challenge in systems biology. One of the primary methods to study this is through gene regulatory networks (GRNs). The field of GRN inference however faces many challenges, such as the complexity of gene regulation and high noise levels, which necessitates effective tools for evaluating inference methods. For this purpose, data that corresponds to a known GRN, from various conditions and experimental setups is necessary, which is only possible to attain via simulation. Existing tools for simulating data for GRN inference have limitations either in the way networks are constructed or data is produced, and are often not flexible for adjusting the algorithm or parameters. To overcome these issues we present GeneSNAKE, a Python package designed to allow users to generate biologically realistic GRNs, and from a GRN simulate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties. GeneSNAKE improves on previous work in the field by adding a perturbation model that allows for a greater range of perturbation schemes along with the ability to control noise and modify the perturbation strength. For benchmarking, GeneSNAKE offers a number of functions both for comparing a true GRN to an inferred GRN, and to study properties in data and GRN models. These functions can in addition be used to study properties of biological data to produce simulated data with more realistic properties. GeneSNAKE is an open-source, comprehensive simulation and benchmarking package with powerful capabilities that are not combined in any other single package, and thanks to the Python implementation it is simple to extend and modify by a user.
4.	Zhivkoplias, Erik K., et al. (författare) Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops 2022 Ingår i: Frontiers in Genetics. - : Frontiers Media SA. - 1664-8021. ; 13 Tidskriftsartikel (refereegranskat)abstract The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.
5.	Abhiman, Saraswathi, et al. (författare) Large-scale prediction of function shift in protein families with a focus on enzymatic function. 2005 Ingår i: Proteins. - : Wiley. - 1097-0134. ; 60:4, s. 758-68 Tidskriftsartikel (refereegranskat)
6.	Abhiman, Saraswathi, et al. (författare) Prediction of function divergence in protein families using the substitution rate variation parameter alpha. 2006 Ingår i: Mol Biol Evol. - : Oxford University Press (OUP). - 0737-4038 .- 1537-1719. ; 23:7, s. 1406-13 Tidskriftsartikel (refereegranskat)
7.	Alexeyenko, Andrey, et al. (författare) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. 2006 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1460-2059. ; 22:14, s. E9-E15 Tidskriftsartikel (refereegranskat)
8.	Alexeyenko, Andrey, et al. (författare) Chromosomal clustering of nuclear genes encoding mitochondrial and chloroplast proteins in Arabidopsis. 2006 Ingår i: Trends Genet. - : Elsevier BV. - 0168-9525. ; 22:11, s. 589-93 Tidskriftsartikel (refereegranskat)
9.	Alexeyenko, Andrey, et al. (författare) Comparative interactomics with Funcoup 2.0 2012 Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 40:D1, s. D821-D828 Tidskriftsartikel (refereegranskat)abstract FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.
10.	Alexeyenko, Andrey, et al. (författare) Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity 2010 Ingår i: PLOS ONE. - : Public Library of Science (PLoS). - 1932-6203. ; 5:5, s. e10465- Tidskriftsartikel (refereegranskat)abstract Background: In order to generate hypotheses regarding the mechanisms by which 2,3,7,8-tetrachlorodibenzo-p-dioxin (dioxin) causes toxicity, we analyzed global gene expression changes in developing zebrafish embryos exposed to this potent toxicant in the context of a dynamic gene network. For this purpose, we also computationally inferred a zebrafish (Danio rerio) interactome based on orthologs and interaction data from other eukaryotes. Methodology/Principal Findings: Using novel computational tools to analyze this interactome, we distinguished between dioxin-dependent and dioxin-independent interactions between proteins, and tracked the temporal propagation of dioxin-dependent transcriptional changes from a few genes that were altered initially, to large groups of biologically coherent genes at later times. The most notable processes altered at later developmental stages were calcium and iron metabolism, embryonic morphogenesis including neuronal and retinal development, a variety of mitochondria-related functions, and generalized stress response (not including induction of antioxidant genes). Within the interactome, many of these responses were connected to cytochrome P4501A (cyp1a) as well as other genes that were dioxin-regulated one day after exposure. This suggests that cyp1a may play a key role initiating the toxic dysregulation of those processes, rather than serving simply as a passive marker of dioxin exposure, as suggested by earlier research. Conclusions/Significance: Thus, a powerful microarray experiment coupled with a flexible interactome and multi-pronged interactome tools (which are now made publicly available for microarray analysis and related work) suggest the hypothesis that dioxin, best known in fish as a potent cardioteratogen, has many other targets. Many of these types of toxicity have been observed in mammalian species and are potentially caused by alterations to cyp1a.
11.	Alexeyenko, Andrey, et al. (författare) Global networks of functional coupling in eukaryotes from comprehensive data integration 2009 Ingår i: Genome Research. - : Cold Spring Harbor Laboratory. - 1088-9051 .- 1549-5469. ; 19:6, s. 1107-16 Tidskriftsartikel (refereegranskat)abstract No single experimental method can discover all connections in the interactome. A computational approach can help by integrating data from multiple, often unrelated, proteomics and genomics pipelines. Reconstructing global networks of functional coupling (FC) faces the challenges of scale and heterogeneity--how to efficiently integrate huge amounts of diverse data from multiple organisms, yet ensuring high accuracy. We developed FunCoup, an optimized Bayesian framework, to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions: physical interaction, protein complex member, metabolic, or signaling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimize the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 data sets in seven organisms and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network, only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. We show how FunCoup can be used for discovering candidate members of the Parkinson and Alzheimer pathways. Cross-species pathway conservation analysis provided further support to these observations.
12.	Barrientos-Somarribas, Mauricio, et al. (författare) Discovering viral genomes in human metagenomic data by predicting unknown protein families 2018 Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 8 Tidskriftsartikel (refereegranskat)abstract Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.
13.	Björkholm, Patrik, et al. (författare) Comparative analysis and unification of domain-domain interaction networks 2009 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:22, s. 3020-5 Tidskriftsartikel (refereegranskat)abstract MOTIVATION: Certain protein domains are known to preferentially interact with other domains. Several approaches have been proposed to predict domain-domain interactions, and over nine datasets are available. Our aim is to analyse the coverage and quality of the existing resources, as well as the extent of their overlap. With this knowledge, we have the opportunity to merge individual domain interaction networks to construct a comprehensive and reliable database. RESULTS: In this article we introduce a new approach towards comparing domain-domain interaction networks. This approach is used to compare nine predicted domain and protein interaction networks. The networks were used to generate a database of unified domain interactions, UniDomInt. Each interaction in the dataset is scored according to the benchmarked reliability of the sources. The performance of UniDomInt is an improvement compared to the underlying source networks and to another composite resource, Domine. AVAILABILITY: http://sonnhammer.sbc.su.se/download/UniDomInt/
14.	Buzzao, Davide, et al. (författare) TOPAS, a network-based approach to detect disease modules in a top-down fashion 2022 Ingår i: NAR Genomics and Bioinformatics. - : Oxford University Press (OUP). - 2631-9268. ; 4:4 Tidskriftsartikel (refereegranskat)abstract A vast scenario of potential disease mechanisms and remedies is yet to be discovered. The field of Network Medicine has grown thanks to the massive amount of high-throughput data and the emerging evidence that disease-related proteins form ‘disease modules’. Relying on prior disease knowledge, network-based disease module detection algorithms aim at connecting the list of known disease associated genes by exploiting interaction networks. Most existing methods extend disease modules by iteratively adding connector genes in a bottom-up fashion, while top-down approaches remain largely unexplored. We have created TOPAS, an iterative approach that aims at connecting the largest number of seed nodes in a top-down fashion through connectors that guarantee the highest flow of a Random Walk with Restart in a network of functional associations. We used a corpus of 382 manually selected functional gene sets to benchmark our algorithm against SCA, DIAMOnD, MaxLink and ROBUST across four interactomes. We demonstrate that TOPAS outperforms competing methods in terms of Seed Recovery Rate, Seed to Connector Ratio and consistency during module detection. We also show that TOPAS achieves competitive performance in terms of biological relevance of detected modules and scalability.
15.	Carreras-Puigvert, Jordi, et al. (författare) A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family 2017 Ingår i: Nature Communications. - : Nature Publishing Group. - 2041-1723. ; 8:1 Tidskriftsartikel (refereegranskat)abstract The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.
16.	Castresana-Aguirre, Miguel, 1991-, et al. (författare) Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis 2022 Ingår i: Frontiers in Genetics. - : Frontiers Media SA. - 1664-8021. ; 13 Tidskriftsartikel (refereegranskat)abstract Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each module. We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering can be beneficial by increasing the sensitivity of pathway analysis methods and by providing deeper insights of biological mechanisms related to the phenotype under study. However, keeping a high specificity is a challenge. For ANUBIX, clustering caused a minor loss of specificity, while for BinoX and NEAT it caused an unacceptable loss of specificity. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We show examples of this approach and conclude that clustering can improve overall pathway annotation performance, but should only be used if the used enrichment method has a low false positive rate.
17.	Castresana-Aguirre, Miguel, et al. (författare) PathBIX—a web server for network-based pathway annotation with adaptive null models 2021 Ingår i: Bioinformatics Advances. - : Oxford University Press (OUP). - 2635-0041. ; 1:1 Tidskriftsartikel (refereegranskat)abstract Motivation: Pathway annotation is a vital tool for interpreting and giving meaning to experimental data in life sciences. Numerous tools exist for this task, where the most recent generation of pathway enrichment analysis tools, network-based methods, utilize biological networks to gain a richer source of information as a basis of the analysis than merely the gene content. Network-based methods use the network crosstalk between the query gene set and the genes in known pathways, and compare this to a null model of random expectation.Results: We developed PathBIX, a novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.
18.	Castresana-Aguirre, Miguel, et al. (författare) Pathway-specific model estimation for improved pathway annotation by network crosstalk 2020 Ingår i: Scientific Reports. - : Springer Science and Business Media LLC. - 2045-2322. ; 10:1 Tidskriftsartikel (refereegranskat)abstract Pathway enrichment analysis is the most common approach for understanding which biological processes are affected by altered gene activities under specific conditions. However, it has been challenging to find a method that efficiently avoids false positives while keeping a high sensitivity. We here present a new network-based method ANUBIX based on sampling random gene sets against intact pathway. Benchmarking shows that ANUBIX is considerably more accurate than previous network crosstalk based methods, which have the drawback of modelling pathways as random gene sets. We demonstrate that ANUBIX does not have a bias for finding certain pathways, which previous methods do, and show that ANUBIX finds biologically relevant pathways that are missed by other methods.
19.	Chalk, Alistair M, et al. (författare) siRNA specificity searching incorporating mismatch tolerance data. 2008 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1460-2059. ; 24:10, s. 1316-1317 Tidskriftsartikel (refereegranskat)abstract Artificially synthesized short interfering RNAs (siRNAs) are widely used in functional genomics to knock down specific target genes. One ongoing challenge is to guarantee that the siRNA does not elicit off-target effects. Initial reports suggested that siRNAs were highly sequence-specific; however, subsequent data indicates that this is not necessarily the case. It is still uncertain what level of similarity and other rules are required for an off-target effect to be observed, and scoring schemes have not been developed to look beyond simple measures such as the number of mismatches or the number of consecutive matching bases present. We created design rules for predicting the likelihood of a non-specific effect and present a web server that allows the user to check the specificity of a given siRNA in a flexible manner using a combination of methods. The server finds potential off-target matches in the corresponding RefSeq database and ranks them according to a scoring system based on experimental studies of specificity. AVAILABILITY: The server is available at http://informatics-eskitis.griffith.edu.au/SpecificityServer.
20.	Dessimoz, Christophe, et al. (författare) Toward community standards in the quest for orthologs 2012 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 28:6, s. 900-904 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
21.	El-Gebali, Sara, et al. (författare) The Pfam protein families database in 2019 2019 Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 47:D1, s. D427-D432 Tidskriftsartikel (refereegranskat)abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families(EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
22.	Finn, Robert D, et al. (författare) Pfam : clans, web tools and services. 2006 Ingår i: Nucleic Acids Res. - : Oxford University Press (OUP). - 1362-4962 .- 0305-1048. ; 34:Database issue, s. D247-51 Tidskriftsartikel (refereegranskat)
23.	Finn, Robert D., et al. (författare) Pfam : the protein families database 2014 Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 42:D1, s. d222-D230 Tidskriftsartikel (refereegranskat)abstract Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
24.	Finn, Robert D., et al. (författare) The Pfam protein families database 2010 Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 38, s. d211-d222 Tidskriftsartikel (refereegranskat)abstract Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is similar to 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
25.	Forslund, Kristoffer, et al. (författare) Benchmarking homology detection procedures with low complexity filters 2009 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:19, s. 2500-2505 Tidskriftsartikel (refereegranskat)abstract BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences. RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed. CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated. AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "WFRF:(Sonnhammer Erik L L) "

Avgränsa träffmängd

År