SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Sonnhammer Erik Professor) "

Sökning: WFRF:(Sonnhammer Erik Professor)

  • Resultat 1-10 av 15
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Guala, Dimitri, 1979- (författare)
  • Functional association networks for disease gene prediction
  • 2017
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Mapping of the human genome has been instrumental in understanding diseasescaused by changes in single genes. However, disease mechanisms involvingmultiple genes have proven to be much more elusive. Their complexityemerges from interactions of intracellular molecules and makes them immuneto the traditional reductionist approach. Only by modelling this complexinteraction pattern using networks is it possible to understand the emergentproperties that give rise to diseases.The overarching term used to describe both physical and indirect interactionsinvolved in the same functions is functional association. FunCoup is oneof the most comprehensive networks of functional association. It uses a naïveBayesian approach to integrate high-throughput experimental evidence of intracellularinteractions in humans and multiple model organisms. In the firstupdate, both the coverage and the quality of the interactions, were increasedand a feature for comparing interactions across species was added. The latestupdate involved a complete overhaul of all data sources, including a refinementof the training data and addition of new class and sources of interactionsas well as six new species.Disease-specific changes in genes can be identified using high-throughputgenome-wide studies of patients and healthy individuals. To understand theunderlying mechanisms that produce these changes, they can be mapped tocollections of genes with known functions, such as pathways. BinoX wasdeveloped to map altered genes to pathways using the topology of FunCoup.This approach combined with a new random model for comparison enables BinoXto outperform traditional gene-overlap-based methods and other networkbasedtechniques.Results from high-throughput experiments are challenged by noise and biases,resulting in many false positives. Statistical attempts to correct for thesechallenges have led to a reduction in coverage. Both limitations can be remediedusing prioritisation tools such as MaxLink, which ranks genes using guiltby association in the context of a functional association network. MaxLink’salgorithm was generalised to work with any disease phenotype and its statisticalfoundation was strengthened. MaxLink’s predictions were validatedexperimentally using FRET.The availability of prioritisation tools without an appropriate way to comparethem makes it difficult to select the correct tool for a problem domain.A benchmark to assess performance of prioritisation tools in terms of theirability to generalise to new data was developed. FunCoup was used for prioritisationwhile testing was done using cross-validation of terms derived fromGene Ontology. This resulted in a robust and unbiased benchmark for evaluationof current and future prioritisation tools. Surprisingly, previously superiortools based on global network structure were shown to be inferior to a localnetwork-based tool when performance was analysed on the most relevant partof the output, i.e. the top ranked genes.This thesis demonstrates how a network that models the intricate biologyof the cell can contribute with valuable insights for researchers that study diseaseswith complex genetic origins. The developed tools will help the researchcommunity to understand the underlying causes of such diseases and discovernew treatment targets. The robust way to benchmark such tools will help researchersto select the proper tool for their problem domain.
  •  
2.
  • Tångrot, Jeanette, 1974- (författare)
  • Structural Information and Hidden Markov Models for Biological Sequence Analysis
  • 2008
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Bioinformatics is a fast-developing field, which makes use of computational methods to analyse and structure biological data. An important branch of bioinformatics is structure and function prediction of proteins, which is often based on finding relationships to already characterized proteins. It is known that two proteins with very similar sequences also share the same 3D structure. However, there are many proteins with similar structures that have no clear sequence similarity, which make it difficult to find these relationships. In this thesis, two methods for annotating protein domains are presented, one aiming at assigning the correct domain family or families to a protein sequence, and the other aiming at fold recognition. Both methods use hidden Markov models (HMMs) to find related proteins, and they both exploit the fact that structure is more conserved than sequence, but in two different ways. Most of the research presented in the thesis focuses on the structure-anchored HMMs, saHMMs. For each domain family, an saHMM is constructed from a multiple structure alignment of carefully selected representative domains, the saHMM-members. These saHMM-members are collected in the so called "midnight ASTRAL set", and are chosen so that all saHMM-members within the same family have mutual sequence identities below a threshold of about 20%. In order to construct the midnight ASTRAL set and the saHMMs, a pipe-line of software tools are developed. The saHMMs are shown to be able to detect the correct family relationships at very high accuracy, and perform better than the standard tool Pfam in assigning the correct domain families to new domain sequences. We also introduce the FI-score, which is used to measure the performance of the saHMMs, in order to select the optimal model for each domain family. The saHMMs are made available for searching through the FISH server, and can be used for assigning family relationships to protein sequences. The other approach presented in the thesis is secondary structure HMMs (ssHMMs). These HMMs are designed to use both the sequence and the predicted secondary structure of a query protein when scoring it against the model. A rigorous benchmark is used, which shows that HMMs made from multiple sequences result in better fold recognition than those based on single sequences. Adding secondary structure information to the HMMs improves the ability of fold recognition further, both when using true and predicted secondary structures for the query sequence.
  •  
3.
  • Castresana Aguirre, Miguel, 1991- (författare)
  • From networks to pathway analysis
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Biological mechanisms stem from complex intracellular interactions spanning across different levels of regulation. Mapping these interactions is fundamental for the understanding of all types of biological conditions, including complex diseases. Each experimental approach carries its own bias and noise. Combining heterogeneous data sources reduces noise and gives a broader sense of the interactions between genes known as functional association, where both direct and indirect interactions are captured.FunCoup is one of the most comprehensive functional association databases, providing networks for 22 organisms in all domains of life. FunCoup uses a naïve Bayesian integration approach to combine 11 different data types and increases the coverage by transferring associations between species via orthologs. Additional insights into the mechanisms of a gene network are provided through tissue specificity filtering and directed regulatory links.Even though FunCoup provides a comprehensive map of the intracellular machinery, gaining insights into conditions such as diseases requires a functional level analysis rather than a gene level analysis. Thus, studying genes that are involved in a condition from a functional perspective requires the usage of pathway enrichment analysis. Several approaches exist, from basic gene overlap to more elaborate analyses that use functional association networks. ANUBIX is a novel network-based analysis (NBA) method that overcomes the high false positive rate issue that previous state-of-the-art NBA approaches have. Additionally, even with accurate methods, a commonly ignored problem is that gene sets derived from experiments are often noisy or contain multiple mechanisms, mixing different pathways which weakens their association to the condition under study. To increase the sensitivity of pathway analysis, we developed a pipeline to cluster gene sets into more homogeneous parts with the aim of unraveling all the mechanisms activated in the studied condition. To facilitate the usage of these tools, we built a web server called PathBIX, a user-friendly platform that allows interactive analysis of all species in FunCoup against multiple pathway databases.
  •  
4.
  • Frings, Oliver, 1982- (författare)
  • Network and gene expression analyses for understanding protein function
  • 2013
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Biological function is the result of a complex network of functional associations between genes or their products. Modeling the dynamics underlying biological networks is one of the big challenges in bioinformatics. A first step towards solving this problem is to predict and study the networks of functional associations underlying various conditions.An improved version of the FunCoup network inference method that features networks for three new species and updated versions of the existing networks is presented. Network clustering, i.e. partitioning networks into highly connected components is an important tool for network analysis. We developed MGclus, a clustering method for biological networks that scores shared network neighbors. We found MGclus to perform favorably compared to other methods popular in the field. Studying sets of experimentally derived genes in the context of biological networks is a common strategy to shed light on their underlying biology. The CrossTalkZ method presented in this work assesses the statistical significance of crosstalk enrichment, i.e. the extent of connectivity between or within groups of functionally coupled genes or proteins in biological networks. We further demonstrate that CrossTalkZ is a valuable method to functionally annotate experimentally derived gene sets.Males and females differ in the expression of an extensive number of genes. The methods developed in the first part of this work were applied to study sex-biased genes in chicken and several network properties related to the molecular mechanisms of sex-biased gene regulation in chicken were deduced. Cancer studies have shown that tumor progression is strongly determined by the tumor microenvironment. We derived a gene expression signature of PDGF-activated fibroblasts that shows a strong prognostic significance in breast cancer in univariate and multivariate survival analyses when compared to established markers for prognosis.
  •  
5.
  • Hillerton, Thomas, 1992- (författare)
  • In silico modelling for refining gene regulatory network inference
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Gene regulation is at the centre of all cellular functions, regulating the cell's healthy and pathological responses. The interconnected system of regulatory interactions is known as the gene regulatory network (GRN), where genes influence each other to maintain strict and robust control. Today a large number of methods exist for inferring GRNs, which necessitates benchmarking to determine which method is most suitable for a specific goal. Paper I presents such a benchmark focusing on the effect of using known perturbations to infer GRNs. A further challenge when studying GRNs is that experimental data contains high levels of noise and that artefacts may be introduced by the experiment itself. The LSCON method was developed in paper II to reduce the effect of one such artefact that can occur if the expression of a gene shows no or minimal change across most or all experiments.  With few fully determined biological GRNs available, it is problematic to use these to evaluate an inference method's correctness. Instead, the GRN field relies on simulated data, using a known GRN and generating the corresponding data. When simulating GRNs, capturing the topological properties of the biological GRN is vital. The FFLatt algorithm was developed in paper III to create scale-free, feed-forward loop motif-enriched GRNs, capturing two of the most prominent topological features in biological GRNs.  Once a high-quality GRN is obtained, the next step is to simulate gene expression data corresponding to the GRN. In paper IV, building on the FFLatt method, an open-source Python simulation tool called GeneSNAKE was developed to generate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties and improves on previous tools by featuring a variety of perturbation schemes along with the ability to control noise and modify the perturbation strength.
  •  
6.
  • Kaduk, Mateusz, 1985- (författare)
  • Functional Inference from Orthology and Domain Architecture
  • 2018
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Proteins are the basic building blocks of all living organisms. They play a central role in determining the structure of living beings and are required for essential chemical reactions. One of the main challenges in bioinformatics is to characterize the function of all proteins. The problem of understanding protein function can be approached by understanding their evolutionary history. Orthology analysis plays an important role in studying the evolutionary relation of proteins. Proteins are termed orthologs if they derive from a single gene in the species' last common ancestor, i.e. if they were separated by a speciation event. Orthologs are useful because they retain their function more often than other homologs. Inference of a complete set of orthologs for many species is computationally intensive. Currently, the fastest algorithms rely on graph-based approaches, which compare all-vs-all sequences and then cluster top hits into groups of orthologs. The initial step of performing all-vs-all comparisons is usually the primary computational challenge as it scales quadratically with the number of species. A new, more scalable and less computationally demanding method was developed to solve this problem without sacrificing accuracy. The Hieranoid 2 algorithm reduces computational complexity to almost linear by overcoming the necessity to perform all-vs-all similarity searches. The algorithm progresses along a known species tree, from leaves to root. Starting at the leaves, ortholog groups are predicted conventionally and then summarized at internal nodes to form pseudo-species. These pseudo-species are then re-used to search against other (pseudo-)species higher in the tree. This way the algorithm aggregates new ortholog groups hierarchically. The hierarchy is a natural structure to store and view large multi-species ortholog groups, and provides a complete picture of inferred evolutionary events. To facilitate explorative analysis of hierarchical groups of orthologs, a new online tool was created. The HieranoiDB website provides precomputed hierarchical groups of orthologs for a set of 66 species. It allows the user to search for orthology assignments using protein description, protein sequence, or species. Evolutionary events and meta information is added to the hierarchical groups of orthologs, which are shown graphically as interactive trees. This representation allows exploring, searching, and easier visual inspection of multi-species ortholog groups.The majority of orthology prediction methods focus on treating the whole protein sequence as a single evolutionary unit. However, proteins are often composed of individual units, called protein domains, that can have different evolutionary histories. To extend the full sequence based methodology to a domain-aware method, a new approach called Domainoid is proposed. Here, domains are extracted from full-length sequences and subjected to orthology inference. This allows Domainoid to find orthology that would be missed by a full sequence approach.Networks are a convenient graphical representation for showing a large number of functional associations between genes or proteins. They allow various analyses of graph properties, and can help visualize complex relationships. A framework for inferring comprehensive functional association networks was developed, called FunCoup. A major difference compared to other networks is FunCoup's extensive use of orthology relationships between species, which significantly boosts its coverage. Using naïve Bayesian classifiers to integrate 10 different evidence types and orthology transfer, FunCoup captures functional associations of many types, and provides comprehensive networks for 17 species across five gold-standards.
  •  
7.
  • Morgan, Daniel, 1988- (författare)
  • Towards Reliable Gene Regulatory Network Inference
  • 2019
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Phenotypic traits are now known to stem from the interplay between genetic variables across many if not every level of biology. The field of gene regulatory network (GRN) inference is concerned with understanding the regulatory interactions between genes in a cell, in order to build a model that captures the behaviour of the system. Perturbation biology, whereby genes or RNAs are targeted and their activity altered, is of great value for the GRN field. By first systematically perturbing the system and then reading the system's reaction as a whole, we can feed this data into various methods to reverse engineer the key agents of change.The initial study sets the groundwork for the rest, and deals with finding common ground among the sundry methods in order to compare and rank performance in an unbiased setting. The GeneSPIDER (GS) MATLAB package is an inference benchmarking platform whereby methods can be added via a wrapper for testing in competition with one another. Synthetic datasets and networks spanning a wide range of conditions can be created for this purpose. The evaluation of methods across various conditions in the benchmark therein demonstrates which properties influence the accuracy of which methods, and thus which are more suitable for use under given characterized condition.The second study introduces a novel framework NestBoot for increasing inference accuracy within the GS environment by independent, nested bootstraps, \ie repeated inference trials. Under low to medium noise levels, this allows support to be gathered for links occurring most often while spurious links are discarded through comparison to an estimated null distribution of shuffled-links. While noise continues to plague every method, nested bootstrapping in this way is shown to increase the accuracy of several different methods.The third study applies NestBoot on real data to infer a reliable GRN from an small interfering RNA (siRNA) perturbation dataset covering 40 genes known or suspected to have a role in human cancers. Methods were developed to benchmark the accuracy of an inferred GRN in the absence of a true known GRN, by assessing how well it fits the data compared to a null model of shuffled topologies. A network of high confidence was recovered containing many regulatory links known in the literature, as well as a slew of novel links.The fourth study seeks to infer reliable networks on large scale, utilizing the high dimensional biological datasets of the LINCS L1000 project.  This dataset has too much noise for accurate GRN inference as a whole, hence we developed a method to select a  subset that is sufficiently informative to accurately infer GRNs. This is a first step in the direction of identifying probable submodules within a greater genome-scale GRN yet to be uncovered.
  •  
8.
  • Ogris, Christoph, 1985- (författare)
  • Global functional association network inference and crosstalk analysis for pathway annotation
  • 2017
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Cell functions are steered by complex interactions of gene products, like forming a temporary or stable complex, altering gene expression or catalyzing a reaction. Mapping these interactions is the key in understanding biological processes and therefore is the focus of numerous experiments and studies. Small-scale experiments deliver high quality data but lack coverage whereas high-throughput techniques cover thousands of interactions but can be error-prone. Unfortunately all of these approaches can only focus on one type of interaction at the time. This makes experimental mapping of the genome-wide network a cost and time intensive procedure. However, to overcome these problems, different computational approaches have been suggested that integrate multiple data sets and/or different evidence types. This widens the stringent definition of an interaction and introduces a more general term - functional association. FunCoup is a database for genome-wide functional association networks of Homo sapiens and 16 model organisms. FunCoup distinguishes between five different functional associations: co-membership in a protein complex, physical interaction, participation in the same signaling cascade, participation in the same metabolic process and for prokaryotic species, co-occurrence in the same operon. For each class, FunCoup applies naive Bayesian integration of ten different evidence types of data, to predict novel interactions. It further uses orthologs to transfer interaction evidence between species. This considerably increases coverage, and allows inference of comprehensive networks even for not well studied organisms. BinoX is a novel method for pathway analysis and determining the relation between gene sets, using functional association networks. Traditionally, pathway annotation has been done using gene overlap only, but these methods only get a small part of the whole picture. Placing the gene sets in context of a network provides additional evidence for pathway analysis, revealing a global picture based on the whole genome.PathwAX is a web server based on the BinoX algorithm. A user can input a gene set and get online network crosstalk based pathway annotation. PathwAX uses the FunCoup networks and 280 pre-defined pathways. Most runs take just a few seconds and the results are summarized in an interactive chart the user can manipulate to gain further insights of the gene set's pathway associations.
  •  
9.
  • Persson, Emma, 1991- (författare)
  • Big data networks and orthology analysis
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Understanding biological systems in complex organisms is important in life science in order to comprehend the interplay of genes, proteins, and compounds causing complex diseases. As biological systems are intricate, bioinformatics tools, models, and algorithms are of the utmost importance to understand the bigger picture and decipher biological meaning from the vast amounts of information available from biological experiments and predictions. Bioinformatics programs and algorithms do not only depend on information from experiments, but also on information generated from other tools in order to draw accurate conclusions and make predictions. Prediction of orthologs, genes having a common ancestry, separated by a speciation event, are important building blocks for a wide variety of tools and analysis pipelines, as they can be used to transfer gene function between species. Orthologs can for example be used to map genes of model organisms to genes in humans in studies of drug targets. They are extensively used in functional association networks in order to transfer information between species. Functional association networks are models of associations between genes or proteins, where associations can be derived from experimental evidence of different types, from the species itself, or transferred from other species using orthologs. The networks can be used to explore the context and neighbors of a gene, but also for a variety of higher-level analyses, e.g. network-based pathway enrichment analysis. In pathway enrichment analysis the networks can be utilized to contextualize experimental gene sets and annotate them with biological functions. As these tools depend on each other, it is of great importance that the networks used in pathway enrichment analysis are comprehensive and accurate, and that the orthologs used in the networks are relevant and significant. In this thesis, the development and improvement of five bioinformatics tools within three areas of bioinformatics are presented. Despite the tools residing within slightly different areas, they all rely on each other, and can all on different levels improve our understanding of biological functions and biological meaning, from the level of orthology analysis to functional association networks to pathway enrichment analysis.
  •  
10.
  • Seçilmiş, Deniz, 1991- (författare)
  • Improving the accuracy of gene regulatory network inference from noisy data
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Gene regulatory networks (GRNs) control physiological and pathological processes in a living organism, and their accurate inference from measured gene expression can identify therapeutic mechanisms for complex diseases such as cancers. The biggest obstacle in achieving the accurate reconstruction of GRNs is called ‘noise’, which considerably alters the measured gene expression because the noise generally dominates the biological signal. This situation needs to be addressed carefully so that GRN inference methods do not estimate a fit to the noise instead of the underlying biological signal. Potential noise compensation approaches are a must if the goal is to reconstruct the true system. To this end, within the scope of this doctoral thesis, I developed two methods that, in different ways, overcome the obstacles introduced by noise in gene expression data. Method 1 allows the collection of more informative subsets of genes whose expression is not as highly affected as those which cause the system to be overall uninformative. Method 2 infers a perturbation design that is better suited to the gene expression data than the originally intended design, and therefore produces more accurate GRNs at high noise levels. Furthermore, a benchmark study was carried out which compares the methodological backgrounds of GRN inference methods in terms of whether they utilize knowledge of the perturbation design or not, which clearly shows that utilization of the perturbation design is essential for accurate inference of GRNs. Finally a method is presented to improve GRN inference accuracy by selecting the GRN with the optimal sparsity based on information theoretical criteria. The three new methods (PAPERS I, II and IV) can also be used together, which is shown in this thesis to improve the GRN inference accuracy considerably more than the methods separately. As inference of accurate GRNs is a major challenge in gene regulation, the methods presented in this thesis represent an important contribution to move the field forward.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 15

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy