SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Lagergren Jens Professor) "

Search: WFRF:(Lagergren Jens Professor)

  • Result 1-15 of 15
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Aguilar, Xavier (author)
  • Performance Monitoring, Analysis, and Real-Time Introspection on Large-Scale Parallel Systems
  • 2020
  • Doctoral thesis (other academic/artistic)abstract
    • High-Performance Computing (HPC) has become an important scientific driver. A wide variety of research ranging for example from drug design to climate modelling is nowadays performed in HPC systems. Furthermore, the tremendous computer power of such HPC systems allows scientists to simulate problems that were unimaginable a few years ago. However, the continuous increase in size and complexity of HPC systems is turning the development of efficient parallel software into a difficult task. Therefore, the use of per- formance monitoring and analysis is a must in order to unveil inefficiencies in parallel software. Nevertheless, performance tools also face challenges as a result of the size of HPC systems, for example, coping with huge amounts of performance data generated.In this thesis, we propose a new model for performance characterisation of MPI applications that tackles the challenge of big performance data sets. Our approach uses Event Flow Graphs to balance the scalability of profiling techniques (generating performance reports with aggregated metrics) with the richness of information of tracing methods (generating files with sequences of time-stamped events). In other words, graphs allow to encode ordered se- quences of events without storing the whole sequence of such events, and therefore, they need much less memory and disk space, and are more scal- able. We demonstrate in this thesis how our Event Flow Graph model can be used as a trace compression method. Furthermore, we propose a method to automatically detect the structure of MPI applications using our Event Flow Graphs. This knowledge can afterwards be used to collect performance data in a smarter way, reducing for example the amount of redundant data collected. Finally, we demonstrate that our graphs can be used beyond trace compression and automatic analysis of performance data. We propose a new methodology to use Event Flow Graphs in the task of visual performance data exploration.In addition to the Event Flow Graph model, we also explore in this thesis the design and use of performance data introspection frameworks. Future HPC systems will be very dynamic environments providing extreme levels of parallelism, but with energy constraints, considerable resource sharing, and heterogeneous hardware. Thus, the use of real-time performance data to or- chestrate program execution in such a complex and dynamic environment will be a necessity. This thesis presents two different performance data introspec- tion frameworks that we have implemented. These introspection frameworks are easy to use, and provide performance data in real time with very low overhead. We demonstrate, among other things, how our approach can be used to reduce in real time the energy consumed by the system.The approaches proposed in this thesis have been validated in different HPC systems using multiple scientific kernels as well as real scientific applica- tions. The experiments show that our approaches in performance character- isation and performance data introspection are not intrusive at all, and can be a valuable contribution to help in the performance monitoring of future HPC systems.
  •  
2.
  • Frånberg, Mattias, 1985- (author)
  • Statistical methods for detecting gene-gene and gene-environment interactions in genome-wide association studies
  • 2019
  • Doctoral thesis (other academic/artistic)abstract
    • Despite considerable effort to elucidate the genetic architecture of multi-factorial traits and diseases, there remains a gap between the estimated heritability (e.g., from twin studies) and the heritability explained by discovered genetic variants. The existence of interactions between different genes, and between genes and the environment, has frequently been hypothesized as a likely cause of this discrepancy. However, the statistical inference of interactions is plagued by limited sample sizes, high computational requirements, and incomplete knowledge of how the measurement scale and parameterization affect the analysis.This thesis addresses the major statistical, computational, and modeling issues that hamper large-scale interaction studies today. Furthermore, it investigates whether gene-gene and gene-environment interactions are significantly involved in the development of diseases linked to atherosclerosis. Firstly, I develop two statistical methods that can be used to study of gene-gene interactions: the first is tailored for limited sample size situations, and the second enables multiple analyses to be combined into large meta-analyses. I perform comprehensive simulation studies to determine that these methods have higher or equal statistical power than contemporary methods, scale-invariance is required to guard against false positives, and that saturated parameterizations perform well in terms of statistical power. In two studies, I apply the two proposed methods to case/control data from myocardial infarction and associated phenotypes. In both studies, we identify putative interactions for myocardial infarction but are unable to replicate the interactions in a separate cohort. In the second study, however, we identify and replicate a putative interaction involved in Lp(a) plasma levels between two variants rs3103353 and rs9458157. Secondly, I develop a multivariate statistical method that simultaneously estimates the effects of genetic variants, environmental variables, and their interactions. I show by extensive simulations that this method achieves statistical power close to the optimal oracle method. We use this method to study the involvement of gene-environment interactions in intima-media thickness, a phenotype relevant for coronary artery disease. We identify a putative interaction between a genetic variant in the KCTD8 gene and alcohol use, thus suggesting an influence on intima-media thickness. The methods developed to support the analyses in this thesis as well as a selection of other prominent methods in the field is implemented in a software package called besiq.In conclusion, this thesis presents statistical methods, and the associated software, that allows large-scale studies of gene-gene and gene-environment interactions to be effortlessly undertaken.
  •  
3.
  • Kang, Wenjing, 1988- (author)
  • microRNAs: from biogenesis to organismal tracing
  • 2020
  • Doctoral thesis (other academic/artistic)abstract
    • MicroRNAs (miRNAs) are short noncoding RNAs of around 22 nucleotides in length, which help to shape the expression of most mRNAs. Perturbation of miRNA expression has revealed a variety of defects in development, cell specification, physiology and behavior. This thesis focuses on two topics of miRNA: identification of structural features that influence miRNA biogenesis (Paper I) and application of taxonomical marker miRNAs to resolve organismal origin of samples (Paper II and III).The current model of miRNA hairpin biogenesis has limited information content and appears to be incomplete. In paper I, we apply a novel high-throughput screening method to profile the optimal structure of miRNA hairpins for efficient and precise miRNA biogenesis. The optimal structure consists of tight and loose local structures across the hairpin, which reflects the constraints of biogenesis proteins. We find that miRNA hairpins with stable lower basal stem are more efficiently processed and have a higher expression level in tissues of 20 animal species. We address that the structural features - which have been largely neglected in the current model - are in fact as important as the well-known sequence motifs.New miRNAs are continuously added over evolutionary time and are rarely secondarily lost, making them ideal taxonomical markers. In paper II, we demonstrate as a proof-of-principle that miRNAs can be used to trace biological sample back to the lineage or even species of origin. Based on the marker miRNAs, we develop miRTrace, the first software to accurately trace miRNA sequences back to their taxonomical origin. The method can sensitively identify the origin of single cells and detect parasitic nematode RNA in mammalian host blood sample. In paper III, we apply miRNA tracing to address a controversial question about the origin of the exogenous plant miRNAs (xenomiRs) found in human samples, and which have been proposed to regulate human gene expression. Our computational and experimental results provide evidence that xenomiRs are derived from technical artifacts rather than dietary intake.
  •  
4.
  • Bergenstråhle, Ludvig (author)
  • Computational Models of Spatial Transcriptomes
  • 2024
  • Doctoral thesis (other academic/artistic)abstract
    • Spatial biology is a rapidly growing field that has seen tremendous progress over the last decade. We are now able to measure how the morphology, genome, transcriptome, and proteome of a tissue vary across space. Datasets generated by spatial technologies reflect the complexity of the systems they measure: They are multi-modal, high-dimensional, and layer an intricate web of dependencies between biological compartments at different length scales. To add to this complexity, measurements are often sparse and noisy, obfuscating the underlying biological signal and making the data difficult to interpret. In this thesis, we describe how data from spatial biology experiments can be analyzed with methods from deep learning and generative modeling to accelerate biological discovery. The thesis is divided into two parts. The first part provides an introduction to the fields of deep learning and spatial biology, and how the two can be combined to model spatial biology data. The second part consists of four papers describing methods that we have developed for this purpose. Paper I presents a method for inferring spatial gene expression from hematoxylin and eosin stains. The proposed method offers a data-driven approach to analyzing histopathology images without relying on expert annotations and could be a valuable tool for cancer screening and diagnosis in the clinics. Paper II introduces a method for jointly modeling spatial gene expression with histology images. We show that the method can predict super-resolved gene expression and transcriptionally characterize small-scale anatomical structures. Paper III proposes a method for learning flexible Markov kernels to model continuous and discrete data distributions. We demonstrate the method on various image synthesis tasks, including unconditional image generation and inpainting. Paper IV leverages the techniques introduced in Paper III to integrate data from different spatial biology experiments. The proposed method can be used for data imputation, super resolution, and cross-modality data transfer.
  •  
5.
  • Khan, Mehmood Alam (author)
  • Computational Problems in Modeling Evolution and Inferring Gene Families.
  • 2016
  • Doctoral thesis (other academic/artistic)abstract
    • Over the last few decades, phylogenetics has emerged as a very promising field, facilitating a comparative framework to explain the genetic relationships among all the living organisms on earth. These genetic relationships are typically represented by a bifurcating phylogenetic tree — the tree of life. Reconstructing a phylogenetic tree is one of the central tasks in evolutionary biology. The different evolutionary processes, such as gene duplications, gene losses, speciation, and lateral gene transfer events, make the phylogeny reconstruction task more difficult. However, with the rapid developments in sequencing technologies and availability of genome-scale sequencing data, give us the opportunity to understand these evolutionary processes in a more informed manner, and ultimately, enable us to reconstruct genes and species phylogenies more accurately. This thesis is an attempt to provide computational methods for phylogenetic inference and give tools to conduct genome-scale comparative evolutionary studies, such as detecting homologous sequences and inferring gene families.In the first project, we present FastPhylo as a software package containing fast tools for reconstructing distance-based phylogenies. It implements the previously published efficient algorithms for estimating a distance matrix from the input sequences and reconstructing an un-rooted Neighbour Joining tree from a given distance matrix. Results on simulated datasets reveal that FastPhylo can handles hundred of thousands of sequences in a minimum time and memory efficient manner. The easy to use, well-defined interfaces, and the modular structure of FastPhylo allows it to be used in very large Bioinformatic pipelines.In the second project, we present a synteny-aware gene homology method, called GenFamClust (GFC) that uses gene content and gene order conservation to detect homology. Results on simulated and biological datasets suggest that local synteny information combined with the sequence similarity improves the detection of homologs.In the third project, we introduce a novel phylogeny-based clustering method, PhyloGenClust, which partitions a very large gene family into smaller subfamilies. ROC (receiver operating characteristics) analysis on synthetic datasets show that PhyloGenClust identify subfamilies more accurately. PhyloGenClust can be used as a middle tier clustering method between raw clustering methods, such as sequence similarity methods, and more sophisticated Bayesian-based phylogeny methods.Finally, we introduce a novel probabilistic Bayesian method based on the DLTRS model, to sample reconciliations of a gene tree inside a species tree. The method uses MCMC framework to integrate LGTs, gene duplications, gene losses and sequence evolution under a relaxed molecular clock for substitution rates. The proposed sampling method estimates the posterior distribution of gene trees and provides the temporal information of LGT events over the lineages of a species tree. Analysis on simulated datasets reveal that our method performs well in identifying the true temporal estimates of LGT events. We applied our method to the genome-wide gene families for mollicutes and cyanobacteria, which gave an interesting insight into the potential LGTs highways. 
  •  
6.
  • Koptagel, Hazal, 1991- (author)
  • Variational methods for phylogeny and single-cell genomics
  • 2023
  • Doctoral thesis (other academic/artistic)abstract
    • The investigation of the evolutionary history of organisms, both at the cellular level and at the species level, is a relevant research topic in computational biology. These investigations lead to a deeper understanding of developmental history, cancer progression, the genetic similarity of species, and more. One way to study the relations between single cells or species is to examine the differences in their genomes, including single nucleotide and copy number variations. The genetic materials need to be extracted and sequenced to be used in the analyses, but this data preparation is prone to errors. The development of sophisticated, probabilistic models is of the utmost importance in handling technological artifacts and including uncertainty in the analysis. In this compilation thesis, we studied various questions and presented four papers to address different challenges. First, we focused on single cells from healthy tissue and developed a probabilistic model to reconstruct the cell lineage tree. This task is challenging in several aspects; i) the healthy cells have a low mutation rate and, therefore, do not introduce many mutations at each cell division, ii) healthy cells usually do not have significant structural variations to improve the analysis, and iii) the sequencing technology introduces errors, and some of these errors are hard to distinguish from the mutations. With the experimental studies, we showed that our model is fast, robust, and accurately reconstructs lineage trees.   Second, we focused on cancer cells. One research topic is identifying structural variations in the cancer cells' genomes and subsequently grouping the cells with similar genome profiles. This two-step process is vulnerable; the imperfections in the first step can irreversibly impact the analysis in the second step. To address this problem, we developed a variational inference-based model that simultaneously does copy number profiling and cell clustering. In addition, we extended the model to incorporate single nucleotide variations to improve the performance. Third, we approached the phylogenetic tree inference problem and developed a variational inference-based model to make the inference. The tree topology space, which contains all possible phylogenetic tree structures, is enormous, and the consideration of each unique tree is intractable. Typically, the existing variational inference-based methods need to constrain their analysis to a much smaller subset of the tree space. Our proposed model does not require such constraints and can obtain similar performance while requiring significantly less time and memory. Finally, we addressed a challenge in variational inference. The variational inference methods target a complex, usually multimodal posterior distribution and try to approximate it using simpler, often unimodal distributions. This design choice causes the variational models to fit one out of many modes of the target distribution; hence they do not capture the overall pattern of the target distribution. We proposed a simple yet effective way to use separately trained variational models to capture the multimodality of the target distribution and demonstrated the approximation performance using several variational methods and data types. We addressed various challenges in computational biology with these four papers and contributed to the progress of the field by developing probabilistic models. 
  •  
7.
  • Mohaghegh Neyshabouri, Mohammadreza (author)
  • Inter and intra-tumor models of somatic evolution in cancer
  • 2023
  • Doctoral thesis (other academic/artistic)abstract
    • Cancer is a disease caused by the accumulation of somatic mutations in an evolutionary process. Mutations in so-called cancer driver genes provide the harboring cells with particular selective advantages and result in cancer progression. Identification of the driver genes and their interrelations is critical for a wide range of research and clinical applications. This thesis investigates the problem of modeling the cancer evolution dynamics using probabilistic cancer progression models. Such models aim to explain the mechanism of accumulation of mutations in the tumor cells and how specific mutations may exert promoting or inhibiting effects on each other. We introduce a set of computational methods to analyze cross-sectional data from a cohort of tumors and infer the interrelations among cancer driver genes, represented by a graphical structure over them.In our first two papers, following the typical setting in the cancer progression model studies, we use a simple representation for the tumors in which a single genotype vector models each tumor. We introduce a pathway linear progression model in the first paper and a generalized tree-structured model in the second. Using novel dynamic programming procedures for calculating the likelihoods, we build Markov Chain Monte Carlo (MCMC) inference algorithms for our models in these papers. Using these fast and efficient MCMC algorithms enables us to study massive datasets that were infeasible to be investigated by previously introduced methods.In our third paper, we introduce a framework for taking a finer representation of the tumors into account for inferring progression models. With the rapid improvements in the amount and quality of available data, we can now work with vast numbers of reliably reconstructed tumor clonal trees. In our third paper, we introduce a method that takes such clonal trees from cohorts of tumors as its input and identifies the interrelations among the driver genes within a single tumor or across different tumors. We propose an MCMC algorithm with guided proposal distributions, which substantially increase the algorithm's efficiency in exploring the high-probability regions. The rich input data and the computationally efficient algorithm introduced in this paper provide promising results on a set of synthetic and biological data experiments.
  •  
8.
  • Shahrabi Farahani, Hossein, 1976- (author)
  • Computational Modeling of Cancer Progression
  • 2013
  • Doctoral thesis (other academic/artistic)abstract
    • Cancer is a multi-stage process resulting from accumulation of genetic mutations. Data obtained from assaying a tumor only contains the set of mutations in the tumor and lacks information about their temporal order. Learning the chronological order of the genetic mutations is an important step towards understanding the disease. The probability of introduction of a mutation to a tumor increases if certain mutations that promote it, already happened. Such dependencies induce what we call the monotonicity property in cancer progression. A realistic model of cancer progression should take this property into account.In this thesis, we present two models for cancer progression and algorithms for learning them. In the first model, we propose Progression Networks (PNs), which are a special class of Bayesian networks. In learning PNs the issue of monotonicity is taken into consideration. The problem of learning PNs is reduced to Mixed Integer Linear Programming (MILP), which is a NP-hard problem for which very good heuristics exist. We also developed a program, DiProg, for learning PNs.In the second model, the problem of noise in the biological experiments is addressed by introducing hidden variable. We call this model Hidden variable Oncogenetic Network (HON). In a HON, there are two variables assigned to each node, a hidden variable that represents the progression of cancer to the node and an observable random variable that represents the observation of the mutation corresponding to the node. We devised a structural Expectation Maximization (EM) algorithm for learning HONs. In the M-step of the structural EM algorithm, we need to perform a considerable number of inference tasks. Because exact inference is tractable only on Bayesian networks with bounded treewidth, we also developed an algorithm for learning bounded treewidth Bayesian networks by reducing the problem to a MILP.Our algorithms performed well on synthetic data. We also tested them on cytogenetic data from renal cell carcinoma. The learned progression networks from both algorithms are in agreement with the previously published results.MicroRNAs are short non-coding RNAs that are involved in post transcriptional regulation. A-to-I editing of microRNAs converts adenosine to inosine in the double stranded RNA. We developed a method for determining editing levels in mature microRNAs from the high-throughput RNA sequencing data from the mouse brain. Here, for the first time, we showed that the level of editing increases with development. 
  •  
9.
  • Sjöstrand, Joel, 1980- (author)
  • Reconciling gene family evolution and species evolution
  • 2013
  • Doctoral thesis (other academic/artistic)abstract
    • Species evolution can often be adequately described with a phylogenetic tree. Interestingly, this is the case also for the evolution of homologous genes; a gene in an ancestral species may – through gene duplication, gene loss, lateral gene transfer (LGT), and speciation events – give rise to a gene family distributed across contemporaneous species. However, molecular sequence evolution and genetic recombination make the history – the gene tree – non-trivial to reconstruct from present-day sequences. This history is of biological interest, e.g., for inferring potential functional equivalences of extant gene pairs.In this thesis, we present biologically sound probabilistic models for gene family evolution guided by species evolution – effectively yielding a gene-species tree reconciliation. Using Bayesian Markov-chain Monte Carlo (MCMC) inference techniques, we show that by taking advantage of the information provided by the species tree, our methods achieve more reliable gene tree estimates than traditional species tree-uninformed approaches.Specifically, we describe a comprehensive model that accounts for gene duplication, gene loss, a relaxed molecular clock, and sequence evolution, and we show that the method performs admirably on synthetic and biological data. Further-more, we present two expansions of the inference procedure, enabling it to pro-vide (i) refined gene tree estimates with timed duplications, and (ii) probabilistic orthology estimates – i.e., that the origin of a pair of extant genes is a speciation.Finally, we present a substantial development of the model to account also for LGT. A sophisticated algorithmic framework of dynamic programming and numerical methods for differential equations is used to resolve the computational hurdles that LGT brings about. We apply the method on two bacterial datasets where LGT is believed to be prominent, in order to estimate genome-wide LGT and duplication rates. We further show that traditional methods – in which gene trees are reconstructed and reconciled with the species tree in separate stages – are prone to yield inferior gene tree estimates that will overestimate the number of LGT events.
  •  
10.
  • Tofigh, Ali, 1977- (author)
  • Using Trees to Capture Reticulate Evolution : Lateral Gene Transfers and Cancer Progression
  • 2009
  • Doctoral thesis (other academic/artistic)abstract
    • The historic relationship of species and genes are traditionally depicted using trees. However, not all evolutionary histories are adequately captured by bifurcating processes and an increasing amount of research is devoted towards using networks or network-like structures to capture evolutionary history. Lateral gene transfer (LGT) is a previously controversial mechanism responsible for non tree-like evolutionary histories, and is today accepted as a major force of evolution, particularly in the prokaryotic domain. In this thesis, we present models of gene evolution incorporating both LGTs and duplications, together with efficient computational methods for various inference problems. Specifically, we define a biologically sound combinatorial model for reconciliation of species and gene trees that facilitates simultaneous consideration of duplications and LGTs. We prove that finding most parsimonious reconciliations is NP-hard, but that the problem can be solved efficiently if reconciliations are not required to be acyclic—a condition that is satisfied when analyzing most real-world datasets. We also provide a polynomial-time algorithm for parametric tree reconciliation, a problem analogous to parametric sequence alignment, that enables us to study the entire space of optimal reconciliations under all possible cost schemes. Going beyond combinatorial models, we define the first probabilistic model of gene evolution incorporating a birth-death process generating duplications, LGTs, and losses, together with a relaxed molecular clock model of sequence evolution. Algorithms based on Markov chain Monte Carlo (MCMC) techniques, methods from numerical analysis, and dynamic programming are presented for various probability and parameter inference problems. Finally, we develop methods for analysis of cancer progression, a biological process with many similarities to the process of evolution. Cancer progresses by accumulation of harmful genetic aberrations whose patterns of emergence are graph-like. We develop a model of cancer progression based on trees, and mixtures thereof, that admits an efficient structural EM algorithm for finding Maximum Likelihood (ML) solutions from available cross-sectional data.
  •  
11.
  • Ullah, Ikram, 1984- (author)
  • Probabilistic Models for Species Tree Inference and Orthology Analysis
  • 2015
  • Doctoral thesis (other academic/artistic)abstract
    • A phylogenetic tree is used to model gene evolution and species evolution using molecular sequence data. For artifactual and biological reasons, a gene tree may differ from a species tree, a phenomenon known as gene tree-species tree incongruence. Assuming the presence of one or more evolutionary events, e.g., gene duplication, gene loss, and lateral gene transfer (LGT), the incongruence may be explained using a reconciliation of a gene tree inside a species tree. Such information has biological utilities, e.g., inference of orthologous relationship between genes.In this thesis, we present probabilistic models and methods for orthology analysis and species tree inference, while accounting for evolutionary factors such as gene duplication, gene loss, and sequence evolution. Furthermore, we use a probabilistic LGT-aware model for inferring gene trees having temporal information for duplication and LGT events.In the first project, we present a Bayesian method, called DLRSOrthology, for estimating orthology probabilities using the DLRS model: a probabilistic model integrating gene evolution, a relaxed molecular clock for substitution rates, and sequence evolution. We devise a dynamic programming algorithm for efficiently summing orthology probabilities over all reconciliations of a gene tree inside a species tree. Furthermore, we present heuristics based on receiver operating characteristics (ROC) curve to estimate suitable thresholds for deciding orthology events. Our method, as demonstrated by synthetic and biological results, outperforms existing probabilistic approaches in accuracy and is robust to incomplete taxon sampling artifacts.In the second project, we present a probabilistic method, based on a mixture model, for species tree inference. The method employs a two-phase approach, where in the first phase, a structural expectation maximization algorithm, based on a mixture model, is used to reconstruct a maximum likelihood set of candidate species trees. In the second phase, in order to select the best species tree, each of the candidate species tree is evaluated using PrIME-DLRS: a method based on the DLRS model. The method is accurate, efficient, and scalable when compared to a recent probabilistic species tree inference method called PHYLDOG. We observe that, in most cases, the analysis constituted only by the first phase may also be used for selecting the target species tree, yielding a fast and accurate method for larger datasets.Finally, we devise a probabilistic method based on the DLTRS model: an extension of the DLRS model to include LGT events, for sampling reconciliations of a gene tree inside a species tree. The method enables us to estimate gene trees having temporal information for duplication and LGT events. To the best of our knowledge, this is the first probabilistic method that takes gene sequence data directly into account for sampling reconciliations that contains information about LGT events. Based on the synthetic data analysis, we believe that the method has the potential to identify LGT highways.
  •  
12.
  • Elias, Isaac, 1976- (author)
  • Computational problems in evolution : Multiple alignment, genome rearrangements, and tree reconstruction
  • 2006
  • Doctoral thesis (other academic/artistic)abstract
    • Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include: • A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren) • NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets. • A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman) • Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller) • An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren)
  •  
13.
  • Mahmudi, Owais, 1982- (author)
  • Probabilistic Reconciliation Analysis for Genes and Pseudogenes
  • 2015
  • Doctoral thesis (other academic/artistic)abstract
    • Phylogeneticists have studied the evolution of life from single celled organisms to the astonishing biodiversity around us for a long time now. The relationship between species is often expressed as a binary tree - the tree of life. Availability of fully sequenced genomes across species provides us the opportunity to investigate and understand the evolutionary processes, and to reconstruct the gene and species phylogeny in greater detail and more accurately. However, the effect of interacting evolutionary processes, such as gene duplications, gene losses, pseudogenizations, and lateral gene transfers, makes the inference of gene phylogenies challenging.In this thesis, probabilistic  Bayesian methods are introduced  to infer gene hylogenies in the guidance of species phylogeny. The distinguishing feature f this work from the earlier reconciliation-based methods is that evolutionary vents are mapped to detailed time intervals on the evolutionary time-scale. he proposed probabilistic approach reconciles the evolutionary events to the pecies phylogeny by integrating  gene duplications, gene losses, lateral gene ransfers and sequence evolution under a relaxed molecular clock. Genome- ide gene families for vertebrates and prokaryotes are  analyzed using this pproach that provides interesting insight into the evolutionary processes.Finally, a probabilistic  model is introduced that  models evolution  of genes and pseudogenes  simultaneously. The model incorporates birth-death  pro- cess according to which genes are duplicated, pseudogenized and lost under a sequence evolution  model with  a relaxed molecular clock.  To model  the evolutionary scenarios realistically, the model employs two different sequence evolution  models for the  evolution  of genes  and pseudogenes. The recon- ciliation  of evolutionary events to the species phylogenies enable us to infer the evolutionary scenario with  a higher resolution.  Some subfamilies of two interesting gene superfamilies,  i.e.  olfactory receptors and zinc fingers, are analyzed using this approach, which provides interesting insights. 
  •  
14.
  • Muhammad, Sayyed Auwn, 1980- (author)
  • Probabilistic Modelling of Domain and Gene Evolution
  • 2016
  • Doctoral thesis (other academic/artistic)abstract
    • Phylogenetic inference relies heavily on statistical models that have been extended and refined over the past years into complex hierarchical models to capture the intricacies of evolutionary processes. The wealth of information in the form of fully sequenced genomes has led to the development of methods that are used to reconstruct the gene and species evolutionary histories in greater and more accurate detail. However, genes are composed of evolutionary conserved sequence segments called domains, and domains can also be affected by duplications, losses, and bifurcations implied by gene or species evolution. This thesis proposes an extension of evolutionary models, such as duplication-loss, rate, and substitution, that have previously been used to model gene evolution, to model the domain evolution.In this thesis, I am proposing DomainDLRS: a comprehensive, hierarchical Bayesian method, based on the DLRS model by Åkerborg et al., 2009, that models domain evolution as occurring inside the gene and species tree. The method incorporates a birth-death process to model the domain duplications and losses along with a domain sequence evolution model with a relaxed molecular clock assumption. The method employs a variant of Markov Chain Monte Carlo technique called, Grouped Independence Metropolis-Hastings for the estimation of posterior distribution over domain and gene trees. By using this method, we performed analyses of Zinc-Finger and PRDM9 gene families, which provides an interesting insight of domain evolution.Finally, a synteny-aware approach for gene homology inference, called GenFamClust, is proposed that uses similarity and gene neighbourhood conservation to improve the homology inference. We evaluated the accuracy of our method on synthetic and two biological datasets consisting of Eukaryotes and Fungal species. Our results show that the use of synteny with similarity is providing a significant improvement in homology inference.
  •  
15.
  • Åkerborg, Örjan, 1974- (author)
  • Taking advantage of phylogenetic trees in comparative genomics
  • 2008
  • Doctoral thesis (other academic/artistic)abstract
    • Phylogenomics can be regarded as evolution and genomics in co-operation. Various kinds of evolutionary studies, gene family analysis among them, demand access to genome-scale datasets. But it is also clear that many genomics studies, such as assignment of gene function, are much improved by evolutionary analysis. The work leading to this thesis is a contribution to the phylogenomics field. We have used phylogenetic relationships between species in genome-scale searches for two intriguing genomic features, namely and A-to-I RNA editing. In the first case we used pairwise species comparisons, specifically human-mouse and human-chimpanzee, to infer existence of functional mammalian pseudogenes. In the second case we profited upon later years' rapid growth of the number of sequenced genomes, and used 17-species multiple sequence alignments. In both these studies we have used non-genomic data, gene expression data and synteny relations among these, to verify predictions. In the A-to-I editing project we used 454 sequencing for experimental verification. We have further contributed a maximum a posteriori (MAP) method for fast and accurate dating analysis of speciations and other evolutionary events. This work follows recent years' trend of leaving the strict molecular clock when performing phylogenetic inference. We discretised the time interval from the leaves to the root in the tree, and used a dynamic programming (DP) algorithm to optimally factorise branch lengths into substitution rates and divergence times. We analysed two biological datasets and compared our results with recent MCMC-based methodologies. The dating point estimates that our method delivers were found to be of high quality while the gain in speed was dramatic. Finally we applied the DP strategy in a new setting. This time we used a grid laid out on a species tree instead of on an interval. The discretisation gives together with speciation times a common timeframe for a gene tree and the corresponding species tree. This is the key to integration of the sequence evolution process and the gene evolution process. Out of several potential application areas we chose gene tree reconstruction. We performed genome-wide analysis of yeast gene families and found that our methodology performs very well.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-15 of 15

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view