SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Lagergren Jens) "

Sökning: WFRF:(Lagergren Jens)

  • Resultat 1-50 av 106
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Acevedo Gomez, Yasna, et al. (författare)
  • PERFORMANCE RECOVERY FROM NO2 EXPOSURE IN PEM FUEL CELL
  • 2017
  • Ingår i: EFC 2017 - Proceedings of the 7th European Fuel Cell Piero Lunghi Conference. - : ENEA. ; , s. 157-158
  • Konferensbidrag (refereegranskat)abstract
    • The hydrogen fuel cell vehicle market is projected to increase in the coming years, and fuel cell vehicles will operate in an environment where they coexist with combustion engine vehicles. In this context, the PEM fuel cell will be exposed to significant amounts of contaminants on the roads that will decrease its performance and durability. In the present study the PEM fuel cell is exposed to 100 ppm of nitrogen dioxide in the airflow. Different methods for recovery of performance were tested; recovery during constant current load and by subjecting the cell to successive polarization curves. The results showed that the successive polarization curves are the best choice for recovery. However, recovery at low current density and high potential is also a good alternative.
  •  
2.
  • Addario-Berry, L, et al. (författare)
  • Ancestral maximum likelihood of evolutionary trees is hard
  • 2004
  • Ingår i: Journal of Bioinformatics and Computational Biology. - 0219-7200 .- 1757-6334. ; 2:2, s. 257-271
  • Tidskriftsartikel (refereegranskat)abstract
    • Maximum likelihood (ML) (Felsenstein, 1981) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task - in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years, no such hardness result has been obtained so far for ML. In this work we make a first step in this direction by proving that ancestral maximum likelihood (AML) is NP-complete. The input to this problem is a set of aligned sequences of equal length and the goal is to find a tree and an assignment of ancestral sequences for all of that tree's internal vertices such that the likelihood of generating both the ancestral and contemporary sequences is maximized. Our NP-hardness proof follows that for MP given in (Day, Johnson and Sankoff, 1986) in that we use the same reduction from VERTEX COVER; however, the proof of correctness for this reduction relative to AML is different and substantially more involved.
  •  
3.
  • Aguilar, Xavier (författare)
  • Performance Monitoring, Analysis, and Real-Time Introspection on Large-Scale Parallel Systems
  • 2020
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • High-Performance Computing (HPC) has become an important scientific driver. A wide variety of research ranging for example from drug design to climate modelling is nowadays performed in HPC systems. Furthermore, the tremendous computer power of such HPC systems allows scientists to simulate problems that were unimaginable a few years ago. However, the continuous increase in size and complexity of HPC systems is turning the development of efficient parallel software into a difficult task. Therefore, the use of per- formance monitoring and analysis is a must in order to unveil inefficiencies in parallel software. Nevertheless, performance tools also face challenges as a result of the size of HPC systems, for example, coping with huge amounts of performance data generated.In this thesis, we propose a new model for performance characterisation of MPI applications that tackles the challenge of big performance data sets. Our approach uses Event Flow Graphs to balance the scalability of profiling techniques (generating performance reports with aggregated metrics) with the richness of information of tracing methods (generating files with sequences of time-stamped events). In other words, graphs allow to encode ordered se- quences of events without storing the whole sequence of such events, and therefore, they need much less memory and disk space, and are more scal- able. We demonstrate in this thesis how our Event Flow Graph model can be used as a trace compression method. Furthermore, we propose a method to automatically detect the structure of MPI applications using our Event Flow Graphs. This knowledge can afterwards be used to collect performance data in a smarter way, reducing for example the amount of redundant data collected. Finally, we demonstrate that our graphs can be used beyond trace compression and automatic analysis of performance data. We propose a new methodology to use Event Flow Graphs in the task of visual performance data exploration.In addition to the Event Flow Graph model, we also explore in this thesis the design and use of performance data introspection frameworks. Future HPC systems will be very dynamic environments providing extreme levels of parallelism, but with energy constraints, considerable resource sharing, and heterogeneous hardware. Thus, the use of real-time performance data to or- chestrate program execution in such a complex and dynamic environment will be a necessity. This thesis presents two different performance data introspec- tion frameworks that we have implemented. These introspection frameworks are easy to use, and provide performance data in real time with very low overhead. We demonstrate, among other things, how our approach can be used to reduce in real time the energy consumed by the system.The approaches proposed in this thesis have been validated in different HPC systems using multiple scientific kernels as well as real scientific applica- tions. The experiments show that our approaches in performance character- isation and performance data introspection are not intrusive at all, and can be a valuable contribution to help in the performance monitoring of future HPC systems.
  •  
4.
  • Alkema, W. B. L., et al. (författare)
  • MSCAN : identification of functional clusters of transcription factor binding sites
  • 2004
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 32, s. W195-W198
  • Tidskriftsartikel (refereegranskat)abstract
    • Identification of functional transcription factor binding sites in genomic sequences is notoriously difficult. The critical problem is the low specificity of predictions, which directly reflects the low target specificity of DNA binding proteins. To overcome the noise produced in predictions of individual binding sites, a new generation of algorithms achieves better predictive specificity by focusing on locally dense clusters of binding sites. MSCAN is a leading method for binding site cluster detection that determines the significance of observed sites while correcting for local compositional bias of sequences. The algorithm is highly flexible, applying any set of input binding models to the analysis of a user-specified sequence. From the user's perspective, a key feature of the system is that no reference data sets of regulatory sequences from co-regulated genes are required to train the algorithm. The output from MSCAN consists of an ordered list of sequence segments that contain potential regulatory modules. We have chosen the features in MSCAN such that sequence and matrix retrieval is highly facilitated, resulting in a web server that is intuitive to use. MSCAN is available at http://mscan.cgb.ki.se/cgi-bin/MSCAN.
  •  
5.
  • Andersson, Samuel A., et al. (författare)
  • Motif Yggdrasil : Sampling from a tree mixture model
  • 2006
  • Ingår i: Research In Computational Molecular Biology, Proceedings. - Berlin, Heidelberg : Springer Berlin Heidelberg. - 3540332952 ; , s. 458-472
  • Konferensbidrag (refereegranskat)abstract
    • In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.
  •  
6.
  • Andersson, Samuel A., et al. (författare)
  • Motif Yggdrasil : Sampling sequence motifs from a tree mixture model
  • 2007
  • Ingår i: Journal of Computational Biology. - 1066-5277 .- 1557-8666. ; 14:5, s. 682-697
  • Tidskriftsartikel (refereegranskat)abstract
    • In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.
  •  
7.
  • Arvestad, Lars, et al. (författare)
  • Bayesian gene/species tree reconciliation and orthology analysis using MCMC
  • 2003
  • Ingår i: Bioinformatics. - : Oxford Journals. - 1367-4803 .- 1367-4811. ; 19, s. i7-i15
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available. Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.
  •  
8.
  • Arvestad, Lars, et al. (författare)
  • Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution.
  • 2004
  • Ingår i: Proceedings of the Annual International Conference on Computational Molecular Biology, RECOM. - New York, New York, USA : ACM Press. ; , s. 326-335
  • Konferensbidrag (refereegranskat)abstract
    • Gene tree and species tree reconstruction, orthology analysis and reconciliation, are problems important in multigenome-based comparative genomics and biology in general. In the present paper, we advance the frontier of these areas in several respects and provide important computational tools. First, exact algorithms are given for several probabilistic reconciliation problems with respect to the probabilistic gene evolutionmodel, previously developed by the authors. Until now, those problems were solved by MCMC estimation algorithms. Second, we extend the gene evolution model to the genesequence evolution model, by including sequence evolution. Third, we develop MCMC algorithms for the gene sequence evolution model that, given gene sequence data allows: (1) orthology analysis, reconciliation analysis, and gene tree reconstruction, w.r.t. a species tree, that balances a likely/unlikely reconciliation and a likely/unlikely genetree and (2) species tree reconstruction that balance a likely /unlikely reconciliation and a likely/unlikely gene trees. These MCMC algorithms take advantage of the exact algorithms for the gene evolution model. We have successfully tested our dynamical programming algorithms on real data for a biogeography problem. The MCMC algorithms perform very well both on synthetic and biological data.
  •  
9.
  • Arvestad, Lars, et al. (författare)
  • The Gene Evolution Model and Computing Its Associated Probabilities
  • 2009
  • Ingår i: Journal of the ACM. - : Association for Computing Machinery (ACM). - 0004-5411 .- 1557-735X. ; 56:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Phylogeny is both a fundamental tool in biology and a rich source of fascinating modeling and algorithmic problems. Today's wealth of sequenced genomes makes it increasingly important to understand evolutionary events such as duplications, losses, transpositions, inversions, lateral transfers, and domain shuffling. We focus on the gene duplication event, that constitutes a major force in the creation of genes with new function [Ohno 1970; Lynch and Force 2000] and, thereby also, of biodiversity. We introduce the probabilistic gene evolution model, which describes how a gene tree evolves within a given species tree with respect to speciation, gene duplication, and gene loss. The actual relation between gene tree and species tree is captured by a reconciliation, a concept which we generalize for more expressiveness. The model is a canonical generalization of the classical linear birth-death process, obtained by replacing the interval where the process takes place by a tree. For the gene evolution model, we derive efficient algorithms for some associated probability distributions: the probability of a reconciled tree, the probability of a gene tree, the maximum probability reconciliation, the posterior probability of a reconciliation, and sampling reconciliations with respect to the posterior probability. These algorithms provides the basis for several applications, including species tree construction, reconciliation analysis, orthology analysis, biogeography, and host-parasite co-evolution.
  •  
10.
  •  
11.
  • Bergenstråhle, Ludvig (författare)
  • Computational Models of Spatial Transcriptomes
  • 2024
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Spatial biology is a rapidly growing field that has seen tremendous progress over the last decade. We are now able to measure how the morphology, genome, transcriptome, and proteome of a tissue vary across space. Datasets generated by spatial technologies reflect the complexity of the systems they measure: They are multi-modal, high-dimensional, and layer an intricate web of dependencies between biological compartments at different length scales. To add to this complexity, measurements are often sparse and noisy, obfuscating the underlying biological signal and making the data difficult to interpret. In this thesis, we describe how data from spatial biology experiments can be analyzed with methods from deep learning and generative modeling to accelerate biological discovery. The thesis is divided into two parts. The first part provides an introduction to the fields of deep learning and spatial biology, and how the two can be combined to model spatial biology data. The second part consists of four papers describing methods that we have developed for this purpose. Paper I presents a method for inferring spatial gene expression from hematoxylin and eosin stains. The proposed method offers a data-driven approach to analyzing histopathology images without relying on expert annotations and could be a valuable tool for cancer screening and diagnosis in the clinics. Paper II introduces a method for jointly modeling spatial gene expression with histology images. We show that the method can predict super-resolved gene expression and transcriptionally characterize small-scale anatomical structures. Paper III proposes a method for learning flexible Markov kernels to model continuous and discrete data distributions. We demonstrate the method on various image synthesis tasks, including unconditional image generation and inpainting. Paper IV leverages the techniques introduced in Paper III to integrate data from different spatial biology experiments. The proposed method can be used for data imputation, super resolution, and cross-modality data transfer.
  •  
12.
  • Bergenstråhle, Ludvig, et al. (författare)
  • Learning Stationary Markov Processes with Contrastive Adjustment
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • We introduce a new optimization algorithm, termed contrastive adjustment, for learning Markov transition kernels whose stationary distribution matches the data distribution. Contrastive adjustment is not restricted to a particular family of transition distributions and can be used to model data in both continuous and discrete state spaces. Inspired by recent work on noise-annealed sampling, we propose a particular transition operator, the noise kernel, that can trade mixing speed for sample fidelity. We show that contrastive adjustment is highly valuable in human-computer design processes, as the stationarity of the learned Markov chain enables local exploration of the data manifold and makes it possible to iteratively refine outputs by human feedback. We compare the performance of noise kernels trained with contrastive adjustment to current state-of-the-art generative models and demonstrate promising results on a variety of image synthesis tasks.
  •  
13.
  • Berglund, Emelie, et al. (författare)
  • Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity
  • 2018
  • Ingår i: Nature Communications. - : Springer Science and Business Media LLC. - 2041-1723. ; 9
  • Tidskriftsartikel (refereegranskat)abstract
    • Intra-tumor heterogeneity is one of the biggest challenges in cancer treatment today. Here we investigate tissue-wide gene expression heterogeneity throughout a multifocal prostate cancer using the spatial transcriptomics (ST) technology. Utilizing a novel approach for deconvolution, we analyze the transcriptomes of nearly 6750 tissue regions and extract distinct expression profiles for the different tissue components, such as stroma, normal and PIN glands, immune cells and cancer. We distinguish healthy and diseased areas and thereby provide insight into gene expression changes during the progression of prostate cancer. Compared to pathologist annotations, we delineate the extent of cancer foci more accurately, interestingly without link to histological changes. We identify gene expression gradients in stroma adjacent to tumor regions that allow for re-stratification of the tumor microenvironment. The establishment of these profiles is the first step towards an unbiased view of prostate cancer and can serve as a dictionary for future studies.
  •  
14.
  • Bonet, Jose, et al. (författare)
  • DeepMP : a deep learning tool to detect DNA base modifications on Nanopore sequencing data
  • 2022
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 38:5, s. 1235-1243
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: DNA methylation plays a key role in a variety of biological processes. Recently, Nanopore long-read sequencing has enabled direct detection of these modifications. As a consequence, a range of computational methods have been developed to exploit Nanopore data for methylation detection. However, current approaches rely on a human-defined threshold to detect the methylation status of a genomic position and are not optimized to detect sites methylated at low frequency. Furthermore, most methods use either the Nanopore signals or the basecalling errors as the model input and do not take advantage of their combination. Results: Here, we present DeepMP, a convolutional neural network-based model that takes information from Nanopore signals and basecalling errors to detect whether a given motif in a read is methylated or not. Besides, DeepMP introduces a threshold-free position modification calling model sensitive to sites methylated at low frequency across cells. We comprehensively benchmarked DeepMP against state-of-the-art methods on Escherichia coli, human and pUC19 datasets. DeepMP outperforms current approaches at read-based and position-based methylation detection across sites methylated at different frequencies in the three datasets. Availability and implementation: DeepMP is implemented and freely available under MIT license at https://github.
  •  
15.
  • Bryant, D., et al. (författare)
  • Compatibility of unrooted phylogenetic trees is FPT
  • 2006
  • Ingår i: Theoretical Computer Science. - : Elsevier BV. - 0304-3975 .- 1879-2294. ; 351:3, s. 296-302
  • Tidskriftsartikel (refereegranskat)abstract
    • A collection of T-1, T-2,..., T-k of unrooted, leaf labelled (phylogenetic) trees, all with different leaf sets, is said to be compatible if there exists a tree T such that each tree T-i can be obtained from T by deleting leaves and contracting edges. Determining compatibility is NP-hard, and the fastest algorithm to date has worst case complexity of around Omega(n(k)) time, n being the number of leaves. Here, we present an O(nf (k)) algorithm, proving that compatibility of unrooted phylogenetic trees is fixed parameter tractable (FPT) with respect to the number k of trees.
  •  
16.
  • Chen, Xinsong, et al. (författare)
  • Breast cancer patient-derived whole-tumor cell culture model for efficient drug profiling and treatment response prediction
  • 2023
  • Ingår i: Proceedings of the National Academy of Sciences of the United States of America. - : Proceedings of the National Academy of Sciences. - 0027-8424 .- 1091-6490. ; 120:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Breast cancer (BC) is a complex disease comprising multiple distinct subtypes with different genetic features and pathological characteristics. Although a large number of antineoplastic compounds have been approved for clinical use, patient-to-patient variability in drug response is frequently observed, highlighting the need for efficient treatment prediction for individualized therapy. Several patient-derived models have been established lately for the prediction of drug response. However, each of these models has its limitations that impede their clinical application. Here, we report that the whole-tumor cell culture (WTC) ex vivo model could be stably established from all breast tumors with a high success rate (98 out of 116), and it could reassemble the parental tumors with the endogenous microenvironment. We observed strong clinical associations and predictive values from the investigation of a broad range of BC therapies with WTCs derived from a patient cohort. The accuracy was further supported by the correlation between WTC-based test results and patients' clinical responses in a separate validation study, where the neoadjuvant treatment regimens of 15 BC patients were mimicked. Collectively, the WTC model allows us to accomplish personalized drug testing within 10 d, even for small-sized tumors, highlighting its potential for individualized BC therapy. Furthermore, coupled with genomic and transcriptomic analyses, WTC-based testing can also help to stratify specific patient groups for assignment into appropriate clinical trials, as well as validate potential biomarkers during drug development.
  •  
17.
  • Daniel, Chammiran, et al. (författare)
  • RNA editing of non-coding RNA and its role in gene regulation
  • 2015
  • Ingår i: Biochimie. - : Elsevier BV. - 0300-9084 .- 1638-6183. ; 117, s. 22-27
  • Forskningsöversikt (refereegranskat)abstract
    • It has for a long time been known that repetitive elements, particularly Alu sequences in human, are edited by the adenosine deaminases acting on RNA, ADAR, family. The functional interpretation of these events has been even more difficult than that of editing events in coding sequences, but today there is an emerging understanding of their downstream effects. A surprisingly large fraction of the human transcriptome contains inverted Alu repeats, often forming long double stranded structures in RNA transcripts, typically occurring in introns and UTRs of protein coding genes. Alu repeats are also common in other primates, and similar inverted repeats can frequently be found in non-primates, although the latter are less prone to duplex formation. In human, as many as 700,000 Alu elements have been identified as substrates for RNA editing, of which many are edited at several sites. In fact, recent advancements in transcriptome sequencing techniques and bioinformatics have revealed that the human editome comprises at least a hundred million adenosine to inosine (A-to-I) editing sites in Alu sequences. Although substantial additional efforts are required in order to map the editome, already present knowledge provides an excellent starting point for studying cis-regulation of editing. In this review, we will focus on editing of long stem loop structures in the human transcriptome and how it can effect gene expression.
  •  
18.
  • Ekdahl, Ylva, et al. (författare)
  • A-to-I editing of microRNAs in the mammalian brain increases during development
  • 2012
  • Ingår i: Genome Research. - : Cold Spring Harbor Laboratory. - 1088-9051 .- 1549-5469. ; 22:8, s. 1477-1487
  • Tidskriftsartikel (refereegranskat)abstract
    • Adenosine-to-inosine (A-to-I) RNA editing targets double-stranded RNA stem-loop structures in the mammalian brain. It has previously been shown that miRNAs are substrates for A-to-I editing. For the first time, we show that for several definitions of edited miRNA, the level of editing increases with development, thereby indicating a regulatory role for editing during brain maturation. We use high-throughput RNA sequencing to determine editing levels in mature miRNA, from the mouse transcriptome, and compare these with the levels of editing in pri-miRNA. We show that increased editing during development gradually changes the proportions of the two miR-376a isoforms, which previously have been shown to have different targets. Several other miRNAs that also are edited in the seed sequence show an increased level of editing through development. By comparing editing of pri-miRNA with editing and expression of the corresponding mature miRNA, we also show an editing-induced developmental regulation of miRNA expression. Taken together, our results imply that RNA editing influences the miRNA repertoire during brain maturation.
  •  
19.
  • Elias, Isaac, 1976- (författare)
  • Computational problems in evolution : Multiple alignment, genome rearrangements, and tree reconstruction
  • 2006
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include: • A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren) • NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets. • A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman) • Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller) • An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren)
  •  
20.
  • Elias, Isaac, et al. (författare)
  • Fast Computation of Distance Estimators
  • 2007
  • Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 8, s. 89-
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. Results: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. Conclusion: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.
  •  
21.
  • Elias, Isaac, et al. (författare)
  • Fast neighbor joining
  • 2005
  • Ingår i: AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS. - Berlin, Heidelberg : Springer Berlin Heidelberg. - 3540275800 ; , s. 1263-1274
  • Konferensbidrag (refereegranskat)abstract
    • Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Theta(n(3)) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius. The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n(2)) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas.
  •  
22.
  • Elias, Isaac, et al. (författare)
  • Fast Neighbor Joining
  • 2009
  • Ingår i: Theoretical Computer Science. - : Elsevier BV. - 0304-3975 .- 1879-2294. ; 410:21-23, s. 1993-2000
  • Tidskriftsartikel (refereegranskat)abstract
    •  Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. it takes the distances between n taxa and produces in Theta(n(3)) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius. The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining(FNJ) with optimal reconstruction radius and optimal run time complexity O(n(2)) and (2) we present a greatly simplified proof for the correctness of NJ. initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for Computing the so-called correction formulas.
  •  
23.
  • Engblom, Camilla, et al. (författare)
  • Spatial transcriptomics of B cell and T cell receptors reveals lymphocyte clonal dynamics
  • 2023
  • Ingår i: Science. - : American Association for the Advancement of Science (AAAS). - 0036-8075 .- 1095-9203. ; 382:6675, s. 8486-
  • Tidskriftsartikel (refereegranskat)abstract
    • The spatial distribution of lymphocyte clones within tissues is critical to their development, selection, and expansion. We have developed spatial transcriptomics of variable, diversity, and joining (VDJ) sequences (Spatial VDJ), a method that maps B cell and T cell receptor sequences in human tissue sections. Spatial VDJ captures lymphocyte clones that match canonical B and T cell distributions and amplifies clonal sequences confirmed by orthogonal methods. We found spatial congruency between paired receptor chains, developed a computational framework to predict receptor pairs, and linked the expansion of distinct B cell clones to different tumor-associated gene expression programs. Spatial VDJ delineates B cell clonal diversity and lineage trajectories within their anatomical niche. Thus, Spatial VDJ captures lymphocyte spatial clonal architecture across tissues, providing a platform to harness clonal sequences for therapy.
  •  
24.
  •  
25.
  • Ensterö, Mats, et al. (författare)
  • A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins
  • 2010
  • Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals. Results: We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing. Conclusions: Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.
  •  
26.
  • Fernandez-Baca, D., et al. (författare)
  • A polynomial-time algorithm for near-perfect phylogeny
  • 2003
  • Ingår i: SIAM journal on computing (Print). - : Society for Industrial & Applied Mathematics (SIAM). - 0097-5397 .- 1095-7111. ; 32:5, s. 1115-1127
  • Tidskriftsartikel (refereegranskat)abstract
    • A parameterized version of the Steiner tree problem in phylogeny is defined, where the parameter measures the amount by which a phylogeny differs from perfection. This problem is shown to be solvable in polynomial time for any fixed value of the parameter.
  •  
27.
  • Flament, Maxime, et al. (författare)
  • An Approach to Fourth Generation Wireless Infrastructures - Scenarios and Key Research Issues
  • 1999
  • Ingår i: Vehicular Technology Conference, 1999 IEEE 49th. - : IEEE. - 0780355652 ; , s. 1742-1746, s. 1742-1746
  • Konferensbidrag (refereegranskat)abstract
    • Studying the feasibility and viability of various future infrastructure architectures and potential road-maps of their deployment is the focus of the 4th Generation Wireless Infrastructures (4GW) project within the strategic Personal Computing and Communication (PCC) program (Molin 1998). In attempting to realize the PCC vision “mobile multimedia to all at today's prices for fixed telephony”, a difficult problem arises. In contrast to the process of solving engineering and business problems in current or imminent wireless systems, where system concepts, requirements and markets are reasonably well known, very little is known about these things over a 10 year horizon. The approach used in the project to tackle this problem, is to use various scenario techniques. Plausible scenarios which describe the telecommunication scene in 2010 and which are used to determine potential technological and other bottlenecks in order to find key areas for research in this field are a very important element in these studies. Some of these scenarios are presented in this paper, together with some implications regarding bottlenecks and key research issues. The results are presented in terms of working assumptions (WAs) used with the project. The WAs are also proposed to provide a framework for interrelating different research activities within PCC.
  •  
28.
  • Flament, Maxime, et al. (författare)
  • Key Research Issues in 4th Generation Wireless Infrastructures
  • 1998
  • Konferensbidrag (refereegranskat)abstract
    • The world of communication is now developing faster than ever. Telecommunication infrastructure deployment in contrast is a slow and costly process demanding along-range strategic perspective in decision making. As aconsequence, R&D efforts are concerned with problems on a time horizon of 10 years or more. Studying the feasibility and viability of various future infrastructure architectures and potential road-maps of their deployment is the focus for the 4th Generation WirelessInfrastructures project within PCC. Determining technological and other bottlenecks to find key areas forr esearch in this area is a very important element in these studies.The methods used for this purpose are various scenario techniques. Plausible scenarios, describing the telecommunication scene in 2010, have been designed based on a number of global trends in technology,economy and politics. The scenario trends have also been verified by using a Delphi survey among leading industrials and scientists in Sweden. Based on these trends, three vivid scenarios are built which implicitly describe the different trends that have been created, for instance, the Big Brother, the Anything Goes, and thePocket Computing. At the end of the paper, the implications of the scenarios to the infrastructure research areas are discussed. In particular, the working assumptions and key research problems in each PCC/4GW work package are revisited and prioritized according to the scenarios. The scenarios are also proposed to provide a framework for inter-relating different research activities within PCC.
  •  
29.
  •  
30.
  • Flament, Maxime, et al. (författare)
  • Scenarios - A tool for starting a research process
  • 1998
  • Ingår i: Proceedings PCC Workshop, Stockholm. ; , s. 6-10
  • Konferensbidrag (refereegranskat)abstract
    • Scenarios can be used as a tool for starting a research process within a group. When putting together a research group of people with different backgrounds, differentcompetencies and different preconceived ideas, special attention has to be put on the problem of integration.Many different "tools" can be used to help merging thedifferent approaches of the individuals constituting thegroup. One such tool is scenario work. Trends arecreated by analyzing and structuring the environment around the project. Combining and extrapolating the trends gives a basis for creating the scenarios. The scenarios are then used to help refining research issues. Both the process and the result are hence of importance. The process aims at revealing the different approachesand possible conflicts due to the different  scientific andresearch traditions among the members of the group. Theresult forms a basis for refining and limit the researchquestion. The work within the 4th Generation Wireless Infrastructure project group of the PCC-project aims atfinding important design issues for the infrastructure of a future wireless communication system. Using the scenarios as a starting point the research questions originally put, have been successfully challenged andrefined.
  •  
31.
  •  
32.
  • Flament, Maxime, et al. (författare)
  • Telecom scenarios for the 4th Generation Wireless Infrastructures
  • 1998
  • Ingår i: Proceedings PCC Workshop, Stockholm. ; , s. 11-15
  • Konferensbidrag (refereegranskat)abstract
    • Telecommunication infrastructure deployment, incontrast to the rest of the communication area, is a slowand costly process, demanding a long-range strategic perspective in decision making. Determining key issues for strategic research in this area is thus very important. This paper describes detailed work to that aim, within the PCC project. The aim was to find possible scenariosfor the 4th Generation Wireless Infrastructures (4GW)around year 2010 and to determine their implications onthe direction of research in wireless communications. In this scenario work, a number of trends were created based on the current state of technology, economy and politics. These trends are verified by using Delphi methods. Based on these trends and additional research,three vivid scenarios are built, which picture differentways the trends may develop, The scenarios are called:"Big Brother Scenario", "Anything-Goes Scenario" and"Pocket Computing Scenario". At the end, the paper discusses the implications of the scenarios on thewireless communication research areas. In particular, the working assumptions and key research problems in eachwork package in the 4th Generation Wireless Infrastructure project are verified and prioritized according to the scenarios.
  •  
33.
  • Fritzell, Kajsa, et al. (författare)
  • ADARs and editing : The role of A-to-I RNA modification in cancer progression
  • 2018
  • Ingår i: Seminars in Cell and Developmental Biology. - : ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD. - 1084-9521 .- 1096-3634. ; 79, s. 123-130
  • Forskningsöversikt (refereegranskat)abstract
    • Cancer arises when pathways that control cell functions such as proliferation and migration are dysregulated to such an extent that cells start to divide uncontrollably and eventually spread throughout the body, ultimately endangering the survival of an affected individual. It is well established that somatic mutations are important in cancer initiation and progression as well as in creation of tumor diversity. Now also modifications of the transcriptome are emerging as a significant force during the transition from normal cell to malignant tumor. Editing of adenosine (A) to inosine (I) in double-stranded RNA, catalyzed by adenosine deaminases acting on RNA (ADARs), is one dynamic modification that in a combinatorial manner can give rise to a very diverse transcriptome. Since the cell interprets inosine as guanosine (G), editing can result in non-synonymous codon changes in transcripts as well as yield alternative splicing, but also affect targeting and disrupt maturation of microRNA. ADAR editing is essential for survival in mammals but its dysregulation can lead to cancer. ADAR1 is for instance overexpressed in, e.g., lung cancer, liver cancer, esophageal cancer and chronic myoelogenous leukemia, which with few exceptions promotes cancer progression. In contrast, ADAR2 is lowly expressed in e.g. glioblastoma, where the lower levels of ADAR2 editing leads to malignant phenotypes. Altogether, RNA editing by the ADAR enzymes is a powerful regulatory mechanism during tumorigenesis. Depending on the cell type, cancer progression seems to mainly be induced by ADAR1 upregulation or ADAR2 downregulation, although in a few cases ADAR1 is instead downregulated. In this review, we discuss how aberrant editing of specific substrates contributes to malignancy. 
  •  
34.
  •  
35.
  •  
36.
  • Frånberg, Mattias, 1985-, et al. (författare)
  • Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests
  • 2015
  • Ingår i: PLOS Genetics. - : Public Library of Science (PLoS). - 1553-7390 .- 1553-7404. ; 11:9
  • Tidskriftsartikel (refereegranskat)abstract
    • Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PRO-CARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.
  •  
37.
  • Frånberg, Mattias, 1985-, et al. (författare)
  • Fast and general tests of genetic interaction for genome-wide association studies
  • 2017
  • Ingår i: PloS Computational Biology. - : PUBLIC LIBRARY SCIENCE. - 1553-734X .- 1553-7358. ; 13:6
  • Tidskriftsartikel (refereegranskat)abstract
    • A complex disease has, by definition, multiple genetic causes. In theory, these causes could be identified individually, but their identification will likely benefit from informed use of anticipated interactions between causes. In addition, characterizing and understanding interactions must be considered key to revealing the etiology of any complex disease. Large-scale collaborative efforts are now paving the way for comprehensive studies of interaction. As a consequence, there is a need for methods with a computational efficiency sufficient for modern data sets as well as for improvements of statistical accuracy and power. Another issue is that, currently, the relation between different methods for interaction inference is in many cases not transparent, complicating the comparison and interpretation of results between different interaction studies. In this paper we present computationally efficient tests of interaction for the complete family of generalized linear models (GLMs). The tests can be applied for inference of single or multiple interaction parameters, but we show, by simulation, that jointly testing the full set of interaction parameters yields superior power and control of false positive rate. Based on these tests we also describe how to combine results from multiple independent studies of interaction in a meta-analysis. We investigate the impact of several assumptions commonly made when modeling interactions. We also show that, across the important class of models with a full set of interaction parameters, jointly testing the interaction parameters yields identical results. Further, we apply our method to genetic data for cardiovascular disease. This allowed us to identify a putative interaction involved in Lp(a) plasma levels between two 'tag' variants in the LPA locus (p = 2.42 . 10(-09)) as well as replicate the interaction (p = 6.97 . 10(-07)). Finally, our meta-analysis method is used in a small (N = 16,181) study of interactions in myocardial infarction.
  •  
38.
  • Frånberg, Mattias, 1985- (författare)
  • Statistical methods for detecting gene-gene and gene-environment interactions in genome-wide association studies
  • 2019
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Despite considerable effort to elucidate the genetic architecture of multi-factorial traits and diseases, there remains a gap between the estimated heritability (e.g., from twin studies) and the heritability explained by discovered genetic variants. The existence of interactions between different genes, and between genes and the environment, has frequently been hypothesized as a likely cause of this discrepancy. However, the statistical inference of interactions is plagued by limited sample sizes, high computational requirements, and incomplete knowledge of how the measurement scale and parameterization affect the analysis.This thesis addresses the major statistical, computational, and modeling issues that hamper large-scale interaction studies today. Furthermore, it investigates whether gene-gene and gene-environment interactions are significantly involved in the development of diseases linked to atherosclerosis. Firstly, I develop two statistical methods that can be used to study of gene-gene interactions: the first is tailored for limited sample size situations, and the second enables multiple analyses to be combined into large meta-analyses. I perform comprehensive simulation studies to determine that these methods have higher or equal statistical power than contemporary methods, scale-invariance is required to guard against false positives, and that saturated parameterizations perform well in terms of statistical power. In two studies, I apply the two proposed methods to case/control data from myocardial infarction and associated phenotypes. In both studies, we identify putative interactions for myocardial infarction but are unable to replicate the interactions in a separate cohort. In the second study, however, we identify and replicate a putative interaction involved in Lp(a) plasma levels between two variants rs3103353 and rs9458157. Secondly, I develop a multivariate statistical method that simultaneously estimates the effects of genetic variants, environmental variables, and their interactions. I show by extensive simulations that this method achieves statistical power close to the optimal oracle method. We use this method to study the involvement of gene-environment interactions in intima-media thickness, a phenotype relevant for coronary artery disease. We identify a putative interaction between a genetic variant in the KCTD8 gene and alcohol use, thus suggesting an influence on intima-media thickness. The methods developed to support the analyses in this thesis as well as a selection of other prominent methods in the field is implemented in a software package called besiq.In conclusion, this thesis presents statistical methods, and the associated software, that allows large-scale studies of gene-gene and gene-environment interactions to be effortlessly undertaken.
  •  
39.
  • Geras, Agnieszka, et al. (författare)
  • Celloscope : a probabilistic model for marker-gene-driven cell type deconvolution in spatial transcriptomics data
  • 2023
  • Ingår i: Genome Biology. - : Springer Nature. - 1465-6906 .- 1474-760X. ; 24:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Spatial transcriptomics maps gene expression across tissues, posing the challenge of determining the spatial arrangement of different cell types. However, spatial transcriptomics spots contain multiple cells. Therefore, the observed signal comes from mixtures of cells of different types. Here, we propose an innovative probabilistic model, Celloscope, that utilizes established prior knowledge on marker genes for cell type deconvolution from spatial transcriptomics data. Celloscope outperforms other methods on simulated data, successfully indicates known brain structures and spatially distinguishes between inhibitory and excitatory neuron types based in mouse brain tissue, and dissects large heterogeneity of immune infiltrate composition in prostate gland tissue.
  •  
40.
  • Hallett, M., et al. (författare)
  • Simultaneous identification of duplications and lateral transfers
  • 2004
  • Ingår i: Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB. - New York, New York, USA : ACM Press. ; , s. 347-356
  • Konferensbidrag (refereegranskat)abstract
    • This paper introduces a combinatorial model that incorporates duplication events as well as lateral gene transfer events (a.k.a. horizontal gene transfer events). To the best of our knowledge, this is the first such model containing both of these events. A so-called dt-scenario is used to explain differences between a gene tree T and species trees S. The model is biologically as well as mathematically sound. Among other biological considerations, the model respects the partial order of evolution implied by 5 by demanding that the dt-scenarios are "acyclic". We present fixed parameter tractable algorithms that count the minimum number of duplications and lateral transfers, and more generally can compute the set of pairs (t, d) where d is the minimum number of duplications required by any explanation that requires t lateral transfers. This allows us to also compute a weighted parsimony score. We also show how gene loss events can be incorporated into our model. We also give an NP-completeness proof which suggests that the intractability is due to the demand that the dt-scenarios be acyclic. When this condition is removed, we can show that the problem is computable in polynomial time via dynamic programming. By generating "synthetic" gene and species trees via a birth-death process, we explored the capacity of our algorithms to faithfully reconstruct the actual number of events taken place. The results are positive.
  •  
41.
  • Hjelm, M., et al. (författare)
  • New Probabilistic network models and algorithms for oncogenesis
  • 2006
  • Ingår i: Journal of Computational Biology. - : Mary Ann Liebert Inc. - 1066-5277 .- 1557-8666. ; 13:4, s. 853-865
  • Tidskriftsartikel (refereegranskat)abstract
    • Chromosomal aberrations in solid tumors appear in complex patterns. It is important to understand how these patterns develop, the dynamics of the process, the temporal or even causal order between aberrations, and the involved pathways. Here we present network models for chromosomal aberrations and algorithms for training models based on observed data. Our models are generative probabilistic models that can be used to study dynamical aspects of chromosomal evolution in cancer cells. They are well suited for a graphical representation that conveys the pathways found in a dataset. By allowing only pairwise dependencies and partition aberrations into modules, in which all aberrations are restricted to have the same dependencies, we reduce the number of parameters so that datasets sizes relevant to cancer applications can be handled. We apply our framework to a dataset of colorectal cancer tumor karyotypes. The obtained model explains the data significantly better than a model where independence between the aberrations is assumed. In fact, the obtained model performs very well with respect to several measures of goodness of fit and is, with respect to repetition of the training, more or less unique.
  •  
42.
  • Hotti, Alexandra, et al. (författare)
  • Benefits of Non-Linear Scale Parameterizations in Black Box Variational Inference through Smoothness Results and Gradient Variance Bounds
  • 2024
  • Ingår i: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024. - : ML Research Press. ; , s. 3538-3546
  • Konferensbidrag (refereegranskat)abstract
    • Black box variational inference has consistently produced impressive empirical results. Convergence guarantees require that the variational objective exhibits specific structural properties and that the noise of the gradient estimator can be controlled. In this work we study the smoothness and the variance of the gradient estimator for location-scale variational families with non-linear covariance parameterizations. Specifically, we derive novel theoretical results for the popular exponential covariance parameterization and tighter gradient variance bounds for the softplus parameterization. These results reveal the benefits of using non-linear scale parameterizations on large scale datasets. With a non-linear scale parameterization, the smoothness constant of the variational objective and the upper bound on the gradient variance decrease as the scale parameter becomes smaller. Learning posterior approximations with small scales is essential in Bayesian statistics with sufficient amount of data, since under appropriate assumptions, the posterior distribution is known to contract around the parameter of interest as the sample size increases. We validate our theoretical findings through empirical analysis on several large-scale datasets, underscoring the importance of non-linear parameterizations.
  •  
43.
  • Håstad, Johan, et al. (författare)
  • Fitting points on the real line and its application to RH mapping
  • 2003
  • Ingår i: Journal of Algorithms. - : Elsevier BV. - 0196-6774 .- 1090-2678. ; 49:1, s. 42-62
  • Tidskriftsartikel (refereegranskat)abstract
    • A natural problem is that of, given an n x n symmetric matrix D, finding an arrangement of n points on the real line such that the so obtained distances agree as well as possible with the by D specified distances. We refer to the variation in which the difference in distance is measured in maximum norm as the MATRIX-TO-LINE problem. The MATRIX-TO-LINE problem has previously been shown to be NP-complete [J.B. Saxe, 17th Allerton Conference in Communication, Control, and Computing, 1979, pp. 480-489]. We show that it can be approximated within 2, but unless P = NP not within 7/5 - delta for any delta > 0. We also show a tight lower bound under a stronger assumption. We show that the MATRIX-TO-LINE problem cannot be approximated within 2 - delta unless 3-colorable graphs can be colored with [4/delta] colors in polynomial time. Currently, the best polynomial time algorithm colors a 3-colorable graph with (O) over tilde (n(3/14)) colors [A. Blum, D. Karger, Inform. Process. Lett. 61 (1), (1997), 49-53]. We apply our MATRIX-TO-LINE algorithm to a problem in computational biology, namely, the Radiation Hybrid (RH) problem. That is, the algorithmic part of a physical mapping method called RH mapping. This gives us the first algorithm with a guaranteed convergence for the general RH problem.
  •  
44.
  • Iglesias, Maria Jesus, et al. (författare)
  • Combined Chromatin and Expression Analysis Reveals Specific Regulatory Mechanisms within Cytokine Genes in the Macrophage Early Immune Response
  • 2012
  • Ingår i: PLOS ONE. - : Public Library of Science (PLoS). - 1932-6203. ; 7:2, s. e32306-
  • Tidskriftsartikel (refereegranskat)abstract
    • Macrophages play a critical role in innate immunity, and the expression of early response genes orchestrate much of the initial response of the immune system. Macrophages undergo extensive transcriptional reprogramming in response to inflammatory stimuli such as Lipopolysaccharide (LPS). To identify gene transcription regulation patterns involved in early innate immune responses, we used two genome-wide approaches - gene expression profiling and chromatin immunoprecipitation-sequencing (ChIP-seq) analysis. We examined the effect of 2 hrs LPS stimulation on early gene expression and its relation to chromatin remodeling (H3 acetylation; H3Ac) and promoter binding of Sp1 and RNA polymerase II phosphorylated at serine 5 (S5P RNAPII), which is a marker for transcriptional initiation. Our results indicate novel and alternative gene regulatory mechanisms for certain proinflammatory genes. We identified two groups of upregulated inflammatory genes with respect to chromatin modification and promoter features. One group, including highly up-regulated genes such as tumor necrosis factor (TNF), was characterized by H3Ac, high CpG content and lack of TATA boxes. The second group, containing inflammatory mediators (interleukins and CCL chemokines), was up-regulated upon LPS stimulation despite lacking H3Ac in their annotated promoters, which were low in CpG content but did contain TATA boxes. Genome-wide analysis showed that few H3Ac peaks were unique to either +/-LPS condition. However, within these, an unpacking/expansion of already existing H3Ac peaks was observed upon LPS stimulation. In contrast, a significant proportion of S5P RNAPII peaks (approx 40%) was unique to either condition. Furthermore, data indicated a large portion of previously unannotated TSSs, particularly in LPS-stimulated macrophages, where only 28% of unique S5P RNAPII peaks overlap annotated promoters. The regulation of the inflammatory response appears to occur in a very specific manner at the chromatin level for specific genes and this study highlights the level of fine-tuning that occurs in the immune response.
  •  
45.
  • Ivansson, L., et al. (författare)
  • Algorithms for RH mapping : New ideas and improved analysis
  • 2004
  • Ingår i: SIAM journal on computing (Print). - : Society for Industrial & Applied Mathematics (SIAM). - 0097-5397 .- 1095-7111. ; 34:1, s. 89-108
  • Tidskriftsartikel (refereegranskat)abstract
    • Radiation hybrid ( RH) mapping is a technique for constructing a physical map describing the locations of n markers on a chromosome of an organism. In [J. Comput. Biol., 4 (1997), pp. 517-533], Ben-Dor and Chor presented new algorithms for the RH problem and gave the first performance guarantees for such algorithms. We improve the lower bounds on the number of experiments in a way that is sufficient for two of these algorithms to give a correct ordering of the markers with high probability. Not only are the new bounds tighter, but our analysis also captures to a much higher extent how the bounds depend on the actual arrangement of the markers. Furthermore, we modify the two algorithms to utilize RH mapping data produced with several radiation intensities. We show that the new algorithms are almost insensitive to the problem of using the correct intensity.
  •  
46.
  • Jun, Seong-Hwan, et al. (författare)
  • Reconstructing clonal tree for phylo-phenotypic characterization of cancer using single-cell transcriptomics
  • 2023
  • Ingår i: Nature Communications. - : Springer Nature. - 2041-1723. ; 14:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Functional characterization of the cancer clones can shed light on the evolutionary mechanisms driving cancer's proliferation and relapse mechanisms. Single-cell RNA sequencing data provide grounds for understanding the functional state of cancer as a whole; however, much research remains to identify and reconstruct clonal relationships toward characterizing the changes in functions of individual clones. We present PhylEx that integrates bulk genomics data with co-occurrences of mutations from single-cell RNA sequencing data to reconstruct high-fidelity clonal trees. We evaluate PhylEx on synthetic and well-characterized high-grade serous ovarian cancer cell line datasets. PhylEx outperforms the state-of-the-art methods both when comparing capacity for clonal tree reconstruction and for identifying clones. We analyze high-grade serous ovarian cancer and breast cancer data to show that PhylEx exploits clonal expression profiles beyond what is possible with expression-based clustering methods and clear the way for accurate inference of clonal trees and robust phylo-phenotypic analysis of cancer. The functional changes of individual clones in single cell RNA sequencing (scRNA-seq) data remain elusive. Here, the authors develop PhylEx that integrates bulk genomics data with co-occurrences of mutations revealed by scRNA-seq data and apply it to high-grade serous ovarian cancer cell line and breast cancer datasets.
  •  
47.
  • Kang, Wenjing, 1988- (författare)
  • microRNAs: from biogenesis to organismal tracing
  • 2020
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • MicroRNAs (miRNAs) are short noncoding RNAs of around 22 nucleotides in length, which help to shape the expression of most mRNAs. Perturbation of miRNA expression has revealed a variety of defects in development, cell specification, physiology and behavior. This thesis focuses on two topics of miRNA: identification of structural features that influence miRNA biogenesis (Paper I) and application of taxonomical marker miRNAs to resolve organismal origin of samples (Paper II and III).The current model of miRNA hairpin biogenesis has limited information content and appears to be incomplete. In paper I, we apply a novel high-throughput screening method to profile the optimal structure of miRNA hairpins for efficient and precise miRNA biogenesis. The optimal structure consists of tight and loose local structures across the hairpin, which reflects the constraints of biogenesis proteins. We find that miRNA hairpins with stable lower basal stem are more efficiently processed and have a higher expression level in tissues of 20 animal species. We address that the structural features - which have been largely neglected in the current model - are in fact as important as the well-known sequence motifs.New miRNAs are continuously added over evolutionary time and are rarely secondarily lost, making them ideal taxonomical markers. In paper II, we demonstrate as a proof-of-principle that miRNAs can be used to trace biological sample back to the lineage or even species of origin. Based on the marker miRNAs, we develop miRTrace, the first software to accurately trace miRNA sequences back to their taxonomical origin. The method can sensitively identify the origin of single cells and detect parasitic nematode RNA in mammalian host blood sample. In paper III, we apply miRNA tracing to address a controversial question about the origin of the exogenous plant miRNAs (xenomiRs) found in human samples, and which have been proposed to regulate human gene expression. Our computational and experimental results provide evidence that xenomiRs are derived from technical artifacts rather than dietary intake.
  •  
48.
  • Khan, Mehmood Alam (författare)
  • Computational Problems in Modeling Evolution and Inferring Gene Families.
  • 2016
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Over the last few decades, phylogenetics has emerged as a very promising field, facilitating a comparative framework to explain the genetic relationships among all the living organisms on earth. These genetic relationships are typically represented by a bifurcating phylogenetic tree — the tree of life. Reconstructing a phylogenetic tree is one of the central tasks in evolutionary biology. The different evolutionary processes, such as gene duplications, gene losses, speciation, and lateral gene transfer events, make the phylogeny reconstruction task more difficult. However, with the rapid developments in sequencing technologies and availability of genome-scale sequencing data, give us the opportunity to understand these evolutionary processes in a more informed manner, and ultimately, enable us to reconstruct genes and species phylogenies more accurately. This thesis is an attempt to provide computational methods for phylogenetic inference and give tools to conduct genome-scale comparative evolutionary studies, such as detecting homologous sequences and inferring gene families.In the first project, we present FastPhylo as a software package containing fast tools for reconstructing distance-based phylogenies. It implements the previously published efficient algorithms for estimating a distance matrix from the input sequences and reconstructing an un-rooted Neighbour Joining tree from a given distance matrix. Results on simulated datasets reveal that FastPhylo can handles hundred of thousands of sequences in a minimum time and memory efficient manner. The easy to use, well-defined interfaces, and the modular structure of FastPhylo allows it to be used in very large Bioinformatic pipelines.In the second project, we present a synteny-aware gene homology method, called GenFamClust (GFC) that uses gene content and gene order conservation to detect homology. Results on simulated and biological datasets suggest that local synteny information combined with the sequence similarity improves the detection of homologs.In the third project, we introduce a novel phylogeny-based clustering method, PhyloGenClust, which partitions a very large gene family into smaller subfamilies. ROC (receiver operating characteristics) analysis on synthetic datasets show that PhyloGenClust identify subfamilies more accurately. PhyloGenClust can be used as a middle tier clustering method between raw clustering methods, such as sequence similarity methods, and more sophisticated Bayesian-based phylogeny methods.Finally, we introduce a novel probabilistic Bayesian method based on the DLTRS model, to sample reconciliations of a gene tree inside a species tree. The method uses MCMC framework to integrate LGTs, gene duplications, gene losses and sequence evolution under a relaxed molecular clock for substitution rates. The proposed sampling method estimates the posterior distribution of gene trees and provides the temporal information of LGT events over the lineages of a species tree. Analysis on simulated datasets reveal that our method performs well in identifying the true temporal estimates of LGT events. We applied our method to the genome-wide gene families for mollicutes and cyanobacteria, which gave an interesting insight into the potential LGTs highways. 
  •  
49.
  • Khan, Mehmood Alam, et al. (författare)
  • fastphylo : Fast tools for phylogenetics
  • 2013
  • Ingår i: BMC Bioinformatics. - : BioMed Central. - 1471-2105. ; 14:1, s. 334-
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.
  •  
50.
  • Khan, Mehmood Alam, et al. (författare)
  • Probabilistic inference of lateral gene transfer events
  • 2016
  • Ingår i: BMC Bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 17:Suppl 14
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge.Results: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify highways of LGT.Conclusions: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 106
Typ av publikation
tidskriftsartikel (51)
konferensbidrag (19)
annan publikation (16)
doktorsavhandling (16)
forskningsöversikt (2)
bokkapitel (2)
visa fler...
visa färre...
Typ av innehåll
refereegranskat (70)
övrigt vetenskapligt/konstnärligt (36)
Författare/redaktör
Lagergren, Jens (85)
Sennblad, Bengt (20)
Arvestad, Lars (14)
Lagergren, Jens, Pro ... (11)
Toosi, Hosein (7)
Zander, Jens (6)
visa fler...
Queseth, Olav (6)
Lagergren, Fredrik (6)
Gessler, Fredrik (6)
Stridh, Rickard (6)
Unbehaun, Matthias (6)
Wu, Jiang (6)
Shahrabi Farahani, H ... (6)
Lundeberg, Joakim (5)
Hartman, Johan (5)
Chen, Xinsong (5)
Jun, Seong-Hwan (5)
Hamsten, Anders (4)
Sjöstrand, Joel (4)
Öhman, Marie (4)
Engblom, Camilla (4)
Melin, Harald (4)
Elias, Isaac (4)
Flament, Maxime (4)
Hagemann-Jensen, Mic ... (4)
Frisen, Jonas (3)
Åkerborg, Örjan (3)
Parviainen, Pekka (3)
Mold, Jeff (3)
Hallett, M (2)
Höglund, Mattias (2)
Sandberg, Rickard (2)
Andersson, Alma (2)
Eriksson, Jens (2)
Andersson, Samuel A. (2)
Lagergren, Lars (2)
Kurt, Semih (2)
Berglund, Ann-Charlo ... (2)
Bergenstråhle, Jose ... (2)
Flament, Maxime, 197 ... (2)
Bergenstråhle, Ludvi ... (2)
Thrane, Kim (2)
Sjölund, Erik (2)
Saarenpää, Sami (2)
Michaelsson, Jakob (2)
Chen, Mandi (2)
Mold, Jeff E. (2)
Nowick, Katja (2)
Lin, Qirong (2)
Mantovani, Giulia (2)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (97)
Stockholms universitet (19)
Karolinska Institutet (18)
Chalmers tekniska högskola (5)
Uppsala universitet (3)
Lunds universitet (2)
visa fler...
Malmö universitet (2)
Högskolan i Halmstad (1)
Linköpings universitet (1)
Linnéuniversitetet (1)
visa färre...
Språk
Engelska (104)
Svenska (2)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (82)
Medicin och hälsovetenskap (12)
Teknik (9)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy