SwePub - sökning: WFRF:(Corander Jukka)

Numrering	Referens	Omslagsbild	Hitta
1.	Corander, Jukka, 1965-, et al. (författare) Have I seen you before? : Principles of Bayesian predictive classification revisited 2013 Ingår i: Statistics and computing. - : Springer Berlin/Heidelberg. - 0960-3174 .- 1573-1375. ; 23:1, s. 59-73 Tidskriftsartikel (refereegranskat)abstract A general inductive Bayesian classification framework is considered using a simultaneous predictive distribution for test items. We introduce a principle of generative supervised and semi-supervised classification based on marginalizing the joint posterior distribution of labels for all test items. The simultaneous and marginalized classifiers arise under different loss functions, while both acknowledge jointly all uncertainty about the labels of test items and the generating probability measures of the classes. We illustrate for data from multiple finite alphabets that such classifiers achieve higher correct classification rates than a standard marginal predictive classifier which labels all test items independently, when training data are sparse. In the supervised case for multiple finite alphabets the simultaneous and the marginal classifiers are proven to become equal under generalized exchangeability when the amount of training data increases. Hence, the marginal classifier can be interpreted as an asymptotic approximation to the simultaneous classifier for finite sets of training data. It is also shown that such convergence is not guaranteed in the semi-supervised setting, where the marginal classifier does not provide a consistent approximation.
2.	Beaumont, Mark A, et al. (författare) In defence of model-based inference in phylogeography. 2010 Ingår i: Molecular Ecology. - 0962-1083 .- 1365-294X. ; 19:3, s. 436-446 Tidskriftsartikel (refereegranskat)abstract Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.
3.	Chatterjee, Saikat, et al. (författare) SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition 2014 Ingår i: Bioinformatics. - : Oxford University Press. - 1460-2059 .- 1367-4803 .- 1367-4811. ; 30:17, s. 2423-2431 Tidskriftsartikel (refereegranskat)abstract Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment.Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.Availability and implementation: A platform-independent Matlab implementation of the method is freely available at http://www.ee.kth.se/ctsoftware; source code that does not require access to Matlab is currently being tested and will be made available later through the above Web site.
4.	Corander, Jukka, et al. (författare) A bayesian random fragment insertion model for de novo detection of DNA regulatory binding regions 2007 Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Identification of regulatory binding motifs within DNA sequences is a commonly occurring problem in computationnl bioinformatics. A wide variety of statistical approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Most approaches assume the existence of reliable biodatabasc information to build probabilistic a priori description of the motif classes. No method has been previously proposed for finding the number of putative de novo motif types and their positions within a set of DNA sequences. As the number of sequenced genomes from a wide variety of organisms is constantly increasing, there is a clear need for such methods. Here we introduce a Bayesian unsupervised approach for this purpose by using recent advances in the theory of predictive classification and Markov chain Monte Carlo computation. Our modelling framework enables formal statistical inference in a large-scale sequence screening and we illustrate it by a set of examples.
5.	Corander, Jukka, et al. (författare) A tribute to Mats Gyllenberg, on the occasion of his 60th birthday 2016 Ingår i: Journal of Mathematical Biology. - : Springer Science and Business Media LLC. - 0303-6812 .- 1432-1416. ; 72:4, s. 793-795 Tidskriftsartikel (refereegranskat)
6.	Corander, Jukka, et al. (författare) Bayesian Block-Diagonal Predictive Classifier for Gaussian Data 2013 Ingår i: Synergies of Soft Computing and Statistics for Intelligent Data Analysis. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783642330414 - 9783642330421 ; , s. 543-551 Bokkapitel (refereegranskat)abstract The paper presents a method for constructing Bayesian predictive classifier in a high-dimensional setting. Given that classes are represented by Gaussian distributions with block-structured covariance matrix, a closed form expression for the posterior predictive distribution of the data is established. Due to factorization of this distribution, the resulting Bayesian predictive and marginal classifier provides an efficient solution to the high-dimensional problem by splitting it into smaller tractable problems. In a simulation study we show that the suggested classifier outperforms several alternative algorithms such as linear discriminant analysis based on block-wise inverse covariance estimators and the shrunken centroids regularized discriminant analysis.
7.	Corander, Jukka, et al. (författare) Bayesian model learning based on a parallel MCMC strategy 2006 Ingår i: Statistics and computing. - : Springer Science and Business Media LLC. - 0960-3174 .- 1573-1375. ; 16:4, s. 355-362 Tidskriftsartikel (refereegranskat)abstract We introduce a novel Markov chain Monte Carlo algorithm for estimation of posterior probabilities over discrete model spaces. Our learning approach is applicable to families of models for which the marginal likelihood can be analytically calculated, either exactly or approximately, given any fixed structure. It is argued that for certain model neighborhood structures, the ordinary reversible Metropolis-Hastings algorithm does not yield an appropriate solution to the estimation problem. Therefore, we develop an alternative, non-reversible algorithm which can avoid the scaling effect of the neighborhood. To efficiently explore a model space, a finite number of interacting parallel stochastic processes is utilized. Our interaction scheme enables exploration of several local neighborhoods of a model space simultaneously, while it prevents the absorption of any particular process to a relatively inferior state. We illustrate the advantages of our method by an application to a classification model. In particular, we use an extensive bacterial database and compare our results with results obtained by different methods for the same data.
8.	Corander, Jukka, et al. (författare) Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy 2009 Ingår i: Advances in Data Analysis and Classification. - : Springer Berlin/Heidelberg. - 1862-5347 .- 1862-5355. ; 3:1, s. 3-24 Tidskriftsartikel (refereegranskat)abstract Advantages of statistical model-based unsupervised classification over heuristic alternatives have been widely demonstrated in the scientific literature. However, the existing model-based approaches are often both conceptually and numerically instable for large and complex data sets. Here we consider a Bayesian model-based method for unsupervised classification of discrete valued vectors, that has certain advantages over standard solutions based on latent class models. Our theoretical formulation defines a posterior probability measure on the space of classification solutions corresponding to stochastic partitions of observed data. To efficiently explore the classification space we use a parallel search strategy based on non-reversible stochastic processes. A decision-theoretic approach is utilized to formalize the inferential process in the context of unsupervised classification. Both real and simulated data sets are used for the illustration of the discussed methods.
9.	Corander, Jukka, et al. (författare) Bayesian Unsupervised Learning of DNA Regulatory Binding Regions 2009 Ingår i: Advances in Artificial Intelligence. - : Hindawi Publishing Corporation. - 1687-7470 .- 1687-7489. ; , s. 219743- Tidskriftsartikel (refereegranskat)abstract Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Mostapproaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and theirpositions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.
10.	Corander, Jukka, et al. (författare) Inductive Inference and Partition Exchangeability in Classification 2013 Ingår i: Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. - Berlin, Heidelberg : Springer Berlin/Heidelberg. ; , s. 91-105 Konferensbidrag (refereegranskat)abstract Inductive inference has been a subject of intensive research efforts over several decades. In particular, for classification problems substantial advances have been made and the field has matured into a wide range of powerful approaches to inductive inference. However, a considerable challenge arises when deriving principles for an inductive supervised classifier in the presence of unpredictable or unanticipated events corresponding to unknown alphabets of observable features. Bayesian inductive theories based on de Finetti type exchangeability which have become popular in supervised classification do not apply to such problems. Here we derive an inductive supervised classifier based on partition exchangeability due to John Kingman. It is proven that, in contrast to classifiers based on de Finetti type exchangeability which can optimally handle test items independently of each other in the presence of infinite amounts of training data, a classifier based on partition exchangeability still continues to benefit from a joint prediction of labels for the whole population of test items. Some remarks about the relation of this work to generic convergence results in predictive inference are also given.
11.	Corander, Jukka, et al. (författare) Learning Genetic Population Structures Using Minimization of Stochastic Complexity 2010 Ingår i: Entropy. - : MDPI AG. - 1099-4300. ; 12:5, s. 1102-1124 Tidskriftsartikel (refereegranskat)abstract Considerable research efforts have been devoted to probabilistic modeling of genetic population structures within the past decade. In particular, a wide spectrum of Bayesian models have been proposed for unlinked molecular marker data from diploid organisms. Here we derive a theoretical framework for learning genetic population structure of a haploid organism from bi-allelic markers for which potential patterns of dependence are a priori unknown and to be explicitly incorporated in the model. Our framework is based on the principle of minimizing stochastic complexity of an unsupervised classification under tree augmented factorization of the predictive data distribution. We discuss a fast implementation of the learning framework using deterministic algorithms.
12.	Corander, Jukka, 1971- (författare) On Bayesian graphical model determination 2000 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract A graphical model specifies a graph representation of the independence structure of a multivariate distribution, where nodes represent variables and edges association between variables.This thesis introduces methodology for determination of graphical models for multivariate distributions within the exponential family. Model determination is understood in the present context as quantification of the uncertainty about the association structure, given empirical observations. Only models with symmetric associations between variables are considered. The distributions investigated are multinomial, multinormal and conditional Gaussian (CG) distributions. Local graphical models which generalize the graphical loglinear models for multinomial distributions are introduced. These models allow conditional associations to be absent locally, in parts of the sample space.A unifying theme is that the models are represented in terms of affine restrictions to the parameters of a regular exponential model. All introduced methods are applicable to the complete class of graphical models, consisting of both decomposable and non-decomposable models. Various real data sets investigated earlier in the graphical modeling literature are used to illustrate the methods.Two different measures of model uncertainty are considered: the posterior probability and the relative expected utility of a model. Posterior probabilities are estimated by a Markov chain Monte Carlo sampling method. The other measure of model uncertainty is derived in a decision theoretic framework under reference priors for the model parameters.The expected logarithmic utility of a model is decomposed into predictive performance and relative cost. The predictive performance is measured by posterior expectation of the negative entropy of the distribution induced by a graphical model. This expectation has an analytic expression for decomposable models, while a simulation consistent estimate can be obtained for non-decomposable models. The expected logarithmic utility is asymptotically equivalent to the Schwarz criterion under a certain cost function.
13.	Corander, Jukka, et al. (författare) Optimal Viterbi Bayesian predictive classification for data from finite alphabets 2013 Ingår i: Journal of Statistical Planning and Inference. - : Elsevier BV. - 0378-3758 .- 1873-1171. ; 143:2, s. 261-275 Tidskriftsartikel (refereegranskat)abstract A family of Viterbi Bayesian predictive classifiers has been recently popularized for speech recognition applications with continuous acoustic signals modeled by finite mixture densities embedded in a hidden Markov framework. Here we generalize such classifiers to sequentially observed data from multiple finite alphabets and derive the optimal predictive classifier under exchangeability of the emitted symbols. We demonstrate that the optimal predictive classifier which learns from unlabelled test items improves considerably upon marginal maximum a posteriori rule in the presence of sparse training data. It is shown that the learning process saturates when the amount of test data tends to infinity, such that no further gain in classification accuracy is possible upon arrival of new test items in the long run.
14.	Corander, Jukka, et al. (författare) Parallell interacting MCMC for learning of topologies of graphical models 2008 Ingår i: Data mining and knowledge discovery. - : Springer Science and Business Media LLC. - 1384-5810 .- 1573-756X. ; 17:3, s. 431-456 Tidskriftsartikel (refereegranskat)abstract Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis-Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.
15.	Corander, Jukka, et al. (författare) Random partition models and exchangeability for Bayesian identification of population structure 2007 Ingår i: Bulletin of Mathematical Biology. - : Springer Science and Business Media LLC. - 0092-8240 .- 1522-9602. ; 69:3, s. 797-815 Tidskriftsartikel (refereegranskat)abstract We introduce a Bayesian theoretical formulation of the statistical learning problem concerning the genetic structure of populations. The two key concepts in our derivation are exchangeability in its various forms and random allocation models. Implications of our results to empirical investigation of the population structure are discussed.
16.	Ekdahl, Magnus, 1979- (författare) Approximations of Bayes Classifiers for Statistical Learning of Clusters 2006 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract It is rarely possible to use an optimal classifier. Often the classifier used for a specific problem is an approximation of the optimal classifier. Methods are presented for evaluating the performance of an approximation in the model class of Bayesian Networks. Specifically for the approximation of class conditional independence a bound for the performance is sharpened.The class conditional independence approximation is connected to the minimum description length principle (MDL), which is connected to Jeffreys’ prior through commonly used assumptions. One algorithm for unsupervised classification is presented and compared against other unsupervised classifiers on three data sets.
17.	Flahou, Bram, et al. (författare) Evidence for a primate origin of zoonotic Helicobacter suis colonizing domesticated pigs. 2018 Ingår i: The ISME journal. - : Springer Science and Business Media LLC. - 1751-7370 .- 1751-7362. ; 12:1, s. 77-86 Tidskriftsartikel (refereegranskat)abstract Helicobacter suis is the second most prevalent Helicobacter species in the stomach of humans suffering from gastric disease. This bacterium mainly inhabits the stomach of domesticated pigs, in which it causes gastric disease, but it appears to be absent in wild boars. Interestingly, it also colonizes the stomach of asymptomatic rhesus and cynomolgus monkeys. The origin of modern human-, pig- or non-human primate-associated H. suis strains in these respective host populations was hitherto unknown. Here we show that H. suis in pigs possibly originates from non-human primates. Our data suggest that a host jump from macaques to pigs happened between 100000 and 15000 years ago and that pig domestication has had a significant impact on the spread of H. suis in the pig population, from where this pathogen occasionally infects humans. Thus, in contrast to our expectations, H. suis appears to have evolved in its main host in a completely different way than its close relative Helicobacter pylori in humans.
18.	Franzén, Jessica, 1975- (författare) Bayesian Cluster Analysis : Some Extensions to Non-standard Situations 2008 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations.
19.	Jääskinen, Väinö, et al. (författare) Sparse Markov Chains for Sequence Data. Scandinavian Journal of Statistics 2014 Ingår i: Scandinavian Journal of Statistics. - New York : John Wiley & Sons, Inc.. - 0303-6898 .- 1467-9469. ; 41:3, s. 639-655 Tidskriftsartikel (refereegranskat)abstract Finite memory sources and variable-length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree-based approaches. This can lead to a substantially higher rate of data compression, and such non-hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.
20.	Koslicki, David, et al. (författare) ARK : Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition 2015 Ingår i: PLOS ONE. - : PUBLIC LIBRARY SCIENCE. - 1932-6203. ; 10:10 Tidskriftsartikel (refereegranskat)abstract Motivation Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. Results There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. Availability An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
21.	Laajala, Teemu D, et al. (författare) Improved statistical modeling of tumor growth and treatment effect in preclinical animal studies with highly heterogeneous responses in vivo. 2012 Ingår i: Clinical cancer research : an official journal of the American Association for Cancer Research. - 1078-0432. ; 18:16, s. 4385-96 Tidskriftsartikel (refereegranskat)abstract Preclinical tumor growth experiments often result in heterogeneous datasets that include growing, regressing, or stable growth profiles in the treatment and control groups. Such confounding intertumor variability may mask the true treatment effects especially when less aggressive treatment alternatives are being evaluated. Experimental design: We developed a statistical modeling approach in which the growing and poorly growing tumor categories were automatically detected by means of an expectation-maximization algorithm coupled within a mixed-effects modeling framework. The framework is implemented and distributed as an R package, which enables model estimation and statistical inference, as well as statistical power and precision analyses.
22.	Mageiros, Leonardos, et al. (författare) Genome evolution and the emergence of pathogenicity in avian Escherichia coli 2021 Ingår i: Nature Communications. - : Springer Nature. - 2041-1723. ; 12:1 Tidskriftsartikel (refereegranskat)abstract Chickens are the most common birds on Earth and colibacillosis is among the most common diseases affecting them. This major threat to animal welfare and safe sustainable food production is difficult to combat because the etiological agent, avian pathogenic Escherichia coli (APEC), emerges from ubiquitous commensal gut bacteria, with no single virulence gene present in all disease-causing isolates. Here, we address the underlying evolutionary mechanisms of extraintestinal spread and systemic infection in poultry. Combining population scale comparative genomics and pangenome-wide association studies, we compare E. coli from commensal carriage and systemic infections. We identify phylogroup-specific and species-wide genetic elements that are enriched in APEC, including pathogenicity-associated variation in 143 genes that have diverse functions, including genes involved in metabolism, lipopolysaccharide synthesis, heat shock response, antimicrobial resistance and toxicity. We find that horizontal gene transfer spreads pathogenicity elements, allowing divergent clones to cause infection. Finally, a Random Forest model prediction of disease status (carriage vs. disease) identifies pathogenic strains in the emergent ST-117 poultry-associated lineage with 73% accuracy, demonstrating the potential for early identification of emergent APEC in healthy flocks.
23.	Nettelblad, Carl, 1985- (författare) Two Optimization Problems in Genetics : Multi-dimensional QTL Analysis and Haplotype Inference 2012 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract The existence of new technologies, implemented in efficient platforms and workflows has made massive genotyping available to all fields of biology and medicine. Genetic analyses are no longer dominated by experimental work in laboratories, but rather the interpretation of the resulting data. When billions of data points representing thousands of individuals are available, efficient computational tools are required. The focus of this thesis is on developing models, methods and implementations for such tools.The first theme of the thesis is multi-dimensional scans for quantitative trait loci (QTL) in experimental crosses. By mating individuals from different lines, it is possible to gather data that can be used to pinpoint the genetic variation that influences specific traits to specific genome loci. However, it is natural to expect multiple genes influencing a single trait to interact. The thesis discusses model structure and model selection, giving new insight regarding under what conditions orthogonal models can be devised. The thesis also presents a new optimization method for efficiently and accurately locating QTL, and performing the permuted data searches needed for significance testing. This method has been implemented in a software package that can seamlessly perform the searches on grid computing infrastructures.The other theme in the thesis is the development of adapted optimization schemes for using hidden Markov models in tracing allele inheritance pathways, and specifically inferring haplotypes. The advances presented form the basis for more accurate and non-biased line origin probabilities in experimental crosses, especially multi-generational ones. We show that the new tools are able to reconstruct haplotypes and even genotypes in founder individuals and offspring alike, based on only unordered offspring genotypes. The tools can also handle larger populations than competing methods, resolving inheritance pathways and phase in much larger and more complex populations. Finally, the methods presented are also applicable to datasets where individual relationships are not known, which is frequently the case in human genetics studies. One immediate application for this would be improved accuracy for imputation of SNP markers within genome-wide association studies (GWAS).
24.	Nyman, Henrik, et al. (författare) Context-specific independence in graphical log-linear models 2016 Ingår i: Computational statistics (Zeitschrift). - : Springer. - 0943-4062 .- 1613-9658. ; 31:4, s. 1493-1512 Tidskriftsartikel (refereegranskat)abstract Log-linear models are the popular workhorses of analyzing contingency tables. A log-linear parameterization of an interaction model can be more expressive than a direct parameterization based on probabilities, leading to a powerful way of defining restrictions derived from marginal, conditional and context-specific independence. However, parameter estimation is often simpler under a direct parameterization, provided that the model enjoys certain decomposability properties. Here we introduce a cyclical projection algorithm for obtaining maximum likelihood estimates of log-linear parameters under an arbitrary context-specific graphical log-linear model, which needs not satisfy criteria of decomposability. We illustrate that lifting the restriction of decomposability makes the models more expressive, such that additional context-specific independencies embedded in real data can be identified. It is also shown how a context-specific graphical model can correspond to a non-hierarchical log-linear parameterization with a concise interpretation. This observation can pave way to further development of non-hierarchical log-linear models, which have been largely neglected due to their believed lack of interpretability.
25.	Nyman, Henrik, et al. (författare) Stratified Graphical Models : Context-Specific Independence in Graphical Models 2014 Ingår i: BAYESIAN ANAL. - 1931-6690. ; 9:4, s. 883-908 Tidskriftsartikel (refereegranskat)abstract Theory of graphical models has matured over more than three decades to provide the backbone for several classes of models that are used in a myriad of applications such as genetic mapping of diseases, credit risk evaluation, reliability and computer security. Despite their generic applicability and wide adoption, the constraints imposed by undirected graphical models and Bayesian networks have also been recognized to be unnecessarily stringent under certain circumstances. This observation has led to the proposal of several generalizations that aim at more relaxed constraints by which the models can impose local or context-specific dependence structures. Here we consider an additional class of such models, termed stratified graphical models. We develop a method for Bayesian learning of these models by deriving an analytical expression for the marginal likelihood of data under a specific subclass of decomposable stratified models. A non-reversible Markov chain Monte Carlo approach is further used to identify models that are highly supported by the posterior distribution over the model space. Our method is illustrated and compared with ordinary graphical models through application to several real and synthetic datasets.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "WFRF:(Corander Jukka) "

Avgränsa träffmängd

År