SwePub - sökning: WFRF:(Koski Timo)

Numrering	Referens	Omslagsbild	Hitta
1.	Armerin, Fredrik, 1971-, et al. (författare) Forecasting Ranking in Harness Racing Using Probabilities Induced by Expected Positions 2019 Ingår i: Applied Artificial Intelligence. - : TAYLOR & FRANCIS INC. - 0883-9514 .- 1087-6545. ; 33:2, s. 171-189 Tidskriftsartikel (refereegranskat)abstract Ranked events are pivotal in many important AI-applications such as Question Answering and recommendations systems. This paper studies ranked events in the setting of harness racing. For each horse there exists a probability distribution over its possible rankings. In the paper, it is shown that a set of expected positions (and more generally, higher moments) for the horses induces this probability distribution. The main contribution of the paper is a method, which extracts this induced probability distribution from a set of expected positions. An algorithm is proposed where the extraction of the induced distribution is given by the estimated expectations. MATLAB code is provided for the methodology. This approach gives freedom to model the horses in many different ways without the restrictions imposed by for instance logistic regression. To illustrate this point, we employ a neural network and ordinary ridge regression. The method is applied to predicting the distribution of the finishing positions for horses in harness racing. It outperforms both multinomial logistic regression and the market odds. The ease of use combined with fine results from the suggested approach constitutes a relevant addition to the increasingly important field of ranked events.
2.	Austin, Brian, et al. (författare) Sliding window discretization : A new method for multiple band matching of bacterial genotyping fingerprints 2004 Ingår i: Bulletin of Mathematical Biology. - : Springer Science and Business Media LLC. - 0092-8240 .- 1522-9602. ; 66:6, s. 1575-1596 Tidskriftsartikel (refereegranskat)abstract Microbiologists have traditionally applied hierarchical clustering algorithms as their mathematical tool of choice to unravel the taxonomic relationships between micro-organisms. However, the interpretation of such hierarchical classifications suffers from being subjective, in that a variety of ad hoc choices must be made during their construction. On the other hand, the application of more profound and objective mathematical methods - such as the minimization of stochastic complexity - for the classification of bacterial genotyping fingerprints data is hampered by the prerequisite that such methods only act upon vectorized data. In this paper we introduce a new method, coined sliding window discretization, for the transformation of genotypic fingerprint patterns into binary vector format. In the context of an extensive amplified fragment length polymorphism (AFLP) data set of 507 strains from the Vibrionaceae family that has previously been analysed, we demonstrate by comparison with a number of other discretization methods that this new discretization method results in minimal loss of the original information content captured in the banding patterns. Finally, we investigate the implications of the different discretization methods on the classification of bacterial genotyping fingerprints by minimization of stochastic complexity, as it is implemented in the BinClass software package for probabilistic clustering of binary vectors. The new taxonomic insights learned from the resulting classification of the AFLP patterns will prove the value of combining sliding window discretization with minimization of stochastic complexity, as an alternative classification algorithm for bacterial genotyping fingerprints.
3.	Berglund, Daniel, et al. (författare) Measures of Additive Interactionand Effect Direction Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Measures for additive interaction are defined using risk ratios. These ratios need to be modeled so that all combinations of the exposures are harmful, as the scale between protective and harmful factors differs. This remodeling is referred to as recoding. Previously, recoding has been thought of as random. In this paper, we will examine and discuss the impact of recoding in studies with small effect sizes, such as genome wide association studies, and the impact recoding has on significance testing.
4.	Berglund, Daniel (författare) Models for Additive and Sufficient Cause Interaction 2019 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract The aim of this thesis is to develop and explore models in, and related to, the sufficient cause framework, and additive interaction. Additive interaction is closely connected with public health interventions and can be used to make inferences about the sufficient causes in order to find the mechanisms behind an outcome, for instance a disease.In paper A we extend the additive interaction, and interventions, to include continuous exposures. We show that there does not exist a model that does not lead to inconsistent conclusions about the interaction.The sufficient cause framework can also be expressed using Boolean functions, which is expanded upon in paper B. In this paper we define a new model based on the multifactor potential outcome model (MFPO) and independence of causal influence models (ICI).In paper C we discuss the modeling and estimation of additive interaction in relation to if the exposures are harmful or protective conditioned on some other exposure. If there is uncertainty about the effects direction there can be errors in the testing of the interaction effect.
5.	Berglund, Daniel, et al. (författare) On Probabilistic Multifactor Potential Outcome Models Annan publikation (övrigt vetenskapligt/konstnärligt)abstract The sufficient cause framework describes how sets of sufficient causes are responsible for causing some event or outcome. It is known that it is closely connected with Boolean functions. In this paper we define this relation formally, and show how it can be used together with Fourier expansion of the Boolean functions to lead to new insights. The main result is a probibalistic version of the multifactor potential outcome model based on independence of causal influence models and Bayesian networks.
6.	Berglund, Daniel, et al. (författare) On the Existence of Suitable Models for Additive Interaction with Continuous Exposures Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Additive interaction can be of importance for public health interventions and it is commonly defined using binary exposures. There has been expansions of the models to also include continuous exposures, which could lead to better and more precise estimations of the effect of interventions. In this paper we define the intervention for a continuous exposure as a monotonic function. Based on this function for the interventions we prove that there is no model for estimating additive interactions with continuous exposures for which it holds that; (i) both exposures have marginal effects and no additive interaction on the exposure level for both exposures, (ii) neither exposure has marginal effect and there is additive interaction between the exposures. We also show that a logistic regression model for continuous exposures will always produce additive interaction if both exposures have marginal effects.
7.	Corander, Jukka, et al. (författare) A bayesian random fragment insertion model for de novo detection of DNA regulatory binding regions 2007 Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Identification of regulatory binding motifs within DNA sequences is a commonly occurring problem in computationnl bioinformatics. A wide variety of statistical approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Most approaches assume the existence of reliable biodatabasc information to build probabilistic a priori description of the motif classes. No method has been previously proposed for finding the number of putative de novo motif types and their positions within a set of DNA sequences. As the number of sequenced genomes from a wide variety of organisms is constantly increasing, there is a clear need for such methods. Here we introduce a Bayesian unsupervised approach for this purpose by using recent advances in the theory of predictive classification and Markov chain Monte Carlo computation. Our modelling framework enables formal statistical inference in a large-scale sequence screening and we illustrate it by a set of examples.
8.	Corander, Jukka, et al. (författare) A tribute to Mats Gyllenberg, on the occasion of his 60th birthday 2016 Ingår i: Journal of Mathematical Biology. - : Springer Science and Business Media LLC. - 0303-6812 .- 1432-1416. ; 72:4, s. 793-795 Tidskriftsartikel (refereegranskat)
9.	Corander, Jukka, et al. (författare) Bayesian Block-Diagonal Predictive Classifier for Gaussian Data 2013 Ingår i: Synergies of Soft Computing and Statistics for Intelligent Data Analysis. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783642330414 - 9783642330421 ; , s. 543-551 Bokkapitel (refereegranskat)abstract The paper presents a method for constructing Bayesian predictive classifier in a high-dimensional setting. Given that classes are represented by Gaussian distributions with block-structured covariance matrix, a closed form expression for the posterior predictive distribution of the data is established. Due to factorization of this distribution, the resulting Bayesian predictive and marginal classifier provides an efficient solution to the high-dimensional problem by splitting it into smaller tractable problems. In a simulation study we show that the suggested classifier outperforms several alternative algorithms such as linear discriminant analysis based on block-wise inverse covariance estimators and the shrunken centroids regularized discriminant analysis.
10.	Corander, Jukka, et al. (författare) Bayesian model learning based on a parallel MCMC strategy 2006 Ingår i: Statistics and computing. - : Springer Science and Business Media LLC. - 0960-3174 .- 1573-1375. ; 16:4, s. 355-362 Tidskriftsartikel (refereegranskat)abstract We introduce a novel Markov chain Monte Carlo algorithm for estimation of posterior probabilities over discrete model spaces. Our learning approach is applicable to families of models for which the marginal likelihood can be analytically calculated, either exactly or approximately, given any fixed structure. It is argued that for certain model neighborhood structures, the ordinary reversible Metropolis-Hastings algorithm does not yield an appropriate solution to the estimation problem. Therefore, we develop an alternative, non-reversible algorithm which can avoid the scaling effect of the neighborhood. To efficiently explore a model space, a finite number of interacting parallel stochastic processes is utilized. Our interaction scheme enables exploration of several local neighborhoods of a model space simultaneously, while it prevents the absorption of any particular process to a relatively inferior state. We illustrate the advantages of our method by an application to a classification model. In particular, we use an extensive bacterial database and compare our results with results obtained by different methods for the same data.
11.	Corander, Jukka, et al. (författare) Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy 2009 Ingår i: Advances in Data Analysis and Classification. - : Springer Berlin/Heidelberg. - 1862-5347 .- 1862-5355. ; 3:1, s. 3-24 Tidskriftsartikel (refereegranskat)abstract Advantages of statistical model-based unsupervised classification over heuristic alternatives have been widely demonstrated in the scientific literature. However, the existing model-based approaches are often both conceptually and numerically instable for large and complex data sets. Here we consider a Bayesian model-based method for unsupervised classification of discrete valued vectors, that has certain advantages over standard solutions based on latent class models. Our theoretical formulation defines a posterior probability measure on the space of classification solutions corresponding to stochastic partitions of observed data. To efficiently explore the classification space we use a parallel search strategy based on non-reversible stochastic processes. A decision-theoretic approach is utilized to formalize the inferential process in the context of unsupervised classification. Both real and simulated data sets are used for the illustration of the discussed methods.
12.	Corander, Jukka, et al. (författare) Bayesian Unsupervised Learning of DNA Regulatory Binding Regions 2009 Ingår i: Advances in Artificial Intelligence. - : Hindawi Publishing Corporation. - 1687-7470 .- 1687-7489. ; , s. 219743- Tidskriftsartikel (refereegranskat)abstract Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Mostapproaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and theirpositions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.
13.	Corander, Jukka, 1965-, et al. (författare) Have I seen you before? : Principles of Bayesian predictive classification revisited 2013 Ingår i: Statistics and computing. - : Springer Berlin/Heidelberg. - 0960-3174 .- 1573-1375. ; 23:1, s. 59-73 Tidskriftsartikel (refereegranskat)abstract A general inductive Bayesian classification framework is considered using a simultaneous predictive distribution for test items. We introduce a principle of generative supervised and semi-supervised classification based on marginalizing the joint posterior distribution of labels for all test items. The simultaneous and marginalized classifiers arise under different loss functions, while both acknowledge jointly all uncertainty about the labels of test items and the generating probability measures of the classes. We illustrate for data from multiple finite alphabets that such classifiers achieve higher correct classification rates than a standard marginal predictive classifier which labels all test items independently, when training data are sparse. In the supervised case for multiple finite alphabets the simultaneous and the marginal classifiers are proven to become equal under generalized exchangeability when the amount of training data increases. Hence, the marginal classifier can be interpreted as an asymptotic approximation to the simultaneous classifier for finite sets of training data. It is also shown that such convergence is not guaranteed in the semi-supervised setting, where the marginal classifier does not provide a consistent approximation.
14.	Corander, Jukka, et al. (författare) Inductive Inference and Partition Exchangeability in Classification 2013 Ingår i: Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. - Berlin, Heidelberg : Springer Berlin/Heidelberg. ; , s. 91-105 Konferensbidrag (refereegranskat)abstract Inductive inference has been a subject of intensive research efforts over several decades. In particular, for classification problems substantial advances have been made and the field has matured into a wide range of powerful approaches to inductive inference. However, a considerable challenge arises when deriving principles for an inductive supervised classifier in the presence of unpredictable or unanticipated events corresponding to unknown alphabets of observable features. Bayesian inductive theories based on de Finetti type exchangeability which have become popular in supervised classification do not apply to such problems. Here we derive an inductive supervised classifier based on partition exchangeability due to John Kingman. It is proven that, in contrast to classifiers based on de Finetti type exchangeability which can optimally handle test items independently of each other in the presence of infinite amounts of training data, a classifier based on partition exchangeability still continues to benefit from a joint prediction of labels for the whole population of test items. Some remarks about the relation of this work to generic convergence results in predictive inference are also given.
15.	Corander, Jukka, et al. (författare) Learning Genetic Population Structures Using Minimization of Stochastic Complexity 2010 Ingår i: Entropy. - : MDPI AG. - 1099-4300. ; 12:5, s. 1102-1124 Tidskriftsartikel (refereegranskat)abstract Considerable research efforts have been devoted to probabilistic modeling of genetic population structures within the past decade. In particular, a wide spectrum of Bayesian models have been proposed for unlinked molecular marker data from diploid organisms. Here we derive a theoretical framework for learning genetic population structure of a haploid organism from bi-allelic markers for which potential patterns of dependence are a priori unknown and to be explicitly incorporated in the model. Our framework is based on the principle of minimizing stochastic complexity of an unsupervised classification under tree augmented factorization of the predictive data distribution. We discuss a fast implementation of the learning framework using deterministic algorithms.
16.	Corander, Jukka, et al. (författare) Optimal Viterbi Bayesian predictive classification for data from finite alphabets 2013 Ingår i: Journal of Statistical Planning and Inference. - : Elsevier BV. - 0378-3758 .- 1873-1171. ; 143:2, s. 261-275 Tidskriftsartikel (refereegranskat)abstract A family of Viterbi Bayesian predictive classifiers has been recently popularized for speech recognition applications with continuous acoustic signals modeled by finite mixture densities embedded in a hidden Markov framework. Here we generalize such classifiers to sequentially observed data from multiple finite alphabets and derive the optimal predictive classifier under exchangeability of the emitted symbols. We demonstrate that the optimal predictive classifier which learns from unlabelled test items improves considerably upon marginal maximum a posteriori rule in the presence of sparse training data. It is shown that the learning process saturates when the amount of test data tends to infinity, such that no further gain in classification accuracy is possible upon arrival of new test items in the long run.
17.	Corander, Jukka, et al. (författare) Parallell interacting MCMC for learning of topologies of graphical models 2008 Ingår i: Data mining and knowledge discovery. - : Springer Science and Business Media LLC. - 1384-5810 .- 1573-756X. ; 17:3, s. 431-456 Tidskriftsartikel (refereegranskat)abstract Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis-Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.
18.	Corander, Jukka, et al. (författare) Random partition models and exchangeability for Bayesian identification of population structure 2007 Ingår i: Bulletin of Mathematical Biology. - : Springer Science and Business Media LLC. - 0092-8240 .- 1522-9602. ; 69:3, s. 797-815 Tidskriftsartikel (refereegranskat)abstract We introduce a Bayesian theoretical formulation of the statistical learning problem concerning the genetic structure of populations. The two key concepts in our derivation are exchangeability in its various forms and random allocation models. Implications of our results to empirical investigation of the population structure are discussed.
19.	Cui, Y., et al. (författare) Simultaneous Predictive Gaussian Classifiers 2016 Ingår i: Journal of Classification. - : Springer-Verlag New York. - 0176-4268 .- 1432-1343. ; , s. 1-30 Tidskriftsartikel (refereegranskat)abstract Gaussian distribution has for several decades been ubiquitous in the theory and practice of statistical classification. Despite the early proposals motivating the use of predictive inference to design a classifier, this approach has gained relatively little attention apart from certain specific applications, such as speech recognition where its optimality has been widely acknowledged. Here we examine statistical properties of different inductive classification rules under a generic Gaussian model and demonstrate the optimality of considering simultaneous classification of multiple samples under an attractive loss function. It is shown that the simpler independent classification of samples leads asymptotically to the same optimal rule as the simultaneous classifier when the amount of training data increases, if the dimensionality of the feature space is bounded in an appropriate manner. Numerical investigations suggest that the simultaneous predictive classifier can lead to higher classification accuracy than the independent rule in the low-dimensional case, whereas the simultaneous approach suffers more from noise when the dimensionality increases.
20.	Dawyndt, Peter, et al. (författare) A complementary approach to systematics 2005 Ingår i: Microbiology Today. - 1464-0570. ; :February, s. 38-38 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
21.	Ekdahl, Magnus, 1979- (författare) Approximations of Bayes Classifiers for Statistical Learning of Clusters 2006 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract It is rarely possible to use an optimal classifier. Often the classifier used for a specific problem is an approximation of the optimal classifier. Methods are presented for evaluating the performance of an approximation in the model class of Bayesian Networks. Specifically for the approximation of class conditional independence a bound for the performance is sharpened.The class conditional independence approximation is connected to the minimum description length principle (MDL), which is connected to Jeffreys’ prior through commonly used assumptions. One algorithm for unsupervised classification is presented and compared against other unsupervised classifiers on three data sets.
22.	Ekdahl, Magnus, et al. (författare) Bounds for the loss in probability of correct classification under model based approximation 2006 Ingår i: Journal of machine learning research. - 1532-4435 .- 1533-7928. ; 7, s. 2449-2480 Tidskriftsartikel (refereegranskat)abstract In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classification when compared to the optimal classifier. An example of an approximation is the Naive Bayes classifier. We show that the performance of the Naive Bayes depends on the degree of functional dependence between the features and labels. We provide a sufficient condition for zero loss of performance, too.
23.	Ekdahl, Magnus, et al. (författare) Concentrated or non-concentrated discrete distributions are almost independent 2007 Annan publikation (övrigt vetenskapligt/konstnärligt)abstract The task of approximating a simultaneous distribution with a product of distributions in a single variable is important in the theory and applications of classification and learning, probabilistic reasoning, and random algmithms. The evaluation of the goodness of this approximation by statistical independence amounts to bounding uniformly upwards the difference between a joint distribution and the product of the distributions (marginals). In this paper we develop a bound that uses information about the most probable state to find a sharp estimate, which is often as sharp as possible. We also examine the extreme cases of concentration and non-conccntmtion, respectively, of the approximated distribution.
24.	Ekdahl, Magnus, 1979-, et al. (författare) On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers 2007 Ingår i: Machine Learning and Data Mining in Pattern Recognition. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783540734987 - 9783540734994 - 3540734988 ; , s. 2-16 Bokkapitel (refereegranskat)abstract Computational procedures using independence assumptions in various forms are popular in machine learning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. This paper derives distributions that maximizes the statewise difference to the respective product of marginals. These distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent ('Naïve') classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.
25.	Ekdahl, Magnus, 1979-, et al. (författare) On the Performance of Approximations of Bayesian Networks in Model- 2006 Ingår i: The Annual Workshop of the Swedish Artificial Intelligence Society,2006. - Umeå : SAIS. ; , s. 73- Konferensbidrag (refereegranskat)abstract When the true class conditional model and class probabilities are approximated in a pattern recognition/classification problem the performance of the optimal classifier is expected to deteriorate. But calculating this reduction is far from trivial in the general case. We present one generalization, and easily computable formulas for estimating the degradation in performance with respect to the optimal classifier. An example of an approximation is the Naive Bayes classifier. We generalize and sharpen results for evaluating this classifier.
26.	Eriksson, Håkan B., et al. (författare) A genie-aided detector with a probabilistic description of the side information 1995 Ingår i: Proceedings. - Piscataway, NJ : IEEE Communications Society. - 0780324536 ; , s. 332- Konferensbidrag (refereegranskat)abstract Building on Forney's concept of the genie (1972), and introducing the idea of an explicit statistical description of the side information provided to the genie-aided detector, we develop a generic tool for derivation of lower bounds on the bit-error rate of any actual receiver. With this approach, the side information statistics become design parameters, which may be chosen to give the resulting bound a desired structure. To illustrate this, we choose statistics in order to obtain a special case: the lower bound derived by Mazo (1975). The statistical description of the side information makes the lower bounding a transparent application of Bayesian theory
27.	Favero, Martina, et al. (författare) A dual process for the coupled Wright-Fisher diffusion Ingår i: Journal of Mathematical Biology. - 0303-6812 .- 1432-1416. Tidskriftsartikel (refereegranskat)abstract The coupled Wright-Fisher diffusion is a multi-dimensional Wright-Fisher diffusion for multi-locus and multi-allelic genetic frequencies, expressed as the strong solution to a system of stochastic differential equations that are coupled in the drift, where the pairwise interaction among loci is modelled by an inter-locus selection. In this paper, an ancestral process, which is dual to the coupled Wright-Fisher diffusion, is derived. The dual process corresponds to the block counting process of coupled ancestral selection graphs, one for each locus. Jumps of the dual process arise from coalescence, mutation, single-branching, which occur at one locus at the time, and double-branching, which occur simultaneously at two loci. The coalescence and mutation rates have the typical structure of the transition rates of the Kingman coalescent process. The single-branching rate not only contains the one-locus selection parameters in a form that generalises the rates of an ancestral selection graph, but it also contains the two-locus selection parameters to include the effect of the pairwise interaction on the single loci. The double-branching rate reflects the particular structure of pairwise selection interactions of the coupled Wright-Fisher diffusion. Moreover, in the special case of two loci, two alleles, with selection and parent independent mutation, the stationary density for the coupled Wright-Fisher diffusion and the transition rates of the dual process are obtained in an explicit form.
28.	Favero, Martina, et al. (författare) A dual process for the coupled Wright-Fisher diffusion 2021 Ingår i: Journal of Mathematical Biology. - : Springer Nature. - 0303-6812 .- 1432-1416. ; 82:1-2 Tidskriftsartikel (refereegranskat)abstract The coupled Wright-Fisher diffusion is a multi-dimensional Wright-Fisher diffusion for multi-locus and multi-allelic genetic frequencies, expressed as the strong solution to a system of stochastic differential equations that are coupled in the drift, where the pairwise interaction among loci is modelled by an inter-locus selection. In this paper, an ancestral process, which is dual to the coupled Wright-Fisher diffusion, is derived. The dual process corresponds to the block counting process of coupled ancestral selection graphs, one for each locus. Jumps of the dual process arise from coalescence, mutation, single-branching, which occur at one locus at the time, and double-branching, which occur simultaneously at two loci. The coalescence and mutation rates have the typical structure of the transition rates of the Kingman coalescent process. The single-branching rate not only contains the one-locus selection parameters in a form that generalises the rates of an ancestral selection graph, but it also contains the two-locus selection parameters to include the effect of the pairwise interaction on the single loci. The double-branching rate reflects the particular structure of pairwise selection interactions of the coupled Wright-Fisher diffusion. Moreover, in the special case of two loci, two alleles, with selection and parent independent mutation, the stationary density for the coupled Wright-Fisher diffusion and the transition rates of the dual process are obtained in an explicit form.
29.	Garcia-Pareja, Celia, et al. (författare) EXACT SIMULATION OF COUPLED WRIGHT-FISHER DIFFUSIONS 2021 Ingår i: Advances in Applied Probability. - : Cambridge University Press (CUP). - 0001-8678 .- 1475-6064. ; 53:4, s. 923-950 Tidskriftsartikel (refereegranskat)abstract In this paper an exact rejection algorithm for simulating paths of the coupled Wright- Fisher diffusion is introduced. The coupled Wright-Fisher diffusion is a family of multivariate Wright-Fisher diffusions that have drifts depending on each other through a coupling term and that find applications in the study of networks of interacting genes. The proposed rejection algorithm uses independent neutral Wright-Fisher diffusions as candidate proposals, which are only needed at a finite number of points. Once a candidate is accepted, the remainder of the path can be recovered by sampling from neutral multivariate Wright-Fisher bridges, for which an exact sampling strategy is also provided. Finally, the algorithm's complexity is derived and its performance demonstrated in a simulation study.
30.	Geilhufe, R. Matthias, et al. (författare) Materials Informatics for Dark Matter Detection 2018 Ingår i: Physica Status Solidi. Rapid Research Letters. - : Wiley. - 1862-6254 .- 1862-6270. ; 12:11 Tidskriftsartikel (refereegranskat)abstract Dark Matter particles are commonly assumed to be weakly interacting massive particles (WIMPs) with a mass in the GeV to TeV range. However, recent interest has shifted toward lighter WIMPs, which are more difficult to probe experimentally. A detection of sub-GeV WIMPs will require the use of small gap materials in sensors. Using recent estimates of the WIMP mass, we identify the relevant target space toward small gap materials (100 to 10 meV). Dirac Materials, a class of small- or zero-gap materials, emerge as natural candidates for sensors for Dark Matter detection. We propose the use of informatics tools to rapidly assay materials band structures to search for small gap semiconductors and semimetals, rather than focusing on a few preselected compounds. As a specific example of the proposed strategy, we use the organic materials database () to identify organic candidates for sensors: the narrow band gap semiconductors BNQ-TTF and DEBTTT with gaps of 40 and 38 meV, and the Dirac-line semimetal (BEDT-TTF)center dot Br which exhibits a tiny gap of approximate to 50 meV when spin-orbit coupling is included. We outline a novel and powerful approach to search for dark matter detection sensor materials by means of a rapid assay of materials using informatics tools.
31.	Gyllenberg, M., et al. (författare) Bayesian predictiveness, exchangeability and sufficientness in bacterial taxonomy 2002 Ingår i: Mathematical Biosciences. - 0025-5564 .- 1879-3134. ; 177-178, s. 161-184 Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract We present a theory of classification and predictive identification of bacteria. Bacterial strains are characterized by a binary vector and the taxonomy is specified by attaching a label to each vector. The theory is developed from only two basic assumptions, viz. that the sequence of pairs of feature vectors and the attached labels is judged (infinitely) exchangeable and predictively sufficient. We derive expressions for the training error and the probability of identification error and show that latter is an affine function of the former. We prove the law of large numbers for identification matrices, which contain the fundamental information of bacterial data. We prove the Bayesian risk consistency of the predictive identification rule given by the theory and show that the training error is a consistent estimate of the generalization error. © 2002 Published by Elsevier Science Inc.
32.	Gyllenberg, M, et al. (författare) New methods for the analysis of binarized BIOLOG GN data of vibrio species : Minimization of stochastic complexity and cumulative classification 2002 Ingår i: Systematic and Applied Microbiology. - : Elsevier BV. - 0723-2020 .- 1618-0984. ; 25:3, s. 403-415 Tidskriftsartikel (refereegranskat)abstract We apply minimization of stochastic complexity and the closely related method of cumulative classification to analyse the extensively studied BIOLOG GN data of Vibrio spp. Minimization of stochastic complexity provides an objective tool of bacterial taxonomy as it produces classifications that are optimal from the point of view of information theory. We compare the outcome of our results with previously published classifications of the same data set. Our results both confirm earlier detected relationships between species and discover new ones.
33.	Gyllenberg, Mats, et al. (författare) Non-uniqueness in probabilistic numerical identification of bacteria 1994 Ingår i: Journal of Applied Probability. - : Cambridge University Press (CUP). - 0021-9002 .- 1475-6072. ; 31:2, s. 542-548 Tidskriftsartikel (refereegranskat)abstract In this note we point out an inherent difficulty in numerical identification of bacteria. The problem is that of uniqueness of the taxonomic structure or, in mathematical terms, the lack of statistical identifiability of finite mixtures of multivariate Bernoulli probability distributions shown here.
34.	Gyllenberg, Mats, et al. (författare) Non-uniqueness of numerical taxonomic structures 1993 Ingår i: Binary Computing in Microbiology. - 0266-304X. ; 5:4, s. 138-144 Tidskriftsartikel (refereegranskat)abstract The most important methods of numerical taxonomy in microbiology are based on so called reference matrices giving the frequencies of positive binary features of the different taxa. Microbiologists seem to have been tacitly assuming that every well-defined classification method, that is, every algorithm for constructing a reference matrix from data, leads to a unique classification (reference matrix). We use a mathematical result-that a finite mixture of multivariate Bernoulli distributions is always unidentifiable-to disprove this assumption. We show that the same classification method applied to the same data can always lead to different classifications. This result is of importance for the foundations of computational microbial taxonomy. It is illustrated by simple examples from the two main methods of classification and identification: the one where classification is performed first and then followed by identification, and cumulative classification where classification and identification are carried out simultaneously. The consequences of the non-uniqueness result for microbiological practice are discussed
35.	Gyllenberg, Mats, et al. (författare) Null recurrence in a stochastic Ricker model 1994 Ingår i: Analysis, algebra, and computers in mathematical research. - New York : Marcel Dekker Incorporated. - 0824792173 ; , s. 147-164 Konferensbidrag (refereegranskat)abstract We consider a nonlinear first order stochastic difference equation which may be viewed as a stochastic perturbation of {\it W. E. Ricker's} [J. Fish. Res. Bd. Can. 11, 559-623 (1954)] deterministic model of population growth. Numerical experiments seem to suggest that the corresponding Markov process has a stationary probability distribution but this is shown to be false by proving that the process is in fact null recurrent
36.	Gyllenberg, Mats, et al. (författare) Population models with environmental stochasticity 1994 Ingår i: Journal of Mathematical Biology. - 0303-6812 .- 1432-1416. ; 32:2, s. 93-108 Tidskriftsartikel (refereegranskat)abstract Two discrete population models, one with stochasticity in the carrying capacity and one with stochasticity in the per capita growth rate, are investigated. Conditions under which the corresponding Markov processes are null recurrent and positively recurrent are derived
37.	Gyllenberg, M, et al. (författare) Probabilistic models for bacterial taxonomy 2001 Ingår i: International Statistical Review. - 0306-7734 .- 1751-5823. ; 69:2, s. 249-276 Forskningsöversikt (refereegranskat)abstract We give a survey of different partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological information in a Bayesian forecasting setting. We show that there is a close connection between classification and probabilistic identification and that, in fact, our approach ties these two concepts together in a coherent way.
38.	Hallgren, Jonas (författare) Continuous time Graphical Models and Decomposition Sampling 2015 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract Two topics in temporal graphical probabilistic models are studied. The topics are treated in separate papers, both with applications in finance. The first paper study inference in dynamic Bayesian networks using Monte Carlo methods. A new method for sampling random variables is proposed. The method divides the sample space into subspaces. This allows the sampling to be done in parallel with independent and distinct sampling methods on the subspaces. The methodology is demonstrated on a volatility model and some toy examples with promising results. The second paper treats probabilistic graphical models in continuous time —a class of models with the ability to express causality. Tools for inference in these models are developed and employed in the design of a causality measure. The framework is used to analyze tick-by-tick data from the foreign exchange market.
39.	Hallgren, Jonas, et al. (författare) Decomposition Sampling Applied to Parallelization of Metropolis-Hastings Annan publikation (övrigt vetenskapligt/konstnärligt)
40.	Hallgren, Jonas (författare) Inference in Temporal Graphical Models 2016 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract This thesis develops mathematical tools used to model and forecast different economic phenomena. The primary starting point is the temporal graphical model. Four main topics, all with applications in finance, are studied.The first two papers develop inference methods for networks of continuous time Markov processes, so called Continuous Time Bayesian Networks. Methodology for learning the structure of the network and for doing inference and simulation is developed. Further, models are developed for high frequency foreign exchange data.The third paper models growth of gross domestic product (GDP) which is observed at a very low frequency. This application is special and has several difficulties which are dealt with in a novel way using a framework developed in the paper. The framework is motivated using a temporal graphical model. The method is evaluated on US GDP growth with good results.The fourth paper study inference in dynamic Bayesian networks using Monte Carlo methods. A new method for sampling random variables is proposed. The method divides the sample space into subspaces. This allows the sampling to be done in parallel with independent and distinct sampling methods on the subspaces. The methodology is demonstrated on a volatility model for stock prices and some toy examples with promising results.The fifth paper develops an algorithm for learning the full distribution in a harness race, a ranked event. It is demonstrated that the proposed methodology outperforms logistic regression which is the main competitor. It also outperforms the market odds in terms of accuracy.
41.	Hallgren, Jonas, et al. (författare) Testing for Causality in Continuous time Bayesian Network Models of High-Frequency Data Annan publikation (övrigt vetenskapligt/konstnärligt)
42.	Janzura, M., et al. (författare) Minimum entropy of error principle in estimation 1994 Ingår i: Information Sciences. - : Elsevier BV. - 0020-0255 .- 1872-6291. ; 79:1-2, s. 123-144 Tidskriftsartikel (refereegranskat)abstract The principle of minimum error entropy estimation as found in the work of Weidemann and Stear is reformulated as a problem of finding optimum locations of probability densities in a given mixture such that the resulting (differential) entropy is minimized. New results concerning the entropy lower bound are derived. Continuity of the entropy and attaining the minimum entropy are proved in the case where the mixture is finite. Some other examples and situations, in particular that of symmetric unimodal densities, are studied in more detail
43.	Janzura, M., et al. (författare) On the structure and applications of minimum entropy of error estimation for binary random variables 1994 Ingår i: Proceedings, abstracts and summaries - Joint Conference on Information Sciences : [joint meeting of First Annual Computer Theory & Informatics Workshop on Mobile Computing and Third Annual International Conference on Fuzzy Theory & Technology. - Durham, NC : Duke University Press. ; , s. 291-294 Konferensbidrag (refereegranskat)
44.	Jääskinen, Väinö, et al. (författare) Sparse Markov Chains for Sequence Data. Scandinavian Journal of Statistics 2014 Ingår i: Scandinavian Journal of Statistics. - New York : John Wiley & Sons, Inc.. - 0303-6898 .- 1467-9469. ; 41:3, s. 639-655 Tidskriftsartikel (refereegranskat)abstract Finite memory sources and variable-length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree-based approaches. This can lead to a substantially higher rate of data compression, and such non-hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.
45.	Kanerva, Anna, et al. (författare) Case-control estimation of the impact of oncolytic adenovirus on the survival of patients with refractory solid tumors. 2015 Ingår i: Molecular Therapy. - : Elsevier BV. - 1525-0024 .- 1525-0016. ; 23:2, s. 321-329 Tidskriftsartikel (refereegranskat)abstract Oncolytic immunotherapy with cytokine armed replication competent viruses is an emerging approach in cancer treatment. In a recent randomized trial an increase in response rate was seen but the effect on overall survival is not known with any virus. To facilitate randomized trials, we performed a case-control study assessing the survival of 270 patients treated in an Advanced Therapy Access Program (ATAP), in comparison to matched concurrent controls from the same hospital. The overall survival of all virus treated patients was not increased over controls. However, when analysis was restricted to GMCSF-sensitive tumor types treated with GMSCF-coding viruses, a significant improvement in median survival was present (From 170 to 208 days, P = 0.0012, N=148). An even larger difference was seen when analysis was restricted to good performance score patients (193 versus 292 days, P = 0.034, N=90). The survival of ovarian cancer patients was especially promising as median survival nearly quadrupled (P = 0.0003, N=37). These preliminary data lend support to initiation of randomized clinical trials with GMCSF-coding oncolytic adenoviruses.Molecular Therapy (2014); doi:10.1038/mt.2014.218.
46.	Koski, Pasi, et al. (författare) Sports Club for Health (SCforH) : updated guidelines for health-enhancing sports activities in a club setting 2017 Bok (populärvet., debatt m.m.)
47.	Koski, Pasi, et al. (författare) Sports Club for Health (SCforH) : uppdaterade riktlinjer för hälsofrämjande idrottsaktiviteter i föreningsmiljö 2017 Bok (populärvet., debatt m.m.)abstract SCforH är ett expertbaserat koncept som hjälper såväl föreningar som nationella och regionala idrottsorganisationer att inse de potentiella hälsofördelarna i sina idrottsdiscipliner och att organisera hälsofrämjande idrottsaktiviteter inom ramen för sin verksamhet.
48.	Koski, Timo, et al. (författare) A Bayesian molecular interaction library 2003 Ingår i: Journal of Computer-Aided Molecular Design. - 0920-654X .- 1573-4951. ; 17:7, s. 435-461 Tidskriftsartikel (refereegranskat)abstract We describe a library of molecular fragments designed to model and predict non-bonded interactions between atoms. We apply the Bayesian approach, whereby prior knowledge and uncertainty of the mathematical model are incorporated into the estimated model and its parameters. The molecular interaction data are strengthened by narrowing the atom classification to 14 atom types, focusing on independent molecular contacts that lie within a short cutoff distance, and symmetrizing the interaction data for the molecular fragments. Furthermore, the location of atoms in contact with a molecular fragment are modeled by Gaussian mixture densities whose maximum a posteriori estimates are obtained by applying a version of the expectation-maximization algorithm that incorporates hyperparameters for the components of the Gaussian mixtures. A routine is introduced providing the hyperparameters and the initial values of the parameters of the Gaussian mixture densities. A model selection criterion, based on the concept of a 'minimum message length' is used to automatically select the optimal complexity of a mixture model and the most suitable orientation of a reference frame for a fragment in a coordinate system. The type of atom interacting with a molecular fragment is predicted by values of the posterior probability function and the accuracy of these predictions is evaluated by comparing the predicted atom type with the actual atom type seen in crystal structures. The fact that an atom will simultaneously interact with several molecular fragments forming a cohesive network of interactions is exploited by introducing two strategies that combine the predictions of atom types given by multiple fragments. The accuracy of these combined predictions is compared with those based on an individual fragment. Exhaustive validation analyses and qualitative examples ( e. g., the ligand-binding domain of glutamate receptors) demonstrate that these improvements lead to effective modeling and prediction of molecular interactions.
49.	Koski, Timo (författare) A diffusion approximation for the quantization error in delta modulation with a Gauss-Markov signal 1990 Ingår i: Limit theorems in probability and statistics. - Amsterdam : Elsevier. - 0444987584 ; , s. 305-325 Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract A stochastic differential equation (SDE) for approximation in the sense of weak convergence of the error process in delta modulation for a Gauss-Markov process is established. The pertinent SDE has been previously proposed by E. N. Protonotarios. Here we use some results about the representation of the encoded process in terms of a stochastic integral and the weak convergence methods of H. Kushner.
50.	Koski, Timo, et al. (författare) A dissimilarity matrix between protein atom classes based on Gaussian mixtures 2002 Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 18:9, s. 1257-1263 Tidskriftsartikel (refereegranskat)abstract Motivation: Previously, Rantanen et al. (2001; J. Mol. Biol., 313, 197-214) constructed a protein atom-ligand fragment interaction library embodying experimentally solved, high-resolution three-dimensional (3D) structural data from the Protein Data Bank (PDB). The spatial locations of protein atoms that surround ligand fragments were modeled with Gaussian mixture models, the parameters of which were estimated with the expectation-maximization (EM) algorithm. In the validation analysis of this library, there was strong indication that the protein atom classification, 24 classes, was too large and that a reduction in the classes would lead to improved predictions. Results: Here, a dissimilarity (distance) matrix that is suitable for comparison and fusion of 24 pre-defined protein atom classes has been derived. Jeffreys' distances between Gaussian mixture models are used as a basis to estimate dissimilarities between protein atom classes. The dissimilarity data are analyzed both with a hierarchical clustering method and independently by using multidimensional scaling analysis. The results provide additional insight into the relationships between different protein atom classes, giving us guidance on, for example, how to readjust protein atom classification and, thus, they will help us to improve protein-ligand interaction predictions.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "WFRF:(Koski Timo) "

Avgränsa träffmängd

År