1. |
|
|
2. |
- Brändén, Petter, et al.
(författare)
-
Negative dependence in sampling
- 2012
-
Ingår i: Scandinavian Journal of Statistics. - : Wiley. - 0303-6898 .- 1467-9469. ; 39:4, s. 830-838
-
Tidskriftsartikel (refereegranskat)abstract
- The strong Rayleigh property is a new and robust negative dependence property that implies negative association; in fact it implies conditional negative association closed under external fields (CNA+). Suppose that and are two families of 0-1 random variables that satisfy the strong Rayleigh property and let . We show that {Zi} conditioned on is also strongly Rayleigh; this turns out to be an easy consequence of the results on preservation of stability of polynomials of Borcea & Branden (Invent. Math., 177, 2009, 521569). This entails that a number of important pps sampling algorithms, including Sampford sampling and Pareto sampling, are CNA+. As a consequence, statistics based on such samples automatically satisfy a version of the Central Limit Theorem for triangular arrays.
|
|
3. |
- Jonasson, Johan, 1966
(författare)
-
Slow mixing for Latent Dirichlet Allocation
- 2017
-
Ingår i: Statistics & Probability Letters. - : Elsevier BV. - 0167-7152. ; 129, s. 96-100
-
Tidskriftsartikel (refereegranskat)abstract
- Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in probability theory in general and in machine learning in particular. A Markov chain is devised so that its stationary distribution is some probability distribution of interest. Then one samples from the given distribution by running the Markov chain for a "long time" until it appears to be stationary and then collects the sample. However these chains are often very complex and there are no theoretical guarantees that stationarity is actually reached. In this paper we study the Gibbs sampler of the posterior distribution of a very simple case of Latent Dirichlet Allocation, an attractive Bayesian unsupervised learning model for text generation and text classification. It turns out that in some situations, the mixing time of the Gibbs sampler is exponential in the length of documents and so it is practically impossible to properly sample from the posterior when documents are sufficiently long, (C) 2017 Published by Elsevier B.V.
|
|
4. |
- Magnusson, M., et al.
(författare)
-
Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data
- 2020
-
Ingår i: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS). Aug 26-28, 2020, s. 341-350. - 2640-3498.
-
Konferensbidrag (refereegranskat)abstract
- Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performance by combining fast approximate LOO surrogates with exact LOO subsampling using the difference estimator and supply proofs with regards to scaling characteristics. The resulting approach can be orders of magnitude more efficient than previous approaches, as well as being better suited to model comparison.
|
|
5. |
- Allerbo, Oskar, 1985, et al.
(författare)
-
Elastic Gradient Descent, an Iterative Optimization Method Approximating the Solution Paths of the Elastic Net
- 2023
-
Ingår i: Journal of Machine Learning Research. - 1533-7928 .- 1532-4435. ; 24, s. 1-35
-
Tidskriftsartikel (refereegranskat)abstract
- The elastic net combines lasso and ridge regression to fuse the sparsity property of lasso with the grouping property of ridge regression. The connections between ridge regression and gradient descent and between lasso and forward stagewise regression have previously been shown. Similar to how the elastic net generalizes lasso and ridge regression, we introduce elastic gradient descent, a generalization of gradient descent and forward stagewise regression. We theoretically analyze elastic gradient descent and compare it to the elastic net and forward stagewise regression. Parts of the analysis are based on elastic gradient flow, a piecewise analytical construction, obtained for elastic gradient descent with infinitesimal step size. We also compare elastic gradient descent to the elastic net on real and simulated data and show that it provides similar solution paths, but is several orders of magnitude faster. Compared to forward stagewise regression, elastic gradient descent selects a model that, although still sparse, provides considerably lower prediction and estimation errors.
|
|
6. |
- Imberg, Henrik, 1991, et al.
(författare)
-
Optimal subsampling designs
- 2023
-
Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
- Subsampling is commonly used to overcome computational and economical bottlenecks in the analysis of finite populations and massive datasets. Existing methods are often limited in scope and use optimality criteria (e.g., A-optimality) with well-known deficiencies, such as lack of invariance to the measurement-scale of the data and parameterisation of the model. A unified theory of optimal subsampling design is still lacking. We present a theory of optimal design for general data subsampling problems, including finite population inference, parametric density estimation, and regression modelling. Our theory encompasses and generalises most existing methods in the field of optimal subdata selection based on unequal probability sampling and inverse probability weighting. We derive optimality conditions for a general class of optimality criteria, and present corresponding algorithms for finding optimal sampling schemes under Poisson and multinomial sampling designs. We present a novel class of transformation- and parameterisation-invariant linear optimality criteria which enjoy the best of two worlds: the computational tractability of A-optimality and invariance properties similar to D-optimality. The methodology is illustrated on an application in the traffic safety domain. In our experiments, the proposed invariant linear optimality criteria achieve 92-99% D-efficiency with 90-95% lower computational demand. In contrast, the A-optimality criterion has only 46% and 60% D-efficiency on two of the examples.
|
|
7. |
|
|
8. |
|
|
9. |
|
|
10. |
- Imberg, Henrik, 1991, et al.
(författare)
-
Optimal sampling in unbiased active learning
- 2020
-
Ingår i: Proceedings of Machine Learning Research. - 2640-3498. ; 108, s. 559-569
-
Konferensbidrag (refereegranskat)abstract
- A common belief in unbiased active learning is that, in order to capture the most informative instances, the sampling probabilities should be proportional to the uncertainty of the class labels. We argue that this produces suboptimal predictions and present sampling schemes for unbiased pool-based active learning that minimise the actual prediction error, and demonstrate a better predictive performance than competing methods on a number of benchmark datasets. In contrast, both probabilistic and deterministic uncertainty sampling performed worse than simple random sampling on some of the datasets.
|
|