SwePub - search: L773:0167 9473 OR L773:1872 73...

Enumeration	Reference	Cover	Find
1.	Antoch, Jaromír, et al. (author) Recursive robust regression computational aspects and comparison 1995 In: Computational Statistics & Data Analysis. - 0167-9473 .- 1872-7352. ; 19:2, s. 115-128 Journal article (peer-reviewed)abstract The main objective of this paper is to show how algorithms for classical robust regression can be modified for recursive evaluation. It is shown that such a modification can be utilized to increase the algorithmic efficiency for convex object functions. However, for the non-convex ones it is demonstrated that recursion gives little help to find the optimal solution.
2.	Eldén, Lars, 1944- (author) Partial least-squares vs. Lanczos bidiagonalization—I : analysis of a projection method for multiple regression 2004 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 46:1, s. 11-31 Journal article (peer-reviewed)abstract Multiple linear regression is considered and the partial least-squares method (PLS) for computing a projection onto a lower-dimensional subspace is analyzed. The equivalence of PLS to Lanczos bidiagonalization is a basic part of the analysis. Singular value analysis, Krylov subspaces, and shrinkage factors are used to explain why, in many cases, PLS gives a faster reduction of the residual than standard principal components regression. It is also shown why in some cases the dimension of the subspace, given by PLS, is not as small as desired.
3.	Karlsson, Sune, 1960-, et al. (author) Computationally efficient double bootstrap variance estimation 2000 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 33:3, s. 237-247 Journal article (peer-reviewed)abstract The double bootstrap provides a useful tool for bootstrapping approximately pivotal quantities by using an "inner" bootstrap loop to estimate the variance. When the estimators are computationally intensive, the double bootstrap may become infeasible. We propose the use of a new variance estimator for the nonparametric bootstrap which effectively removes the requirement to perform the inner loop of the double bootstrap. Simulation results indicate that the proposed estimator produce bootstrap-t confidence intervals with coverage accuracy which replicates the coverage accuracy for the standard double bootstrap.
4.	Oke, Thimothy, et al. (author) Small-sample properties of some tests for unit root with data-based choice of the degree of augmentation 1999 In: Computational Statistics & Data Analysis. - 0167-9473 .- 1872-7352. ; 30:4, s. 457-469 Journal article (peer-reviewed)abstract In the augmented Dickey-Fuller (ADF) regression one usually decides on the level of the "augmentation" prior to the performing of unit root test. This is a purely data-dependent method that uses either some information criteria or some sequential test of significance on parameter estimates. Contrary to earlier beliefs, our analyses reveal that the presence and/or absence of a drift and a time trend in the data generating process has a remarkable effect on the behaviour of the subsequent tests for unit root.
5.	Orre, R., et al. (author) Bayesian neural networks with confidence estimations applied to data mining 2000 In: Computational Statistics & Data Analysis. - 0167-9473 .- 1872-7352. ; 34:4, s. 473-493 Journal article (peer-reviewed)abstract An international database of case reports, each one describing a possible case of adverse drug reactions (ADRs), is maintained by the Uppsala Monitoring Centre (UMC), for the WHO international program on drug safety monitoring. Each report can be seen as a row in a data matrix and consists of a number of variables, like drugs used, ADRs, and other patient data. The problem is to examine the database and find significant dependencies which might be signals of potentially important ADRs, to be investigated by clinical experts. We propose a method by which estimated frequencies of combinations of variables are compared with the frequencies that would be predicted assuming there were no dependencies. The estimates of significance are obtained with a Bayesian approach via the variance of posterior probability distributions. The posterior is obtained by fusing a prior distribution (Dirichlet of dimension 2(n-1)) with a batch of data, which is also the prior used when the next batch of data arrives. To decide whether the joint probabilities of events are different fi-om what would follow from the independence assumption, the information component log(P-ij/(PiPj)) plays a crucial role, and one main technical contribution reported here is an efficient method to estimate this measure, as well as the variance of its posterior distribution, for large data matrices. The method we present is fundamentally an artificial neural network denoted Bayesian confidence propagation neural network (BCPNN). We also demonstrate an efficient way of finding complex dependencies. The method is now (autumn 1998) being routinely used to produce warning signals on new unexpected ADR associations.
6.	Abramowicz, Konrad, 1983-, et al. (author) Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical data 2022 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 177 Journal article (peer-reviewed)abstract Nonparametric bagging clustering methods are studied and compared to identify latent structures from a sequence of dependent categorical data observed along a one-dimensional (discrete) time domain. The frequency of the observed categories is assumed to be generated by a (slowly varying) latent signal, according to latent state-specific probability distributions. The bagging clustering methods use random tessellations (partitions) of the time domain and clustering of the category frequencies of the observed data in the tessellation cells to recover the latent signal, within a bagging framework. New and existing ways of generating the tessellations and clustering are discussed and combined into different bagging clustering methods. Edge tessellations and adaptive tessellations are the new proposed ways of forming partitions. Composite methods are also introduced, that are using (automated) decision rules based on entropy measures to choose among the proposed bagging clustering methods. The performance of all the methods is compared in a simulation study. From the simulation study it can be concluded that local and global entropy measures are powerful tools in improving the recovery of the latent signal, both via the adaptive tessellation strategies (local entropy) and in designing composite methods (global entropy). The composite methods are robust and overall improve performance, in particular the composite method using adaptive (edge) tessellations.
7.	Ahmad, M. Rauf (author) A significance test of the RV coefficient in high dimensions 2019 In: Computational Statistics & Data Analysis. - : ELSEVIER SCIENCE BV. - 0167-9473 .- 1872-7352. ; 131, s. 116-130 Journal article (peer-reviewed)abstract The RV coefficient is an important measure of linear dependence between two multivariate data vectors. Using unbiased and computationally efficient estimators of its components, a modification to the RV coefficient is proposed, and used to construct a test of significance for the true coefficient. The modified estimator improves the accuracy of the original and, along with the test, can be applied to data with arbitrarily large dimensions, possibly exceeding the sample size, and the underlying distribution need only have finite fourth moment. Exact and asymptotic properties are studied under fairly general conditions. The accuracy of the modified estimator and the test is shown through simulations under a variety of parameter settings. In comparisons against several existing methods, both the proposed estimator and the test exhibit similar performance to the distance correlation. Several real data applications are also provided.
8.	Andersson, Björn, et al. (author) Fast estimation of multiple group generalized linear latent variable models for categorical observed variables 2023 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 182 Journal article (peer-reviewed)abstract A computationally efficient method for marginal maximum likelihood estimation of multiple group generalized linear latent variable models for categorical data is introduced. The approach utilizes second-order Laplace approximations of the integrals in the likelihood function. It is demonstrated how second-order Laplace approximations can be utilized highly efficiently for generalized linear latent variable models by considering symmetries that exist for many types of model structures. In a simulation with binary observed variables and four correlated latent variables in four groups, the method has similar bias and mean squared error compared to adaptive Gauss-Hermite quadrature with five quadrature points while substantially improving computational efficiency. An empirical example from a large-scale educational assessment illustrates the accuracy and computational efficiency of the method when compared against adaptive Gauss-Hermite quadrature with three, five, and 13 quadrature points.
9.	Barlow, Jesse, et al. (author) Editorial Material: 3rd Special issue on matrix computations and statistics 2010 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 54:12, s. 3379-3380 Journal article (other academic/artistic)abstract n/a
10.	Broström, Göran, 1942-, et al. (author) Generalized linear models with clustered data : fixed and random effects models 2011 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 55:12, s. 3123-3134 Journal article (peer-reviewed)abstract The statistical analysis of mixed effects models for binary and count data is investigated. In the statistical computing environment R, there are a few packages that estimate models of this kind. The packagelme4 is a de facto standard for mixed effects models. The packageglmmML allows non-normal distributions in the specification of random intercepts. It also allows for the estimation of a fixed effects model, assuming that all cluster intercepts are distinct fixed parameters; moreover, a bootstrapping technique is implemented to replace asymptotic analysis. The random intercepts model is fitted using a maximum likelihood estimator with adaptive Gauss–Hermite and Laplace quadrature approximations of the likelihood function. The fixed effects model is fitted through a profiling approach, which is necessary when the number of clusters is large. In a simulation study, the two approaches are compared. The fixed effects model has severe bias when the mixed effects variance is positive and the number of clusters is large.
11.	Cronie, Ottmar, 1979, et al. (author) Some edge correction methods for marked spatio-temporal point process models 2011 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 55:7, s. 2209-2220 Journal article (peer-reviewed)abstract Three edge correction methods for (marked) spatio-temporal point processes are proposed. They are all based on the idea of placing an approximated expected behaviour of the process at hand (simulated realisations) outside the study region which interacts with the data during the estimation. These methods are applied to the so-called growth-interaction model. The specific choices of growth function and interaction function made are purely motivated by the forestry applications considered. The parameters of the growth and interaction functions, i.e. the parameters related to the development of the marks, are estimated using the least-squares approach together with the proposed edge corrections. Finally, the edge corrected estimation methods are applied to a data set of Swedish Scots pine.
12.	De Gooijer, Jan G., et al. (author) Kernel-based hidden Markov conditional densities 2022 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 169 Journal article (peer-reviewed)abstract A natural way to obtain conditional density estimates for time series processes is to adopt a kernel-based (nonparametric) conditional density estimation (KCDE) method. To this end, the data generating process is commonly assumed to be Markovian of finite order. Markov processes, however, have limited memory range so that only the most recent observations are informative for estimating future observations, assuming the underlying model is known. Hidden Markov models (HMMs), on the other hand, can integrate information over arbitrary lengths of time and thus describe a wider variety of data generating processes. The KCDE and HMMs are combined into one method. The resulting KCDE-HMM method is described in detail, and an iterative algorithm is presented for estimating its transition probabilities, weights and bandwidths. Consistency and asymptotic normality of the resulting conditional density estimator are proved. The conditional forecast ability of the proposed conditional density method is examined and compared via a rolling forecasting window with three benchmark methods: HMM, autoregressive HMM, and KCDE-MM. Large-sample performance of the above conditional estimation methods as a function of training data size is explored. Finally, the methods are applied to the U.S. Industrial Production series and the S&P 500 index. The results indicate that KCDE-HMM outperforms the benchmark methods for moderate-to-large sample sizes, irrespective of the number of hidden states considered.
13.	Edlund, Ove, et al. (author) Computing the constrained M-estimates for regression 2005 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 49:1, s. 19-32 Journal article (peer-reviewed)abstract Constrained M-estimates for regression have been previously proposed as an alternative class of robust regression estimators with high breakdown point and high asymptotic efficiency. These are closely related to S-estimates, and it is shown that in some cases they will necessarily coincide. It has been difficult to use the CM-estimators in practice for two reasons. Adequate computational methods have been lacking and there has also been some confusion concerning the tuning parameters. Both of these problems are addressed; an updated table for choice of suitable parameter value is given, and an algorithm to compute CM-estimates for regression is presented. It can also be used to compute S-estimates. The computational problem to be solved is global optimization with an inequality constraint. The algorithm consists of two phases. The first phase is finding suitable starting values for the local optimization. The second phase, the efficient finding of a local minimum, is described in more detail. There is a MATLAB code generally available from the net. A Monte Carlo simulation is performed, using this code, to test the performance of the estimator as well as the algorithm.
14.	Elezovic, Suad, 1965- (author) Functional modelling of volatility in the Swedish limit order book market 2008 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. Journal article (peer-reviewed)abstract The publicly available electronic limit order book at the StockholmStock Exchange consists of five levels of prices and quantities of agiven stock with a bid and ask side. All changes in the book duringone day can be recorded with a time quote. Studying variation of the quoted price returns as function of quantity is discussed. In particular, discovering and modelling dynamic behaviours in thevolatility of prices and liquidity measures are considered. Applying a functional approach, estimation of the volatility dynamics of the spreads, created as differences between the ask and bid prices, is presented through a case study. For thatpurpose two-step estimation of functional linear models is used, extending this method to atime series context.
15.	Ghorbani, Mohammad, et al. (author) Testing the first-order separability hypothesis for spatio-temporal point patterns 2021 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 161 Journal article (peer-reviewed)abstract First-order separability of a spatio-temporal point process plays a fundamental role in theanalysis of spatio-temporal point pattern data. While it is often a convenient assumptionthat simplifies the analysis greatly, existing non-separable structures should be accountedfor in the model construction. Three different tests are proposed to investigate thishypothesis as a step of preliminary data analysis. The first two tests are exact orasymptotically exact for Poisson processes. The first test based on permutations and globalenvelopes allows one to detect at which spatial and temporal locations or lags the datadeviate from the null hypothesis. The second test is a simple and computationally cheapχ2-test. The third test is based on stochastic reconstruction method and can be generallyapplied for non-Poisson processes. The performance of the first two tests is studied in asimulation study for Poisson and non-Poisson models. The third test is applied to the realdata of the UK 2001 epidemic foot and mouth disease.
16.	Hartley, Roger, et al. (author) Heterogeneous demand responses to discrete price changes : an application to the purchase of lottery tickets 2006 In: Computational Statistics & Data Analysis. - : Association for Computing Machinery. - 0167-9473 .- 1872-7352. ; 50:3, s. 859-877 Journal article (peer-reviewed)abstract During the survey period of any household expenditure survey price variations may occur. Such variation can be used to identify heterogeneous demand responses to price changes. This is feasible because expenditure surveys usually contain a large number of observations. The principal difficulty for estimation arises because of the sampling process which generates the data. An estimable model of individual purchase is presented and is estimated in the case of the demand for lottery tickets. This model allows identification of heterogeneous responses to changes in the rollover state (i.e. whether last week's jackpot has been added to the current jackpot). The distribution of heterogeneous responses, in the form of a bivariate mixing distribution, is shown to be identified from the available data. The EM algorithm is used to estimate the parameters of the mixing distribution. The estimates imply that there is substantial heterogeneity in the population both in the normal expenditure levels and in the reaction to a jackpot rolled over.
17.	Huang, Xiaolin, et al. (author) Asymmetric nu-tube support vector regression 2014 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 77, s. 371-382 Journal article (peer-reviewed)abstract Finding a tube of small width that covers a certain percentage of the training data samples is a robust way to estimate a location: the values of the data samples falling outside the tube have no direct influence on the estimate. The well-known nu-tube Support Vector Regression (nu-SVR) is an effective method for implementing this idea in the context of covariates. However, the nu-SVR considers only one possible location of this tube: it imposes that the amount of data samples above and below the tube are equal. The method is generalized such that those outliers can be divided asymmetrically over both regions. This extension gives an effective way to deal with skewed noise in regression problems. Numerical experiments illustrate the computational efficacy of this extension to the nu-SVR.
18.	Häggström, Jenny (author) Bandwidth selection for backfitting estimation of semiparametric additive models : a simulation study 2013 In: Computational Statistics & Data Analysis. - Amsterdam : Elsevier. - 0167-9473 .- 1872-7352. ; 62, s. 136-148 Journal article (peer-reviewed)abstract A data-driven bandwidth selection method for backfitting estimation of semiparametric additive models, when the the parametric part is of main interest, is proposed. The proposed method is a double smoothing estimator of the mean-squared error of the backfitting estimator of the parametric terms. The performance of the proposed method is evaluated and compared with existing bandwidth selectors by means of a simulation study.
19.	Höök, Lars Josef, et al. (author) Efficient computation of the quasi likelihood function for discretely observed diffusion processes 2016 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 103, s. 426-437 Journal article (peer-reviewed)abstract An efficient numerical method for nearly simultaneous computation of all conditional moments needed for quasi maximum likelihood estimation of parameters in discretely observed stochastic differential equations is presented. The method is not restricted to any particular dynamics of the stochastic differential equation and is virtually insensitive to the sampling interval. The key contribution is that computational complexity is sublinear in terms of expensive operations in the number of observations as all moments can be computed offline in a single operation. Simulations show that the bias of the method is small compared to the random error in the estimates, and to the bias of comparable methods. Furthermore the computational cost is comparable (actually faster for moderate and large data sets) to the simple, but in some applications badly biased, the Euler–Maruyama approximation.
20.	Jin, Shaobo, 1987-, et al. (author) Standard error estimates in hierarchical generalized linear models 2024 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 189 Journal article (peer-reviewed)abstract Hierarchical generalized linear models are often used to fit random effects models. However, attention is mostly paid to the estimation of fixed unknown parameters and inference for latent random effects. In contrast, standard error estimators receive less attention than they should be. Currently, the standard error estimators are based on various approximations, even when the mean parameters may be estimated from a higher-order approximation of the likelihood and the dispersion parameters are estimated by restricted maximum likelihood. Existing standard error estimation procedures are reviewed. A numerical illustration shows that the current standard errors are not necessarily accurate. Alternative standard errors are also proposed. In particular, a sandwich estimator that accounts for the dependence between the mean parameters and the dispersion parameters greatly improve the current standard errors.
21.	Katsikatsou, Myrsini, 1981-, et al. (author) Pairwise likelihood estimation for factor analysis models with ordinal data 2012 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 56:12, s. 4243-4258 Journal article (peer-reviewed)abstract A pairwise maximum likelihood (PML) estimation method is developed for factor analysis models with ordinal data and fitted both in an exploratory and confirmatory set-up. The performance of the method is studied via simulations and comparisons with full information maximum likelihood (FIML) and three-stage limited information estimation methods, namely the robust unweighted least squares (3S-RULS) and robust diagonally weighted least squares (3S-RDWLS). The advantage of PML over FIML is mainly computational. Unlike PML estimation, the computational complexity of FIML estimation increases either with the number of factors or with the number of observed variables depending on the model formulation. Contrary to 3S-RULS and 3S-RDWLS estimation, PML estimates of all model parameters are obtained simultaneously and the PML method does not require the estimation of a weight matrix for the computation of correct standard errors. The simulation study on the performance of PML estimates and estimated asymptotic standard errors investigates the effect of different model and sample sizes. The bias and mean squared error of PML estimates and their standard errors are found to be small in all experimental conditions and decreasing with increasing sample size. Moreover, the PML estimates and their standard errors are found to be very close to those of FIML.
22.	Larsson, Rolf (author) A likelihood ratio type test for invertibility in moving average processes 2014 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 76:SI, s. 489-501 Journal article (peer-reviewed)abstract A new test for invertibility of moving average processes is proposed. The test is based on an explicit local approximation of the likelihood ratio. A simulation study compares the power with two previously suggested tests: a score type test and a numerical likelihood ratio test. Local to the null of noninvertibility, the proposed test is seen to have better power properties than the score type test and its power is only slightly below that of the numerical likelihood ratio test. Moreover, the test is extended to an ARMA(p, 1) framework, by using it on the estimated residuals of a fitted AR(p) model. A simulation study for ARMA(1, 1) shows that when varying the AR parameter, the test has better size properties than the score type test.
23.	Mahjani, Behrang, et al. (author) Fitting linear mixed models using sparse matrix methods and Lanczos factorization 2016 In: Computational Statistics & Data Analysis. - 0167-9473 .- 1872-7352. Journal article (other academic/artistic)
24.	Nguyen, Hoang, 1989-, et al. (author) Variational inference for high dimensional structured factor copulas 2020 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 151 Journal article (peer-reviewed)abstract Factor copula models have been recently proposed for describing the joint distribution of a large number of variables in terms of a few common latent factors. A Bayesian procedure is employed in order to make fast inferences for multi-factor and structured factor copulas. To deal with the high dimensional structure, a Variational Inference (VI) algorithm is applied to estimate different specifications of factor copula models. Compared to the Markov Chain Monte Carlo (MCMC) approach, the variational approximation is much faster and could handle a sizeable problem in limited time. Another issue of factor copula models is that the bivariate copula functions connecting the variables are unknown in high dimensions. An automatic procedure is derived to recover the hidden dependence structure. By taking advantage of the posterior modes of the latent variables, the bivariate copula functions are selected by minimizing the Bayesian Information Criterion (BIC). Simulation studies in different contexts show that the procedure of bivariate copula selection could be very accurate in comparison to the true generated copula model. The proposed procedure is illustrated with two high dimensional real data sets.
25.	Palm, Bruna, et al. (author) 2-D Rayleigh autoregressive moving average model for SAR image modeling 2022 In: Computational Statistics & Data Analysis. - : Elsevier B.V.. - 0167-9473 .- 1872-7352. ; 171 Journal article (peer-reviewed)abstract Two-dimensional (2-D) autoregressive moving average (ARMA) models are commonly applied to describe real-world image data, usually assuming Gaussian or symmetric noise. However, real-world data often present non-Gaussian signals, with asymmetrical distributions and strictly positive values. In particular, SAR images are known to be well characterized by the Rayleigh distribution. In this context, the ARMA model tailored for 2-D Rayleigh-distributed data is introduced—the 2-D RARMA model. The 2-D RARMA model is derived and conditional likelihood inferences are discussed. The proposed model was submitted to extensive Monte Carlo simulations to evaluate the performance of the conditional maximum likelihood estimators. Moreover, in the context of SAR image processing, two comprehensive numerical experiments were performed comparing anomaly detection and image modeling results of the proposed model with traditional 2-D ARMA models and competing methods in the literature. © 2022 The Authors
26.	Persson, Emma, 1981-, et al. (author) Data-driven algorithms for dimension reduction in causal inference 2017 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 105, s. 280-292 Journal article (peer-reviewed)abstract In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. In this paper, data-driven algo- rithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness we search for minimal subsets of the covariate vector. Based on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elim- ination procedure testing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.
27.	Schelin, Lina, 1980-, et al. (author) Spatial prediction in the presence of left-censoring 2014 In: Computational Statistics & Data Analysis. - : Elsevier. - 0167-9473 .- 1872-7352. ; 74, s. 125-141 Journal article (other academic/artistic)abstract Environmental (spatial) monitoring of different variables often involves left-censored observations falling below the minimum detection limit (MDL) of the instruments used to quantify them. Several methods to predict the variables at new locations given left-censored observations of a stationary spatial process are compared. The methods use versions of kriging predictors, being the best linear unbiased predictors minimizing the mean squared prediction errors. A semi-naive method that determines imputed values at censored locations in an iterative algorithm together with variogram estimation is proposed. It is compared with a computationally intensive method relying on Gaussian assumptions, as well as with two distribution-free methods that impute the MDL or MDL divided by two at the locations with censored values. Their predictive performance is compared in a simulation study for both Gaussian and non-Gaussian processes and discussed in relation to the complexity of the methods from a user’s perspective. The method relying on Gaussian assumptions performs, as expected, best not only for Gaussian processes, but also for other processes with symmetric marginal distributions. Some of the (semi-)naive methods also work well for these cases. For processes with skewed marginal distributions (semi-)naive methods work better. The main differences in predictive performance arise for small true values. For large true values no difference between methods is apparent.
28.	Shang, Shulian, et al. (author) Partially linear single index Cox regression model in nested case-control studies 2013 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 67, s. 199-212 Journal article (peer-reviewed)abstract The nested case-control (NCC) design is widely used in epidemiologic studies as a cost-effective subcohort sampling method to study the association between a disease and its potential risk factors. NCC data are commonly analyzed using Thomas' partial likelihood approach under the Cox proportional hazards model assumption. However, the linear modeling form in the Cox model may be insufficient for practical applications, especially when there are a large number of risk factors under investigation. In this paper, we consider a partially linear single index proportional hazards model, which includes a linear component for covariates of interest to yield easily interpretable results and a nonparametric single index component to adjust for multiple confounders effectively. We propose to approximate the nonparametric single index function by polynomial splines and estimate the parameters of interest using an iterative algorithm based on the partial likelihood. Asymptotic properties of the resulting estimators are established. The proposed methods are evaluated using simulations and applied to an NCC study of ovarian cancer.
29.	Sysoev, Oleg, et al. (author) A segmentation-based algorithm for large-scale partially ordered monotonic regression 2011 In: Computational Statistics & Data Analysis. - : Elsevier Science B.V., Amsterdam.. - 0167-9473 .- 1872-7352. ; 55:8, s. 2463-2476 Journal article (peer-reviewed)abstract Monotonic regression (MR) is an efficient tool for estimating functions that are monotonic with respect to input variables. A fast and highly accurate approximate algorithm called the GPAV was recently developed for efficient solving large-scale multivariate MR problems. When such problems are too large, the GPAV becomes too demanding in terms of computational time and memory. An approach, that extends the application area of the GPAV to encompass much larger MR problems, is presented. It is based on segmentation of a large-scale MR problem into a set of moderate-scale MR problems, each solved by the GPAV. The major contribution is the development of a computationally efficient strategy that produces a monotonic response using the local solutions. A theoretically motivated trend-following technique is introduced to ensure higher accuracy of the solution. The presented results of extensive simulations on very large data sets demonstrate the high efficiency of the new algorithm.
30.	Thulin, Måns, 1986- (author) A high-dimensional two-sample test for the mean using random subspaces 2014 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 74, s. 26-38 Journal article (peer-reviewed)abstract A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most high-dimensional tests for the equality of two mean vectors rely on naive diagonal or trace estimators of the covariance matrix, ignoring dependences between variables. A test using random subspaces is proposed, which offers higher power when the variables are dependent and is invariant under linear transformations of the marginal distributions. The p-values for the test are obtained using permutations. The test does not rely on assumptions about normality or the structure of the covariance matrix. It is shown by simulation that the new test has higher power than competing tests in realistic settings motivated by microarray gene expression data. Computational aspects of high-dimensional permutation tests are also discussed and an efficient R implementation of the proposed test is provided.
31.	Ul Hassan, Mahmood, et al. (author) An exchange algorithm for optimal calibration of items in computerized achievement tests 2021 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 157 Journal article (peer-reviewed)abstract The importance of large scale achievement tests, like national tests in school, eligibility tests for university, or international assessments for evaluation of students, is increasing. Pretesting of questions for the above mentioned tests is done to determine characteristic properties of the questions by adding them to an ordinary achievement test. If computerized tests are used, it has been shown using optimal experimental design methods that it is efficient to assign pretest questions to examinees based on their abilities. The specific distribution of abilities of the available examinees are considered and restricted optimal designs are applied. A new algorithm is developed which builds on an equivalence theorem. It discretizes the design space with the possibility to change the grid adaptively during the run, makes use of an exchange idea and filters computed designs. It is illustrated how the algorithm works through some examples as well as how convergence can be checked. The new algorithm is flexible and can be used even if different models are assumed for different questions.
32.	Von Rosen, Dietrich (author) Comparison of exact parametric tests for high-dimensional data 2009 In: Computational Statistics and Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 53, s. 776-787 Journal article (peer-reviewed)
33.	Werner Hartman, Linda, et al. (author) Utilizing identity-by-descent probabilities for genetic fine-mapping in population based samples, via spatial smoothing of haplotype effects 2009 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 53:5, s. 1802-1817 Journal article (peer-reviewed)abstract Genetic fine mapping can be performed by exploiting the notion that haplotypes that are structurally similar in the neighbourhood of a disease predisposing locus are more likely to harbour the same susceptibility allele. Within the framework of Generalized Linear Mixed Models this can be formalized using spatial smoothing models, i.e. inducing a covariance structure for the haplotype risk parameters, such that risks associated with structurally similar haplotypes are dependent. In a Bayesian procedure a local similarity measure is calculated for each update of the presumed disease locus. Thus, the disease locus is searched as the place where the similarity structure produces risk parameters that can best discriminate between cases and controls. From a population genetic perspective the use of an identity-by-descent based similarity metric is theoretically motivated. This approach is then compared to other more intuitively motivated models and other similarity measures based on identity-by-state, suggested in the literature. (C) 2008 Elsevier B.V. All rights reserved.
34.	Wittek, Peter (author) Two-way incremental seriation in the temporal domain with three-dimensional visualization : Making sense of evolving high-dimensional data sets 2013 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473 .- 1872-7352. ; 66, s. 193-201 Journal article (peer-reviewed)abstract Two-way seriation is a popular technique to analyse groups of similar instances and their features, as well as the connections between the groups themselves. The two-way seriated data may be visualized as a two-dimensional heat map or as a three-dimensional landscape where colour codes or height correspond to the values in the matrix. To achieve a meaningful visualization of high-dimensional data, a compactly supported convolution kernel is introduced, which is similar to filter kernels used in image reconstruction and geostatistics. This filter populates the high-dimensional space with values that interpolate nearby elements, and provides insight into the clustering structure. Ordinary two-way seriation is also extended to deal with updates of both the row and column spaces. Combined with the convolution kernel, a three-dimensional visualization of dynamics is demonstrated on two data sets, a news collection and a set of microarray measurements.
35.	Lindgren, Anna (author) Quantile regression with censored data using generalized L1minimization 1997 In: Computational Statistics & Data Analysis. - 0167-9473. ; 23:4, s. 509-524 Journal article (peer-reviewed)abstract We propose a way to estimate a parametric quantile function when the dependent variable, e.g. the survival time, is censored. We discuss one way to do this, transforming the problem of finding the p-quantile for the true, uncensored, survival times into a problem of finding the q-quantile for the observed, censored, times. The q-value involves the distribution of the censoring times, which is unknown. The estimation of the quantile function is done using the asymmetric L1 technique with weights involving local Kaplan-Meier estimates of the distribution of the censoring limit.
36.	Sejling, Ken, et al. (author) Methods for recursive robust estimation of AR parameters 1994 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473. ; 17:5, s. 509-536 Journal article (peer-reviewed)
37.	Bolin, David, et al. (author) A comparison between Markov approximations and other methods for large spatial data sets 2013 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473. ; 61, s. 7-21 Journal article (peer-reviewed)abstract The Matern covariance function is a popular choice for modeling dependence in spatial environmental data. Standard Matern covariance models are, however, often computationally infeasible for large data sets. Recent results for Markov approximations of Gaussian Matern fields based on Hilbert space approximations are extended using wavelet basis functions. Using a simulation-based study, these Markov approximations are compared with two of the most popular methods for computationally efficient model approximations, covariance tapering and the process convolution method. The methods are compared with respect to their computational properties when used for spatial prediction (kriging), and the results show that, for a given computational cost, the Markov methods have a substantial gain in accuracy compared with the other methods. (C) 2012 Elsevier B.V. All rights reserved.
38.	Bolin, David, et al. (author) Fast estimation of spatially dependent temporal trends using Gaussian Markov Random fields 2009 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473. ; 53:8, s. 2885-2896 Journal article (peer-reviewed)abstract There is a need for efficient methods for estimating trends in spatio-temporal Earth Observation data. A suitable model for such data is a space-varying regression model, where the regression coefficients for the spatial locations are dependent. A second order intrinsic Gaussian Markov Random Field prior is used to specify the spatial covariance structure. Model parameters are estimated using the Expectation Maximisation (EM) algorithm, which allows for feasible computation times for relatively large data sets. Results are illustrated with simulated data sets and real vegetation data from the Sahel area in northern Africa. The results indicate a substantial gain in accuracy compared with methods based on independent ordinary least squares regressions for the individual pixels in the data set. Use of the EM algorithm also gives a substantial performance gain over Markov Chain Monte Carlo-based estimation approaches.
39.	Bolin, David, 1983, et al. (author) Fast estimation of spatially dependent temporal vegetation trends using Gaussian Markov random fields 2009 In: Computational Statistics and Data Analysis. - : Elsevier BV. - 0167-9473. ; 53:8, s. 2885-2896 Journal article (peer-reviewed)abstract There is a need for efficient methods for estimating trends in spatio-temporal Earth Observation data. A suitable model for such data is a space-varying regression model, where the regression coefficients for the spatial locations are dependent. A second order intrinsic Gaussian Markov Random Field prior is used to specify the spatial covariance structure. Model parameters are estimated using the Expectation Maximisation (EM) algorithm, which allows for feasible computation times for relatively large data sets. Results are illustrated with simulated data sets and real vegetation data from the Sahel area in northern Africa. The results indicate a substantial gain in accuracy compared with methods based on independent ordinary least squares regressions for the individual pixels in the data set. Use of the EM algorithm also gives a substantial performance gain over Markov Chain Monte Carlo-based estimation approaches. © 2008 Elsevier B.V. All rights reserved.
40.	Bolin, David, 1983, et al. (author) Latent Gaussian random field mixture models 2019 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473. ; 130, s. 80-93 Journal article (peer-reviewed)abstract For many problems in geostatistics, land cover classification, and brain imaging the classical Gaussian process models are unsuitable due to sudden, discontinuous, changes in the data. To handle data of this type, we introduce a new model class that combines discrete Markov random fields (MRFs) with Gaussian Markov random fields. The model is defined as a mixture of several, possibly multivariate, Gaussian Markov random fields. For each spatial location, the discrete MRF determines which of the Gaussian fields in the mixture that is observed. This allows for the desired discontinuous changes of the latent processes, and also gives a probabilistic representation of where the changes occur spatially. By combining stochastic gradient minimization with sparse matrix techniques we obtain computationally efficient methods for both likelihood-based parameter estimation and spatial interpolation. The model is compared to Gaussian models and standard MRF models using simulated data and in application to upscaling of soil permeability data. (C) 2018 Elsevier B.V. All rights reserved.
41.	Brodin, Erik, 1975 (author) On quantile estimation by bootstrap 2006 In: Computational Statistics and Data Analysis. - : Elsevier BV. - 0167-9473. ; 50:6, s. 1398-1406 Journal article (peer-reviewed)abstract Exact bootstrap is used to optimize the weights of an L-estimator for quantiles with respect to the estimated MSE (mean square error). Performance of the new estimator is measured by comparing MSE with the sample quantile. The new estimator performs better than the sample quantiles in almost every case. However, the gain is only about 5%, in terms of decreased MSE.
42.	Burdakov, Oleg, et al. (author) Hasse diagrams and the generalized PAV-algorithm for monotonic regression in several explanatory variables 2005 In: Computational Statistics and Data Analysis. - 0167-9473. Journal article (peer-reviewed)abstract Monotonic regression is a nonparametric method for estimation ofmodels in which the expected value of a response variable y increases ordecreases in all coordinates of a vector of explanatory variables x = (x1, …, xp).Here, we examine statistical and computational aspects of our recentlyproposed generalization of the pool-adjacent-violators (PAV) algorithm fromone to several explanatory variables. In particular, we show how the goodnessof-fit and accuracy of obtained solutions can be enhanced by presortingobserved data with respect to their level in a Hasse diagram of the partial orderof the observed x-vectors, and we also demonstrate how these calculations canbe carried out to save computer memory and computational time. Monte Carlosimulations illustrate how rapidly the mean square difference between fittedand expected response values tends to zero, and how quickly the mean squareresidual approaches the true variance of the random error, as the number of observations increases up to 104.
43.	Carling, K, et al. (author) An experimental comparison of gradient methods in econometric duration analysis 1998 In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - : ELSEVIER SCIENCE BV. - 0167-9473. ; 27:1, s. 83-97 Journal article (peer-reviewed)abstract Empirical duration analysis calls for numerical methods to search for a maximum of a non-linear likelihood function. The NEWTON method is probably a common choice. Yet, it might also be an unwise choice, as we demonstrate fitting a large empirical data s
44.	Carling, K (author) Testing for independence in a competing risks model 1996 In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - : ELSEVIER SCIENCE BV. - 0167-9473. ; 22:5, s. 527-535 Journal article (peer-reviewed)abstract This paper examines the performance of the conventional likelihood ratio test when testing for independence between failure times in a competing risks framework. The dependence between failure times arises by stochastically related unobserved components.
45.	Frumento, P, et al. (author) An estimating equation for censored and truncated quantile regression 2017 In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - : Elsevier BV. - 0167-9473. ; 113, s. 53-63 Journal article (other academic/artistic)
46.	Kettaneh, Nouna, et al. (author) PCA and PLS with very large data sets 2005 In: Computational Statistics & Data Analysis. - : Elsevier BV. - 0167-9473. ; 48:1, s. 69-85 Journal article (peer-reviewed)abstract Chemometrics was started around 30 years ago to cope with the rapidly increasing volumes of data produced in chemical laboratories. A multivariate approach based on projections—PCA and PLS—was developed that adequately solved many of the problems at hand. However, with the further increase in the size of our data sets seen today in all fields of science and technology, we start to see inadequacies in our multivariate methods, both in their efficiency and interpretability.Starting from a few examples of complicated problems seen in RD&P (research, development, and production), possible extensions and generalizations of the existing multivariate projection methods—PCA and PLS—will be discussed. Criteria such as scalability of methods to increasing size of problems and data, increasing sophistication in the handling of noise and non-linearities, interpretability of results, and relative simplicity of use, will be held as important. The discussion will be made from a perspective of the evolution of scientific methodology as (a) driven by new technology, e.g., computers and graphical displays, and the need to answer some always reoccurring and basic questions, and (b) constrained by the limitations of the human brain, i.e., our ability to understand and interpret scientific and data analytic results.
47.	Lee, S, et al. (author) A random-effect model approach for group variable selection 2015 In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - : Elsevier BV. - 0167-9473. ; 89, s. 147-157 Journal article (other academic/artistic)
48.	Lee, S, et al. (author) Sparse pathway-based prediction models for high-throughput molecular data 2018 In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - : Elsevier BV. - 0167-9473. ; 126, s. 125-135 Journal article (other academic/artistic)
49.	Lindström, Erik, et al. (author) Sequential Calibration of Options 2008 In: Computational Statistics and Data Analysis. - : Elsevier BV. - 0167-9473. ; 52, s. 2877-2891 Journal article (peer-reviewed)
50.	Liu, Q, et al. (author) A two-stage hierarchical regression model for meta-analysis of epidemiologic nonlinear dose-response data 2009 In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - : Elsevier BV. - 0167-9473. ; 53:12, s. 4157-4167 Journal article (other academic/artistic)

Skapa referenser, mejla, bekava och länka

Permalink

Träfflista för sökning "L773:0167 9473 OR L773:1872 7352 "

Refine your search

Year