Sökning: onr:"swepub:oai:DiVA.org:kth-319696" >
The Quest for Robus...
The Quest for Robust Model Selection Methods in Linear Regression
-
- Borpatra Gohain, Prakash (författare)
- KTH,Teknisk informationsvetenskap,Sweden,ISE
-
- Jansson, Magnus, Professor (preses)
- KTH,Teknisk informationsvetenskap
-
- Hari, K.V.S, Professor (opponent)
- Indian Institute of Science, bengaluru
-
(creator_code:org_t)
- ISBN 9789180403689
- Stockholm : KTH Royal Institute of Technology, 2022
- Engelska 145 s.
-
Serie: TRITA-EECS-AVL ; 2022:61
- Relaterad länk:
-
https://kth.diva-por... (primary) (Raw object)
-
visa fler...
-
https://urn.kb.se/re...
-
visa färre...
Abstract
Ämnesord
Stäng
- A fundamental requirement in data analysis is fitting the data to a model that can be used for the purpose of prediction and knowledge discovery. A typical and favored approach is using a linear model that explains the relationship between the response and the independent variables. Linear models are simple, mathematically tractable, and have sound explainable attributes that make them widely ubiquitous in many different fields of applications. Nonetheless, finding the best model (or true model if it exists) is a challenging task that requires meticulous attention. In this PhD thesis, we consider the problem of model selection (MS) in linear regression with a greater focus on the high-dimensional setting when the parameter dimension is quite large compared to the number of available observations. Most of the existing methods of MS struggle in two major areas, viz., consistency and scale-invariance. Consistency refers to the property of the MS method to be able to pick the true model as the sample size grows large or/and when the signal-to-noise-ratio (SNR) increases. Scale-invariance indicates that the performance of the MS method is invariant and stable to any kind of data scaling. These two properties are very crucial for any MS method. In the field of MS employing information criteria, the BayesianInformation Criterion (BIC) is undoubtedly the most popular and widely-used method. However, the new BIC forms including the extended versions designed for the high-SNR scenarios are not invariant to data-scaling and our results indicate that their performance is quite unstable under different scaling scenarios. To eradicate this problem we proposed improved versions of the BIC criterion viz., BICR and EBICR where the subscript ‘R’ stands for robust. BICR is based on the classical setting of order selection, whereas EBICR is the extended version of BICR to handle MS in the high-dimensional setting where it is quite possible that the parameter dimension p also grows with the sample size N. We analyze their performance as N grows large as well as when the noise variance diminishes towards zero, and provide detailed analytical proofs to guarantee their consistency in both cases. Simulation results indicate that the performance of the proposed MS criteria is robust to any data scaling and offers significant improvement in correctly picking the true model. Additionally, we generalize EBICR to handle the problem of MS in block-sparse high-dimensional general linear regression. Block-sparsity is a phenomenon that is seen in many applications. Nevertheless, the existing MS methods based on information criteria are not designed to handle the block structure of the linear model. The proposed generalization handles the block structure effortlessly and can be employed for MS in any type of linear regression framework.
Ämnesord
- TEKNIK OCH TEKNOLOGIER -- Elektroteknik och elektronik -- Signalbehandling (hsv//swe)
- ENGINEERING AND TECHNOLOGY -- Electrical Engineering, Electronic Engineering, Information Engineering -- Signal Processing (hsv//eng)
Nyckelord
- Model selection
- information criterion
- linear regression
- sparsity
- high dimensional
- Electrical Engineering
- Elektro- och systemteknik
- Mathematical Statistics
- Matematisk statistik
Publikations- och innehållstyp
- vet (ämneskategori)
- dok (ämneskategori)
Hitta via bibliotek
Till lärosätets databas