SwePub
Sök i LIBRIS databas

  Extended search

(WFRF:(Rao J. Sunil))
 

Search: (WFRF:(Rao J. Sunil)) > (2020-2024) > An Information Theo...

An Information Theoretic Approach to Prevalence Estimation and Missing Data

Hössjer, Ola, 1964- (author)
Stockholms universitet,Matematiska institutionen
Díaz-Pachón, Daniel Andrés (author)
Zhao, Chen (author)
show more...
Rao, J. Sunil (author)
show less...
 (creator_code:org_t)
2024
2024
English.
In: IEEE Transactions on Information Theory. - 0018-9448 .- 1557-9654. ; 70:5, s. 3567-3582
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Many data sources, including tracking social behavior to election polling to testing studies for understanding disease spread, are subject to sampling bias whose implications are not fully yet understood. In this paper we study estimation of a given feature (such as disease, or behavior at social media platforms) from biased samples, treating non-respondent individuals as missing data. Prevalence of the feature among sampled individuals has an upward bias under the assumption of individuals’ willingness to be sampled. This can be viewed as a regression model with symptoms as covariates and the feature as outcome. It is assumed that the outcome is unknown at the time of sampling, and therefore the missingness mechanism only depends on the covariates. We show that data, in spite of this, is missing at random only when the sizes of symptom classes in the population are known; otherwise data is missing not at random. With an information theoretic viewpoint, we show that sampling bias corresponds to external information due to individuals in the population knowing their covariates, and we quantify this external information by active information. The reduction in prevalence, when sampling bias is adjusted for, similarly translates into active information due to bias correction, with opposite sign to active information due to testing bias. We develop unified results that show that prevalence and active information estimates are asymptotically normal under all missing data mechanisms, when testing errors are absent and present respectively. The asymptotic behavior of the estimators is illustrated through simulations.

Subject headings

NATURVETENSKAP  -- Matematik -- Sannolikhetsteori och statistik (hsv//swe)
NATURAL SCIENCES  -- Mathematics -- Probability Theory and Statistics (hsv//eng)

Keyword

Active information
asymptotic normality
biased estimate
missing data
testing errors

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

Find more in SwePub

By the author/editor
Hössjer, Ola, 19 ...
Díaz-Pachón, Dan ...
Zhao, Chen
Rao, J. Sunil
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Mathematics
and Probability Theo ...
Articles in the publication
IEEE Transaction ...
By the university
Stockholm University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view