R. Cao Abad, L. Borrajo López
The problem of nonparametric estimation of the mean in big data under the presence of sampling bias is considered in this work. This problem is studied when the biasing weight function is known (unrealistic) as well as for unknown weight functions (realistic). Two different scenarios are considered to remedy the problem of ignoring the weight function: (i) having a small sized simple random sample of the real population and (ii) having observed a sample from a doubly biased distribution. In both scenarios the problem is related to nonparametric density estimation, so kernel methods are used as auxiliary tools. Asymptotic expressions for the mean squared error of the estimators proposed are derived. This leads to asymptotic formulas for the optimal smoothing parameters involved. Some simulations are carried out to illustrate the performance of the nonparametric methods proposed. These methods are also applied to a dataset related to delay times in airlines.
Palabras clave / Keywords: biased data, big data, kernel estimator, sampling bias, smoothing parameter
Programado
Sesión J03 Estadística No Paramétrica
31 de mayo de 2018 10:20
Sala 2