Example：10.1021/acsami.1c06204 or Chem. Rev., 2007, 107, 2411-2502
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion. Psychological Methods (IF10.929), Pub Date : 2022-02-03, DOI: 10.1037/met0000478 Heather J. Gunn, Panteha Hayati Rezvan, M. Isabel Fernández, W. Scott Comulada
Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices.