%0 Journal Article
%K Lead
%K Machine learning
%K Lead poisoning
%K Environmental exposure
%K Aggregated data
%A G.P Lobo
%A B Kalyan
%A Ashok J Gadgil
%B International Journal of Hygiene and Environmental Health
%D 2021
%G eng
%R https://doi.org/10.1016/j.ijheh.2021.113862
%T Predicting childhood lead exposure at an aggregated level using machine learning
%U https://reader.elsevier.com/reader/sd/pii/S1438463921001772?token=2F789CA8B5FC1ACCD98208F21E80356CF2114405639324ED6186F1981A0C9D002CEA6B12DB272F4B06BF650F37D04E51&originRegion=us-east-1&originCreation=20220202171012
%V 238
%8 10/2021
%X <p>Childhood lead exposure affects over 500,000 children under 6 years old in the US; however, only 14 states<br />recommend regular universal blood screening. Several studies have reported on the use of predictive models to<br />estimate lead exposure of individual children, albeit with limited success: lead exposure can vary greatly among<br />individuals, individual data is not easily accessible, and models trained in one location do not always perform<br />well in another. We report on a novel approach that uses machine learning to accurately predict elevated Blood<br />Lead Levels (BLLs) in large groups of children, using aggregated data. To that end, we used publicly available zip<br />code and city/town BLL data from the states of New York (n = 1642, excluding New York City) and Massa-<br />chusetts (n = 352), respectively. Five machine learning models were used to predict childhood lead exposure by<br />using socioeconomic, housing, and water quality predictive features. The best-performing model was a Random<br />Forest, with a 10-fold cross validation ROC AUC score of 0.91 and 0.85 for the Massachusetts and New York<br />datasets, respectively. The model was then tested with New York City data and the results compared to measured<br />BLLs at a borough level. The model yielded predictions in excellent agreement with measured data: at a city level<br />it predicted elevated BLL rates of 1.72% for the children in New York City, which is close to the measured value<br />of 1.73%. Predictive models, such as the one presented here, have the potential to help identify geographical<br />hotspots with significantly large occurrence of elevated lead blood levels in children so that limited resources<br />may be deployed to those who are most at risk.</p>