%0 Journal Article %K Environmental justice %K Machine learning %K Lead in school drinking water %K Lead leaching %K Public data mining %A G.P Lobo %A Johanna Laraway %A Ashok J Gadgil %B Science of the Total Environment %D 2021 %G eng %R https://doi.org/10.1016 %T Identifying schools at high-risk for elevated lead in drinking water using only publicly available data %U https://www.osti.gov/pages/biblio/1821159 %V 803 %8 08/2021 %X
Estimating the risk of lead contamination of schools' drinking water at the State level is a complex, important, and
unexplored challenge. Variable water quality among water systems and changes in water chemistry during dis-
tribution affect lead dissolution rates from pipes and fittings. In addition, the locations of lead-bearing plumbing
materials are uncertain. We tested the capability of six machine learning models to predict the likelihood of lead
contamination of drinking water at the schools' taps using only publicly available datasets. The predictive fea-
tures used in the models correspond to those with a proven correlation to the dominant, but commonly unavail-
able, factors that govern lead leaching: the presence of lead-bearing plumbing materials and water quality
conducive to lead corrosion. By combining water chemistry data from public reports, socioeconomic information
from the US census, and spatial features using Geographic Information Systems, we trained and tested models to
estimate the likelihood of lead contaminated tap water in over 8,000 schools across California and Massachusetts.
Our best-performing model was a Random Forest, with a 10-fold cross validation score of 0.88 for Massachusetts
and 0.78 for California using the average Area Under the Receiver Operating Characteristic Curve (ROC AUC) met-
ric. The model was then used to assign a lead leaching risk category to half of the schools across California (the
other half was used for training). There was good agreement between the modeled risk categories and the actual
lead leaching outcomes for every school; however, the model overestimated the lead leaching risk in up to 17% of
the schools. This model is the first of its kind to offer a tool to predict the risk of lead leaching in schools at the
State level. Further use of this model can help deploy limited resources more effectively to prevent childhood
lead exposure from school drinking water.