Lead exposure in childhood may be even more dangerous for cognitive advancement and school performance than previously thought, according to a new analysis led by data scientist Joe Feldman.
Lead exposure in children most often comes from deteriorating lead-based paint, contaminated soil or old water pipes-hazards that remain in many U.S. communities.
high levels of lead in a child’s bloodstream have long been known to impair intellectual ability. But like many other real-world datasets, the data establishing the link between lead exposure and cognitive development are messy and incomplete.
“It’s clear that lead is dangerous,” said Feldman, an assistant professor of statistics and data science in Arts & Sciences at Washington University in St. Louis. “but the magnitude of that association has been hard to estimate as many children are never tested for exposure, which means many data points are missing.”
To better understand the risk, Feldman and colleagues-Jerome Reiter, of Duke University, and WashU alum Daniel Kowal (AB ’12), now at Cornell University-analyzed data from 170,000 fourth-grade students from North Carolina, wiht the goal of linking lead exposure to end-of-grade standardized test scores. The findings are published in the journal Bayesian Analysis.
“Although standardized test scores are a flawed metric, they are vital proxies for child development and are strongly correlated to academic milestones in high school and beyond,” Feldman said.
New Statistical Approach Bridges Gaps in Data, Improves Health Insights
Researchers have developed a new statistical approach to address the common problem of missing data, particularly relevant in healthcare where patients frequently enough stop consistently tracking symptoms or data points are incomplete. This method allows for the integration of external information – like results from clinical trials – to create a more comprehensive understanding and improve decision-making.
The challenge arises when individuals intermittently stop measuring or recording their symptoms, creating gaps in the data. However,a wealth of external data exists regarding the effectiveness of various treatments. “We’re trying to develop models that can integrate this external information to better understand the missing data,” explained a researcher involved in the study.
This approach isn’t limited to healthcare; it has the potential to clarify questions across various fields hampered by incomplete datasets.
“Statistical models should not be constrained by the lack of information in a particular dataset,” stated the researcher. “Our work allows users to easily integrate external information to improve decision-making and public health strategies.”
The research,published in Bayesian Analysis (2025),details the use of auxiliary Marginal Quantiles for Gaussian Copula models with Nonignorable Missing Data. DOI: 10.1214/25-BA1551