ML models predict obesity in India but show bias against marginalized groups

Algorithmic bias in obesity prediction models risks perpetuating health inequities for India's 55,000+ marginalized older adults by systematically underidentifying high-risk cases in scheduled tribes, castes, and lowest-income populations.
A model that works perfectly for the average may be failing the most vulnerable.
The study reveals how machine learning models can achieve high overall accuracy while systematically underperforming for marginalized groups.

In a nation where hunger and obesity coexist across the same generations and geographies, researchers have discovered that artificial intelligence trained to identify who is at nutritional risk carries within it the very inequalities it was meant to help address. Analyzing data from more than 55,000 older Indian adults, scientists found that machine learning models performed impressively for the population as a whole, yet systematically failed scheduled tribes, scheduled castes, and the poorest income groups — the communities most in need of intervention. The study does not indict the technology so much as it reveals a deeper truth: algorithms trained on an unequal world learn to serve it unequally, and fairness must be designed in, not assumed.

  • India's older adults face a dual nutritional crisis — wasting and obesity occurring side by side — and AI was recruited to help identify who needs care, but the models arrived carrying hidden prejudices.
  • Tree-based models achieved strong overall accuracy, yet in the poorest fifth of the population they correctly identified only 37 percent of overweight cases, compared to 84 percent among the wealthiest — a gap wide enough to determine who receives help and who is left behind.
  • Scheduled tribes and scheduled castes saw the sharpest drops in model sensitivity, meaning the algorithm was most likely to miss high-risk individuals in precisely the communities that history has already most neglected.
  • Bias mitigation techniques — reweighting training data, adjusting decision thresholds, applying Reject Option Classification — did narrow the disparities, but each gain in equity came at a measurable cost to overall accuracy, forcing an explicit ethical trade-off.
  • The study lands not as a verdict against machine learning but as an urgent call: public health AI deployed without deliberate fairness design becomes a new mechanism for distributing old inequities.

India's older population faces a paradoxical health crisis — undernutrition and obesity coexisting within the same communities, sometimes the same households. Researchers from multiple institutions asked whether machine learning could help identify who was at risk, and whether that identification would reach everyone equally.

Drawing on data from 55,647 adults aged 45 and older across all Indian states, the team tested six machine learning models to predict three outcomes: underweight, overweight or obese, and dangerous abdominal fat accumulation. The top-performing models — LightGBM and gradient boosting — achieved overall accuracy scores between 0.79 and 0.84, strong results by any clinical standard.

But the aggregate numbers concealed a troubling pattern. Among scheduled tribes, scheduled castes, and the lowest income quintile, model sensitivity dropped sharply. The poorest fifth of the population had only 37 percent of their overweight cases correctly identified, while the wealthiest fifth saw 84 percent correctly flagged. The model was rarely wrong when it said someone was not overweight — but it was frequently blind to actual cases among the marginalized.

The researchers attempted to correct this through established fairness techniques: reweighting training data, adjusting decision thresholds, and applying post-processing methods like Reject Option Classification. Some approaches did reduce the gap between rich and poor, between upper castes and historically excluded groups. But each improvement in equity came at a cost — broader flagging, lower precision, a trade-off that could not be engineered away.

The features driving the disparities were themselves revealing. Grip strength was the strongest predictor, followed by gender and urban or rural residence. Yet these variables are not neutral biological facts — they reflect access to nutrition, healthcare, and economic opportunity. A scheduled tribe member with lower grip strength may be expressing the cumulative weight of structural disadvantage, not individual biology. The model was learning to predict obesity, but it was also learning to replicate the inequalities embedded in its training data.

The implications are concrete. A public health program relying on such a model to target interventions will systematically under-serve the communities most in need, directing resources toward those already better positioned to access care. The algorithm, absent deliberate correction, becomes a quiet instrument of inequity.

The study offers no simple resolution, but its message is clear: fairness in public health AI is not a technical afterthought — it requires explicit commitment from the earliest stages of design, honest accounting of trade-offs, and the recognition that a model performing well for the average may be failing the most vulnerable most of all.

India's older population faces a peculiar health crisis: some are wasting away while others gain weight, often within the same household or village. Researchers at multiple institutions set out to ask whether artificial intelligence could help identify who needs help—and whether that help would reach everyone equally.

They assembled data on 55,647 adults aged 45 and older, drawn from the Longitudinal Ageing Study in India, a nationally representative survey spanning all Indian states. The researchers tested six different machine learning models—some based on decision trees, others on neural networks—to predict three outcomes: who was underweight, who was overweight or obese, and who carried dangerous amounts of fat around their midsection. The best models, particularly one called LightGBM and another using gradient boosting, achieved impressive overall accuracy. Their ability to distinguish between cases and non-cases ranged from 0.79 to 0.84 on a standard scale where 1.0 is perfect.

But when the researchers looked closer, they found something troubling. The models worked well for the population as a whole, yet systematically failed certain groups. Adults from scheduled tribes—historically marginalized communities with legal protections under India's constitution—saw the model's sensitivity drop sharply. The same was true for scheduled castes and for people in the lowest income quintile. In the poorest fifth of the population, the model correctly identified only 37 percent of overweight cases, compared to 84 percent in the wealthiest fifth. The model was still confident in its negative predictions—it rarely said someone was overweight when they weren't—but it missed many actual cases among the poor.

The researchers then asked: could they fix this? They tried several established techniques. Some involved reweighting the training data so the model paid equal attention to all groups. Others involved adjusting the decision threshold after training, essentially changing the bar for when the model would flag someone as high-risk. A few approaches, like Reject Option Classification, did narrow the gap between rich and poor, between upper castes and marginalized groups. But this fairness came at a cost. When the model became more sensitive to cases in poor communities, it became less precise overall—it flagged more people as overweight, including some who weren't.

The researchers identified the features driving these disparities. Grip strength emerged as the single most important predictor, followed by gender and whether someone lived in a city or village. Rural residence was associated with underweight; urban residence with obesity. Women were more likely to be flagged as overweight or carrying excess belly fat. But here is the crucial finding: these features themselves reflect deeper inequalities. A scheduled tribe member in a rural area might have lower grip strength not because of individual biology but because of limited access to nutrition, healthcare, and economic opportunity. The model was learning to predict obesity, yes—but it was also learning and amplifying the structural inequalities baked into those features.

This is the core tension the study illuminates. Machine learning models are not neutral. They are trained on data that reflects the world as it is, not as it should be. When that world is deeply unequal, the models become very good at predicting outcomes for privileged groups and worse at predicting them for everyone else. The researchers' mitigation strategies—the technical fixes—could narrow these gaps, but only by accepting lower overall accuracy or by explicitly designing the system to treat different groups differently.

The implications ripple outward. If a public health program uses such a model to identify people at risk of obesity and target them for intervention, it will systematically miss high-risk people in marginalized communities. Those communities will receive fewer resources, fewer screenings, fewer chances at prevention. The algorithm, in other words, becomes a mechanism for distributing health inequity.

The study offers no easy resolution. The researchers show that fairness in machine learning requires deliberate design—not just technical skill, but explicit commitment to equity from the start. It means building fairness audits into the development process. It means accepting trade-offs between overall accuracy and group fairness. It means, perhaps most importantly, recognizing that a model that works perfectly for the average may be failing the most vulnerable. As India's health system grows more reliant on algorithmic decision-making, this lesson becomes urgent.

Fair, trustworthy decision support systems in public health must be designed explicitly with equity objectives, rather than assuming that technical excellence alone will guarantee just outcomes.
— Study authors, PLOS Digital Health
La Conversación del Hearth Otra perspectiva de la historia
Inventor

Why does a model that achieves 84 percent accuracy overall suddenly perform so much worse for scheduled tribes?

Model

Because accuracy is an average. When you pool everyone together, the model's mistakes in one group get hidden by its successes in another. The wealthy, upper-caste population is larger and easier to predict, so the model learns their patterns very well. The marginalized groups are smaller, more diverse, and their patterns get drowned out.

Inventor

But the researchers fed the model information about caste and income. Shouldn't that help it understand these groups better?

Model

It does help predict outcomes—adding socioeconomic variables improved accuracy. But here's the problem: those variables are proxies for inequality itself. When the model learns that scheduled tribe status predicts underweight, it's not learning biology. It's learning poverty, malnutrition, and limited healthcare access. The model becomes very good at encoding disadvantage.

Inventor

So the model is doing exactly what it was trained to do—predict accurately based on the data it sees.

Model

Precisely. And that's the trap. The data reflects reality, but reality is unjust. A model that perfectly mirrors an unequal world will perpetuate that inequality. It will tell a health program: focus resources on the wealthy, they're easier to help. Ignore the poorest—they're harder to predict.

Inventor

The researchers tried to fix this with bias mitigation. Did it work?

Model

Partially. Techniques like Reject Option Classification can narrow the gap between groups. But they do it by making the model less accurate overall, or by treating groups differently—essentially saying, we'll use a lower bar to flag someone as high-risk if they're poor. That's fairer, but it's also a visible trade-off.

Inventor

What should a health system do with these findings?

Model

Recognize that no algorithm is neutral. If you're going to use machine learning to identify people at risk, you have to audit it for fairness first. You have to decide: am I willing to accept lower overall accuracy to serve marginalized groups better? And you have to be honest about what the model is actually measuring—not just obesity risk, but the structural inequalities that shape health.

Contáctanos FAQ