Generative AI model predicts long-term health risks
A new generative AI model, developed by researchers from the European Molecular Biology Laboratory (EMBL), the German Cancer Research Centre (DKFZ), and the University of Copenhagen, can estimate the timing and risk of over 1,000 diseases decades into the future. The model was trained on the anonymized medical histories of 400,000 participants from the UK Biobank and then tested on 1.9 million patients in Denmark. This demonstrates the technical feasibility of forecasting long-term health outcomes across different healthcare systems.
The model, called Delphi-2M, uses a similar approach to large language models (LLMs). It treats medical diagnoses and lifestyle factors as a sequence of „tokens“ and learns the patterns and order in which diseases and comorbidities appear. By capturing not only what conditions occur but also their sequence and the time between them, the AI can generate plausible medical histories.
How the Model Works
The AI uses a generative transformer architecture with special modifications for medical data. These modifications include continuous age encodings to represent the time between medical events. According to Tom Fitzgerald, a staff scientist at EMBL, the model learns the predictable patterns that medical events often follow to forecast future health outcomes. It provides an estimate of potential risks, not a certainty.
The model is most effective at predicting conditions with a consistent progression, such as certain cancers, heart attacks, and septicemia. However, it is less reliable for conditions influenced by unpredictable life events, such as mental health disorders or pregnancy-related complications.
The Role of Probability
The model’s outputs are probabilities, similar to a weather forecast. Its accuracy is higher for shorter time horizons, but the long-term estimates are well-calibrated at a population level. For example, the annual risk of a heart attack for men aged 60-65 in the UK Biobank cohort ranged from 4 in 10,000 to 1 in 100, depending on their medical history and lifestyle. The model’s ability to provide these population-level estimates is a significant step towards more personalized and preventive healthcare.
According to Ewan Birney, Interim Executive Director at EMBL, the model is a proof of concept that shows it’s possible for AI to learn long-term health patterns. By modeling how illnesses develop over time, it can help plan early interventions and move healthcare toward more personalized and preventive approaches.
Limitations and Future Potential
Like all models, Delphi-2M has its limitations. The data from the UK Biobank skews toward an older, whiter, and healthier demographic, with limited data on childhood and adolescent health. This raises questions about its generalizability to a more diverse population. While the Danish validation helps, further testing is needed before the model can be used clinically. For now, researchers believe its main utility lies in studying how diseases develop, simulating outcomes when real data is scarce, and helping healthcare systems plan for the future.
Despite these limitations, this generative AI approach to healthcare is a significant step forward. By treating health histories as structured narratives, these models can offer new ways to anticipate the transition to multimorbidity in aging populations. With further validation in diverse groups and the integration of other data types like molecular or wearable data, such tools could become a cornerstone of a preventive healthcare system that looks forward rather than back.