Latest News

GPT Model Could Serve as ‘Early Warning System’ of Disease Risk

By Deborah Borfitz

October 15, 2025 | An innovative GPT model trained on vast amounts of anonymized data from healthy volunteers in the UK Biobank has been shown to be impressively good at predicting an individual’s risk of developing over 1,200 diseases over the next decade. Since the model represents disease risks as rates, it can also quantify how quickly new cases occur within a population over time, according to Tomas Fitzgerald, senior scientist at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI).

Delphi-2M, as it is known, was validated using external data on about 2 million individuals in the Danish National Patient Registry without any retraining or fine-tuning, as covered in an article that was published recently in Nature (DOI: 10.1038/s41586-025-09529-3). “Although we had a very marginal loss in prediction across the board, the accuracy is still very high in that Danish cohort,” Fitzgerald says, adding that development of the model is “probably the best piece of research I’ve ever been involved in.”

It was part of a joint effort with colleagues at the EMBL-EBI and the German Cancer Research Centre as well as longtime collaborators at the University of Copenhagen. While many disease risk prediction algorithms exist, Delphi-2M uniquely forecasts the full spectrum of human diseases, as it has been trained using the International Classification of Diseases, Tenth Revision (ICD-10) coding system, he explains.

Thanks to innovative modifications made to the model’s generative transformer architecture, all these diseases can be mapped to continuous time within Delphi-2M. In addition to producing disease risk rates, this allows the model to estimate potential disease burden for up to 20 years, reports Fitzgerald.

The research teams applied explainable AI methods to reveal clusters of comorbidities within and across the same broad ICD-10 “disease chapters” and their time-dependent consequences on future health, Fitzgerald continues. Delphi-2M can be used to explore an individual patient’s healthcare journey, or why the model predicts a risk at a certain age for a certain condition, by looking back at the factors in that patient’s history that are contributing most strongly to that prediction. The same exercise can be done to understand shifting healthcare or risk patterns across a population.

The predictive accuracy of Delphi-2M was “similar to if not better” than currently implemented single disease risk scores, he reports. That high performance is due to its ability to catch complex relationships between competing morbidities and risks related to sex, body mass, smoking, and alcohol consumption, as well as their connections to documented causes of death—all of which serve as tokens in the model.

For conditions with clear and consistent progression patterns, including certain types of cancers and heart attacks, the model performs especially well. It was found to be less reliable for more variable conditions such as mental health disorders and pregnancy-related complications.

Risk predictions are expressed as rates over time, much like weather forecasts, says Fitzgerald. And as with those forecasts, disease predictions are easier to get right over the short term. The model is calibrated to produce accurate population-level risk estimates, forecasting how often certain conditions occur within groups of people, and consistently recapitulating patterns of disease occurrence as recorded in the UK Biobank.

Access to the model’s “attention-based weights,” referring to the level of attention it pays to different tokens in rendering a disease risk prediction, can be obtained via the data access control procedures of the UK Biobank. This was done to safeguard privacy, although the likelihood of individuals being identified through the derived datasets is “difficult to imagine,” Fitzgerald says.

Delphi-2M is trained using extensive information on a cohort of 400,000 participants from the UK Biobank. This included genetic and imaging data as well as information contained in their electronic health records.

Attention Maps

Many implementations of Delphi-2M can be imagined for clinical practice and resource planning optimization, says Fitzgerald. Although it may take another five years for the model to get through the required testing and evaluation to be available in the clinic, “the most obvious application would be a type of early warning system” alerting clinicians about the diseases most likely to be impactful for individual patients in the years ahead.

Healthcare systems might also find Delphi-2M helpful in making decisions about where to allocate resources in the coming year, he adds. For planning purposes, public health agencies might likewise want to use the model to get a better handle on likely disease burden and incidence rates among different sub-populations within a country.

From a research point of view, Fitzerald says, the possibilities include modeling how diseases progress in populations across time and the biology underpinning those conditions. He and his colleagues are already actively collaborating with many other groups interested in exploring the model.

Being an attention-based transformer model, Delphi-2M can create an attention map for either an individual or groups of individuals. “It’s quite interesting to see how the model thinks about attending to different diseases across time,” says Fitzgerald, noting that it tends to pay strong attention to whether someone is male or female throughout the entire time course of any individual.

That makes a lot of sense, since health is quite different for males and females, he adds. When looking at heart conditions, the model tends to “forget” about it after a certain point (presumably after treatment or recovery), whereas with cancer it will forget about the disease far less quickly—meaning, “it has a stronger impact on your likely disease rates going into the future.”

Attention maps have many intricacies that make them difficult to interpret on the individual level, says Fitzgerald. They offer valuable insight into which parts of the input data the model is prioritizing, but don't fully explain the reasoning behind its predictions.

Explainability and Biases

Researchers employed a few simple tactics to explain Delphi-2M predictions, including “uniform manifold approximation and projection” representation of its embedding matrix showing disease codes clustering closely by the underlying chapter—described in the paper as “a property that the model has no direct knowledge of, and that purely reflects co-occurrence patterns in the data.”

ICD-10 codes are organized into 22 chapters that group diseases and conditions by body system or etiology, and, accordingly, Delphi-2M arranges diseases in the same chapter closer together within its internal space, interprets Fitzgerald. Delphi-2M also seems to recognize the impact of severe health conditions (e.g., heart attack) and old age on healthcare trajectories, as evidenced by the proximity of those diseases and risk features to the “death token” (point at which someone dies).

An assessment of biases turned up a few systemic distortions, he adds, including an “immortality bias” making the model resistant to “allowing anyone to die before the age of 40.” This is tied to the fact that the UK Biobank is a healthy volunteer cohort of people who enrolled at a median age of 40 and were alive when they did so.

Another striking finding was that Delphi-2M could somehow “snoop out” from which source—hospital records, self-reports, primary care, hospital admissions, or death registries—disease data was collated “although that information wasn’t presented to the model,” he continues. For individuals who had their diagnosis come from primary care records only, the model puts a higher prediction on conditions seen only, or mostly, in that setting.

While that might present a problem in terms of generalizability, it “actually helps the model to be better at predicting,” says Fitzgerald. It has learned that there is some relationship between co-occurring diseases that tend to be diagnosed in a hospital or, alternatively, in the clinic by a general practitioner.

Model Improvement

Fitzgerald and his team are now actively working on layering additional data into Delphi-2M to improve the accuracy of its predictions, including biomarker data, genotypes, and prescription records. “In the models where we’ve integrated biomarker data, overall, they provide a marginal increase [in performance] across most diseases, in particular metabolic conditions,” he says.

A lot of work remains to be done in terms of representing genetics in the model, adds Fitzgerald, but progress is being made. One interesting discovery is that the addition of a simple polygenetic risk score improves the model’s performance a bit over time. The exciting possibility here is the ability to look at baseline genetic risk across an individual’s or a population’s life to see how and when it impacts their health—something that to date has been impossible to do with genetic risk models.

Also possible, but less interesting, he says, is the integration of a ChatGPT-style free text application enabling Delphi-2M to be applied to unstructured data. Delphi-2M is a far different model than clinical assistants designed simply to summarize patient information and transcribe conversations.