Latest News

New AI Tool Uses Gene Sequencing Data To Predict Formation Site Of Cancer

By Deborah Borfitz 

October 31, 2023 | Researchers at Dana-Farber Cancer Institute and Massachusetts Institute of Technology have developed a new tool, based on artificial intelligence, that predicts the likely site of formation for cancer of unknown primary (CUP) tumors. As highlighted in a study that published recently in Nature Medicine (DOI: 10.1038/s41591-023-02482-6), the OncoNPC algorithm “identifies patients who had the right predicted cancer and the right targetable mutation for which there is an approved therapy,” according to Dana-Farber researcher and senior author Alexander Gusev, Ph.D. 

“Only roughly half of these patients actually received a targeted treatment, so these predictions could have shaped their clinical care,” he adds. “Indeed, oncologists are working with the best data and intuition they have, and CUPs are challenging tumors, so it is often the case that the best clinical option is hard to decipher.” 

CUPs account for between 3% and 5% of all cancer cases and outcomes are dismal. Current treatments generally do not cure the cancer, so a clinical trial is often the best treatment choice

Treatments approved by the U.S. Food and Drug Administration are designated for specific cancer types and targetable drivers (e.g., therapies for non-small cell lung cancer that target the epidermal growth factor receptor). “These treatments typically do not work in other cancers,” says Gusev.  

The hope is that OncoNPC can “take some of the guesswork out,” as he puts it. The algorithm considers mutational data derived from the sequence of about 400 genes that are often mutated in cancer to make an interpretable prediction that can be meaningfully acted on. 

“Both tumor sequencing and pathology slides are used routinely, so in principle no changes to the clinical workflow would be needed to provide the additional information from our algorithm,” says Gusev.   

Community clinics could be the big winners here, since they often do not have the resources or the staff to conduct a detailed workup, immunohistochemistry, and other pathological data that would be useful for a complete diagnosis. “In these cases, we think our algorithm can provide early or orthogonal information to prioritize the most appropriate workup to conduct,” Gusev says. “The goal is not to replace pathologists, but to accelerate the path to a diagnosis with some initial hints based on the somatic evidence.” 

Aggregating Mutations

For the published study, Gusev and his colleagues trained the machine learning model using the medical records from nearly 30,000 patients diagnosed with one of 22 known cancer types from three major cancer centers. The first step was to show the model could predict the origin of most tumors (80%) of known origin with another subset of about 7,000 cases. For tumors with high-confidence predictions, which constituted about 65% of the total, the accuracy of OncoNPC rose to roughly 95%. 

In their next step, Gusev reports, the research team used OncoNPC on 971 CUP tumors to predict the tumor’s origin with high confidence for 400 (41.2%) of those cases. The kicker here is that there is no ground truth, so the predictions had to be validated by looking at inherited germline risks of cancer and checking if the predictions lined up with what they were seeing in terms of patients’ pathology results, history, and genetic mutations. 

“The use of the AI algorithm here is in aggregating all of the observed mutations into a single prediction, especially as many of them exhibit complex interactions that are otherwise hard to spot by eye,” he explains. “Our retrospective analysis showed that patients who received matching therapy had longer survival than those who did not, so we believe there is real potential here to improve patient outcomes by helping guide the therapy choice.” 

As estimated by the team, the OncoNPC predictions would enable approximately 2.2 times as many CUP patients (about 15% of patients) to be matched to approved targeted medicines versus given standard chemotherapy. One important caveat is that it remains unknown what would have happened had these patients received the treatment consistent with the algorithm instead, Gusev notes. Ultimately, OncoNPC will need to be evaluated in a clinical trial. 

To that end, the researchers are working with collaborators at Dana-Farber to investigate how OncoNPC could be inserted into the clinic and whether it would improve the diagnostic picture for patients. They hope to work out the logistics of providing information in a “meaningful and interpretable way” and move to the trial phase, Gusev says. Collaborators outside of the institution are also being sought as deployment sites for the computational tool even if the resources to do a full diagnostic or pathological workup are not available. 

Other Data Sources 

Now that investigators see that predictions are possible and seem to be clinically relevant, Gusev says, they are working other available pieces of diagnostic information, including pathology data from slides using hematoxylin and eosin staining, into the algorithm. “In principle these images also hold richer data that can go beyond cancer type and identify the right treatment within the cancer type, based on aspects of the tumor morphology.” 

That is the current direction of their research, he adds. Another possibility is to expand the data feed to include radiology images. However, “analyzing images is a much harder task than analyzing mutational data, so we wanted to take this one step at a time.”