June 2, 2022 | Innovation Practices Award–Extracting high-quality data from clinical records can be time-consuming and costly, primarily when manual abstractors are tasked with mining complex, unstructured, and lengthy documents. Not to mention, word choice, format, and symptom documentation vary among healthcare settings. This arduous process lead the scientists at Guardant Health and Linguamatics, an IQVIA company, to launch a real-world evidence database called GuardantINFORM. The platform was named one of the 2022 Bio-IT World Innovative Practices award winners.
GuardantINFORM operates as a text mining and natural language processing (NLP) precision oncology platform that extracts, transforms, and normalizes relevant information from clinical documents and genomic data. It integrates information from various sources, including Guardant360–the first FDA-approved liquid biopsy genomic profiling test for patients with advanced cancer. Guardant performs over 100,000 tests per year, and Guardant360 sequences up to 500 genes. “This is being done so that doctors can understand how to treat patients with the right therapy,” explained Irfan Shah, Director of Business Development, Guardant Health, in his presentation last month at the Bio-IT World Conference & Expo. “But as Guardant is going through all this testing, we’re amassing a large database of genomic information. We know that there is a need to use this data and generate insights through real-world evidence.”
As one of the most extensive clinical-genomic databases with over 207,000 patients, GuardantINFORM integrates phenotypic and genomic multi-modal datasets from in-house genetic studies, health claims, survival and mortality documents, and unstructured pathology reports. “We’re trying to develop a leading clinical genomic database to support research across the entire pharma spectrum of drug development: pre-clinical, clinical, post-market, and post-approval studies,” said Shah.
“The data is locked away in the medical records,” added Paul Milligan, Director of Product Strategy, IQVIA. However, most data are difficult to extract in other ways unless NLP is involved. Whether data are structured, unstructured, or require sophisticated processing to unlock, NLP seems to be the key.
Natural language processing is the preferred solution, the team says, over cloud-sourced or black-box models because of its flexibility, finely tuned extraction parameters, and ability to accommodate specific challenges in multi-modal data. For example, it takes critical information extracted from medical records–such as tumor staging, biomarker profile, tumor histology, smoking history, and performance status–and normalizes it into structured data. GuardantINFORM, with Linguamatics as its NLP solution, produces structured data with 98% to 100% precision for certain variables.
Approximately 5,000 breast and lung cancer patient documents were used for initial NLP extraction training. An off-the-shelf tool called Optical Character Recognition digitized portable document format files and converted images into a text-based format.
Guardant used proprietary NLP processing to identify and extract pertinent patient information. This processing includes analyzing words, their relationship to other words, and matching them to specific terminologies and ontologies. The extraction also accounts for varied responses in patient narratives.
“This pipeline has been built up over many years and can do an end-to-end process of taking a source document and extracting and exporting structured data into any format you want,” said Milligan. It even leaves room to insert preferred tools for a customized extraction and integrates them with other downstream machine learning predictors or classifiers.
After the initial training round, the Guardant team made adjustments to identify false positives and negatives for continual improvement of the NLP query.
In conjunction with Kinnate Biopharma, Guardant Health presented a smoking status study at the 2022 American Association for Cancer Research Annual Meeting. Kinnate Biopharma used GuardantINFORM to study patients with BRAF alterations.
“There was a key finding, which essentially describes that patients with BRAF Class II and III alterations have worse lung cancer and melanoma outcomes,” reports Shah. Thanks to GuardantINFORM’s integration of phenotypic and genotypic data, they discovered that patients with the same genetic alterations were also more likely to have a history of tobacco use. Information like this helps drug developers and practitioners track tumor progression and anti-cancer therapy effectiveness throughout a patient’s lifespan.
Remarkably, GuardantINFORM produced such critical findings after just phase I of the study. As a result, Guardant and Linguamatics teams expect to see a more targeted and high-quality approach to drug development as the platform expands.
“Guardant is actively testing patients with its diagnostic capability,” said Shah. “For all those patients with this unstructured medical information, we can ingest that as we develop and grow our platform. As we continue to run this project, we will pull in new information and improve the quality of the information we’re already capturing.”