Twitter   LinkedInFacebookRSS  
Next-Gen Phenotyping Through AI Beats Clinician Diagnoses


By Allison Proffitt

January 7, 2019 | Diagnostics holds many opportunities for artificial intelligence, and one company has been training its machine learning algorithm to scan photographs to propose diagnoses for genetic syndromes. In a paper published today in Nature Medicine, FDNA, an artificial intelligence and precision medicine company, reports results from three years of research using deep learning to assess facial phenotypes to diagnose genetic disorders (DOI: 10.1038/s41591-018-0279-0).

Along with colleagues from Tel Aviv University; Rabin Medical Center & Schneider Children’s Medical Center in Israel; Rady Children’s Hospital in San Diego; A. I. du Pont Hospital for Children; Rheinische-Friedrich-Wilhelms University in Germany, University Hospital Magdeburg in Germany, and University of California, San Diego, FDNA researchers evaluated how well the company’s facial image analysis framework—DeepGestalt—compared to experts doing phenotypic evaluation. The company has trained its system on a global library of facial images of more than 150,000 patients through Face2Gene, an app that is free to healthcare professionals.

New Clinical Practice

“The common clinical practice is to describe the patient’s phenotype in discrete clinical terms, and to use semantic similarity search engines for syndrome suggestions,” the authors explain. “This approach is subjective and depends greatly on the clinician’s phenotyping experience.”

Instead, “Next-generation phenotyping is really what’s needed to fulfil the promise of precision medicine,” Karen Gripp, FDNA’s Chief Medical Officer, told Diagnostics World. Gripp and her co-authors argue that an automated facial analysis framework could supplement the clinical workflow to achieve better syndrome prioritization and diagnosis. They report four experiments supporting the proposal.

In the first pair of experiments, the researchers compared how well DeepGestalt could distinguish one particular syndrome from several others. DeepGestalt assessed images from a published dataset and findings were compared to clinician diagnoses of the same datasets for Cornelia de Lange syndrome (CdLS) and Angelman syndrome. In both cases, DeepGestalt out-performed clinicians. Previously-published clinician diagnoses were 75%-87% accuracy for CdLS and 71% accuracy for Angelman syndrome. DeepGestalt achieved an accuracy of 96.88%, sensitivity of 95.67%, and specificity of 100% for CdLS and an accuracy of 92%, sensitivity of 80% and specificity of 100% for Angelman syndrome.

In a third experiment, the task was to distinguish between molecular subtypes of a heterogeneous syndrome resulting from different mutations affecting the same pathway. For this experiment there were no clinician results to compare to; in previous published studies researchers had presented 81 images of Noonan syndrome patients with mutations in PTPN11, SOS1, RAF1 or KRAS to two dysmorphologists and concluded that facial phenotype alone was insufficient to predict the genotype.

Using a different, much smaller, test set, the Specialized Gestalt Model (a truncated version of DeepGestalt limited to Noonan syndrome) was able to predict associated mutations with an accuracy of 64%.

“DeepGestalt out-performed clinicians in three initial experiments,” the authors write. “It can be optimized for specific phenotypic subsets, as shown on a Specialized Gestalt Model focused on identifying the correct facial phenotype of five genes related to Noonan syndrome, allowing geneticists to investigate phenotype–genotype correlations,” they continue.

In the final experiment, a multiclass Gestalt model trained on a large database of 17,106 images of diagnosed cases spanning 216 distinct syndromes was evaluated on two test sets: a clinical test set of 502 patient images of cases submitted and solved over time by clinical experts, and a publications test set of 329 patient images from the London Medical Databases.

In this experiment, results were delivered like the ones Face2Gene delivers to its physician users: a ranked list of likely syndromes, hopefully narrowing the 216 distinct syndromes to a much more manageable list. Here, DeepGestalt’s performance was evaluated by measuring the top-1, top-5 and -10 accuracy—whether the correct syndrome was DeepGestalt’s top choice, returned in the top five list, or in the list of the top ten likely syndromes.

“Top-10 accuracy evaluation emphasizes the clinical use of DeepGestalt as a reference tool, where all top syndromes are considered,” the authors write.

DeepGestalt achieved a top-10 accuracy of 91% in the clinical test set and 89% accuracy on the publications test set. For about 90% of cases, the correct syndrome was among DeepGestalt’s top ten suggestions. The top-5 and top-1 accuracy for the clinical test set was 85.4% and 61.3%, and for the publications test set 83.2% and 68.7%.

Training Sets

Robust machine learning algorithms depend on fairly large training sets of data. FDNA has gathered those data through publications and public datasets, but also by releasing the Face2Gene app to clinicians.

Last fall, FDNA CEO Dekel Gelbman described Face2Gene for Diagnostics World. The app is actually a platform hosting several cloud applications accessible through mobile apps and the web, he explained. To use the app, a doctor photographs a patient, and the photo is uploaded to FDNA’s cloud-installed technology. DeepGestalt scans the photo and gives the doctor a ranked list of compatible diagnoses drawn from the company’s proprietary database. The app also walks physicians through questions to add additional, well-characterized, phenotypic data, further refining the results. Physicians close the loop by adding a final, molecular diagnosis if possible.

The data input by physicians has been training the deep learning framework in what Gelbman calls a virtuous cycle. “We benefit from the data that’s uploaded, and they benefit from the technology. So it’s a symbiotic relationship. We continue to provide technology to returning users… and they understand that if they give us feedback, the technology actually learns from that feedback and gets better,” he explained.

Face2Gene use has been growing, Gelbman said, and he reported that many users have come to rely on the app as their clinical warehouse or medical record for the patients in the genetic department. But the data presented in today’s paper will hopefully attract users without face-to-face interactions with patients, Gripp said.

“I think this paper will expand the use of the technology, not necessarily so much with clinical geneticists because they already use this tool, but it will make it more understandable to people like sequencing labs, how this additional information can be used in a meaningful way for their work.”

For example, Gripp imagines exome sequencing labs that can use next-generation phenotyping to more quickly narrow their variant searches. “Use of the lab portal in Face2Gene can obtain this information about the patients. That can help them be more cost-effective and more meaningful in the analysis of the exome data,” she said.

Caveats And Next Steps

Along with promising results and new potential users, the authors do highlight a few caveats and areas of concern. The underlying assumptions in each of these experiments is that the patient does have a syndrome. “The results in this report… are not transferable to a test set including unaffected individuals,” the authors write.

They also warn of the risks of misuse of the technology. “Phenotypic data are sensitive patient information and discrimination based thereon is prevented by the Genetic Information Nondiscrimination Act. Unlike genomic data, facial images are easily accessible,” they note. “Effective monitoring strategies mitigating abuse may include the addition of a digital footprint through blockchain technologies to applications using DeepGestalt.”

But for Gripp, the importance of the published work is to spark imagination more than anything.

“Really the most important thing to take away from this is, it is one example of how AI can be applied to the phenotyping of a patient. That is a tool that will become more and more valuable. This is looking at the facial features, but there’s a lot more to the patient. You can image using a similar tool to look at radiographs or at photographs of the retina,” she said. “This is only one example. There are many other capabilities that AI approaches can bring to next-generation phenotyping.”