By Deborah Borfitz
April 6, 2020 | A decade-long collaboration between the Center for Microbiome Innovation at the University of California at San Diego (UCSD) and one of China’s primary national research institutions for renewable energy and green materials has led to the development of a Microbiome Search Engine (MSE) that efficiently harnesses big data to detect and diagnose disease. The precision, sensitivity and speed of the novel strategy outperform model-based approaches that lack information on many conditions and rely on biomarkers that can apply to multiple diseases, says XU Jian, director of the Single-Cell Center at the Qingdao Institute of Bioenergy and Bioprocess Technology (QIBEBT) of the Chinese Academy of Sciences.
The international team of researchers share the mission of developing novel computational tools to study microbiomes and are working under the umbrella of the crowdsourced Earth Microbiome Project that “aims to sample microbiomes from every corner of our biosphere,” says Xu. The “general-purpose” MSE can be used not only for disease diagnosis but also predicting ecological disasters, understanding energy-producing microbial processes, assessing the efficacy of cures, and developing personal products and environmental remediation solutions.
“Currently, microbiome-based detection usually targets a specific status [e.g., a certain disease] by using biomarkers and machine learning models for status classification,” explains Xu. “Since the range of detection is restricted within the given status types in the model, it is difficult to broadly assess whether the sample is healthy or not.” The other problem is that single markers for microbiome-related diseases are associated with at least two different diseases, misleading the classification, he adds.
MSE instead searches the entire microbiome based on the specific outlier, so researchers can identify the microbiome state associated with the disease across different cohorts or sequencing platforms, Xu continues. It’s a two-step process that involves a Microbiome Novelty Score (MNS) based on how a person’s microbiome differs from a baseline database of healthy individuals and multiple-disease classification of unhealthy samples based on their nearest compositional matches.
Xu and his research colleagues introduced the MNS concept in 2018 to evaluate the uniqueness of a query sample against a database. In a survey of over 100,000 samples, they found “the MNS of human microbiomes from healthy subjects was becoming stable,” he notes. “Thus, our search-based approach starts by detecting disease samples based on their outlier MNS compared to a comprehensive healthy-sample database, without making any a priori hypothesis about any diseases.”
For the ensuing multiple-disease classification step, compositional matches are made via their “whole-microbiome-level similarity that does not rely on any specific disease markers,” says Xu.
As recently described in mSystems, the search-based diagnosis was tested with 3,113 fecal samples from different studies and cohorts and achieved an overall accuracy of 80%, which is “quite high for cross-study/cohort analysis,” Xu says. “Such detection for model-based approaches is currently not possible since models always need specific targeted diseases.
“We also tested MSE [on the 3,113 samples] with sequences produced by Illumina and Roche 454 platforms, and both of them reached approximately 80% overall accuracy,” he continues. “This is “significantly higher than model-based approaches … [that are] heavily dependent on sequencing platform variation, as suggested by the preference for Illumina samples over 454 samples.”
The accuracy of predictions made by big-data mining tools like MSE depends on the establishment of a global healthy baseline database, Xu says. Samples are needed from across different geographic locations to ensure microbiome diversity.
Previous international efforts, including the Human Microbiome Project of the National Institutes of Health, mostly focused on patients with clear clinical symptoms that correspond to “obviously abnormal” microbiomes, says Xu. “However, to ensure that the general public can benefit from microbiome research breakthroughs, the ‘less obviously abnormal’ microbiomes will need to be distinguished.”
The QIBEBT-UCSD team is in fact developing such tools and databases, Xu says. “One of the hurdles has been the lack of national or global coordination in sampling healthy populations. The other hurdle is the cost associated with microbiome sampling and sequencing—e.g., 16S [rRNA gene] sequencing is of relatively low taxonomical resolution, while WGS [whole genome sequencing] are unable to tackle low-biomass microbiome samples.
“The third hurdle is the limitations associated with sequencing technologies,” continues Xu. DNA sequencing, for example, can identify and quantify microbes but does not provide the functional state of them, which more closely correlates with host disease state. “Finally, we need dedicated funding for ‘microbiome data centers’ that are responsible for the collection, curation, and interpretation of the enormous volume of historical and emerging microbiome data.
“These data are like photos of all the microbial species [that have] ever lived… whose appearance and disappearance can reveal the destiny of mankind and the biosphere,” Xu says. “Just like saving wild animals and plants, the efforts of saving these invisible friends of mankind should start with urgent, globally coordinated efforts.”
Multiple Search Engines
MSE can support numerous efforts in the emerging field of microbiome-based diagnostics—including “techniques that exploit non-human, microorganism-derived molecules in the diagnosis of cancers” and microbiome-based prevention strategy for early childhood caries, the most prevalent disease in children worldwide, he says. The prerequisite are healthy baseline samples as well as diseased samples.
“Specifically, we will develop ecosystem-specific [e.g., human body, building, ocean, soil, air] and disease-specific [e.g., caries, periodontal diseases, diabetes, cancer] MSE to support search-based diagnosis at higher precision and sensitivity,” Xu says. Such searches could become, “as standard and enabling for new microbiome studies as performing a BLAST [Basic Local Alignment Search Tool] against your new DNA sequence is today.”