Machine Learning to Identify the Attributes Influencing Zoonotic Virus Emergence
Amesh A. Adalja, MD, FACP, FACEP, FIDSA, November 26, 2018
Predicting the trajectories, timing, and likelihood of emerging viral zoonoses is a task that is fraught with anecdotes, assumptions, and extrapolations. Because of the importance of preparing for and responding to these events, there is great value in being able to accurately gauge risk. Having a rigorous, evidence-based approach that not only explains zoonotic risk causally, but also creates a framework that can be applied in a forward-looking, proactive manner to enhance situational awareness is something that is generally lacking. A new study, published in PLoS One, is one such endeavor with this aim.
Machine Learning Employed
In this study, Walker and colleagues used predictive machine learning to identify viral species that have the potential for human-to-human transmission but are currently unrecognized as potential threats. All 224 viral species known to infect humans were categorized according to 19 variables that included such attributes as viral particle size, organ tropism, cellular location of replication, and possession of a viral envelope. Viruses were then categorized based on whether they were known to be transmissible between humans, and gradient boosted regression was performed. The model generated was able to accurately separate viruses based on human transmissibility.
Several Factors Identified
The authors identified several factors that are potential predictors of zoonotic infections possessing the capacity to transmit between humans. The identified attributes were:
- Non-human primate host
- Viral presence in the liver, central nervous system, and/or respiratory tract
- Absence of a lipid envelope
- Small viral size (<75 nm in diameter)
- Limited genome segmentation (≤2 segments)
All factors identified have biological plausibility with the exception of liver and central nervous system presence, which may reflect, as the authors note, the less frequent rates of sampling of these tissues.
Mapping to Viral Groups
Using these data, Walker and colleagues sought to evaluate the 85 viral species that are capable of infecting humans but not known to spread between humans. Of the group of 85, 47 viral species were found to have a higher degree of human-to-human transmission potential than Crimean-Congo hemorrhagic fever, a virus widely regarded as a threat and a target for vaccine development.
Carnivore amdoparvovirus, Hendravirus, Cardiovirus A, Rosavirus A, HTLV-3, HTLV-4, and simian foamy virus were found to have the highest potential for human-to-human transmission.
Active, Inductive, Attribute-based
This study is an important illustration of how machine learning can be focused on viral attributes to bring rigor to viral prediction. This approach is inductive and moves away from list-based approaches that stultify thinking and always lead to unanticipated viral emergence events—a point my colleagues and I made in our pandemic pathogens report. Indeed, of the 6 viral species identified as high-risk, only 1 (hendravirus) currently garners attention from the health security and pandemic preparedness field.
Hopefully, future refinements of this type of analysis will move from the species level to individual members of viral groups. As the authors note, this limitation was likely responsible for misclassifications when only a small number of members of a viral family transmit between humans while the majority do not (eg, flaviviruses).
Walker JW, Han BA, Ott IM, Drake JM. Transmissibility of emerging viral zoonoses. PLoS One 2018. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206926#sec007. Accessed November 21, 2018.