Imagine if one day in the future, doctors could diagnose throat cancer, Alzheimer’s, depression or other diseases based on the sound of a patient’s voice. To help make that a reality, Washington University School of Medicine in St. Louis is joining the National Institutes of Health (NIH) Bridge2AI program, an estimated $130 million initiative intended to expand the use of artificial intelligence (AI) in biomedical and behavioral research.
One of the first projects involves building a database of diverse human voices and harnessing the tools of AI and machine learning to train computers to identify diseases based on characteristics of the human voice. This effort — called Voice as a Biomarker of Health — will bring together researchers from 12 institutions in North America, including Washington University, to build the database, which will be ethically sourced and also protect patient privacy.
“There is evidence that well-designed computer models can predict who has dementia or cancer, for example, based on voice recordings, which would then supplement additional methods of diagnosis,” said Philip R. O. Payne, PhD, the Janet and Bernard Becker Professor, chief data scientist and director of the Institute for Informatics. “We also will be leading new efforts in education and workforce development in the area of AI and its applications in biomedicine. As part of that, this project will help define a whole new way of producing these types of complex data sets and sharing them — in ethical ways that safeguard privacy — with a broad variety of scientists.”
Payne is leading the project at Washington University and collaborating with investigators across North America, including at the University of South Florida, in Tampa, and Weill Cornell Medicine in New York, who are leading the project nationally.
In addition to building this unique data set, Washington University will co-lead a skill and workforce development core for the national project. The core, co-led with Oregon Health & Science University, will focus on training investigators — including scientists from academic institutions, industry, government and even citizen scientists — from all over the country to be able to access and use the voice data for research. According to Payne, any researcher seeking to learn how to use the data set will receive an individualized education plan with much of the learning delivered in a virtual format and then supported with one-on-one mentoring.
“Oftentimes, citizen scientists, or people we would include in that category, are patients who have the diseases themselves or specialists in private practices who help patients with specific conditions, such as people who stutter and the speech pathologists who work with them,” Payne said. “We are developing outreach efforts to engage with people in the community to participate in this research and also to help us gather a rich and diverse data set of human voices. This is vital to building an ethical and representative data set that eliminates potential bias.”
Based on the existing literature and ongoing research, the research team has identified five disease categories for which voice changes have been associated with illness and there is a pressing need to improve early diagnosis. Data collected for this project will center on the following disease categories:
- Voice disorders (laryngeal cancers, vocal fold paralysis, benign laryngeal lesions).
- Neurological and neurodegenerative disorders (Alzheimer’s, Parkinson’s, stroke, amyotrophic lateral sclerosis).
- Mood and psychiatric disorders (depression, schizophrenia, bipolar disorders).
- Respiratory disorders (pneumonia, chronic obstructive pulmonary disease, heart failure).
- Pediatric voice and speech disorders (speech and language delays, autism).
Although preliminary work with voice data has been promising, limitations to integrating voice as a biomarker in clinical practice have been linked to small data sets, ethical concerns around data ownership and privacy, bias and lack of diversity of the data. To solve these, the Voice as a Biomarker of Health project is creating a large, high-quality, multi-institutional and diverse voice database that is linked to identity-protected and unidentifiable biomarkers from other data, such as demographics, medical imaging and genomics. Federated learning technology — a novel AI framework that allows machine learning models to be trained on data without the data ever leaving its source — will be deployed across multiple research centers by the French-American AI biotech startup Owkin to demonstrate that cross-center AI research can be conducted while preserving the privacy and security of sensitive voice data.
“Using voice to help diagnose disease becomes particularly interesting when you think about the proliferation of virtual care and telemedicine during the pandemic,” Payne said. “Doctors have gotten more accustomed to seeing people at a distance even though they can’t examine the patients physically. But what if, during a virtual visit, there is an AI algorithm that can identify high blood pressure, for example, based on the patient’s voice. We’re not there yet, but this could potentially make future telemedicine more useful, with higher quality, better safety and improved health outcomes, especially for people who live far away from health-care providers.”
Supported by AI experts, bioethicists and social scientists, the project aims to transform the fundamental understanding of diseases and introduce a new method of diagnosing and treating diseases into clinical settings. Because recordings of the human voice are low-cost, easy to store and readily available, diagnosing diseases through the voice using AI could be a transformative step in precision medicine and health-care accessibility, Payne added.