Interview with Artificial Intelligence and Speech Recognition Expert Prof. Frank Rudzicz

Dr. Frank Rudzicz is a scientist at the Toronto Rehabilitation Institute-UHN where he is applying natural language processing and machine learning to various tasks in healthcare, including in detecting dementia from speech. His work in natural language and speech processing is multidisciplinary and involves machine learning, human-computer interaction, artificial intelligence, computer vision, speech-language pathology, rehabilitation engineering, digital signal processing and linguistics.

Dr. Rudzicz’s long-term aim is to produce language software that can improve the quality of life of individuals with cognitive or physical disabilities. This research augments existing techniques by refining the statistical relationships between neural, articulatory and acoustic levels of speech within modern automatic speech recognition systems. These augmented speech systems can be for several applications including: i) automated human-computer dialog systems that include speech synthesis to help individuals complete daily tasks; and ii) prosthetic communication aids for human-human interaction that modify the acoustics of hard-to-understand speech to make it more understandable.

The following has been paraphrased from an interview with Dr. Frank Rudzicz on May 15th, 2018.

(Click above to listen to the full audio version or click here for a downloadable version)


What are your thoughts on the recent Google Duplex demo?

When I saw the demo I was amazed, my first reaction was, ‘why should anyone working in dialogue processing even bother anymore?’ But, thinking about it a bit, there is a lot of work behind the scenes that is not clear from the demo. Particularly, how much of what was shown was learnable, that is how much came strictly from data, and how much was hand-crafted. One of the parts that stuck out to me was when the assistant paused and said ‘Mm-mmm”, to my ear that was a little too perfect. Also, given it was a demo, they picked the conversations carefully and didn’t show the failures or some of the more difficult cases.

Is it possible for us to perfect this kind of AI technology (known as Natural Language Processing (NLP)) and how far do you think we are from that?

I’m pretty optimistic about the potential of machine learning and artificial intelligence in the long-term, but it’s really hard to predict when that will happen. People have been heralding the imminent arrival of AI since the 1950’s, and every time we researchers think we are close, new challenges emerge that bring us down to earth.

There is a lot of new work happening in NLP, and tasks that were once thought impossible, or not even considered, are now being solved efficiently and accurately. But defining what we want out of these dialogue systems is challenging. The people that work in NLP have been going back and forth in the past couple years between using models that learn to converse solely from data, so-called end-to-end systems, to now going back to more scripted interactions with tasks to perform.

It is just a matter of time though until we get systems that can reliably pass the Turing test (machine intelligence indistinguishable from a human) and can act in general purpose tasks. But there are still so many unknown unknowns in solving human dialogue, we have to be careful in saying when it will happen because we don’t even know what our objective is.

When it does happen, what impact do you see it having on society?

Again, it’s really hard to see into the future, any number of calamities could befall our civilization and halt our progress before we get there. But I am pretty optimistic that we will get to a point where machines are indistinguishable from humans in terms of their intelligence, and maybe even surpass us. But to say it will happen by 2040, which a lot of futurists claim, is a bit much.

But in the next few years we will likely iron out the problems in things like Google’s assistant, enabling it to very reliably book appointments or automate customer service. Beyond that I think there is good evidence that within the next 5 years it will be used in healthcare as a sort of ‘e-doctor’ that can ask and answer questions about your symptoms or your mood.

All of those things have science-fiction like consequences to society. We might see more people getting laid off from work, from customer service people all the way up to primary care givers in medicine. It’s too early to tell what effects all of that might have, but normally widespread unemployment doesn’t go over very well.

Are you generally optimistic or pessimistic?

I’m biased because I love AI and I think eventually this technology will exceed our current expectations. I’m also generally optimistic about humans, we are a very adaptable species that has faced tons of challenges in our past that we have overcome. But humans can also be erratic, irrational and emotional, so there is always a concern that people will turn AI into political hot potato and have difficulties incorporating it into our daily lives, in particular if AI is controlled by a small group of technological elites.

Carl Sagan had a wonderful quote on this, “We’ve arranged a global civilization in which most crucial elements profoundly depend on science and technology. We have also arranged things so that almost no one understands science and technology. This is a prescription for disaster.”

Do you believe that language programs the mind?

My experience is that people who are very proud of their native language are more likely to claim that language has an effect on your thoughts. I think language is a response to things that are already built in to the human mind. A lot of which is hard to quantify or even identify, they are based on things like social hierarchies and relationships. We even understand and conceive of the world in terms of objects (nouns) and relationships between objects (adjectives and verbs). These attributes are there across all languages.

How well can you understand the inner-workings of someone’s mind by analyzing their speech?

I think that language and thought are really intimately tied. A lot of cognitive and neuropsychiatric aspects of thought are revealed through how you speak. From a person’s voice alone most people can tell if someone is angry or nervous, but there are a ton of subtle things that are not perceivable by the human ear that are also connected to your thoughts. In our work we measure thousands of aspects of speech and language, and many of them go beyond human hearing. We certainly can’t objectively measure them, but machines can, and those features are often highly correlated with one’s cognitive status and can indicate whether someone has Alzheimer’s, dementia, depression or anxiety. Despite the fact that we feel like a lot of our language is deliberate, much of it is actually subconscious or automatic behavior and are tied to physical parts of the brain that are spatially very close to the memory or executive decision making centers.

How is this convergence of AI technologies allowing us to better diagnose and treat diseases like dementia and Alzheimer’s?

In Alzheimer’s or dementia, there are a bunch of linguistic features that are very highly correlated with memory loss. Difficulties in short term memory limits the complexity of the sentences you can say. These features can be integrated with other data points to give an indication of a person’s mental health. We can also use social media data to assess how people feel about a particular topic or how they are doing emotionally.

Do you see these technologies being used outside of the clinic as well?

We are talking with some large consumer electronics companies that are interested in using this technology to monitor people’s health. These are products that people might be able to buy within the next few years that will periodically record and analyze a person’s voice. It does open up a lot of questions though regarding how comfortable people are having a microphone going off occasionally and collecting data, there are a lot of ethical concerns that go into this. The technology is basically there but society hasn’t really thought about how it affects privacy, though there are efforts being made to look at the ethics of this.


Click here to learn more about the work of Dr. Frank Rudzicz


Leave a Reply