Making IVRs Sound More Natural: “Adapt-to-Me”


Innovating-with-IVRInteractive Voice Response systems attempt something magical: mimic how humans speak with each other. Since we use language every hour of every day, it is part of our identities, our culture, ourselves. When we hear something that doesn’t conform to what we expect language to sound, it immediately catches our attention, distracts us, slows down the conversation. That’s the case when we hear a funny accent, a particularly charming voice, or someone making pauses where we don’t expect them. And it is the case with IVR systems, as technology is still far away from accurately simulating a human being – which shouldn’t even be necessary to get a task done! Furthermore, IVR makers want to refrain from making their callers believe they are talking to a human, as that is a recipe for frustration when the counterpart turns out to be a script, tailored to a particular self-service task and nothing more. However, there are things that can be done to make an IVR sound less robotic, and more in tune with how “I” as the caller speak. In short: you can make the IVR adapt to me.

“Adapt-to-me” is a collection of techniques in Aspect CXP that improves the way your IVR sounds:

Adapt to Preferences: Let me pick the language, let me chose whether I want to interact via speech or DTMF touch-tone, etc.

Aspect CXP offers the Layer concept, which was explained in the previous post of this series: Making IVRs Act More Natural: Personalizing Your IVR through CRM Integration and Dynamic Menus. Using layers, the same application logic can be presented in different ways according to runtime decisions or customer preferences such as language, input mode, the “persona” the caller prefers (e.g. male vs. female voice), and more.

Adapt to Style: Mimic my way of speaking numbers, or using the words/synonyms I use vs. the company’s techno-speak.

There is something unique in how we memorize numbers, like our own telephone number. Some remember them digit by digit, some use a combination of digits for some parts, and number blocks for others, some use number blocks throughout. E.g. saying “four-oh-seven-five-six-seven, double-oh, thirty-seven” vs. “four-oh-seven, five-six-seven, zero-zero-three-seven.

We all might have experienced the phenomenon that we don’t recognize our own phone number if it is uttered in a way different from how we have memorized and would speak it.

Aspect CXP has a feature that lets you build a speech recognition grammar that not only recognizes different ways of saying numbers, but also memorizes how exactly a number was spoken by the caller:

IVR adapt-to-me speak my language

In addition, it can make sure that if the caller uses a certain word in their response (e.g. “Internet”), the IVR is using the same word choice when speaking back, vs. using an internal name for the same such as “DSL”:


IVR adapt-to-me speak my language

If you’re interested in learning how to implement this, have a look at the section “About Pronunciation Values” in the CXP product documentation.

Adapt to Experience: Talk to me differently, based on how experienced I am with your system.

Again using CXP’s Layer concept, the IVR can change the verbosity of the system messages, the amount of contextual help it provides, whether it allows to barge into a message (i.e. interrupt it while it is still playing), how soon it offers me a transfer to a live agent, even the speed of the messages – all based on whether I am calling in for the first time ever, or a frequent user of the IVR system. The history of previous interaction can either be stored in the company’s CRM system, or using Aspect CXP’s built-in Continuity Server and its Context Cookies, which we will cover in a later part of this series.

Finally, the quality of your IVR experience depends to a large extent on the quality of the voice talent you use to record your messages, and the quality of the recordings themselves. Recording messages for an IVR system is an art and a science, and Aspect has been partnering with GM Voices for over a decade to provide high-quality voice recordings with a vast variety of speaker personas and languages.

Prerecorded messages should even be used when speaking back dynamic content such as dates, times, currency amounts, telephone numbers, etc. Using the same voice that you use for all your other messages will increase the naturalness of the overall listening experience. However, speaking back a number such as 451-321-7777 requires more than just recording one audio file for each digit. What you do NOT want after all is for your output to sound robotic, as it would if you used the same recording for the digit 7 in the above. Instead, you would want a different intonation for each occurrence of “7”, with the voice going up after the second “7”, and going down after the fourth. A formatting algorithm would then need to analyze your number and spit out a sequence of audio file names that pick the right recording variation for each occurrence of a digit.

Aspect CXP features a “Formatting Bus” architecture to achieve this, which allows you to build your own “Text-to-Audio” (TTA) algorithms and add them to CXP Server. Some sample algorithms are provided with the Prime Telecom demo application. For coverage of more data types and languages, Aspect Professional Services can provide assistance tailored to your needs.

Other posts in this series:
Innovating with IVR
Making IVRs Act More Natural: Personalizing Your IVR through CRM Integration and Dynamic Menus
UP NEXT: Innovating with IVR: Let’s Get Visual


Tobias Goebel

Tobias is Director of Emerging Technologies at Aspect. He has over 14 years of experience in customer care technology and the contact center industry with roles spanning engineering, consulting, pre-sales engineering, program and product management, and product marketing. As part of Aspect's product management and marketing team today, he works on defining the future of the mobile customer experience, bringing together channels such as mobile apps, messaging, voice, and social. He is a frequent speaker and blogger on topics around customer service and, more recently, the (re-)emerging chatbot, NLP, and AI technologies. Tobias holds degrees in Computational Linguistics, Phonetics, and Computer Science from the universities of Bonn, Germany and Edinburgh, UK.

2 thoughts on “Making IVRs Sound More Natural: “Adapt-to-Me”

  1. For the Pronunciation Values, the grammars would have to be written specifically to handle that scenario, right?

Comments are closed.