Advanced Speech Recognition Tuning in Via CX


When talking to IVR applications, the caller experience very much depends on the quality of automated speech recognition (ASR). Nothing is more off-putting than listening repeatedly to prompts like “Sorry, I didn’t get that,” which is why it’s so important to tune the speech recognition performance. And it’s important to note that speech recognition tuning can really be done only after go-live, when you get data on how actual callers interact with the IVR service.

Via CX supports IVR developers with a range of tools to optimize speech recognition performance. First of all, there are reports that give a good indication of the speech recognition success metrics for each input state in your service, such as this report on per-input-state transition statistics that tells you how many callers hung up per input state, how many callers were understood right away vs encountered a non-understanding (NoMatch) event, and more. This helps to identify those input states where callers struggle; it provides a first indication as to why, and it serves as a starting point for deeper inquiry.

Once an under-performing input state has been identified, you can move on to the Utterances by Input States report which gives a detailed list of transcribed utterances picked up by the speech recognition system for the selected input state. This will help identify problems with the voice grammar – utterances that are never matched, utterances that work well, and utterances that need tuning.

At this point, you will need another tool. In order to verify whether the speech recognition engine rejected perfectly fine responses (false negatives), or erroneously accepted non-valid responses as valid answers (false positives), you’ll need to listen to the actual recordings of those utterances and compare with the results of the respective speech recognition events. To make this happen, Via CX 18 allows to orchestrate IVR services to create utterance recordings. You can enable it for the entire service, or just for those input states that actually require tuning. See the documentation for how to do that; note that you will also have to enable utterance recordings on the Service object.

Once a few thousand utterances have been collected from ‘real’ callers in the production system, you can check out the Session Input State Details report which gives a detailed account of individual calls. For each input state, you will see the results of the speech recognition event and find a link to play the recorded utterance. Doing this for a number of calls will provide you with a good understanding of how callers experience the IVR while navigating from input state to input, and which exact utterances lead to problems and require fixing. If you want to focus on a single input state, the Recordings by Input State report provides a great way to listen to a large number utterances in that particular input state, and compare with the speech recognition results.

These tools will help developers to find all the false positives and false negatives, optimize the voice grammars by removing utterances that callers never use and add those that they do and fine-tune speech recognition related tuning properties such as confidence level and speed-vs-accuracy in the Via CX script.

Going beyond these web-based tools, Aspect also provides access to specific speech tuning tools for the speech recognition engines deployed with Via. This opens up the door to even deeper analysis and tuning capabilities and will be covered in a future blog post.