In my last blog, we talked about the huge gulf between the value of speech analytics and the actual use of speech analytics in contact centers. The penetration rate is only about 15%, but it is growing at a rapid 20% CAGR because historical barriers to purchase have recently been removed: low voice recognition accuracy is now comparable to the human ear, and a large upfront investment is now mitigated by the availability of cloud-based SaaS models.
So, if you are out shopping for speech analytics because you know it can generate a great ROI in your contact center (see our latest eBook: 10 Reasons to Invest in Speech Analytics), you need to understand the two fundamental types of contact center speech analytics: Phonetics and LVCSR (Large Vocabulary Continuous Speech Recognition). When you pull back the covers, the differences are quite significant and can heavily influence the effectiveness of your new speech analytics system. Here’s the 10,000 feet comparative view:
A phonetic speech analytics solution preprocesses the audio into the possible sequences of sounds or “phonemes” and encodes the result in a lattice of possibilities. Then the search terms are also translated into a sequence of phonemes, and the search determines whether this sequence is somewhere in the lattice. There are two advantages of this approach. First, the initial processing time is very fast, since the “vocabulary” is just the set of sounds in the language. However, the searches are much slower since they cannot be efficiently indexed the way words can. The second advantage is that even if the search term is totally new, such as a name that has just newly been introduced into the spoken language (like the drug “cialis”), the term may still be found if that sequence of phonemes exists (“ S IY AH L IH S ” ). The disadvantage is that since there are many possible sequences in the lattice, the term may be found in many places where it was never said (e.g., if the actual words were “see a list”).
Transcription‐based approaches (LVCSR) transcribe the audio into a sequence of words and then use standard text‐based search methods to find the search terms. Since the transcription based approach uses a dictionary of generally 50,000 ‐ 100,000 words and statistical methods to confirm the likelihood of different word sequences (like “the side effect of cialis include” or “cialis pills”), the accuracy is much higher than just the single word lookup of a phonetic approach, so it is more likely that if the word is found, it was spoken. The disadvantage is that the words in the search terms must be available in the dictionary in advance of processing the audio for ingestion by the transcription engine. The initial processing of the audio takes longer than with a phonetic approach because of the large vocabulary, however, searching is instantaneous.
With that as background, the table below summarizes the best uses of Phonetics vs. LVCSR*:
If you would like a quick overview of the differences between phonetics and LVCSR, view our video on Choosing the Right Speech Analytics Technology.You can see in the above table that the uses of Transcription/LVCSR best mirror use cases in a contact center.
When you want more information, contact us at 888-547-2481 for more information on Aspect’s powerful speech analytics solution – Aspect Engagement Analytics.
*Source: CallMiner, Choosing the Right Technology for Your Speech Analytics Project, Marie Meeter