10 Steps to Chatbot Creation – Part III


by Lisa Michaud, NLP Architect, Tobias Goebel, Director of Emerging Technologies, & Bill Gay, Director, Self Service, Agent & Desktop Experience

This is the third post in a three-part series on the essential steps necessary to build a successful and effective customer service chatbot. Part I is available here. Part II is available here.

10 Steps to Chatbot Creation – Part III

7. Pick a platform and a development approach

In essence, most chatbots consider the key tasks to be performed on natural language sentences to be: (1) to determine the intent of the sentence (what is the customer asking, or requesting?  what use case does she wish to initiate?) and (2) to extract data from the sentence (what options has the customer requested?  what data is he providing to you?).  There are essentially two different approaches to these tasks: one based on explicitly creating rules from the top down, and one using machine learning algorithms to learn the task from a large corpus (a collection of written texts) of transcribed interactions.

If you have been offering web chat for quite some time, you might already have collected tens of thousands of customer inquiries with corresponding answers. While a true dialog consists of more than one turn, and many customer service dialogs are not simple pairs of question & answer, you should be able to apply machine learning algorithms to this data set to learn the most common answers to the most common questions. Note that with this approach, you will have to start from scratch for every new language you want the bot to speak. Also, it is still a tremendous and largely manual effort to tag the data and analyze the outcome to ensure quality.

If you don’t have such a corpus already available to you, or what you have is not suitable to train an algorithm, you will have better luck with an approach based on writing distinct rules to extract meaning from messages. The easiest form of doing this is to find certain keywords in the customer’s message and act upon that. However, that bears a risk in that many messages could be misclassified if keywords don’t necessarily appear in the form expected. Consider the difference between “Can I book a flight for tomorrow” and “Can I read an electronic book on my flight tomorrow?” for an airline chatbot, which are asking for two different things yet both contain the keywords “book”, “flight”, and even “tomorrow.”  This might make a simplistic rule believe that both messages are about booking a flight for tomorrow.  The better platforms out there extend the reach of a rules-based approach with built-in linguistic tools that can leverage the relationships between words (synonyms, hypernyms/hyponyms, domains) or common syntactic patterns, in many different languages.  This makes it easier both to capture broad linguistic variation in concise rules and to distinguish between senses of a word like “book.”

An advantage of the linguistic top-down approach is that you have full control over how a message is understood. A neural network created by a machine learning algorithm is often a black box that doesn’t let you go in and surgically change how one particular message is understood. Nuances in natural language  –  such as the fact that “I want to transfer my data” and “How do I move my files?” are about the same intent, but “How do I move this to my file?” is from a different one – are hard to learn with machine learning, but easier to distinguish with a linguistics-based platform. ~Tobias and Lisa

8. Implement the dialogue flow and engineer the NLU

It all comes together in this step: the conversational architecture, the dialogue flow and storyboard, the platform you have selected, and the data you have collected.  Your essential task is to use these to create a classifier that will map an incoming text to the system’s response.

If you selected a platform based on machine learning, you will provide this platform with your example sentences for each possible intent.  The more examples you provide, the better the algorithm will learn the variations of linguistic expressions that can be used for each intent, and the better it will learn how to distinguish between intents.  Note that you will want to reserve some of your example sentences for the next step (testing).

If you are working with a linguistic rules-based platform, you will use the sentences in a different way.  The rules you craft will explicitly represent the characteristics that determine that a given sentence belongs to intent A or intent B, leveraging the tools and abstractions mentioned earlier.

In either situation, this is when it is centrally important to have a diverse set of examples as close to real user utterances as possible.  A single source or a small set of sources will not begin to capture the universe of differences your real users will display when they express themselves to your system. ~Lisa

9. Internal testing and revision of your use case detection

Now you’re ready for the second use of your corpus of example sentences: automated testing.  You also want as many diverse human testers as possible for “real user” testing.  Test and revise your NLU component as well as the bot flow until you reach an acceptable level of accuracy.  Note that this step and the step previous to it are iterative and approximative; because of the nature of human language and the infinite possible expressions of every intent, the goal of 100% accuracy is an unattainable one. Each time you iterate through these steps, however, you get closer.

Furthermore, as mentioned before, make sure you get as close as possible to your real end-users. Any tester that isn’t a real user of your system with an honest need to chat with your bot will produce results that are slightly artificial in nature. You might want to consider a smaller rollout with a fraction of your target customer base to vet some of the design decisions you’ve made along the way. ~Lisa

10. Early deployment and revisions

Even though you’re ready to go live, the work is not done when the bot gets deployed. Even if your bot employs some kind of unsupervised or semi-supervised learning to adjust its own behavior over time, monitoring the first interactions with real users will yield very useful information and may signal that explicit adjustments should be made. Typical adjustments are in the wording of your bot’s responses, as they might yield follow-up clarification questions by your customers that wouldn’t be necessary if the bot’s answer were clearer. You may need to adjust the logic of your intent classification, either through explicit manipulation of the rules or through providing more example sentences.  Finally, you may need to add new use cases if the designed use cases do not cover the majority of user requests. If you truly started small as recommended, then this is the time when you are collecting the vital information about which use cases are the key ones to cover.  ~Lisa

To ensure a successful outcome of your chatbot deployment, view the creation as an iterative process: gather the data, review it, and apply it to your bot’s design.  Repeat.  Above all, log everything for the future.  The success of other projects could be driven by the lessons you learn from this one.