Observations of Edward in the Wild, Part 1


In Spring 2016, we completed work on a pilot Interactive Text Response (ITR) system called Edward, a chatbot for the Radisson Blu Edwardian hotel chain in London helping with front desk, concierge, maintenance, and housekeeping inquiries.  Edward responds to user texts over SMS, handling over 180 different questions, requests, and humorous interactions. Real customers first interacted with Edward starting this May.

Edward conversed with 491 unique guests in the first two months of slowly rolling him out; during that time they sent him 1,023 different texts.  As a Data Scientist and one of Aspect’s Computational Linguists, I’ve spent a lot of time with these initial data to come up with some interesting facts about how real people interact with a concierge chatbot.

Not Everyone Uses “Text Speak”

One of the hard questions in chatbot design is: Do you design your bot to handle concise, telegraphic expressions, or fully articulated sentences?  In which way are people going to express themselves in this particular flavor of computer-mediated communication?  Our data suggest that the answer is “both;” 28% of the sentences texted to Edward during these two months contained only one word (a large percentage of which were initial sentences saying “hello” and final ones saying “thanks”). But a lot of people were more expressive; 22% of the sentences, in fact, contained 8 or more words.  12% of the customer texts contained more than one sentence.  And only 18 sentences total (1%) contained a word that used non-standardized spelling.  This is possibly connected to the fact that the guests at Radisson Blu Edwardian tend to be Gen X and older, and obviously interaction styles can vary greatly between groups of users.  But it underscores the possibility that a chatbot may need to understand the request “towels plz” from one user as easily as it understands “I notice we have only three bath towels and we could use two more.  Thanks!” from another. The burden of adaptation should be on the system, not the user.

chatbots self-service SMS sentence length

Choosing the Use Case, and Setting Expectations, is Critical

Guests were told to “ask Edward anything,” and they did. Only 58% of the sentences texted to Edward in his first two months on the job were covered by the initial set of 180+ use cases we had designed through consultation with subject matter experts from the hotel.  Meanwhile, more than half of the use cases Edward was designed to answer never came up in those first two months. Obviously predicting user behavior is not an easy feat.  A good initial approach to chatbot design is to look at other channels in which customers reach out with questions and requests and to study those data in order to answer: What small set of use cases represent a large number of historical contacts, to create the most efficient service for the least effort?  In some domains, the majority of contacts may be concentrated in only 3-5 core questions.  In others, the distribution is more spread out.  Looking at real data, however, may give unexpected insights about the most popular things users actually say. The most popular topic for Edward was: Thanks. The most frequently asked question was: Is breakfast included in my reservation?

It is important for good customer experience that you set the expectations of guests so that they do not expect that a chatbot can answer “anything” without human help. No bot can answer a question or respond to a request that its designers did not anticipate. However, if the chatbot has the ability to hand the conversation over to a human, it can avoid saying “I don’t understand,” and instead say, “Hold on a moment; I need to get some help to answer that,” keeping the experience from being a frustrating one.

chatbots edward radissonFalse Positives May Be Worse Than False Negatives

This leads me to another important observation about when a chatbot makes a mistake, and which mistakes are more dreadful than others.  If we think of Edward as having a classifier that gives a “positive” result when he knows how to answer a text and “negative” when the text is out of scope, then a “false positive” results in Edward giving the wrong answer to a text that should have been a “no match;” and a “false negative” is when Edward fails to realize that he does know the correct response, incorrectly giving an “I don’t understand” response.

Which one of these is the greater evil depends on your objective.  At Aspect we champion the idea of customer service chatbots that can pass the conversation over to a human agent when a human is needed or wanted, as mentioned above.  In those situations, the false negative is not a terrible problem; the unknown text can be handed to a human agent, who can answer the question, and the chatbot’s reasoning can later be adjusted to include that way of expressing the question.  Using the resource of the human agent because the chatbot did not recognize the question is far better, in many customer service domains, than giving the answer to the wrong question or performing the wrong action.  In other situations, where a human fallback is not available or too costly to prefer, it might be better for the bot to take a wild swing. Having the ability to structure your bot either way in order to best address the needs of your particular app is key.

We’re still studying these data and other sets of customer service domain questions to better understand how to help businesses create the best chatbots to serve their customer service needs.  I’ll continue this topic with more observations in my next post.