In his recent article on Venturebeat.com, Ted Livingston, CEO of Kik – one of the leading messaging apps determined to become the next ecosystem for our digital lives – laid out his vision for a world that will have us, “instantly interact with the world around [us].” He envisions this world will consist of bots that will be associated with everything around us: from seats in a stadium to restaurant tables to everything else. His idea is that instead of downloading individual mobile apps for every enterprise we interact with, we will engage on messaging platforms with automated scripts to get things done.
One thing has already turned true: the phenomenon of app fatigue. Livingston: “[There are] apps to order train tickets at stations; apps to order food at restaurants; and apps to order movie tickets at theatres. Everyone wants you to just “download our app!” And yet, after spending millions of dollars developing them, how many people actually use them? My guess: not a lot.” No need to guess: One thing I’m hearing consistently from our own customers is that their mobile customer service app download rates are in the single digit range. Ouch.
Livingston’s vision instantly reminded me of Michael Saylor’s from MicroStrategy. Back in 2000, “Saylor matter-of-factly describes a future in which an intelligent wireless network will tell you which way to turn to avoid traffic, or if your flight has been changed, or if a doctor has prescribed medicine that’s incompatible with another drug you’re taking. It will be, he says, like ‘a guardian angel whispering in your ear.’” It was around that time when angel.com (sic!) was founded.
Angel’s vision? Ubiquitous Interactive Voice Response, where anything and everything would have a telephone number that you could dial to learn more or engage. Needless to mention, that vision did not become a reality. In fact, the idea of a voice in your ear is still depicted as science fiction in recent Hollywood. So why should things now work out differently, 15 years later? Could the idea of an instant connection to the “world around us” via bots be a bit far-fetched?
The fact that messaging is among the easiest-to-use interfaces is where I believe the major difference between Michael Saylor’s vision from 2000 and Ted Livingston’s from 2016 lies: speech recognition over the phone was never an easy or efficient interface. Partly due to technical constraints, but more so due to the fact that it is a “loud channel”: you simply cannot engage with an application silently if voice is the only way to engage. Even if the speaker is in your ear, you still need to speak things out aloud. A no-go in public places, which is where we spend most of our days. The office, public transportation, the restaurant, or the stadium – to relate to Livingston’s story – all are environments where you cannot or do not want to use your voice.
With texting, or messaging, you are still using mankind’s favorite “communication API”: natural language. You are just using the silent version of it, written language. No need to convert speech into text for further analysis.
However, there are still challenges: teaching a computer to understand natural language is far from trivial. Entire academic fields, such as Artificial Intelligence and Computational Linguistics, employ tens of thousands of researchers, engineers, and liberal arts majors today. Major corporations spend billions of dollars developing software such as IBM Watson, which by itself constitutes what IBM calls a new era of “cognitive computing.”
Livingston explains, “I’d unlock my phone, open my chat app, and scan [a code that would be taking me into a messaging experience]. Instantly, I’d be chatting with the stadium bot, and it’d ask me how many beers I wanted: ‘1, 2, 3, or 4.’ It’d ask me what type: ‘Bud, Coors, or Corona.’” What he describes is actually an experience I would NOT like to have. I don’t want to be forced to follow rigid, directed dialogs. I want to freely express my needs in a quick and efficient manner, “Send 2 Bud Light and 1 Corona to seat 323 please.” Multi-slot recognition is how linguists and Conversational UI designers call this dialog design approach. Natural Language Understanding technology can accomplish this today, as long as you have professionals design and implement these techniques.
Kik’s CEO is obviously, and understandably, biased when he describes that the easiest way to engage with messaging bots is by scanning a code – in his own Kik app. I think the easiest way to engage with a bot is by sending a good old text message. To send SMS, you don’t need a data connection (sometime hard to come by in a crowded stadium), you don’t need an app, you don’t even need a smartphone! Any old cell phone from the 90ies lets you send text. The phone number becomes the entry point to engaging with businesses – once again. Telcos, are you paying any attention???
Livingston concludes by saying that “chat is going to be the world’s next great operating system: a Bot OS.” There is nothing wrong with bold visions. But I am urged to comment that the first operating systems were actually “chat-based” – the infamous “command-line” is still the programmer’s and system administrator’s preferred way of engaging with computers today. History does indeed repeat itself I guess…
Latest posts by Tobias Goebel (see all)