The Future of ITR Part I: Adaptive Dialogue


Over the lifetime of the electronic computer, we have continually struggled with the issue that humans and computers process information and communicate in fundamentally different ways.  When two conversational participants come from vastly different languages, communication occurs when one can learn to speak in the language of the other; and at the beginning of the history of computation, it was the human who learned to communicate in the language of the machine.  Early programmers learned how to express themselves in binary, and the connection between human and computer only occurred because the human side was able to bend itself and learn to work (speak) like a machine. binary-715814_640

With the invention of programming languages other than binary machine code, we slowly but steadily built interfaces where the machine was empowered to receive communication closer and closer to the form in which humans naturally interact. Binary was replaced with languages employing commands that took the form of words. Typed command-line interfaces gave way to graphical interfaces and abstractions of file systems that suggested physical space, with commands now executed with mousing gestures. And now, the maturation of natural language processing technologies means that we can “talk” to our phones, our cars, and our toys.

Cognitive ergonomics is a design philosophy that recognizes that just as an ergonomic keyboard might bend so that the user’s wrists do not have to, a system’s design should bend so that the user’s natural process for accomplishing a task does not have to.  When we design a self-help interactive text response (ITR) app, we need to recognize that user acceptance may hinge on not having to bend to communicate to the computer, but rather to be able to converse as if the dialogue partner were another human.

If the future of human-computer interaction is a dialogue, then, we must first recognize that the “di-“ prefix represents that there are two participants; and the roles of these participants may shift.  In some use cases, the system may maintain a dialogue initiative as it asks questions:

S: What is your name?
U: Diana Prince.
S: What is your account number?
U: 123456.
S: Do you want to make a deposit, or a withdrawal?
U: Withdrawal.

However, in many domains, it is natural for the user to sometimes take over the dialogue initiative:

S: How much would you like to withdraw?
U: What is my current balance?
S: You have $123.45 in that account.
U: Withdraw $75.

This is where natural language understanding (NLU) is the key to an adaptive dialogue that handles natural responses which are not answers. When users are being asked to make a choice, they may often have questions relevant to their selection. They need to have the freedom to answer a question with a question – not derailing the natural flow of the dialogue entirely, but entering into a short digression in which the system recognizes that the dialogue act of the user’s utterance was not an answer, but a query for information; and the system must be able to answer the query.  Then the dialogue initiative can return to the system, and if the user is informed enough to complete her choice, the dialogue can continue naturally. But this is where simple keyword spotting or pattern matching may not be sufficient and it will take a deeper, more robust form of NLU in order to recognize such a shift in dialogue initiative, and to understand how to respond in a natural, “human-like” way. Successful and powerful interactions will come when it is the machine, and not the user, who bends.

Cognitive Ergonomics: