The Next Level of Human-Robot Cooperation

If you asked anyone who ever had a bad experience with a chatbot what made the experience unpleasant, they would probably respond with something along the lines of “it just didn’t understand me”. While there are many deficiencies in common implementations of conversational AI, there is one type of an understanding problem that is both extremely prevalent and relentlessly detrimental to the conversational experience. The problem is that when most robots ask their human counterparts a yes or no question, for example, they only understand a yes or no answer. Natural language processing (NLP) has indeed come to a maturity where you can answer such questions with a “sure” or a “nope”, but anything other than a synonym of yes or no, will just not work. In case you haven't eavesdropped on many conversations in your life, you may think at this point that understanding synonyms is sufficiently clever to make do in most situations, but the reality is that seemingly unrelated, albeit implicit responses are extremely ubiquitous in human-to-human conversations. This is why many chatbots tend to limit the possible responses when asking such close-ended questions. With that being said, when using voice conversational AI in the call center, this is just not an option.

The reason this is such a difficult problem is twofold. First, natural conversations are intuitive for humans and most of whom just don’t have a good mental model for how robots handle conversations. The second reason is related to a popular conversational AI technique called “slot filling” (also known as “semantic role labeling”). The slots in this technique are the pieces of information that are required to provide an answer. Filling the slots involves eagerly trying to extract them from the human utterances, then asking for the missing ones explicitly. The issue is that in its simplest form, a slot represents a specific type of answer - yes/no, phone number, location, etc. When robots try to fill a slot by asking an explicit question, they often only consider the slot “filled” when the expected type of answer is found in the human’s response.

Many consider ELIZA, a computer program published by Joseph Weizenbaum in 1966, to be the very first instance of conversational AI. However, even Weizenbaum did not consider ELIZA intelligent. In fact, it wasn’t intelligent enough to demonstrate our problem, as it never asked specific questions, rather only cleverly rephrased its users’ requests into open-ended inquiries. It wasn’t until the 2000s when new techniques and stronger computing power made meaningful conversations with robots a reality. Thus when Paul Grice, a famed philosopher of language, published his influential paper titled “Logic and Conversation” in 1975, he didn’t even consider its far-reaching implications on conversational AI. In this paper, in which Grice outlines how humans answer questions in conversations, he defines what he calls the “Cooperative Principle”. This principle is phrased in the paper as a prescriptive description of how one must answer a question cooperatively in a conversation, and it consists of four maxims - the maxim of quality, of quantity, of relevance, and of manner. In short, it states that when answering a question, one must provide an answer that is both perspicuous, concise, accurate and relevant. A yes or no question should therefore be treated with a cooperative yes or no answer. That is not to say, however, that the maxims are never breached. To the contrary, Grice goes on to describe how a breach of the cooperative principle generally means that the counterpart in breach simply wishes to provide an implied answer, or an “implicature” as Grice calls them. In other words, an uncooperative answer must rely on shared prior knowledge to imply the requested piece of information. It is an unbelievably insightful observation from someone who has never encountered a dialogue where one of the parties is incapable of understanding implicatures, such as a dialogue between a robot and a human.

If you’ve gotten this far, you probably already understand that it is of utmost importance for robots to comprehend implicatures, should they ever wish to successfully complete an effective open-ended conversation with a human. Unsurprisingly, though, it is no easy task as it requires something more sophisticated than your run-of-the-mill NLP solutions. It requires a whole new approach that adds context-sensitive natural conversation understanding to the existing traditional natural language pipeline. This new layer, which provides a conversational interpretation for the human utterance, is a key component in achieving the next level of conversational AI. As you can hear in the conversation excerpt in the clip below, 5 implicatures in less than a minute are not uncommon, and their seamless understanding produces a remarkably yet inconspicuously human-like interaction, that is intuitive to other humans and represents the top echelon of conversational robots.