Learning to Interact with Humans by Lifelong Interaction with Humans
Current interactions between humans and computers are limited to brittle dialogues, where dialogue systems (aka Chatbots and Conversational Agents) are either based on carefully hand-crafted rules or based on machine learning systems trained on a large number of manually annotated sample dialogues. The development cost is considerable, both in building the representation of the knowledge for the target domain and in the dialogue management proper. One of the most important shortcomings is the variability of human language and a large amount of background knowledge that needs to be shared for effective dialogue. Besides, most of the learned knowledge needs to be learned nearly from scratch for each new dialogue task, including domain knowledge (learned using knowledge induction or existing knowledge bases) and the dialogue management module (adapted to the new domain). Interestingly, humans use dialogue to improve their knowledge in a given domain. That is, people interact with other people in order to confirm or retract our understanding.
The main motivation of Learning to Interact with Humans by Lifelong Interaction with Humans (LIHLITH) is to improve the quality of existing dialogue systems and to lower the cost of deployment in new domains.
The key insight is that our dialogue systems are designed to produce a reward signal by the human (e.g. “OK, that’s great”), allowing the system to know whether the interaction was successful or not. A positive reward would mean that both the domain knowledge acquisition and the dialogue management were completed successfully. A negative reward could be more nuanced (e.g. “That is not what I expected”, “In fact, I was more interested in...”), indicating which part of the system was inaccurate. If necessary, the system will explicitly ask the human for feedback. Note that not all dialogue acts need to award rewards, as the system can safely ignore those. LIHLITH will exploit the feedback when exposed to new dialogue domains, allowing to improve the system.
LIHLITH aims to address two main issues: (i) How to improve the state-of-the-art in lifelong learning using feedback from users? (ii) How to exploit the previous improvements to dialogue systems, leveraging the interaction with humans to improve the overall quality and the ability to cope with new domains? Although the main lessons apply to a wide range of information systems, multimodal systems (e.g. speech), and robotics, we will focus the research in the scope of a question-answering-based dialogue. The main targets were achieved through the following scientific and technological objectives:
Objective O1: Define a reproducible evaluation protocol and associated benchmarks for lifelong learning of dialogue systems.
Objective O2: Improve dialogue systems using lifelong learning through interaction.
Objective O3: Improve knowledge induction through interaction.
Objective O4: Build and evaluate a dialogue system for community question answering. The dialogue will involve questions on specific question-answering domains.
Objective O5: Build a real industrial use case for chatbot technology and develop the corresponding Chatbot system.