Master Tesia
Title:
Building a dialogue system for question-answer forum websites
Author:
Jon Ander Campos
Laburpena:
Dialogo-sistemak gizakiak laguntzeko sistema automatikoak dira, eta beren ezaugarri
nagusia da komunikazioa hizkuntza naturalaren bidez gauzatzeko gai direla. Azken
boladan bultzada handia jaso eta eguneroko tresnetan aurkitu daitezke (Siri, Cortana,
Alexa, etab.). Sistema hauen erabilera handitu ahala, Community Question Answering
(CQA) edo Frequently Asked Questions (FAQ) direlakoak dialogo bitartez atzitzeko
interesa zeharo handitu da, bereziki enpresa munduan. Egungo dialogo sistemen
elkarrizketarako ahalmena, ordea, oso mugatua da, eskuzko erregelen bidez definituta
baitaude. Horrek domeinu berri batean ezartzeko edo behin produkzioan martxan
dagoenean monitorizatu eta egokitzeko kostuak handitzen ditu. Bestalde, nahiz eta
ikaskuntza sakona bezalako teknikek oso emaitza onak lortu dituzten Hizkuntzaren
Prozesamenduko alor desberdinetan, asko sufritzen dute datu eskasiaren arazoa, datu
kopuru izugarriak behar baitituzte ikasketarako. Hemen aurkeztutako proiektuaren
helburu nagusia aipatutako mugak arintzea da, sare neuronaletan oinarritutako sistema
bat inplementatuz eta sistema hauen etorkizuneko garapena bultzatu eta errazteko CQA
datu multzo bat sortuz.
Abstract:
Dialogue-systems are automatic systems developed for helping humans in their daily
routines. The main characteristic of these systems is that they are able to communicate
using natural language. Lately, dialogue agents are becoming increasingly trendy and are
already part of our lives as they are implemented in many tools (Siri, Cortana, Alexa...).
This incursion of voice agents has increased the interest of accessing Community
Question Answering (CQA) and Frequently Asked Questions (FAQ) information by
dialogue means, specially in the industrial world. Nowadays, dialogue systems have their
conversational ability very limited as they are defined by hand-crafted rules. This
hand-crafted nature, makes domain adaptation an extremely costly and time consuming
task. On the other hand, deep learning based techniques, that have achieved
state-of-the-art results in many Natural Language Processing (NLP) tasks, suffer from
lack of data as they need huge amounts of labelled records for training. So, the main aim
of this project, is to develop a neural system together with a CQA dataset for enabling
future research in CQA dialogue systems.
Tutor:
Eneko Agirre and Arantxa Otegi
Urtea:
2019