Master Tesia

Title: 
Building a dialogue system for question-answer forum websites
Author: 
Jon Ander Campos
Laburpena: 
Dialogo-sistemak gizakiak laguntzeko sistema automatikoak dira, eta beren ezaugarri nagusia da komunikazioa hizkuntza naturalaren bidez gauzatzeko gai direla. Azken boladan bultzada handia jaso eta eguneroko tresnetan aurkitu daitezke (Siri, Cortana, Alexa, etab.). Sistema hauen erabilera handitu ahala, Community Question Answering (CQA) edo Frequently Asked Questions (FAQ) direlakoak dialogo bitartez atzitzeko interesa zeharo handitu da, bereziki enpresa munduan. Egungo dialogo sistemen elkarrizketarako ahalmena, ordea, oso mugatua da, eskuzko erregelen bidez definituta baitaude. Horrek domeinu berri batean ezartzeko edo behin produkzioan martxan dagoenean monitorizatu eta egokitzeko kostuak handitzen ditu. Bestalde, nahiz eta ikaskuntza sakona bezalako teknikek oso emaitza onak lortu dituzten Hizkuntzaren Prozesamenduko alor desberdinetan, asko sufritzen dute datu eskasiaren arazoa, datu kopuru izugarriak behar baitituzte ikasketarako. Hemen aurkeztutako proiektuaren helburu nagusia aipatutako mugak arintzea da, sare neuronaletan oinarritutako sistema bat inplementatuz eta sistema hauen etorkizuneko garapena bultzatu eta errazteko CQA datu multzo bat sortuz.
Abstract: 
Dialogue-systems are automatic systems developed for helping humans in their daily routines. The main characteristic of these systems is that they are able to communicate using natural language. Lately, dialogue agents are becoming increasingly trendy and are already part of our lives as they are implemented in many tools (Siri, Cortana, Alexa...). This incursion of voice agents has increased the interest of accessing Community Question Answering (CQA) and Frequently Asked Questions (FAQ) information by dialogue means, specially in the industrial world. Nowadays, dialogue systems have their conversational ability very limited as they are defined by hand-crafted rules. This hand-crafted nature, makes domain adaptation an extremely costly and time consuming task. On the other hand, deep learning based techniques, that have achieved state-of-the-art results in many Natural Language Processing (NLP) tasks, suffer from lack of data as they need huge amounts of labelled records for training. So, the main aim of this project, is to develop a neural system together with a CQA dataset for enabling future research in CQA dialogue systems.
Tutor: 
Eneko Agirre and Arantxa Otegi
Urtea: 
2019