This a list of the aims of the projects in CODEFEST 2016 summer school and the results achieved by each of them. Further information can be found in Codefest_dss2016 website.
Quiz Bowl: Multilingual question-answering for trivia games with Wikipedia
Aims:The question-answering trivia quiz project is in progress. To start the first game prototype, the team is using some of the questions translated into Basque on Monday. This prototype matches the Basque Wikipedia articles with the questions or hints from the quiz, so that the answer to the hint pops out as an article.
Results: We had the chance tre o play a quiz based on Wikipedia trivia: Human vs. Computer. This time humans have been the winners, but by a very small margin only.
The code is available here: github.com/dss2016eu/codefest/tree/master/quizbowl
References to all the code generated in #codefestdss2016 will also be posted there!
Create a morphological analyzer for your minority language
Aims:In order to develop the morphological analyzer for Hungarian language, Ixa group members Iñaki Alegria and Montse Maritxalar have gathered to offer their help in programming tasks. After creating a list of the lexical roots of Hungarian, they have made a selection based on verbs and adjectives, among other criteria. Afterwards, they want to computerize that selection through a specific program in lexc format.
Results: They have explained several projects they’ve been developing through these days, all of them related to machine translation devices: for Hungarian, Buryat (a variation of Mongolian), Rif Berber (language spoken mostly in Morocco), Uyghur (Turkic language spoken in Western China), among others.
NLP for Literature Analysis and Creation
Aims:Members of the group have chosen the name Story buffet for their tools for analysis and creation of literary texts. The team is made up of linguists, programmers and other experts who consider themselves to be “hybrids” of the two.
On the second day, we had a break so that people from Ixa group (the ones in charge of this project) could explain their work to us. Manex Agirrezabal is an expert on metrical analysis in poetry; therefore, along with his knowledge in programming/coding, he thinks this is a great chance to semantically alter short stories. Originally, Itziar Gonzalez-Dios’ field of study was linguistics, but she has joined the world of programming in the last few years; she is interested in the analysis of the complexity and synthesis of texts.
Results: They have showed their webpage (Story buffet) for literature creation and analysis, in a quite humorous way.
Behagune
Aims:The team has continued developing the Behagunea project making use of their different abilities. Victor (programmer) has visualized the results of the Ixa-pipes, and he is working on designing an attractive interface. Also, Dani (IT expert) is trying to translate Ixa-pipes resources into Catalán. Sabrina (linguist), with the help of Iñaki (programmer), is starting an app based on tweets to study what countries think about each other. Finally, due to some problems, Kassandra has decided to put aside one of the projects: the one that aims to include social media in the website DSS2016EU Iritzien Behagunea (Opinion Observatory). Instead, she has chosen to examine the tweets about the DonostiCup football competition.
Results: They have accomplished their goals. Apart from adding new languages (Catalan, Italian) to the Behagunea project, they have managed to merge social media and geolocalization.
Enriching ZureTTS platform with new languages
Aims: Several aspects of the project ZureTTS have been treated. On the one hand, the members of Aholab have focused on developing the platform to include the dialect from Iparralde (the northern side of the Basque Country), and they have started both writing the questions for the voice donors and designing the new interface. Concerning the app for Android, they have spent the day identifying errors and preparing everything required to install the new platform. To conclude, in the “Ireland team” they have translated the webpage interface into Gaélic and contacted some Irish experts within their university to get hold of a good, reliable database.
Results: At the end of the week, apart from adding the Lapurtera (Basque dialect) version to the web, they’ve made a huge progress in Gaelic, thanks to the help of the Irish people specially.
SRL and Dockers
Aims: Members of the SRL project have been structuring a database to add and handle information later on. As Suhail Sarwan says, developments in SRL mean a direct benefit in the field of semantics, particularly if we want to promote and improve the e-learning model. Aided by Rodrigo Agerri, among others, they have worked on the SRL, and Eleanor Dutton intends to develop a tool for linguistic analysis and to apply it to Moroccan Arabic.
Results: They showed us a tool they have developed to identify the participants of the events described by the predicates within a sentence, by sequence tagging methods.
Machine Translation for minority languages
Aims: Each member of the group is focusing on the pair of languages in which he/she is fluent. Based on the program called Apertium, for example, they have started working on a translator for the language combination French-Occitan, so that they can later develop a linguistic analyzer for Occitan. They have also been working on a Tetum-Portuguese translator (the two official languages spoken on the island of Timor) with the same program. Others have started preparing lexical transfers (they will try to do the same with dependency transfers) for the English-Spanish combination using Matxin. This exact same program also allows the creation of a English-Welsh translator, as well as a translator for English-Basque (one such translator already exists, but some errors must be identified and corrected). The latter will be applied in the field of medicine.
Results: They have explained several projects they’ve been developing through these days, all of them related to machine translation devices: for Hungarian, Buryat (a variation of Mongolian), Rif Berber (language spoken mostly in Morocco), Uyghur (Turkic language spoken in Western China), among others.
Leave a Reply