some web applications and tools developed by IXA


I received my PhD in computer science at the University of the Basque Country with the dissertation entitled Automatic Exercise Generation Based on Corpora and Natural Language Processing Techniques

My PhD was focused on the automatic generation of test-based questions using NLP tools and corpora. It was conducted under the supervision of Montse Maritxalar. Find the abstract below.

You can download the dissertation and the presentation slides.

Errata: in page 122 is said: "In contrast, the results of item discrimination and the evaluation of the distractors were obtained based on low-scoring and high-scoring students." but it should be: "In contrast, the evaluation of the distractors was obtained based on low-scoring and high-scoring students."


Automatic Exercise Generation Based on Corpora and Natural Language Processing Techniques doktorego-tesia Euskal Herriko Unibertsitatean aurkeztu nuen. Tesi hau, Hizkuntzaren azterketa eta prozesamendua doktorego-programaren barnean kokatua dago.

LNP tresnak eta corpusak erabiliz, ikasketan lagundu ditzateketen test moduko galderak sortu ditugu modu automatikoan. Tesi lan hau Montse Maritxalar doktorearen zuzendaritzapean burutu nuen.

Eskuragarri: tesi-txostena eta aurkezpena.

Oharra: 122 orrialdean esaten da: "In contrast, the results of item discrimination and the evaluation of the distractors were obtained based on low-scoring and high-scoring students." esan beharko luke: "In contrast, the evaluation of the distractors was obtained based on low-scoring and high-scoring students."


Information and communication technologies (ICT) are widely used in different scenarios as media and methodologies. In this dissertation, we present ICT as an approach to help in the learning process of certain subjects.

The analysis of various available natural language processing (NLP) tools and corpora has demonstrated that it is possible to implement a system that helps experts and teachers in the creation of didactic material. Thus, we have designed and implemented a system called ArikIturri that, based on NLP and corpora, is able to produce items of a certain standard. ArikIturri is a multilingual system, and different question types have been tested in several scenarios. We have proven the viability of the system to work in the Basque language learning, English language learning and science domains. The experiments have corroborated the feasibility to produce several types of question: error correction, fill-in-the-blank, word formation, multiple-choice and short answer questions. The representation of the items as well as the information relating to their generation process is carried out by means of a question model. This structured representation allows the importation and exportation of the items into independent applications.

We have conducted various experiments in which distinct linguistic information has been utilised. In our experiments, the input for the system is always a corpus, from which sentences are selected to be part of the items based on diverse criteria. In addition, their grammatical and semantic information enabled us to carry out experiments: (i) to prove the viability of the system designed to implement a complete automatic process to generate items; (ii) to apply different methods in the generation of distractros; and (iii) to modify some components of the source sentences when creating the stems. The results of these experiments were obtained from experts' opinions and students' answers. In this way, a qualitative analysis based on experts knowledge gave us a way of measuring the correctness of the automatically generated questions. In addition, the quantitative analysis based on students' responses ensured the quality of the items.