Report on the 9th SaLTMiL Workshop (Reykjavik, 2014)
Report on the 9th SaLTMiL Workshop on “Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages”
by Mikel L. Forcada
Tuesday, 27 May 2014.
Reykjavik (Iceland)
Iñaki Alegria, Unai Cabezon, Unai Fernandez de Betoño, Gorka Labaka, Aingeru
Mayor, Kepa Sarasola and Arkaitz Zubiaga
Wikipedia and Machine Translation: killing two birds with one stone
Gideon Kotzé and Friedel Wolff
Experiments with syllable-based English-Zulu alignment
Inari Listenmaa and Kaarel Kaljurand
Computational Estonian Grammar in Grammatical Framework
Matthew Marting and Kevin Unhammer
FST Trimming: Ending Dictionary Redundancy in Apertium
Hrvoje Peradin, Filip Petkovski and Francis Tyers
Shallow-transfer rule-based machine translation for the Western group of South Slavic
Alex Rudnick, Annette Rios Gonzales and Michael Gasser
Enhancing a Rule-Based MT System with Cross-Lingual WSD
- Is research in minority-language machine translation already mainstream
- What are the main difficulties in building or putting together free/open-source language resources for small languages, and how should they be addressed? Are we pooling these resources correctly?
Programme of the 9th SaLTMiL Workshop on “Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages”
A half-day workshop at LREC 2014
Tuesday, 27 May 2014.
Reykjavik (Iceland)
LREC 2014:
Workshop Programme
09:00 – 09:30 Welcoming address by Workshop co-chair Mikel L. Forcada
09:30 – 10:30 Oral papers
Iñaki Alegria, Unai Cabezon, Unai Fernandez de Betoño, Gorka Labaka, Aingeru
Mayor, Kepa Sarasola and Arkaitz Zubiaga
Wikipedia and Machine Translation: killing two birds with one stone
Gideon Kotzé and Friedel Wolff
Experiments with syllable-based English-Zulu alignment
10:30 – 11:00 Coffee break
11:00 – 13:00 Oral papers
Inari Listenmaa and Kaarel Kaljurand
Computational Estonian Grammar in Grammatical Framework
Matthew Marting and Kevin Unhammer
FST Trimming: Ending Dictionary Redundancy in Apertium
Hrvoje Peradin, Filip Petkovski and Francis Tyers
Shallow-transfer rule-based machine translation for the Western group of South Slavic
Alex Rudnick, Annette Rios Gonzales and Michael Gasser
Enhancing a Rule-Based MT System with Cross-Lingual WSD
13:00 – 13:30 General discussion
13:30 Closing
Workshop Organizers
Mikel L. Forcada Universitat d’Alacant, Spain
Kepa Sarasola. Euskal Herriko Unibertsitatea, Spain
Francis M. Tyers. UiT Norgga árktalaš universitehta, Norway
Workshop Programme Committee
Iñaki Alegria Euskal Herriko Unibertsitatea, Spain
Lars Borin Göteborgs Universitet, Sweden
Elaine Uí Dhonnchadha Trinity College Dublin, Ireland
Mikel L. Forcada Universitat d’Alacant, Spain
Michael Gasser Indiana University, USA
Måns Huldén Helsingin Yliopisto, Finland
Krister Lindén Helsingin Yliopisto, Finland
Nikola Ljubešić Sveučilište u Zagrebu, Croatia
Lluís Padró Universitat Politècnica de Catalunya, Spain
Juan Antonio Pérez-Ortiz Universitat d’Alacant, Spain
Felipe Sánchez-Martínez Universitat d’Alacant, Spain
Kepa Sarasola Euskal Herriko Unibertsitatea, Spain
Kevin P. Scannell Saint Louis University, USA
Antonio Toral Dublin City University, Ireland
Trond Trosterud UiT Norgga árktalaš universitehta, Norway
Francis M. Tyers UiT Norgga árktalaš universitehta, Norway
The 9th International Workshop of the Special Interest Group on Speech and Language Technology for Minority Languages (SaLTMiL) will be held in Reykjavík, Iceland, on 27th May 2014, as part of the 2014 International Language Resources and Evaluation Conference (LREC). (For SALTMIL see:; it is also framed as one of the activities of European project Abu-Matran ( Entitled "Free/open-source language resources for the machine translation of less-resourced languages", the workshop is intended to continue the series of SALTMIL/LREC workshops on computational language resources for minority languages, held in Granada (1998), Athens (2000), Las Palmas de Gran Canaria (2002), Lisbon (2004), Genoa (2006), Marrakech (2008), La Valetta (2010) and Istanbul (2012), and is also expected to attract the audience of Free Rule-Based Machine Translation workshops (2009, 2011, 2012).
The workshop aims to share information on language resources, tools and best practice, to save isolated researchers from starting from scratch when building machine translation for a less-resourced language. An important aspect will be the strengthening of the free/open-source language resources community, which can minimize duplication of effort and optimize development and adoption, in line with the LREC 2014 hot topic ‘LRs in the Collaborative Age’ (
Papers describe research and development in the following areas:
- Free/open-source language resources for rule-based machine translation (dictionaries, rule sets)
- Free/open-source language resources for statistical machine translation (corpora)
- Free/open-source tools to annotate, clean, preprocess, convert, etc. language resources for machine translation
- Machine translation as a tool for creating or enriching free/open-source language resources for less-resourced languages
Call for Papers: 9th SaLTMiL workshop on “Free/open-source language resources for the machine translation of less-resourced languages” at LREC 2014.
A full-day workshop at LREC 2014
Tuesday, 27 May 2014.
Reykjavik (Iceland)
LREC 2014:
Paper submission:
The 9th International Workshop of the Special Interest Group on Speech and Language Technology for Minority Languages (SaLTMiL) will be held in Reykjavík, Iceland, on May 24, 2014, as part of the 2014 International Language Resources and Evaluation Conference (LREC). (For SALTMIL see:; it is also framed as one of the activities of European project Abu-Matran ( Entitled "Free/open-source language resources for the machine translation of less-resourced languages", the workshop is intended to continue the series of SALTMIL/LREC workshops on computational language resources for minority languages, held in Granada (1998), Athens (2000), Las Palmas de Gran Canaria (2002), Lisbon (2004), Genoa (2006), Marrakech (2008), La Valetta (2010) and Istanbul (2012), and is also expected to attract the audience of Free Rule-Based Machine Translation workshops (2009, 2011, 2012). The workshop aims to share information on language resources, tools and best practice, to save isolated researchers from starting from scratch when building machine translation for a less-resourced language. An important aspect will be the strengthening of the free/open-source language resources community, which can minimize duplication of effort and optimize development and adoption, in line with the LREC 2014 hot topic ‘LRs in the Collaborative Age’ (
The whole-day workshop will consist of short oral papers, a poster session preceded by a poster-boaster session (2 minutes, 2 slides per poster), and a round table.
Papers are invited that describe research and development in the following areas:
- FOS LR for rule-based machine translation (dictionaries, rule sets)
- FOS LR for statistical machine translation (corpora)
- FOS tools to annotate, clean, preprocess, convert, etc. LRs for machine translation
- Machine translation as a tool for creating or enriching FOS LRs for less-resourced languages
Position papers and (web based) demonstrations will also be considered for presentation.
The best papers, as evaluated by the programme committee, will be presented orally and the remaining paper will be presented in poster format.
We expect short papers of max 6,000 words (up to 6 pages) describing research addressing one of the above topics, to be submitted as PDF documents by using the LREC 2014 START conference management system ( ).
Submissions should be anonymized. When submitting a paper through the START page, authors will be kindly asked to share the resources that have been used for the work described in their paper or that are the outcome of their research. For further information on this initiative, please refer to
Submissions of papers should follow the same style as the papers for the main LREC conference (an Author's Kit made of specific guidelines and downloadable templates will be published on the conference web site in due time). All contributions will be included in the workshop proceedings (CD). They will also be published on the SALTMIL website.
The registration fees will be duly announced at the LREC 2014 site. Registration in the workshop willl include a coffee break and the Proceedings of the Workshop. Registration will be handled by the LREC 2014 Secretariat.
Important dates
Deadline for paper submission: February 10 2014 February 17, 2014
Notification of acceptance sent: March 3, 2014 March, 10, 2014 March, 14, 2014
Camera-ready paper due: March 21, 2014
Organizing committee
Joint e-mail address: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
(1) Dr Francis M Tyers
Institutt for språkvitskap
Det humanistiske fakultet,
N-9037 Universitetet i Tromsø
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
(2) Dr Kepa Sarasola
Computer Science Faculty
Dept. of Computer Languages
The University of the Basque Country
P.K. 649 20080 DONOSTIA
Basque Country, Spain
Tel: +34 943 01 81 54
Fax: +34 943 21 93 06
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
(3) Prof Mikel L. Forcada
Dept. Llenguatges i Sistemes informàtics
Universitat d’Alacant
E-03071 Alacant (Spain)
Tel: +34 96 590 9776
FAx: +34 96 590 9326
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
Programme Committee
Iñaki Alegria, Euskal Herriko Unibertsitatea, Spain
Lars Borin, Göteborgs Universitet, Sweden.
Elaine Uí Dhonnchadha, Trinity College Dublin, Ireland
Mikel L. Forcada, Universitat d’Alacant, Spain
Michael Gasser, Indiana University, USA
Måns Huldén, Helsingin Yliopisto, Finland
Krister Lindén, Helsingin Yliopisto, Finland
Nikola Ljubešić, Sveučilište u Zagrebu, Croatia
Lluís Padró, Universitat Politècnica de Catalunya, Spain
Juan Antonio Pérez-Ortiz, Universitat d’Alacant, Spain
Felipe Sánchez-Martínez, Universitat d’Alacant
Kepa Sarasola, Euskal Herriko Unibertsitatea, Spain
Kevin P. Scannell, Saint Louis University, USA
Antonio Toral, Dublin City University, Ireland
Trond Trosterud, Universitet i Tromsø, Norway
Francis M. Tyers, Universitet i Tromsø, Norway
Report on the 8th LREC workshop. 2012
On May 22nd 2012, SALTMIL held in collaboration with AfLaT a full-day workshop on "Language technology for normalisation of less-resourced languages". This was a satellite workshop preceding the biennial LREC (Language Resources and Evaluation Conference) in Istanbul, Turkey.
The program started with the invited talk presented by Sjur Moshagen Nørstebø. This was then followed by two sessions of four oral presentations and a poster session with eight contributed poster papers. All the presentations and posters stimulated many questions and discussions.
At 17.30, after a brief presentation by Francys Tyers and Guy De Pauw, an interesting discussion took place on "Language technology for normalisation of less-resourced languages" and then the workshop was closed by thanking the audience for their participation in the whole workshop.
About fourty five people were present in total, from a wide range of countries, and representing work on a variety of less resourced languages.
Addional materials related to this workshop are available: