News

LREC10. SALTMIL workshop. CFP: "Creation and use of basic lexical resources for less-resourced languages"

Attention: open in a new window. PDFPrintE-mail

7th SaLTMiL Workshop on
"Creation and use of basic lexical resources for less-resourced languages"

A half-day workshop at LREC 2010

Sunday May 23, 2010, 09.00-14.00.

Mediterranean Conference Center, Valetta, Malta

Context and focus

The 7th International Workshop of the ISCA Special Interest Group on Speech and Language Technology for Minority Languages (SaLTMiL: see http://ixa2.si.ehu.es/saltmil), will be held in Malta, on a date between May 17 and May 23, 2010 to be announced, as part of the 2010 International Language Resources and Evaluation Conference (LREC). Entitled "Creation and use of basic lexical resources for less-resourced languages", the workshop is intended to continue the series of SALTMIL/LREC workshops on computational language resources for minority languages, held in Granada (1998), Athens (2000), Las Palmas de Gran Canaria (2002), Lisbon (2004), Genoa (2006) and Marrakech (2008). The Malta 2010 workshop aims to share information on tools and best practice, so that isolated researchers will not need to start from scratch. An important aspect will be the forming of personal contacts, which can minimize duplication of effort. There will be a balance between presentations of existing language resources, and more general presentations designed to give background information needed by all researchers.

Programme

Proceedings (pdf)

09.00 Registration
09.30 Opening
09.45 Invited talk: Marc Kemps-Snijders. LAT team at the Max Planck Institute at Nijmegen. "ELAN and RELISH project"
10.30 Coffee break
11.00 Invited talk: Antton Gurrutxaga and Igor Leturia: Elhuyar Foundation. "Exploiting Internet to build language resources for less resourced languages"
11.45 Oral papers (20+5 min.):

Tommi A Pirinen and Krister Lindén: "Finite-State Spell-Checking with Weighted Language and Error Models–Building and Evaluating Spell-Checkers with Wikipedia" as Corpus

Aric Bills, Lori S. Levin, Lawrence D. Kaplan, and Edna Agheak MacLean: "Finite-State Morphology for Iñupiaq"
12.35 Poster session

Marco Passarotti: "Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank"

Anna Björk Nikulásdóttir and Matthew Whelpton: "Extraction of Semantic Relations as a Basis for a Future Semantic Database for Icelandic"

Gábor Prószéky, Attila Novák, István Endrédy, Beatrix Oszkó, László Fejes, Sándor Szeverényi, Zsuzsa Várnai and Beáta Wagner-Nagy: "Nganasan – Computational Resources of a Language on the Verge of Extinction"

Géraldine Walther and Benoît Sagot: "Developing a large-scale lexicon for a less-resourced language: Sorani Kurdish"

Hrafn Loftsson, Jökull Yngvason, Sigrún Helgadóttir and Eiríkur Rögnvaldsson: "Developing a PoS-tagged corpus using existing tools"
13.20

Panel: Less resourced languages and Language technology. Short- and medium-term objectives (SaLTMiL)

14.00 Closing

Organisers

  • Mikel L. Forcada: Machine Translation Group, School of Computing, Dublin City University, Dublin, Ireland
  • Kepa Sarasola: Dept. of Computer Languages, University of the Basque Country
  • Francis M. Tyers, Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Spain

Programme committee

  • Mikel L. Forcada: Dublin City University, Ireland
  • Kepa Sarasola: University of the Basque Country
  • Francis M. Tyers: Universitat d'Alacant, Spain
  • Trond Trosterud, Universitetet i Tromsø, Norway
  • Núria Bel, Universitat Pompeu Fabra, Barcelona, Spain
  • Kevin Scannell, Saint Louis University, USA
  • Hrafn Loftsson, University of Reykjavik
  • Felipe Sánchez-Martínez, Universitat d'Alacant
  • Iñaki Alegria: University of the Basque Country
  • Lars Borin, Göteborgs universitet

Additional referees

  • Per Langgård: Oqaasileriffik (Language Secretariat, Nuuk, Greenland)
  • Paul Meurer: Universitet i Bergen
  • Sjur Moshagen: Divvun (Norwegian Sámi Parliament)
  • Eva Navas: University of the Basque Country

Important dates

28 February 2010 4 March 2010 Deadline for submission
22 March 2010    Notification
29 March 2010 Final version
23 May 2010      Workshop

Registration

Registration form available in LREC 2010 site

 

LTC09: News. Getting Less-Resourced Languages On-Board !

Attention: open in a new window. PDFPrintE-mail

Report on the Special joint LTC-FLaReNet session
« Getting Less-Resourced Languages On-Board ! »

LTC’09
Conference
Poznan
(Poland), 6-8 November 2009


A special session on the way to develop Language Technologies for Less-Resourced Languages was organized by FLaReNet within the LTC’09 conference in Poznan (Poland). It had been decided within the FLaReNet Steering Committee to organize a joint satellite workshop in connection with the Language & Technology Conference (LTC’09) in Poznan (6-8 November 2009). The choice was finally to organize it as a special session during the conference, on November 6.

The final conclusions extracted from the report of this event (written by J. Mariani (LIMSI-CNRS & IMMI) and  K. Choukri (ELDA-ELRA) were the following:

  • Generally speaking, a strong political will (more than only lip-service) to consider the language dimension and enough funds are necessary.
  • This must go with the awareness that Language Technologies and Language Resources are important.
  • There should be specialists in the processing of that language, reaching a critical mass, and young researchers should be trained.
  • An infrastructure must exist, including:
  • a writing system/a transcription code/an agreed orthography,
  • Language Resources (sufficient in quantity and quality),
  • tools (especially language independent (based on statistical training) ones, if possible as Open Source),
  • metadata, annotation schemes, standards,
  • development platforms,
  • evaluation means (adapted to the language specificities (such as for Machine Translation of morphologically-rich languages)).
  • The effort should be devoted in the long-term, resulting in a necessary strong foundation.
  • Dialects variants and sociolinguistics should also be taken into account.
  • Addressing only the short-term development of a specific product or service for that language (as a kind of simple toy), should be avoided, whereas demonstrating applications based on a strong foundation should be favoured.
  • When a majority language also exists, both should be studied together, and it would save time and efforts to consider a family of languages all together.
  • Bootstrapping approaches facilitate the coverage of a language.
  • Cooperation among countries or programs would greatly help by providing the less advanced ones with examples and Best Practices, such as the definition of a commonly agreed basic set of Language Resources which have already been proven necessary to correctly produce the corresponding technologies for a given language, and the identification of gaps and roadmaps should be aimed at.
  • Master keywords should be Interoperability and Sustainability.
   

SEPLN09. Added publications. IR-IE-LRL workshop

Attention: open in a new window. PDFPrintE-mail

We have added the 8 papers and the invited talk presented last week in our workshop on Saint Sebastian to the "Publications " section in our website.

Four ot the papers are specifically related to Information  Retrieval, Information Extraction or Question Answering:

  • Ussishkin, Francom and Woudstra address the issues involved in the creation of corpora and lexica for Maltese and Hebrew,
  • Yimam and Libsie describe a question-answering system for Amharic (a Semitic language like Maltese and Hebrew, spoken in Ethiopia),
  • Fernandez, Alegria and Ezeiza deal with the translation of named entities, a basic task in cross-language information retrieval
  • Alegria and co-workers show how to build a question-answering system from existing resources for a language.


Five papers accepted describe minority-language projects currently under way:

  • Pereira-Varela and colleagues describe Babelium, a multimedia framework to learn minority languages;
  • Chan, Jones and East describe a system to automate the writing of school reports in Welsh and English,
  • Prys describes a special interest group for minority-language speech and language technologies,
  • Humphreys describes a project to automatically subtitle television programmes in Welsh, and
  • Florie Moulin, Laura Laluque and Geróid Ó Néill describe a shell to manipulate dictionaries.


The paper corresponding to the invited talk by Lars Borin, professor of linguistic computing at the University of Gothenburg in Sweden, "Linguistic diversity in the information society" has been added as well. Prof. Borin, after giving some background about the current situation of the languages of the world and reviewing the concept of density of a language, discusses the main issues encountered when trying to develop written-language technologies for lower-density languages; in particular information extraction.

On behalf of the organisers, thanks to everyone for an excellent workshop,

   

SEPLN09-SALTMIL Workshop Programme

Attention: open in a new window. PDFPrintE-mail

nformation Retrieval and Information Extraction for Less Resourced Languages
                                                                   IE-IR-LRL

                                      SEPLN 2009 pre-conference workshop
                               Donostia-San sebastián. Monday 7th September 2009

DOWNLOAD THE PROCEEDINGS:  IE-IR-LRL.pdf



 09:00 Registration
 09:15 Opening
 09:30 Invited Talk. Lars Borin
 10:30 Papers  (20+5) min.
      1. Information retrieval and extraction in Maltese and Hebrew:
          Issues in creating web-based corpora and lexical tools for
          less-resourced languages.
          Adam Ussishkin, Jerid Francom, Dainon Woudstra
      2. TETEYEQ: A mharic question answering for factoid question.
          Seid Muhie Yimam, Mulugeta Libsie

 11:20 Coffee break

 11:40 Papers (20+5) min.
      3. Using Wikipedia for Named Entities Translation
          Izaskun Fernandez, Iñaki Alegria, Nerea Ezeiza
      4. Ihardetsi: A Question Answering system for Basque built on
          reused linguistic processors.
          Iñaki Alegria, Olatz Ansa, Xabier Arregi , Arantza Otegi,
          Ander Soraluze

 12:30 Projects (10 min. each)
      1. Babelium Project. Promoting the Use and Learning of Minority
         Languages.
         Juan A. Pereira Varela, Silvia Sanz-Santamaría, Julián
         Gutiérrez Serrano.
      2. A web-based system for multilingual school reports
         David Chan, Dewi Jones, Oggy East
      3. The SALT Cymru Special Interest Group – European Funding
         Encouraging Collaboration Between Academia and Business in
         Wales within the field of Speech and Language Technology.
         Gruffudd Prys
      4. Automated English subtitling of Welsh TV Programmes
          Llio Humphreys
      5. A Dictionary Shell
         Florie Moulin, Laura Laluque, Geróid Ó Néill

 13:20 Panel
       "Less resourced languages and Language technology.
        Short- and medium-term objectives"
        SALTMIL

 13:45 Closing

   

SEPLN09. IR-IE-LRL workshop, Donostia. SALTMIL-SEPLN09

Attention: open in a new window. PDFPrintE-mail

Information Retrieval and Information Extraction for less resourced languages

SEPLN 2009 pre-conference workshop
Donostia-San sebastián. Monday 7th September 2009 Call for papers

Paper submission:   8 June 2009
Organised by the SALTMIL Special Interest Group of ISCA

   

Page 2 of 3