Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

ixa-pipes in FOSDEM

Rodrigo Agerri (Ixa Group) is going to present Ixa-pipes tomorrow in Fosdem, a free event for software developers to meet, share ideas and collaborate:

IXA pipes: Easy and ready use NLP tools for language communities.
Free NLP tools for several languages, including Basque, Galician, Spanis

IXA pipes (http://ixa2.si.ehu.es/ixa-pipes/) is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for several languages. It offers robust and efficient linguistic annotation to both researchers and non-NLP experts with the aim of lowering the barriers of using NLP technology either for research purposes or for small industrial developers and SMEs. The ixa pipes can be used or exploit its modularity to pick and change different components. Every ixa pipe can be up an running after two simple steps. The tools require Java 1.7+ to run and are designed to come with all batteries included, which means that it is not required to do any system configuration or install any third-party dependencies. The modules will run on any platform as long as a JVM 1.7+ is available.

IXA pipes are just a set of processes chained by their standard streams, in a way that the output of each process feeds directly as input to the next one. The Unix pipes metaphor has been applied for NLP tools by adopting a very simple and well known data centric architecture, in which every module/pipe is interchangeable by any other tool as long as it reads and writes the required data format via the standard streams.

The data format in which both the input and output of the modules needs to be formatted to represent and pipe linguistic annotations is NAF. We currently covered tokenization, pos tagging, lemmatization, Named Entity Recognition and classification and probabilistic parsing, but further annotations and languages can be easily added. The tools are distributed under Apache License 2.0.

I would prefer to keep the theoretical part as short as possible and do some practical work with the modules. In order to save time, it will be nice (although not compulsory) if attendants would come with a laptop with the following components installed:

  • Java Development Kit 1.7+
  • Apache Maven 3.+
  • git
  • Datasets/Corpora such as:
  • http://universaldependencies.github.io/docs/
  • CoNLL 2002 NER data http://www.clips.uantwerpen.be/conll2002/ner/

The idea is to download, compile, tag texts and train your own models in a very short time using IXA pipes.
Links: Ixa-pipes  main site, Main developer, Submit feedback

1 comment to ixa-pipes in FOSDEM

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>