If you use the ixa pipes tools or the models, please cite this paper:
Rodrigo Agerri, Josu Bermudez and German Rigau (2014): "IXA pipeline: Efficient and Ready to Use Multilingual NLP tools", in: Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), 26-31 May, 2014, Reykjavik, Iceland. PDF paper
ixa-pipe-tok: Tokenizer and Segmenter for several languages.
ixa-pipe-pos: Statistical POS tagging and Lemmatizer for Basque, Dutch, English, French, Galician, German, Italian and Spanish.
ixa-pipe-nerc: Named Entity Recognition tagger for Basque, Spanish, English, German, Dutch and Italian; Opinion Target Extraction (OTE) for English.
ixa-pipe-chunk: Probabilistic chunker for Basque and English.
ixa-pipe-parse: Probabilistic constituent parser for Spanish and English.
Every ixa pipe can be up an running after two simple steps. The tools require Java 1.7+ to run and are designed to come with all batteries included, which means that it is not required to do any system configuration or install any third-party dependencies. The modules will run on any platform as long as a JVM 1.7+ is available.
IXA pipes are just a set of processes chained by their standard streams, in a way that the output of each process feeds directly as input to the next one. The Unix pipes metaphor has been applied for NLP tools by adopting a very simple and well known data centric architecture, in which every module/pipe is interchangeable by any other tool as long as it reads and writes the required data format via the standard streams.
The data format in which both the input and output of the modules needs to be formatted to represent and pipe linguistic annotations is NAF. Our Java modules all use the kaflib library for easy NAF integration.