On this page: Introduction / Download / Source code / License / How to cite / Platform and requirements / Installation / How to use / Contact
ixa-pipe-pos-eu
ixa-pipe-pos-eu is a robust and wide-coverage morphological analyzer and a PoS tagger, which is an adapted version of Eustagger, a tool lemmatizer/tagger for Basque. It is implemented in C++ programming language.
It is the first module of the linguistic processing chain. The tool takes a raw text as an input text and outputs the lemma, the PoS tag and the morphological information for each token in NAF format.
Download
You can download a pre-compiled binary package for latest stable version from the following links:Source code
Source code for the latest development version can be downloaded or cloned from Github Eustagger Lite page.
License
All the original code produced for ixa-pipe-pos-eu is licensed under GPL v3 free license.
This software uses some external libraries that have their own license and copyright owner:- PCRE++: Copyright (C) 2002-2003, Thomas Linden. GNU Lesser General Public License
- VISL CG-3: Copyright (C) 2007-2013, GrammarSoft ApS. GNU General Public License v3
- SWI Prolog: Copyright (C) 2008, University of Amsterdam. GNU General Public License v2
- Foma: Copyright (C) 2008-2012, Mans Hulden. GNU General Public License v2
- Freeling: Copyright (C) 2004, TALP Research Center, Universitat Politecnica de Catalunya. GNU General Public License v3
- Boost: Copyright (C) 2004-2006, Joe Coder. Boost Software License
How to cite
If you use ixa-pipe-pos-eu tool, please cite the following paper in your academic work:
Arantxa Otegi, Nerea Ezeiza, Iakes Goenaga and Gorka
Labaka. A Modular Chain of NLP Tools for Basque.
In Proceedings of the 19th International Conference on
Text, Speech and Dialogue - TSD 2016, Brno, Czech Republic,
volume 9924 of Lecture Notes in Artificial Intelligence,
pp. 93-100. 2016
[bibtex]
Platform and requirements
The ready to use packages are available only for Linux.
In order to use in other machines, you can download the source code and compile it. As it has some dependencies, it is required to install some additional libraries and programs beforehand. Follow the instructions in the INSTALL file.
Installation
Once you download the pre-compiled binary package, decompress the file and the executable will be ready to use, without any installation.
If you want to compile the source code, follow the instructions in the INSTALL file.
How to use
The executable ixa-pipe-pos-eu.sh is used to run the ixa-pipe-pos-eu tool. It has not any argument.
This tool reads from standard input, and it should be UTF-8 encoded plain text. Therefore, you can obtain lemmas, PoS tags and morphological information of a plain text file using the following command:<wf id="w1" offset="0" length="10" sent="1" para="1">Donostiako</wf>
<wf id="w2" offset="11" length="11" sent="1" para="1">Zinemaldiko</wf>
<wf id="w3" offset="23" length="4" sent="1" para="1">sail</wf>
<wf id="w4" offset="28" length="10" sent="1" para="1">ofizialean</wf>
<wf id="w5" offset="39" length="9" sent="1" para="1">lehiatuko</wf>
<wf id="w6" offset="49" length="2" sent="1" para="1">da</wf>
<wf id="w7" offset="52" length="6" sent="1" para="1">Handia</wf>
<wf id="w8" offset="59" length="5" sent="1" para="1">filma</wf>
<wf id="w9" offset="64" length="1" sent="1" para="1">.</wf>
</text>
<terms>
<!-- Donostiako -->
<term id="t1" lemma="Donostia" morphofeat="NL0LS000" pos="R" case="IZE LIB PLU- GEL NUMS MUGM ZERO HAS_MAI @<IZLG @IZLG>">
<span>
<target id="w1"/>
</span>
</term>
<!-- Zinemaldiko -->
<term id="t2" lemma="zinemaldi" morphofeat="NC0LS000" pos="N" case="IZE ARR GEL NUMS MUGM ZERO HAS_MAI @<IZLG @IZLG>">
<span>
<target id="w2"/>
</span>
</term>
<!-- sail -->
<term id="t3" lemma="sail" morphofeat="NC000000" pos="N" case="IZE ARR BIZ- ZERO @KM>">
<span>
<target id="w3"/>
</span>
</term>
...
</terms>
Contact
Arantxa Otegi, arantza.otegi@ehu.eus
Nerea Ezeiza, n.ezeiza@ehu.eus