On this page: Introduction / Download / Source code / License / How to cite / Installation / How to use / Contact
ixa-pipe-dep-eu
ixa-pipe-dep-eu is a dependency parser for Basque written documents. Currently we distribute two versions of this tool. The first version (v1.0.0), which is simpler but faster, is based on the graph-based version of Mate parser. The second version (v2.0.0) is based on the combination of the analyses obtained by different parsers. More precisely, Mate and MaltParser parsers are used to obtain the analyses, and MaltBlender tool is used to choose the best combination of those analyses. Both versions are implemented in Java programming language.
The tool takes a document in NAF format. This input document should contain lemmas, PoS tags and morphological annotations. The input NAF document containing the necessary linguistic information could be obtained from the output of ixa-pipe-pos-eu.
Download
You can download the package that contains the executable
file for each of the two stable versions from the following
link:
[v2.0.0] ixa-pipe-dep-eu-v2.0.0.tar.gz
[v1.0.0] ixa-pipe-dep-eu-v1.0.0.tar.gz
Linguistic resources
This tool needs some other linguistic tools and resources
and you can download them from the following links (each
version needs its own resources):
[v2.0.0] dep-eu-resources-v2.0.0.tgz
[v1.0.0] dep-eu-resources-v1.0.0.tgz
Source code
Source code for the latest development version can be downloaded or cloned from this Github page.
License
All the original code produced for ixa-pipe-dep-eu is licensed under GPL v3 free license.
This software uses some external tools, and they are distributed with the source code and the resources. These tools have their own copyright owner and license:
[v2.0.0]- mate-tools anna: GNU General Public License v2
- MaltParser: Copyright (C) 2007-2017, Johan Hall, Jens Nilsson and Joakin Nivre. Redistribution and use in source and binary forms, with or without modification, are permitted.
These tools also use some other libraries. See the NOTICE file of these tools.
How to cite
If you use ixa-pipe-dep-eu tool, please cite one of the following papers (depending on the version you use) in your academic work:
[v2.0.0]Iakes Goenaga, Koldo Gojenola, Nerea Ezeiza. Combining Clustering Approaches for Semi-Supervised Parsing: the BASQUE TEAM system in the SPRML 2014 Shared Task. Workshop on Statistical Parsing of Morphologically Rich Languages SPRML 2014 Shared Task, Dublin, COLING Workshop. 2014
[bibtex]
Arantxa Otegi, Nerea Ezeiza, Iakes Goenaga, Gorka
Labaka. A Modular Chain of NLP Tools for Basque.
In Proceedings of the 19th International Conference on
Text, Speech and Dialogue - TSD 2016, Brno, Czech Republic,
volume 9924 of Lecture Notes in Artificial Intelligence,
pp. 93-100. 2016
[bibtex]
Installation
Once you download the package that contains the executable file, decompress the file. The executable will be ready to use, without any installation, but you have to follow the next steps in order to make the required resources usable:
- Download the package of the resources you need from the
following link:
[v2.0.0] dep-eu-resources-v2.0.0.tgz
[v1.0.0] dep-eu-resources-v1.0.0.tgz - Decompress the package and update the run.sh executable file changing the baliabideak variable to specify the path of the dep-eu-resources directory you just got.
Besides, Java should be installed in your computer. Also Perl in order to be able to use MaltBlender (only for v2.0.0).
How to use
The ixa-pipe-dep-eu-X.X.X.jar executable is used to run the ixa-pipe-dep-eu tool. The only required argument (-b) is the path of the resources directory available on the download section. The full command syntax of ixa-pipe-dep-eu-X.X.X.jar is
arguments:
-h show this help message and exit
-b RESOURCES_DIR [Required] Specify the path of the downloaded resource directory.
-c CONLL_FILE [Optional] If you want to save the output also in CONLL format, specify the path of the output file.
A executable script run.sh is provided to run the tool (this script calls to the ixa-pipe-dep-eu-X.X.X.jar executable with all the needed arguments explained above). You can use it, but before running it, update the rootDir and baliabideak variables on this script as specified on the installation section.
This tool reads from standard input. It should be UTF-8 encoded NAF format, containing lemmas, PoS tags and morphological annotations (text and terms elements of NAF). The input NAF document containing the necessary linguistic information could be obtained from the output of ixa-pipe-pos-eu.
Therefore, you can obtain syntactic dependencies of a plain text file using the following command (in a single command-line):<!--ncmod(Zinemaldiko, Donostiako)-->
<dep from="t2" to="t1" rfunc="ncmod" />
<!--ncsubj(da, Zinemaldiko)-->
<dep from="t6" to="t2" rfunc="ncsubj" />
<!--ncmod(lehiatuko, sail)-->
<dep from="t5" to="t3" rfunc="ncmod" />
<!--ncmod(sail, ofizialean)-->
<dep from="t3" to="t4" rfunc="ncmod" />
<!--xpred(da, lehiatuko)-->
<dep from="t6" to="t5" rfunc="xpred" />
<!--ncpred(da, Handia)-->
<dep from="t6" to="t7" rfunc="ncpred" />
<!--ncmod(da, filma)-->
<dep from="t6" to="t8" rfunc="ncmod" />
<!--PUNC(filma, .)-->
<dep from="t8" to="t9" rfunc="PUNC" />
</deps>
Contact
Arantxa Otegi, arantza.otegi@ehu.eus Iakes Goenaga, iakes.goenaga@ehu.eus