RST Chinese Treebank
The RST Spanish-Chinese Treebank is a corpus of specialized texts in Spanish and their parallel texts in Chinese. All the texts are annotated manually with discourse relations under the theoretical framework Rhetorical Structure Theory (RST) (Mann and Thompson, 1988). RSTTool (O’Donnell, 2000) is used to annotate this corpus. The annotation results are saved by rstWeb (Zeldes, 2016).
Totally, 100 texts are included in this corpus. The genres of these texts are: (a) scientific abstract; (b) advertisement; (c) news and (d) announcement. The topics of the corpus are: (a) terminology; (b) culture; (c) language; (d) economy; (e) education; (f) art and (g) international affairs.
In this website, you can find:
- The texts and a search tool to find any information of the corpus based on part of speech (POS).
- The occurrences of each discourse relation
- Discourse structure of a text
- Linear segmentation of each text
How to use this corpus in a correct way?
In order to use this corpus in an appropriate way, we appreciate you can cite the following references:
- Cao Shuyuan, da Cunha Iria, and Iruskieta Mikel. 2018. The RST Spanish-Chinese Treebank. In Proceedings of the Joint Workshop of Linguistic Annotation, Multiword Expression and Constructions (LAW-MWE-CxG-2018), 156-166.
Who we are?
Shuyuan Cao (Universitat Pompeu Fabra)
Mikel Iruskieta (University of Basque Country UPV/EHU)
Iria da Cunha (Universidad Nacional de Educación a Distancia)
NianWen Xue (Brandeis University)
Esther Miranda (University of Basque Country UPV/EHU)
Kike Fernandez (University of Basque Country UPV/EHU)