RST Treebank

Relation: list% (41 )

Segments

Relation type

Relation name

Document

Tagger

Area

Notes

An open-ended multidisciplinary approach, developed by Bugarski (1996a; 1996b) and adapted for the purposes of this paper, is tested against the data coming from various scientific fields, such as computer science, quality control and quality management, linguistics, engineering, etc.

The analysis of the data at hand - international terms most of which have not yet been standardized in Serbian - indicate that a hierarchy of criteria for evaluating the terms, which are to be fully accepted in a given scientific register, should be organized in such a way as to give primacy to the parameter measuring the international value of terms, the shortness parameter and the monosemy parameter.

In all the instances analyzed herein the English borrowings are given primacy over translation and structural calques in the linguistic code which could be labeled as "modern scientific variety of Serbian".

list

N-N

TERM18_A1.rs3

TERM

Some of the difficulties faced will be discussed,

and ideas will be given for approaching this field in present day society.

list

N-N

TERM19_A1.rs3

TERM

Our paper will discuss the methodology used by both groups in term creation.

The paper will also address the difficulties encountered in encouraging the use of new and/or standardised terminology in the Irish language.

list

N-N

TERM23_A1.rs3

TERM

(how can a term be represented?

Is there a minimum representation?

How are terms to be classified?),

list

N-N

TERM29_A1.rs3

TERM

(How should terminological databases be structured?

What relationships should be covered?

What is a dictionary unit?).

list

N-N

TERM29_A1.rs3

TERM

because the unifying process of the language has not been completed,

research carried out is limited

and Basque is an agglutinative language.

list

N-N

TERM31_A1.rs3

TERM

2.1. Linguistic Techniques Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units.

2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined.

2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

list

N-N

TERM31_A1.rs3

TERM

The morphological analyser is already being prepared (Alegria et al, 96),

the lemmatizer/labeller is almost completed (Aduriz et al, 96)

and work has been done on surface level syntax.

list

N-N

TERM31_A1.rs3

TERM

1. Introduction In recent years work has begun to develop instruments in several languages for automatic terminology extraction in technical texts, though human intervention is still required to make the final selection from the terms automatically chosen. As an example we can cite the following instruments: LEXTER (Bourigault, 92), AT & Tko Terminght (Church & Dagan, 94), TERMS by IBM (Justeson & Katz, 95) and NPtool (Arpper, 95). Their areas of application can be divided into two main groups: information indexing and the making-up of terminological glossaries. In areas where terminology is developing dynamically, such as computer science, it is almost impossible to carry out effective terminological work without an instrument of this type. If a similar instrument is to be developed for Basque we shall come up against more major drawbacks, because the unifying process of the language has not been completed, research carried out is limited and Basque is an agglutinative language.

2. Terminology extraction It is a hard task to obtain a formal, complete definition of a term, but that is precisely what a major part of this work consists of: defining the characteristics of terms. To obtain technical terms from the corpus a combination of NLP techniques (based on linguistic knowledge) and statistical techniques is usually used. 2.1. Linguistic Techniques Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units. 2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined. 2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

3. Application to Basque The IXA Group intends to develop a tool of this type for Basque. The morphological analyser is already being prepared (Alegria et al, 96), the lemmatizer/labeller is almost completed (Aduriz et al, 96) and work has been done on surface level syntax. While these tools are being prepared, we must work on the modelling of technical terms, i.e. we must reduce their characteristics. To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained. We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

list

N-N

TERM31_A1.rs3

TERM

In English Levi clearly expressed in 1978 the need to distinguish between two types of adjectives deriving from nouns, whose semantic and syntactic characteristics were different: "nominal nonpredicating adjectives" and "denominal adjectives". The latter are derived by means of suffixes such as -y, -ful and -ous, so that they take not only their grammatical category but also their predicative nature from their suffixes. On the other hand "nominal nonpredicating adjectives" are by category adjectives but behave similarly to nouns: linguistic difficulties / language difficulties.

In the case of Spanish, as defended by Bosque in 1989, there are some generative suffixes which are unlikely to derive to referential adjectives (-esco, -il, -oso, -ino) and others which frequently derive to such adjectives (-al, -ar, -ario, -ico).

In French, too, different suffixes are generally used to create referential and predicative adjectives, giving rise to pairs such as familial-familier and infantil-infantin.

list

N-N

TERM34_A1.rs3

TERM

on the one hand are those which modify nouns denoting activities or consequences, where the referential adjective is frequently the argument of the noun. Examples would be decisi?n presidencial ("presidential decision"), which must mean what is decided by the President, and extracci?n dental ("tooth extraction"), denoting the action of removing a tooth or molar. It is obvious that in the latter case the adjective dental is equivalent to the direct object of the verb extraer ("to extract"). In Basque compound nouns are prime candidates for this (hortz-ateratzea). However, we are unlikely to translate decisi?n presidencial as presidente-erabaki: we would almost certainly opt for presidentearen erabaki. Why is there this difference between the "subject nature" of the modifier of compound nouns and the "defined nature" of this element?

On the other hand, when referential adjectives modify the noun (an object with no argumental structure) the relationship between the noun and the adjective cannot be predicted: in this case we find a relationship of field or of ownership, which is shown by dictionary compilers by the periphrasis -ri dagokion. For instance hilo dental ("dental floss") is the floss used to clean teeth, and in Basque hortzetako haria would probably be preferred to hotz-haria. Finally, the group known in phonetics as a consonante dental would be kontsonante horzkaria and not hortz-kontsonante.

list

N-N

TERM34_A1.rs3

TERM

What types of adjective can be created through adjective generating suffixes?

Although many referential adjectives can be expressed in Basque through compound words, this method clearly does not offer forms for all such adjectives. Where then is the limit?

Where is the key? Is it in predicative/non predicative nature, in argumental, field or possessive relationships or is it elsewhere?

Does definite/indefinite, animate/inanimate, countable/uncountable nature have any influence?

Is impossibility influenced by whether the noun modified is an action, a consequence or an object?

Under what conditions could we be forced to use loans to translate referential adjectives from other languages into Basque?

When do we use noun complements in Basque to replace adjectives in other languages?

list

N-N

TERM34_A1.rs3

TERM

First of all the channel through which Internet terms are made known is the net itself. This means that they not only spread rapidly (information on the internet can be accessed almost immediately) but also reach vast areas (all over the world).

Furthermore, terms can be compiled, discussed and assessed anywhere: many Web sites can be found which give glossaries of Internet terms or propose names and even invite users to vote on them.

list

N-N

TERM38_A1.rs3

TERM

How do the receiving languages respond to this?

How do they deal with Internet terminology?

Are all those words which seem to be terms actually terms?

Do they meet actual needs for names or do sensationalist, ephemeral terms abound?

list

N-N

TERM38_A1.rs3

TERM

What type of terminology is being created?

What lexical creation systems predominate? There is a common denominator in all languages: terms are generated in English and come in as loanwords. How do the receiving languages respond to this? How do they deal with Internet terminology? Are all those words which seem to be terms actually terms? Do they meet actual needs for names or do sensationalist, ephemeral terms abound?

list

N-N

TERM38_A1.rs3

TERM

a) the scarcity of prefixes in Basque as compared to the abundance of suffixes;

and b) the fact that this imbalance is not shared by the romance languages.

list

N-N

TERM50_A1.rs3

TERM

Firstly, derivation processes based on prefixes and suffixes have been analysed in all three languages, with special emphasis on those cases in which the basis for derivation is a verb and those in which the derivative is a verb. Two substantial differences have been found: one is between prefixes and suffixes within Basque and the other is between Basque derivatives and those of romance origin.

Secondly, a theoretical exposition of these two differences has been sought and found.

Thirdly, we have attempted to consolidate the contribution of this theoretical exposition to the field of lexicography.

list

N-N

TERM50_A1.rs3

TERM

In the former case the prefix provides specificity for the core (the derivative predecir is a more specific version of the core decir, but to say before is, after all, still to say).

In the latter case, the core is made up of the prefix itself, and the core is the basis of the derivation, so that prehistoria is not a more specific version of the basic complement historia but something different altogether.

list

N-N

TERM50_A1.rs3

TERM

First of all, it has the prefix des-, which has both possibilities, as in the case of the romance languages. In the derivative desegin it acts as a modifier of the basic core egin (the antonym of do), but when we seek an example of the prefix/core complement type (deshojar), desostatu, we find that it is not properly formed. Observe that the prefixes ber-/bir ''re' and ez- 'in-/des-'also act in the same way. As regards lexicographic conclusions, the first point which must be stressed in this paper is the difficulty found in forming words such as desostatu.

Secondly, we must make it clear that the prefix-core/base-complement of the romance languages and English has a corresponding feature in Basque in base-complement/suffix-core. This is an important contribution to modern lexicography. Beyond formations of the des1 hoja2 r ??hosto2 gabe1 tu type we must bear in mind the option hostoak2 galdu/kendu1 but especially the forms pozoin-du (en-venenar), bigun-du (re-blancederse), lerro-ka-tu (a-linear), irin-ez-ta-tu (enharinar), lur-rera-tu (a-terrizar), which should be standardised as the common correspondents of the prefixes a-, des-, en-, es-, in- and re- so that more and better resources are made available.

list

N-N

TERM50_A1.rs3

TERM

1. Analysis of the problem from the viewpoint of users of geographical terms.

2. Importance of and need for a standardisation of geographical terms as part of work to standardise toponyms.

Reference will be made to the recommendations of the UN and of various specialists.

3. Summary of how geographical terms have been dealt with in toponymic standardisation work by the DEIKER institute at the University of Deusto.

4. Conclusions. Questions and answers

list

N-N

TERM51_A1.rs3

TERM

1. We do not know the exact meaning of many generic elements: "alto" (given here in Basque as "gaina") could mean "mountain pass", "hill", "peak" or various other things.

2. "Standardised" geographical terms may be allocated arbitrarily. For instance in the district of Deusto we find avenida ("avenue") used for two thoroughfares which should not both be described thus: "Avenida Ramón y Cajal" is an ordinary street, while "Avenida Lehendakari Agirre" is a true avenue.

3. There is a lack of standardised geographical terminology. For instance the Spanish term avenida is translated into Basque variously as ibilbidea, etorbidea or pasealekua, depending on which Town Hall allocated the names. Likewise the Spanish term pico ("peak") may appear as gallur, haitzorrotza, haizpunta, mokorra, mokoa, punta, or tontorra depending on the author or research body involved (DEIKER, Elhuyar, Government of Navarra, Euskatzaindia, etc.).

list

N-S

TERM51_A1.rs3

TERM

PRESENTATIONAL RELATIONS

preparation

background*

Enablement and motibation

enablement*

motivation

Evidence and justify

evidence

justify

Anthitesis and concession

anthitesis

concession

restatement and summary

restatement*

summary*

SUBJECT MATTER RELATIONS

Conditional subgroup

condition

otherwise*

unless

unconditional*

Ebaluation and interpretation

Cause subgroup

MULTINUCLEAR

joint*

restatement-NN

same-unit

antítesis	TERM
antithesis	TERM
background	TERM
causa	TERM
cause	TERM
circumstance	TERM
circunstancia	TERM
concession	TERM
condition	TERM
conjunction	TERM
contrast	TERM
contraste	TERM
disjunction	TERM
elaboración	TERM
elaboration	TERM
evaluación	TERM
evaluation	TERM
evidence	TERM
evidencia	TERM
fondo	TERM
interpretación	TERM
interpretation	TERM
justify	TERM
list	TERM
lista	TERM
means	TERM
medio	TERM
motivación	TERM
motivation	TERM
preparation	TERM
propósito	TERM
purpose	TERM
restatement	TERM
result	TERM
resultado	TERM
resumen	TERM
same-unit	TERM
secuencia	TERM
sequence	TERM
solutionhood	TERM
summary	TERM
unless	TERM

Multilingual RST Treebank