Online Java Lattice Building Application |
Help | FCA | SIMuLLDA | Examples | Create
a Structure Interlingua Multilingual Lexical Database Application
SIMuLLDA is an attempt to create a framework for a Multilingual Lexical Database, capable of creating translations between any pair of languages present in the system.
The core feature of SIMuLLDA is an interlingua, structured by means of Formal Concept Analysis. In the SIMuLLDA setup, the formal objects of FCA are not objects, but words or, more precisely, language-dependent meanings, which are related to, but not identified with, the word-forms of the various languages, situated outside of the lattice. The word-forms are related to the meanings in the sense that every word-form can express one or more meanings.
The word-forms themselves are grouped into languages, where languages are little more than lists of word-forms. This gives us a setup as exemplified in figure 1, where languages are represented as boxes, containing word-forms that refer to language-independent meanings in the interlingua. Notice that in this setup, the language-dependent disambiguated words are only implicitly present in the link between the word-forms and the meanings, but not explicitly represented. In the final version of SIMuLLDA these disambiguated words will be present, however, not so much for the semantic system itself, but because we also want to be able to incorporate other dictionary items, such as labels and example sentences, in the model; since some of these relate to the language-dependent disambiguated words, these should be present in the system.
Figure 1. (Partial) Multilingual Setup of SIMuLLDA
Besides the formal objects (meanings), the interlingual lattice also contains the definitional attributes. These definitional attributes are, in a way, less obviously language-dependent: the definitional attributes young expresses, in combination with the meaning FOAL, merely the fact that foals are young. However, young itself is, of course, an English expression for this definitional attribute, which could equally well have been expressed by jeune in French, or mlády in Czech. So in the SIMuLLDA setup, also the definitional attributes are related to word-forms of the various languages, leaving the definitional attributes themselves language-independent.
In order to keep the different entities in SIMuLLDA apart, the following convention will be used: interlingual meanings will be indicated with CAPITALS, attributes will be indicated with boldface; the `words' are not fully specified yet, and will for the time being be indicated with roman letters. Of course, the language-independent items themselves (meanings and definitional attributes), do not have a written form. For clarity, they will be given the names of their English lexicalisations, indexed with a number where necessary. This naming is arbitrary, and has some exceptions: when various English lexicalisations exist (in case of synonymy), either of these is chosen, and when no English lexicalisation exists (in case of a lexical gap), the lexicalisation in an arbitrary other language will be chosen. We also want to be able to refer to the node under which the word is represented, which is, as shown in the previous chapter, the smallest common concept. We will refer to this formal concept in bold small caps, so the node for COLT is <COLT'', COLT'>, for which the shorthand COLT will be used.
Notice how in this multilingual setup, two of the three problems mentioned at the beginning of this section are naturally resolved: words of different languages can now be clearly separated, since languages are explicitly represented as groups of word-forms. Also, there is a clear notion of ambiguity: ambiguity exists when the same word-form related to more than one interlingual meaning. The third problem (lexical gap filling) will be addressed in the next section.
Another point is the following: in the setup in figure 1, it is possible for the same interlingual meaning to be expressed equivalently in various languages. This is not at all a trivial claim; many philosophers, including Quine, Kuhn and Davidson, hold a thesis called incommensurability: the claim that the words (or concepts) of a language form an intricate network, in which the meaning of the one fully depends on the meaning of the others; words only have their meaning within this conceptual scheme of a language, and translating a single word into a different scheme is inherently impossible. This problem is almost invariably ignored by lexicographers, for practical reasons. For instance in the words of the lexicographer Messelaar: ''Practiciens, les lexicographes partent évidemment de la possibilité de passer d'une langue à une autre langue, tout en reconnaissant ses limites.'' (Practicians, lexicographers evidently part from the possibility to pass from one language to another, fully recognizing its limitations) (Messelaar, 1990, p.18).
Next: Lexical Gap Filling