253 FORMALIZATION OF THE WORD'S LEXICAL MEANING IN A PROBLEM OF RECOGNITION OF NATURAL LANGUAGE'S STATEMENTS'S SYNONYMY'S SITUATIONS1 G. M. Emelyanov2, D. V. Mikhailov2 2 Yaroslav-the-Wise Novgorod State University, 173003, Russia, Velikii Novgorod, ul. Bol'shaya St. Petersburgskaya, 41, tel.: (8162)627940, e-mail : [email protected] (G.M. Emelyanov), [email protected] (D.V. Mikhailov) The approach to formalization of semantic correlations between a lexeme and its lexical correlates in a problem of synonymy's situations's recognition is represented. The synonymy's situations are described on the basis of standard Lexical Functions. In this paper a principles of the Word's Lexical Meaning's theory's independent descriptions's generalization are represented. Introduction Application of the device of standard Lexical Functions (LF) within the frameworks of the “MeaningText” approach can solve a problem of the Natural Language's (NL) statements's synonymy's proof on the basis of final set Rls of correctly formalizable rules of transformations of Deep Syntactic Structures (DSS) [4, 5]. Nevertheless, significant difficulty at realization of Rls is a formalization of conditions of rules's applicability. For rl Rls the condition r rl is a set of requirements to syntactic and semantic properties of the lexical blocks replaced by rl . Let's consider a problem of the statements's LF-synonymy's proof as a classical problem of Pattern Recognition (PR). All set L of NLstatements's pairs, between which the LFsynonymy's establishment is possible (concerning Rls ), there is an initial set of classified objects. Then statements's pairs's LF-synonymy's demonstrability lL concerning the fixed rl Rls will be the basis for grouping them into one taxon. Thus r rl represents itself as precedent as a typical representative of the taxon rl . The formulation of PR’s problem: the new pair l L , which not participated in taxonomy is shown. It is required to analyse rl l and to recognize a class's pattern rl Rls , to which an object l is the most similar. The problem statement : to develop a program-realizable representation of r rl by means of revealing a character of semantic correlations between a lexeme and its lexical correlates for the basic types of Lexical Functions. By virtue that sense's redistribution actual for the formalization of r rl is characteristic for situations with the parametric LFs, the PR's problem given above should be formulated as recognition of the semantic relation which is set by splintered value. Revealing and generalization of the given relation has direct analogy to the description of Noun Phrases's semantics [1]. Thus for the Lexical Meanings (LM) of words replaced by rl are under construction the formalized descriptions in a kind of theories - a sets of meaning postulates, connecting each of replaced words with in other words and concepts. Nevertheless, at independent construction of the theory of one word by different researchers there is a problem of generalization of knowledge received thus. The given problem is especially _______________________________________________________________________ 1 This work is financially supported by RFBR (project №06-01-00028) and by ESIC of NovSU. 254 actual at construction of theories on the basis of LMs's NL-definitions with application of standard conceptual languages [2]. Decision methods Let for Lecij l j , l j L we have the description of its Lexical Meaning's theory by means of compound object of Prolog language: lmthLec _ j _ i,Var _ Smth, Re l _ list (1) which describes a set of binary relations Re l between concepts Cncpt1 and Cncpt 2 : rel 2Re l, Cncpt1, Cncpt 2 , and (2) recursively defined relations of arbitrary arity : rel 2 _ complexRe l , Cncpt, Re l _ list and rel _ complexRe l, Re l _ list (3) (4) by means of a list Re l _ list of structures of a kind (2), (3) and (4)). LM given by means of (1) is a denotation to which in logic is put in conformity an extension [3] as a class of entities, defined by (1). Inasmuch as for Lecij its sense (or intension, [3]) from the philosophical point of view is a network of relations between Lec ij and other words Lec mk : Lecij Lecmk , the sense of a lexeme can be defined by a set of functions which are set by statements of a kind (2), (3) and (4) in structure of theories. These functions characterize the concepts designated by Lec ij . Following the terminology accepted in [3], we shall name such functions by Characteristic Functions (ChF) for a set of Lexical Meanings. Thus, as shown by us in [1], each of them can be set both by a separate statement, and their group. At use of a structure (1) for the description of the theory of LM Lec _ j _ i a value of each of the specified functions will be equal to the third argument Cncpt 2 _ mng _ fn of the relation in some statement of a kind (2). And Cncpt 2 _ mng _ fn should be a designation of concept known to system (this concept is identified with the Semantic Class (SCl) of some word). To a name of ChF there will correspond the first argument Re l _ name _ fn of the first statement of a kind (2) or (3), being a designation of a known SCl (this SCl should define a relational noun), at back viewing the list Re l _ list of statements of a kind (2) for the given Lec ij (here as the Re l _ list there can be a list the third argument of the statement of a kind (3), containing the Re l _ name _ fn by the first argument) from the statement with the Cncpt 2 _ mng _ fn mentioned above as the third argument (formed at such viewing the Re l _ list the list in the further reasonings we shall designate as Re l _ list _ fn , Re l _ list _ fn Re l _ list ). On a place of the second argument of the statement with Re l _ name _ fn necessarily there should be a variable Var _ Smth designating a word, interpreted by means of (1). Each next statement in list Re l _ list _ fn should is obligatory to have at least one common argument, which is a designation of some variable, with the previous statement. According to the definition of sense formulated in [3] as intension, externally various descriptions (1) of theories of the same LM give a common set of ChFs mentioned above. Finally they define an intension for the generalized theory of considered LM. Proceeding from definition of intension as a function from the possible worlds to extensions [3], and also the recursive nature of meaning postulates, let's set the task of construction of the generalized theory of the given LM on the basis of independently received variants of theories of this LM as restoration of syntactic representation [3] of extension on the basis of known syntax of expressions for the ChFs which are making an intension and written down by set of statements of a kind (2), (3) and (4). We have a ternary relation I G M W between : 255 a set of objects G which correspond to variants lmthioj of definition of the LM Lec ij in the form of (1), G lmthioj ; a set of attributes M which correspond to Cncppt 2 _ mng _ fnipoj values of Characteristic Functions for lmthioj ; a set W of attribute values. In our task each wW is a name Re l _ name _ fnipoj of ChF which value belongs to the M . A relation I can be considered as a binary relation I1 lmthioj , Cncpt 2 _ mng _ fnipoj W . According to the Basic Theorem of Formal Concept Analysis (FCA) [4] proved by G. Birkhoff that for any binary relation it is possible to construct a “complete’ lattice appears an opportunity to apply the mathematical device of FCA to our problem. With the respect of a complex character of postulates of a kind (3) and (4) we shall expand a set M of formal attributes by first arguments Re l _ from _ Arg iqpoj of statements of a kind (2), being an elements of Re l _ list _ fnipoj for the given variant lmthioj of the theory lmthi j (let's designate the resulted set as M 1 , and the extended thus a set of Formal Concepts (FC) – as G1). As well as Cncpt 2 _ mng _ fnipoj , Re l _ from _ Arg iqpoj should be a designation of Semantic Class known to system. Besides, as a rule, with Re l _ from _ Arg iqpoj , associate some relation set by noun which names a qpoj Re l _ from _ Arg i . Thus Re l _ from _ Arg qpoj i will (actually) lmth , , Re l _ name _ fn . A oj i characterize the FC set by pair Cncpt 2 _ mng _ fn poj i poj i value of attribute Re l _ from _ Arg iqpoj will be equal to third argument Cncpt 2 _ mng _ fniqpoj of the first statement of a kind (2) in the list Re l _ list _ fnipoj (at direct viewing of this list), Cncpt 2 _ mng _ fniqpoj should be a designation of Semantic Class known to system. Search of such statement and formation of the corresponding sublist of list Re l _ list _ fn is carried out by analogy to formation directly Re l _ list _ fn . By introduction in a consideration of a multi valued context : K G1, M1,W , I (5) on the set G determine a relation known in the theory of the FCA as a ‘subconceptsuperconcept” [4] relation. Besides for any subset of objects from G the Least Common Superconcept (LCS) and Greatest Common Subconcept (GCS) can be set. Thus a set of the objects connected by “subconceptsuperconcept” relation with one GCS and/or with one LCS, it is necessary to consider as area. There in a role of LCS and of GCS can be, accordingly, the top concept and the bottom concept of lattice [4]. In this paper for areas we put forward the requirement of uniqueness both GCS, and LCS. A context (5) can be visually represented (fig. 1) by application of the specialized Software ToscanaJ (http://toscanaj.sourceforge.net) which realize a methods of FCA. Fig. 1. LM's definitions for Russian word “агрессор” Using a definition introduced above for an area of a lattice with reference to elements of LM's definition of the given Lec ij let's define formally a key rule for generalization of statements of theories (1). Two compared statements of a kind and rel 2 _ complexRe l, Cncpt, Re l _ list1 rel 2 _ complexRe l , Cncpt, Re l _ list 2 with coincident first and second arguments will be in a resulted theory a one statement of a kind (3) with a third argument which includes the statement (4) what unites the statements from 256 lists Re l _ list 1 and Re l _ list 2 by “or” relation at fulfilment of a following condition. A sets of the FCs got on the basis of Re l _ list 1 and Re l _ list 2 , should form in a lattice for (5) an areas with LCS which has Re l as an attribute value. In an example in a fig.1 an “or” relation will correspond to the following pairs of FCs : (“Definition2_of_aggressor”, “Definition3_of_aggressor”); “Definition1_of_aggressor” and LCS for the pair (“Definition2_of_aggressor”, “Definition3_of_aggressor”). Two compared statements of a kind and rel 2 _ complexRe l, Cncpt, Re l _ list1 rel 2 _ complexRe l , Cncpt, Re l _ list 2 will be in a resulted theory a one statement of a kind (3) with a third argument which includes the statement (4) what unites the statements from lists Re l _ list 1 and Re l _ list 2 by “and” relation at fulfilment of a following condition. Statements of the lists Re l _ list 1 and Re l _ list 2 describe the same FC of (5) but by means of different ChFs. In an example in a fig.1 a told is related to an intent (as a set of formal attributes, [2]) of the FC “Definition1_of_aggressor” and to an intent of the LSC for the pair (“Definition2_of_aggressor”, “Definition3_of_aggressor”). The stated principles of generalization of statements of a kind (3) are applicable for statements of any complexity from among entering into the third argument of statements (3) and recursively defined on the basis of (3) and (4). Thus whereas a capacity n of a set of ChFs corresponding required extension, does not depend on quantity k of generalized theories, a computing complexity of generalization's process of the given LM's theories depends exclusively from n and k n amounts O (at worst n it is equal to k quantity of statements of a kind (2) and (3) at all levels of the description of LM by means of Fig.2. The generalized theory of LM for Russian word “агрессор” k n (1)). As k 1,, n , O n under k 1 k k n and O 1 under k n . k Experimental approbation The offered technique of generalization of theories (1) has been approved in Visual Prolog 5.2 environment on a material of independent lexicographic definitions for the LM of Russian word “агрессор”. Variants of definitions are taken from the Big Soviet Encyclopedia and the thematic dictionary “War and peace” on http://slovari.yandex.ru and also in [4]. The generalized theory of LM for “агрессор” is represented in a fig.2. The perspectives of further researches are related with sharing the approach offered in 257 the present paper and methods of generalization of predicates on the basis of the truth's sets [1]. References 1. Mikhailov D.V., Emelyanov G.M. Model of language's sorts's system in a problem of a statement's semantic pattern's construction at a level of deep syntax // Taurian Herald for Computer Science and Mathematics. - 2006. - №1. - P.79-90 (in Russian). 2. Emelyanov G.M., Kornyshov A.N., Mikhailov D.V. Conceptually-situational modeling of process of synonymic transformation of the Natural Language statements as machine learning on the basis of precedents // Scientific-theoretical magazine “Artificial intelligence”. - 2006. - №2. - P.72-75 (in Russian). 3. Gerasimova Irena. A. Formal grammar and intensional logic // Moscow : Russian Academy of Science, Institute of Philosophy, 2000 (in Russian) 4. Ganter B. and Wille R. Formal Concept Analysis Mathematical Foundations // Berlin: SpringerVerlag, 1999. 5. Igor A. Mel'cuk, Alexander K. Zholkovsky. Explanatory Combinatorial Dictionary of Modern Russian. Semantico-Syntactic Studies of Russian Vocabulary // Wiener Slawistischer Almanach, Sonderband 14, Wienna 1984.
© Copyright 2026 Paperzz