Emelyanov G., Mikhailov D.

253
FORMALIZATION OF THE WORD'S LEXICAL MEANING IN A
PROBLEM OF RECOGNITION OF NATURAL LANGUAGE'S
STATEMENTS'S SYNONYMY'S SITUATIONS1
G. M. Emelyanov2, D. V. Mikhailov2
2 Yaroslav-the-Wise
Novgorod State University,
173003, Russia, Velikii Novgorod, ul. Bol'shaya St. Petersburgskaya, 41, tel.: (8162)627940,
e-mail : [email protected] (G.M. Emelyanov), [email protected] (D.V. Mikhailov)
The approach to formalization of semantic correlations between a lexeme and its lexical
correlates in a problem of synonymy's situations's recognition is represented. The
synonymy's situations are described on the basis of standard Lexical Functions. In this
paper a principles of the Word's Lexical Meaning's theory's independent descriptions's
generalization are represented.
Introduction
Application of the device of standard Lexical
Functions (LF) within the frameworks of the
“MeaningText” approach can solve a
problem of the Natural Language's (NL)
statements's synonymy's proof on the basis of
final set Rls of correctly formalizable rules of
transformations of Deep Syntactic Structures
(DSS) [4, 5]. Nevertheless, significant
difficulty at realization of Rls is a
formalization of conditions of rules's
applicability. For rl  Rls the condition
r rl  is a set of requirements to syntactic and
semantic properties of the lexical blocks
replaced by rl .
Let's consider a problem of the statements's
LF-synonymy's proof as a classical problem of
Pattern Recognition (PR). All set L of NLstatements's pairs, between which the LFsynonymy's
establishment
is
possible
(concerning Rls ), there is an initial set of
classified objects. Then statements's pairs's
LF-synonymy's
demonstrability
lL
concerning the fixed rl  Rls will be the basis
for grouping them into one taxon. Thus r rl 
represents itself as precedent as a typical
representative of the taxon rl . The
formulation of PR’s problem: the new
pair l  L , which not participated in taxonomy
is shown. It is required to analyse rl l  and to
recognize a class's pattern rl  Rls , to which
an object l is the most similar. The problem
statement : to develop a program-realizable
representation of r rl  by means of revealing
a character of semantic correlations between a
lexeme and its lexical correlates for the basic
types of Lexical Functions.
By virtue that sense's redistribution actual for
the formalization of r rl  is characteristic for
situations with the parametric LFs, the PR's
problem given above should be formulated as
recognition of the semantic relation which is
set by splintered value. Revealing and
generalization of the given relation has direct
analogy to the description of Noun Phrases's
semantics [1]. Thus for the Lexical Meanings
(LM) of words replaced by rl are under
construction the formalized descriptions in a
kind of theories - a sets of meaning postulates,
connecting each of replaced words with in
other words and concepts. Nevertheless, at
independent construction of the theory of one
word by different researchers there is a
problem of generalization of knowledge
received thus. The given problem is especially
_______________________________________________________________________
1
This work is financially supported by RFBR (project №06-01-00028) and by ESIC of NovSU.
254
actual at construction of theories on the basis
of LMs's NL-definitions with application of
standard conceptual languages [2].
Decision methods
Let for Lecij  l j , l j  L we have the
description of its Lexical Meaning's theory by
means of compound object of Prolog
language:
lmthLec _ j _ i,Var _ Smth, Re l _ list 
(1)
which describes a set of binary relations Re l
between concepts Cncpt1 and Cncpt 2 :
rel 2Re l, Cncpt1, Cncpt 2 , and
(2)
recursively defined relations of arbitrary arity :
rel 2 _ complexRe l , Cncpt, Re l _ list  and
rel _ complexRe l, Re l _ list 
(3)
(4)
by means of a list Re l _ list of structures of a
kind (2), (3) and (4)).
LM given by means of (1) is a denotation to
which in logic is put in conformity an
extension [3] as a class of entities, defined by
(1). Inasmuch as for Lecij its sense (or
intension, [3]) from the philosophical point of
view is a network of relations between Lec ij
and other words Lec mk : Lecij  Lecmk , the
sense of a lexeme can be defined by a set of
functions which are set by statements of a kind
(2), (3) and (4) in structure of theories. These
functions characterize the concepts designated
by Lec ij . Following the terminology accepted
in [3], we shall name such functions by
Characteristic Functions (ChF) for a set of
Lexical Meanings. Thus, as shown by us in
[1], each of them can be set both by a separate
statement, and their group.
At use of a structure (1) for the description of
the theory of LM Lec _ j _ i a value of each of
the specified functions will be equal to the
third argument Cncpt 2 _ mng _ fn of the
relation in some statement of a kind (2). And
Cncpt 2 _ mng _ fn should be a designation of
concept known to system (this concept is
identified with the Semantic Class (SCl) of
some word). To a name of ChF there will
correspond the first argument Re l _ name _ fn
of the first statement of a kind (2) or (3), being
a designation of a known SCl (this SCl should
define a relational noun), at back viewing the
list Re l _ list of statements of a kind (2) for
the given Lec ij (here as the Re l _ list there
can be a list the third argument of the
statement of a kind (3), containing the
Re l _ name _ fn by the first argument) from
the statement with the Cncpt 2 _ mng _ fn
mentioned above as the third argument
(formed at such viewing the Re l _ list the list
in the further reasonings we shall designate as
Re l _ list _ fn ,
Re l _ list _ fn  Re l _ list ).
On a place of the second argument of the
statement with Re l _ name _ fn necessarily
there should be a variable Var _ Smth
designating a word, interpreted by means of
(1). Each next statement in list Re l _ list _ fn
should is obligatory to have at least one
common argument, which is a designation of
some variable, with the previous statement.
According to the definition of sense
formulated in [3] as intension, externally
various descriptions (1) of theories of the same
LM give a common set of ChFs mentioned
above. Finally they define an intension for the
generalized theory of considered LM.
Proceeding from definition of intension as a
function from the possible worlds to
extensions [3], and also the recursive nature of
meaning postulates, let's set the task of
construction of the generalized theory of the
given LM on the basis of independently
received variants of theories of this LM as
restoration of syntactic representation [3] of
extension on the basis of known syntax of expressions for the ChFs which are making an
intension and written down by set of
statements of a kind (2), (3) and (4). We have
a ternary relation I  G  M  W between :
255
 a set of objects G which correspond to
variants lmthioj of definition of the LM


Lec ij in the form of (1), G  lmthioj ;
 a set of attributes M which correspond to
Cncppt 2 _ mng _ fnipoj
values
of
Characteristic Functions for lmthioj  ;
 a set W of attribute values. In our task each
wW is a name Re l _ name _ fnipoj of ChF
which value belongs to the M .
A relation I can be considered as a binary
relation
I1  lmthioj , Cncpt 2 _ mng _ fnipoj  W .
According to the Basic Theorem of Formal
Concept Analysis (FCA) [4] proved by G.
Birkhoff that for any binary relation it is
possible to construct a “complete’ lattice
appears an opportunity to apply the
mathematical device of FCA to our problem.
With the respect of a complex character of
postulates of a kind (3) and (4) we shall
expand a set M of formal attributes by first
arguments Re l _ from _ Arg iqpoj of statements
of a kind (2), being an elements of
Re l _ list _ fnipoj for the given variant lmthioj
of the theory lmthi j (let's designate the
resulted set as M 1 , and the extended thus a set
of Formal Concepts (FC) – as G1). As well as
Cncpt 2 _ mng _ fnipoj ,  Re l _ from _ Arg iqpoj
should be a designation of Semantic Class
known to system. Besides, as a rule, with
Re l _ from _ Arg iqpoj , associate some relation
set
by
noun
which
names
a
qpoj
Re l _ from _ Arg i .
Thus
 Re l _ from _ Arg
qpoj
i
will
(actually)
lmth ,
, Re l _ name _ fn  . A
oj
i
characterize the FC set by pair
Cncpt 2 _ mng _ fn
poj
i
poj
i
value of attribute Re l _ from _ Arg iqpoj will be
equal to third argument Cncpt 2 _ mng _ fniqpoj
of the first statement of a kind (2) in the list
Re l _ list _ fnipoj (at direct viewing of this list),
Cncpt 2 _ mng _ fniqpoj should be a designation
of Semantic Class known to system. Search of
such statement and formation of the
corresponding sublist of list Re l _ list _ fn is
carried out by analogy to formation directly
Re l _ list _ fn .
By introduction in a consideration of a multi
valued context :
K  G1, M1,W , I 
(5)
on the set G determine a relation known in the
theory of the FCA as a ‘subconceptsuperconcept” [4] relation. Besides for any
subset of objects from G the Least Common
Superconcept (LCS) and Greatest Common
Subconcept (GCS) can be set. Thus a set of the
objects
connected
by
“subconceptsuperconcept” relation with one GCS and/or
with one LCS, it is necessary to consider as
area. There in a role of LCS and of GCS can
be, accordingly, the top concept and the
bottom concept of lattice [4]. In this paper for
areas we put forward the requirement of
uniqueness both GCS, and LCS. A context (5)
can be visually represented (fig. 1) by
application of the specialized Software
ToscanaJ
(http://toscanaj.sourceforge.net)
which realize a methods of FCA.
Fig. 1. LM's definitions for Russian word “агрессор”
Using a definition introduced above for an
area of a lattice with reference to elements of
LM's definition of the given Lec ij let's define
formally a key rule for generalization of
statements of theories (1).
Two compared statements of a kind
and
rel 2 _ complexRe l, Cncpt, Re l _ list1
rel 2 _ complexRe l , Cncpt, Re l _ list 2 with
coincident first and second arguments will be
in a resulted theory a one statement of a kind
(3) with a third argument which includes the
statement (4) what unites the statements from
256
lists Re l _ list 1 and Re l _ list 2 by “or”
relation at fulfilment of a following condition.
A sets of the FCs got on the basis of
Re l _ list 1 and Re l _ list 2 , should form in a
lattice for (5) an areas with LCS which has
Re l as an attribute value. In an example in a
fig.1 an “or” relation will correspond to the
following pairs of FCs :
 (“Definition2_of_aggressor”,
“Definition3_of_aggressor”);
 “Definition1_of_aggressor” and LCS for the pair
(“Definition2_of_aggressor”,
“Definition3_of_aggressor”).
Two compared statements of a kind
and
rel 2 _ complexRe l, Cncpt, Re l _ list1
rel 2 _ complexRe l , Cncpt, Re l _ list 2 will be
in a resulted theory a one statement of a kind
(3) with a third argument which includes the
statement (4) what unites the statements from
lists Re l _ list 1 and Re l _ list 2 by “and”
relation at fulfilment of a following condition.
Statements of the lists Re l _ list 1 and
Re l _ list 2 describe the same FC of (5) but by
means of different ChFs. In an example in a
fig.1 a told is related to an intent (as a set of
formal attributes,
[2]) of the FC
“Definition1_of_aggressor” and to an intent of the
LSC for the pair (“Definition2_of_aggressor”,
“Definition3_of_aggressor”).
The stated principles of generalization of
statements of a kind (3) are applicable for
statements of any complexity from among
entering into the third argument of statements
(3) and recursively defined on the basis of (3)
and (4). Thus whereas a capacity n of a set of
ChFs corresponding required extension, does
not depend on quantity k of generalized
theories, a computing complexity of
generalization's process of the given LM's
theories depends exclusively from n and
k
n
amounts O  (at worst n it is equal to
k
quantity of statements of a kind (2) and (3) at
all levels of the description of LM by means of
Fig.2. The generalized theory of LM for Russian word “агрессор”
k
n
(1)). As k  1,, n , O   n under k  1
k
k
n
and O   1 under k  n .
k
Experimental approbation
The offered technique of generalization of
theories (1) has been approved in Visual
Prolog 5.2 environment on a material of
independent lexicographic definitions for the
LM of Russian word “агрессор”. Variants of
definitions are taken from the Big Soviet
Encyclopedia and the thematic dictionary
“War and peace” on http://slovari.yandex.ru
and also in [4]. The generalized theory of LM
for “агрессор” is represented in a fig.2.
The perspectives of further researches are
related with sharing the approach offered in
257
the present paper and methods of
generalization of predicates on the basis of the
truth's sets [1].
References
1. Mikhailov D.V., Emelyanov G.M. Model of
language's sorts's system in a problem of a
statement's semantic pattern's construction at a level
of deep syntax // Taurian Herald for Computer
Science and Mathematics. - 2006. - №1. - P.79-90
(in Russian).
2. Emelyanov G.M., Kornyshov A.N., Mikhailov D.V.
Conceptually-situational modeling of process of
synonymic transformation of the Natural Language
statements as machine learning on the basis of
precedents // Scientific-theoretical magazine
“Artificial intelligence”. - 2006. - №2. - P.72-75 (in
Russian).
3. Gerasimova Irena. A. Formal grammar and
intensional logic // Moscow : Russian Academy of
Science, Institute of Philosophy, 2000 (in Russian)
4. Ganter B. and Wille R. Formal Concept Analysis Mathematical Foundations // Berlin: SpringerVerlag, 1999.
5. Igor A. Mel'cuk, Alexander K. Zholkovsky.
Explanatory Combinatorial Dictionary of Modern
Russian. Semantico-Syntactic Studies of Russian
Vocabulary // Wiener Slawistischer Almanach,
Sonderband 14, Wienna 1984.