m ||||| 11111151111411! l!!! m1 11 1111111111

||||||m|||||m | | | 11111151111411!l!!!m1 11 1111111111
United States Patent [19]
[11] Patent Number:
Tsuzuki
[45]
[54]
NEAR-SYNONYM GENERATING METHOD
_
_
__
_
_
Date of Patent:
3-015980
[75] Inventor: Kouichi Tsuzuki, Kawasaki, Japan
5,469,355
1/1991
Nov. 21, 1995
Japan .
Primary Examiner_G0pa1 C_ Ray
_
Assistant Examiner—Xuong M. Chung-Trans
[73] Assignee. FlljltSll Llnuted, Kawasaki, Japan
Attorney, Agent, 0,. Firm_Staas & Halsey
[21] Appl. N0; 115,327
[57]
[22] Filed:
sep- 2’ 1993
[30]
Foreign Application Priority Data
A near-synonym generating method generates near-syn
onyrns of a target character string by retrieving a near
ABSTRACT
synonym ?le based on the target character strmg, where the
NOV. 24, 1992
Japan .................................. .. 4-312531
[51] Int C16
[52] Us' Ci """"""""""""""""
' '
_
' """"""""""""""" "
'
[58] Fleld 02222311
G06F 19/00
364/419 04_
rality of words. The near-synonym generating method
includes the steps of (a) retrieving the near-synonym ?le
’ 364/419‘ 14
'
using words which form the target character string as keys,
and extracting near-synonyms which are de?ned for each of
3124/
9'
’
'
’
'
nealzsynonym ?le de?nes neapsynonyms for one or a plu
the words used as the keys as the near-synonyms for each of
’
'
’395
the words forming the target character string, (b) forming a
near-synonym group from each of the words forming the
5,168,533
.
References Clted
US. PATENT DOCUMENTS
5/1983 R
b
al
364/419 13
9/1989 2:33am!“ et ' """""" " 364/419'13
6/1989 Deemest'é'r'gt'gi“““ "
364/4l9'l3
12/1992 Kato et a1. ......
I: ..... .. 38254
5,297,039
3/1994 Kanaegami et a1. ............. .. 364/419.13
[56]
4 384 329
4’773’039
4’839’853
target character string and the corresponding near-synonyms
so as to form a plurality of such near-synonym groups, and
selecting the words or near-synonyrns from each of the
near-synonym groups, and (c) generatmg the near-synonyms
of the target character string by combining the selected
words or near-synonyms obtained by the step (b) in an order
which is different from the “def 0f the “Ids forming the
mfg“ ChaIacler Strmg
FOREIGN PATENT DOCUMENTS
2-129756
5/1990
12 Claims, 18 Drawing Sheets
Japan .
NEAR SYNONYM GENERATOR
f8
2~
E; *r E] E] El
3 ,M
DIV/DER
5
/ ,
CHARACTER STRING
\\
i
REPLACING
CHARACTER
NEAR- SYNONYM
PROCESSOR
a
6 /
I
NEAR-SYNONYM
HIERARCH/CAL DEFINITION
(__ _ __ _ _ _'
[Z]
/ i \
-
PROCESSOR
,_
7
4 a
ADDING
—___“___
\\ NEAR-SYNONYM
PROCESSOR
\ RETRIEVAL PROCESSOR
gig/555E”
a
NEAR-SYNONYM
FILE
__ _ .___ - _. -_
MISSING
\
if
_]
I<_"_~——__
TARGET
l STRING
*
9
US. Patent
Nov. 21, 1995
Sheet 4 of 18
DATA LIBRARY
H
N
PROCESSING UNIT
DATA
5,469,355
RETRIEVAL
FILE
3 ’\ cHARAcTER
Z12
L”
STRING DIV/DER
“ \I
( IO
RETRIEVAL PROCESSOR
REPLACING
NEAR-SYNONYM
f I
TAR
f 4
RRocEss0R
r
5
~_
MISS/N6
CHAgECTER
NEAR-SYNONYM 5
STRING
PRocEssoR
/
9
~/
__ GENERATED
ADDING
NEAR SYNONYM
RRO cEs50 R
NEARSYNONYN
FILE
REsuLT
7
A,
NEAR-SYNONYM cENERAToR
13
Q
I
R TRIEvAL
LIE/T
/
RRocE
58IN6
'5 N PROCESSING uNITIcRu/MEMoRY
14
RETRIEVAL
REsuLT
US. Patent
Nov. 21, 1995
Sheet 5 of 18
5,469,355
FIG.5A
DATA ITEM MANAGEMENT FILE
ITEM NAME
HOLIDA Y/APPLlCATION/MEMBER / NUMBER
HOLIDAY/ REPORT/EMPLOYEE/NUMERAL
VACATION/ NOTICE/MEMBER INUMBER
VACATION lREPORT/MEMBER/ N0 .
EMPLOYEE/ NUMBER
EMPLOYEE/NAME
FIG. 5B
l4
DIFFERENT SOUND/N6 SYNONYM
CANDIDATE LIST
/
Q) HOLIDAY/APPLICATION/MEMBER/NUMBER HOLIDAY/REPORT/EMPLOYEE/NUMERAL
VACATION /NOTICE/MEMBER/NUM8ER
VACATION lHEWRT/MEMBER / No .
EMPLOYEE INUMBER
MEMBER/HOLIDAY /A PPLlCATlON/NUMBER
@ HOLIDAY/REPORT/EMPLOYEE/NUMERAL
HOLIDAY /APPLlCATION/MEMBER/NUMBER
VA CATION/NOTICE/MEMBER / NUMBER
VACATION/ REPORT/MEMBER/ Na .
EMPLOYEE /N UMBER
MEMBER /HOL I DAY/A PPL ICATION/NUMBER
US. Patent
' Nov. 21, 1995
Sheet 6 of 1s
FIG. 6A
PERMITTED
,
5,469,355
FIG. 6B
PERMITTED
PERMITTED
PERMITTED
PERM/ TTE
FIG. 60
NOT @ NOT
PERMITTED
\
PERMITTED
FIG. 60
NOT
PERMITTED /
/
;;;
INOT
\
NOT
\ PERMITTED
\
‘PERMITTED \
1
\
US. Patent
Nov. 21, 1995
5,469,355
Sheet 7 of 18
FIG. 7
5/\@i3
SPECIFY TARGET CHARACTER STRING
NO
[S4
RETRIEVE KEYWORS DEFINED
IN NEAR SYNONYM FILE
DIV/DE TARGET CHARACTER
STRING INTO KEYWORDS
DIV/DE GIVEN TARGET
CHARACTER STRING BY
DEPENDING ON END SYMBOLS KEYWORDS RETRIEVED
QILOEIW NEAR SYNONYM
56x
,
RETR/EVE NEAR SYNONYMS FROM
NEAR SYNONYM FILE FOR EACH
OF DIV/DED KEYWORDS
REGARD D/VIDED KEYWORDS 8r RETR/EVED
NEAR-SYNONYMS AS I NEAR SYNONYM
GROUP
US. Patent
Nov. 21, 1995
Sheet 8 of 18
5,469,355
FIG.8
@
EXTRACT WORD FROM EACH
NEAR-SYNONYM GROUP ASI
CANDIDATE SCOMBINE SUCH WORDS
S9
COMB/NATION SHOULD
HAVE DISTANCE?
S//
S IO 1
/
RETRIEVE COMBINED
CHARACTER STRING AS
RETR/EVE COMB/NED CHARACTER
STRING AS CHARACTER STRING
i‘dlzlggAggEgFigRél/?égRs .
IVN WHICH KEYWORDS OR THE/R
THE/R NEAR SYNONYMS
EAR- SYNONYMS CONTINUE
8: OTHER WORDS
IN-BETWEEN
PICK UP NEAR-SYNONYMS WITH RESPECT
TO TARGET CHARACTER STRING
,
[5/3
DISPLAY IN ORDER FROM NEAR-SYNONYM
MOST SIMILAR TO TARGET CHARACTER
STRING
END
US. Patent
Nov. 21, 1995
Sheet 9 of 18
5,469,355
FIG. 9A
TARGET CHARACTER STRING
(Japanese
U.$./PRE$IDENT/CAND/DATE
Kanji character for'rice')
FIG. 98
/8A
r- NEAR SYNONYM FILE
RICE’ SASANISHIKI , KOSHIHIKARI
AMERICA-“US. , U.S.A., UNITED STATES, STATE OF TEXAS
U.S. PRESIDENT~ REAGAN, BUSH
CANDIDATE"'ZE)C(OMTA_IEERI\A‘DED,
RECOMMENDATION, SELF-"RECOMMENDATION, ELELTIOMRUN,
E T
PRESIDENTIAL CA NDIDATE‘BUSH, CLINTON
FIGJO
RETRIEVAL RESULT
CANDIDATEDORECOMMENDED BY PRIME MINSTER MIYAZAWA,
APPEALED STIMULATING DEMANDS FOR KOSHIHIKARI,AND
VIEWING THE ELECTION OFTHE UPPER HOUSE FOR
THE NEXT TERM
U.S. Patent
Nov. 21, 1995
Sheet 11 of 18
5,469,355
TARGET CHARACTER smmc ———
SEASON/N65 / 557-
w/B
NEAR SYNQNYM FILE
/ 8B
$EA$ON/NG*5ALAD OIL /FLOUR/DRIED EON/T0 /SOY SAUCE
SET
--'
COMBINATION/PACKA GE/GlFT/PACKED /8AG
FIG. [2C
-RETRIEVAL OF SIMILAR GOODS -
COMBINATION OF SEA SON/N65
SET OF SALAD OILS
GIFT PACKAGE OFFLOURS
PACKAGE OF DRIED BONITTD
COMBINATION OF 80)’ SAUCES
9B
US. Patent
Nov. 21, 1995
5,469,355
Sheet 13 of 18
FIG. I4A
TAGET CHARACTER STRING
YY COMPANY/C BUILDING
FIG. I4B
DESTINATION INFORMATION DEFINITION FILE
YY COMPANY-YY COMPANY LIMITED
I MIN. WALK FROM STATION A-I ST BRANCH
OPPOSITE 8 BUILDING ~>2ND BRANCH
NEXT TO 8 BUILDING —- 3RD BRANCH
NEARB BUILDING-*Z?'D/BRD BRANCH
5TH FLOOR OFC BUILDING~>3RD BRANCH
C BUILDING ‘*3RD BRANCH
NEAR SCHOOL D *4 TH BRANCH
FIG. I4C
GENERATED RESUL T
YY COMPANY LIMITED 3RD BRANCH
A 95
BUILDING MANAGEMENT DATA BASE
YY COMPANY LIMITED
3RD
BRANCH
A CITY cAvEIvuE
I
-
I
-
I
NEXT TO 8 BUILDING
5TH FLOOR OF
C BUILDING
5 MIN. WALK FROM
STATION
TEL. No,
/ I26
US. Patent
Nov. 21, 1995
Sheet 15 of 18
5,469,355
FIG. 16A
TARGET CHARACTER STRING
DATA COMMUNICATION
m. / 0
FIG. [6B
SIMILAR DOCUMENT DEFINITION FILE
/
80
DATA ‘PERSONAL COMPUTER, ANALOG, COMMAND, AUDIO; '
COMMUNICATIONfTRANSMIT, NETWORK, PROTOCOL, MAIL - - -
FIG. 16C
/ 14D
DOCUMENT LIST
QUESTIONS 8 ANSWERS TO DATA TRANSMISSION
INTRODUCTION TO TE L EPHONE3
DATA COMMUNICATION HANDBOOK
DIGITAL SWITCHING
‘
DATA SWITCHING NETWORK CATALOG
DATA COMMUNICATION PROTOCOL
P5 ' 232C COMMUNICATION HAND BOOK
osr
MULTI-HYPER
Ar COMMANDS a APPL IcA TIONS
INFOMATION COMMUNICATION PROTOCOL
ACCOUNTING SYSTEM OF vAN
AuDID MAIL SERVICE
COMPUTER COMMUNICATION
DATA COMMUNICATION TECHNIQUE SEMINAR
US. Patent
Nov. 21, 1995
Sheet 16 0f 18
5,469,355
FIG. [7A
TARGET CHARACTER STRING
JU GYOU IN BANGOU
FIG. [7B
NEAR SYNONYM DEFINITION TABLE
85
/No.
FIG. 17C
GENERATED RESULT
US. Patent
Nov. 21, 1995
Sheet 17 of 18
5,469,355
FIG. [8A
TARGET CHARACTER STRING
Arsugi-Sh/ glass repair
r» /F
FIG. 18B
NEAR—SYNONYM DEFITION FILE
/ 8F
Arsugi-Shi-A rsugi
G10 ss —- stained glass , p‘ane, window frame,wind0w
Repair ’ Work, induSf/‘Y, factory, mater/01,9109
FIG. 18C
LIST OF RETRIEVED SHOP NAMES
r
/ [4F
(TEL. NO.)
(ADDRESS)
ATSUGI GLASS
XX - X x X X
OOAvswusl-l-l
ATSUGI STA/NED GLASS STUD/O
X X - X X X X
A AA 870-5
ATSUGI WINDOW FRAME INC.
X X .. X X X X
[1113:1213
ATSUGI AAGLASS
XX - X X X X
XX 2-I6
AAGLASS SHOP
xx —xx xx
02-16
AAAPANE SHOP
XX-XXXX
AAAZ'Z-Z
A A PANE MATERIALS
X X - x X X x
05-4!
A APANE SHOPILTD.)
-X X _ X X X x
‘52-2
XX - X X X X
X1234
(A)
XX - X X xx
000 3-3
A ASPECIAL GLASS INDUSTRIES (B)
X x - XX XX
X X X4-4
WINDOW FRAME INDUSTRIESAAA
A ASPECIAL GLASS INDUSTRIES
L
J
5,469,355
1
2
NEAR-SYNONYM GENERATING METHOD
member No.”.
On the other hand, according to the conventional docu
BACKGROUND OF THE INVENTION
ment retrieval method which uses a combination of the
The present invention generally relates to near-synonym
generating methods, and more particularly to a near-syn
onym generating method which divides a character string
near-synonyms that are extracted for each of the words
5
obtained by dividing the target character string, the extrac
tion was satisfactory to a certain extent depending on the
de?nition of the near-synonyms for each of the words.
However, there was a problem in that it was impossible to
extract a character string in which the words and a part of
their near-synonyms of the target character string are miss
which is to be retrieved into words and generates near~
synonyms of the character string by combining near-syn
onyms which are extracted for each of the words.
The generation of near-synonyms is essential when
retrieving various electronic documents with a high accu
racy. The “near-synonym” is sometimes also referred to as
a “quasi-synonym”. The near-synonym of a certain word
ing, a character string which is added with one or more
words unrelated to the target character string, a character
string having the words or near-synonyms arranged in a
refers to a word which has the same or similar meaning as 15 different order from that of the target character string and the
like. For example, in the example shown in FIG. 1, it was
the certain word. The generation of near-synonyms is par—
ticularly eifective when matters related to a certain theme are
to be retrieved from a large scale database without omission.
An example of a conventional document retrieval using
near-synonyms will be described with reference to FIG. 1. In
FIG. 1, it is assumed for the sake of convenience that a
20
character String which is to be retrieved (hereinafter simply
referred to as a “target character string”) is “holiday/appli
cation/member/number”. A predetermined electronic docu
ment is retrieved using this target character string, and
retrieval method which uses the combination of the near
synonyms of the words, there was a problem in that the
25
near-synonyms of the target character string within the
electronic document are extracted. In the electronic docu
ment, a plurality of near-synonyms are included in the target
character string as shown in FIG. 1.
In this case, the near-synonyms which are extracted as a
result of the retrieval were conventionally the same charac
ter string as the target character string and the character
String “holiday application member number” having a head
which matches that of the target character string.
There is also another known document retrieval method
impossible to extract “employee number” which is missing
a part corresponding to “holiday” and “application”, and
“member holiday application number” in which a part
corresponding to “holiday”, “application” and “member” is
ordered diiferently from the target character string.
In addition, according to the conventional document
35
which carries out the retrieval as follows. That is, the target
operator must manually insert in the target character String
an end symbol (character) “I” at the end of each word when
the target character string is divided into the words. In other
words, the operator must have a knowledge related to the
words and the near-synonyms. In addition, if the target
character string is long and contains a large number of words
or, a large number of target character strings need to be
input, the process of inserting the end symbol is troublesome
and a big burden on the operator.
Therefore, the conventional document retrieval method
realized a satisfactory document retrieval only to a certain
extent, and the result of the extraction often omitted the
necessary near-synonyms, as may be seen from the
character string “holiday application member number” is
divided into words “holiday”, “application”, “member” and
examples given above. For this reason, a highly accurate
document retrieval could not be achieved by the conven
tional document retrieval methods. In other words, the
conventional generation of the near-synonyms was unsuited
or insu?icient for the purposes of carrying out a retrieval
“number” which form this target character string, and the
near-synonyms are extracted for each of these words. For
example, near-synonyms “report”, “employee” and
“numeral” are respectively extracted as the near-synonyms
with a high accuracy.
of the words “application”, “member” and “number”. Such
‘
near-synonyms are de?ned in advance for each word. The
electronic document is retrieved using a character string
45
“holiday report employee numeral” which is obtained by
SUMMARY OF THE INVENTION
Accordingly, it is a general object of the present invention
combining the extracted near-synonyms, and a character
string which is the same as this character string is extracted
as the target character string of the near-synonyms.
to provide a novel and useful near-synonym generating
method in which the problems described above are elimi
nated.
Another and more speci?c object of the present invention
is to provide a near-synonym generating method for gener
Each element forming the target character string is called
a “word”, and a character string which is made up of a
plurality of words is called a “compound word”.
According to the conventional document retrieval meth
ating near-synonyms of a target character string by retriev
ing a near-synonym ?le based on the target character string,
ods described above, a character string which is the same as 55 where the near-synonym ?le de?nes near-synonyms for one
the target character string and a character string having a
or a plurality of words and the near-synonym generating
head which matches that of the target character string were
method comprises the steps of (a) retrieving the near
extractable as near-synonyms.
synonym ?le using words which form the target character
However, there was a problem in that it was impossible to
string as keys, and extracting near-synonyms which are
extract a character string having a head (or a part of the head) 60 de?ned for each of the words used as the keys as the
which does not match that of the target character string, a
near-synonyms for each of the words forming the target
character string (different sounding synonyms) having com
character string, (b) forming a near-synonym group from
pletely different words and phrases (sounds) from those of
the target character string but having the same meaning as
the target character string and the like. For example, in the
example shown in FIG. 1, it was impossible to extract
“vacation notice member number” and “vacation report
each of the words forming the target character string and the
65
corresponding near-synonyms so as to form a plurality of
such near-synonym groups, and selecting the words or
near-synonyms from each of the near-synonym groups, and
(c) generating the near-synonyms of the target character