REOUESJ`II REQUE/FT PATTERN SECTION ORIGINAL SENTENCE

‘
US005761666A
United States Patent [19]
[11]
Patent Number:
Sakai et a].
[45]
Date of Patent:
[54]
DOCUMENT RETRIEVAL SYSTEM
I
I
I
5,761,666
Jun. 2, 1998
4.382277
5/1983 Glaser et a1. ......................... .. 395/605
5,123.103
6/1992 OhTaki et a1. ........................ .. 395/605
[75] Inventors: Tetsuya Sakai. Tokyo; Seiji Miike;
Kazuo Sumita. both of Yokohama. all
of Japan
Primary Examiner—Paul R. Lintz
Attorney, Agent, or Firm—Oblon. Spivak. Mcclelland.
Maia; & NcustadL RC
[73] Assignee: Kabushiki Kaisha Toshiba. Kawasaki.
Japan
[57]
[21] Appl' No': “(L631
[22]
Filed:
[30]
A document retrieval system is provided with an original
Mar. 4, 1996
.
.
sentence processing unit which sets a plurality of sentence
.
.
.
types for identifying the contents of sentences. such as
Foreign Apphcauon pnonty Data
Mar. 16, 1995
[JP]
“OPINION” and “PROPOSAL.” prepares sentence-unit
Japan .................................. .. 7-083458
excerpt sentence data classi?ed according to the sentence
6
5
types from an original sentence database storing original
""""""""""""""""""
. .
sentence data constituting documents. and stores the excerpt
. ................................. ..
,
,
.,
707/5
Sentence
.
[58]
[56]
ABSTRACT
data as exec
.
t swtcncc database_ The ori .n al
.
. g1.
sentence processlng unit comprises a type determination
Field of Search ...............................
395/605. 601.
395/759 ’ 707/1 ‘ 3 ' 5 ' 100
References Cited
Section for winding ?xccrpt Scmcnce data con?sponding to
a designated sentence type. and a shaping section for shap
ing the excerpt sentence data in a predetermined format. e.g..
in such a format that a conjunctive is deleted.
U.S. PATENT DOCUMENTS
4,290,115
9/1981 Pin a a1. .............................. .. 395/605
RETRIEVAL RETRIEVED
TYPE
REOUESJ'II REQUE/FT
PATTERN SECTION ORIGINAL SENTENCE
I
23M
TYPE
15 Claims, 29 Drawm'g Sheets
EXCERPT SENTENCE+
EKCERPT INTERFACE
~23a
INTERFACE UNIT
8
2T
h
5 22
RETRIEVAL UNIT
EXCERPT UNIT
€5
EXDERPT
aaEaE
5M
OR|G|NAL
SENTENCE
DATA BASE
<- -------------------------- --
""""""""""""""" "
WORD INDEX N7
PATTERN
DETERMINATION
DICTIONARY
:
E
:
v
"'
ORIGINAL SENTENCE
PROCESSING UNIT
~20
US. Patent
Jun. 2, 1998
SI
g2
CPU
DISPLAY
UNIT
INPUT
EXCERPT SENTENCE
DATABASE
ORIGINAL SENTENCE
5,761,666
g3
DISPLAY
CONTROLLER
STORAGE UNIT
5w
Sheet 1 0f 29
CONTROLLER
5
ID
INPUT DNII
R
6‘” DATA BASE
5
7w
H
IIIDRD INDEx
8\,\ TYPE DETERMINATION
DICTIONARY
9 \q?
CONJUNCT IVE
D I CT I ONARY
I
4
FIG.
I
W
KEYBOAD
Ha
MOOSE ~~IID
US. Patent
Jun. 2, 1998
Sheet 3 0f 29
J
mm
E32S8E6%Q QAEDSmxQV 2:2952502
5%85215m$
2
2mm
g.
28%
QSlEOV
'Sm$23;5:8
5:SE.
23%
3g
in
_
up:55m;.8.
2 5% QZE w 2 5%
E?. -gmwmo
@NN22
3E2 3%65
222:58 62
E
E8
2.
gmummmcLm>mQEo.Q
_lk52w.32,l i.u
32.
G5
new” 8/
5,761,666
US. Patent
Jun. 2, 1998
Sheet 4 0f 29
5,761,666
ORIGINAL sENTENcE
PROCESSING
-
1
READ EACH sENTENcE OF
DOCUMENT FROM ORIGINAL
sENTENcE DATA BASE
~s\
1
/8
PATTERN
DETERMINATWW»
DlCTlONARY
/9
CONJUNCTIVE
DICT'ONARY --+
PATTERN DETERMINATION
SECTION DETERmNEs PATTERN
CORRESPONDING TD EAcH
~s2
SENTENCE AND ADD TAG To
EACH sENTENcE
SHAPING SECTION DELETES
ggwélrulggwz or EACH
STORE THE RESULT As
EXCERPT sENTENcE DATA
F
FIG.
4
~53
~54
U.S. Patent
Jun. 2, 1998
Sheet 5 of 29
(TYPE)
5,761,666
(PATTERN)
OPI NION/PROPOSAL
'i t is considered that - - '"
'Shou | d it not be that - - -“
‘it is Drouosed that‘ - -”
'i think that'--"
POSING OF PROBLEM
‘The
' 3 a prob | em. . .~
‘Con ' ration wi il now
be given aim”
EXPECTATiON/GUESS
'Maybe- - -"
'I
think
FIG.
that - - - "
5A
TYPE OF EXCERPT
(PLURAL EXCERPT TYPES MAYBE DESIGNATED)
Q/OP I N ION/ PROPOSAL [22>
El POSING 0F PROBLEM
Q/EXPECTAT ION/GUESS
D PAST/BACKGROUND
E] STATUS ouo
[1 CONCLUSION
F I G.
58
~
ALL
SENTENCES
OPiN ION/ PROPOSAL
SENTENCE NOT
INCLUDED iN
DESIGNA
TYP
US. Patent
Jun. 2, 1998
Sheet 7 0f 29
5,761,666
(ExcERPT PROCESS
/5
EXCERPT
SENTEN
om BA E
READ OUT EXCERPT SENTENCE
I
-* DATA CORRESPONDING T0
~SIO
SELECTED DOCUMENT
EXTRACT FROM EXCERPT
SENTENCE DATA ALL
SENTENCES PROVIDED WITH
TAG or SELECTED um
N3“
I
ORIGINAL
SENTENCE
DATA BASE
STORE SENTENCE M0. GP
EACH EXTRACTED SENTENCE
N312
DISPLAY ALL EXCERPTS ON
EXCERPT SENTENCE SCREEN
IN ITEMIZED FORMAT
~SI3
READ OUT ORIGINAL
‘M’ SENTENCE DATA OF SELECTED ~SI4
DODUMENT
DISPLAY ORIGINAL SENTENCE
ON ORIGINAL SENTENCE
SCREEN AND EMPHASIZE
SENTENCE CORRESPONDING TO
STORED SENTENCE NO.
END
FIG.
7
~ SIS
US. Patent
Jun. 2, 1998
5,761,666
Sheet 8 of 29
CPATTERN DISPLAY )
SELECT TYPE
TYPE
/a
~s2o
EXTRACT ALL PATTERNS
DETERMINATION --~
CORRESPONDING To
DICTIONARY
SELECTED TYPE
DISPLAY PATTERNS
CORRESPONDING To
SELECTED TYPE lN
TABLE FORMAT
I
(
END
FIG.
)
8A
TYPEIOPI MON/PROPOSAL
PATTERN:
‘It is considered that -~”
‘Should it not be that~ '"
‘It is proposed that-- '"
‘I think thaT---"
FIG.
8B
~s21
M522
US. Patent
5,761,666
Sheet 9 0f 29
Jun. 2, 1998
( SIESEII“ )
.45
XCERPT
ENTENC
DATA BA
I
READ OUT EXCERPT SENTENCE
DATA 0F SELECTED DOCUMENT
EXTRACT ALL SENTENCES
PROVIDED WITH TAG OF
SELECTED TYPE
I
EXTRACT IMPORTANT
SENTENCE INCLUDING
RETRIEVAL KEYWORD
~S32
DISPLAY ALL EXCERPTS
ON EXCERPT SCREEN IN
ITEMIZED FORMAT
~ S33
END
FIG.
9
US. Patent
Jun. 2, 1998
Sheet 11 of 29
5,761,666
DISPLAY OF
SPECIFIC SENTENCE
,,a
TYPE
DETERMINATION
I
DICTIONARY
+
SELECT TYPE
~ $40
READ our ALL PATTERNS
CORRESPONDING T0
SELECTED TYPE
~s4I
DISPLAY EACH PATTERN AND
NUMBER OF SENTENCES
~S42
MATCHING WITH EACH PATTERN
SELECT PATTERN
r/
~ S 43
5
EXCERPT
SENTENCE
READ OUT EXCERPT SENTENCE
P" DATA CORRESPONDING T0
DATA BACE A
N844
SELECJ?ED TYPE AND PATTERN
DISPLAY EXCERPT SENTENCE
DATA IN EXCERPT SCREEN
IN ITEMIZED FORMAT
END
FIG.
II
~ S45
US. Patent
Jun. 2, 1998
Sheet 13 0f 29
5,761,666
(CONTROL OF mou®
DESIGNATE AMOUNT n OF
SENTENCES (NUMBER OF
N 550
SENTE CESI EXTRACTED
AND DI PLAYED
EXTRACT PATTERNS
CORRESPONDING TO SELECTED
TYPE AND PRIORITY DATA
I = I
TYPE
r" DETERMINATION
DICTIONARY
~ 852
COMPARE AMOUNT OF SENTENCES
TO
CHING WITH PATTERNS UP
PRIORkTY
AT D AOFO RT l DWITH
T
u
WITIERPATI'TERNS OF PRIORITY
END
~ 553
S56
US. Patent
Jun. 2, 1998
Sheet 14 of 29
TYPE DETERMINATION DICTIONARY
PRIORITY >
<TYPE>
(ORDER
OPINION/PROPOSAL
5,761,666
(PATTERN)
I
2
3
‘I t is considered that - ' -”
"Should i t not be that ' - -"
'I t i s nrooosed that - - -"
4
'l thi nk that - ~ -”
FIG. I4A
Q
NUMBER OF SENTECES MATCHING WITH
PATTERN IN DOCUMENT
‘It is considered that~-”
'Should it not be that ' - -”
‘It is Drooosed that - - -"
'I think that - ' -"
FIG. I4B
NOD-‘
G
EXCERPT SENTENCE SCREEN
NUMBER OF SENTENCES SHIFTEDzWITHIN E8)
The Intention of AAAis considered to be
Rather.
shou Id resign.
-As for 000 , i t i s eonsiderd that
FIG.
14C
.
.
US. Patent
Jun. 2, 1998
Sheet 15 0f 29
5,761,666
UPDAT ING PROCESS
TmMwclo
ENm
R mm
NMRMNMmmaE P.u..M
_.Tm
F.Al_. . GTPTPD3mE
DSTD\O YHNMTTENUETYAPH“TR.NPIDYMMEBRplm WUSWNA
TEWTCD Emw?MMmanA“
T mN
FIG.
15
_
S5 6 O.2.1
VN
TN
,
END
CORRECHON 0F
DEGREE 0F PRIORITY
D\SPLAY EXCERPT SENTENCE
~STO
DATA ACCOMPANIED WITH
PRIOR\TY ORDER
P
.MNRmm
m
AC
PA
mm
m
ME
WE
NE
NN
mm
0
?wm
mm
DEmsE SRETASNE RM T... mTl
HON
FIG.
18
END
S7
2l
US. Patent
Jun. 2, 1998
Sheet 16 of 29
TYPEIOPINION PROPOSAL
@CIEE)
ADDITIONAL PATTERNS should a
PATTERN:
‘I t is considered that > ' ~"
Z‘ShduJ 51 it 516% 681 1a:- 1%
‘It is Dronosed that - - ~"
‘I think that - ' -”
FIG.
PATTERN:
ISA
'It is considered that ' - -”
‘Should it not, be that ~ - -”
/
/
I
I
I
r
I
r
1
0
,
a
,;'I'/////////
1
‘It is proposed that---”
‘I think that ' ' "'
UPDATING OF TYPE
DETERMINATION DICTIONARY
FIG.
I6B
5,761,666
US. Patent
Jun. 2, 1998
Sheet 17 0f 29
ORIGINAL SENTENCE SCREEN
SENTENCE
o
SENTENCE
Q
SENTENCE
FIG.
<< DICTIONARY REGISTRATION
//tsh(f)ul 1217/, 0
ITA
REGISTERED WORD:
SCREEN
'should- - -"
IN WHICH TYPE OF PATTERN
WILL THE WORD BE REGISTERED?
Q/OP l N ION/ PROPOSAL
[I POSING OF PROBLEM
[I] EXPECTATION/GUESS
@(QE)
FIG.
ITB
5,761,666
US. Patent
Jun. 2, 1998
Sheet 18 0f 29
PRIORITY )
(ORDER
<TYPE>
OPINION/PROPOSAL
(PATTERN)
I
'I t is considered that - ' -”
2
‘Shouid it not be that - - -”
3
'I t is nronosed that - - ~"
4
'I think that - ~ -”
F I G.
I9A
-The intention of AAA i 5 considered to be
Rather.
shou I d resign.
F l G.
OPINION/PROPOSAL
PRIORITY )
(ORDER
2
I
3
4
.
190
-As torOOO . it i 5 considered that
(TYPE)
5,761,666
.
I9B
(PATTERN)
‘It is considered that - - "'
‘Shou Id i t not be that - - -"
‘It is nronosed that - - '”
'I th i nk that - - '"
FIG.
I9C
US. Patent
Jun. 2, 1998
Sheet 19 of 29
5,761,666
I DOCUMENT RETRIEVAL I
SELECT TYPE AND PATTERN
~S80>
RETRIEVE PLURAL DOCUMENTS
~S8I
SORT RETRIEVAL RESULT IN
THE ORDER BEGINNING WITH
ONE INCLUDING MANY RELEVANT
SENTENCES ON THE BASIS OF
SELECTED TYPE AND PATTERN
~S82
DI PLAY RETRIEVED RESULT
IN THE SORTED ORDER
by
383
I
(
END
F I G.
)
20
SELECTION OF
ALL SENTENCES
L, 5
ORIGINAL
DATA BASE
SELECT TYPE
_S/9O
ORIGINAIs
DATA BA E
r
YES
EXTRACT EXCERPT
SENTENCE DATA ~592
f, 6
II
I
EXTRACT
595~ ORIGINAL DATA
I
SENTSNCES
WITH ~S93
TAG F TYPE
I
EXTRACT ALL
896
DISPLAY m
ITEMIZED FORMAT T“ 394
END
FIG.
23
DISPLAY DATA