Syntactic Patterns of Spatial Relations in Text

Syntactic Patterns of Spatial
Relations in Text
Shaonan Zhu
 Rule-based Extraction of Spatial Relations in Natural Language Text
Nanjing city is in Jiangsu province.
Nanjing city (GNE) is in(SIGNAL) Jiangsu province(GNE).
GNE: geographical named entity
SIGNAL:spatial relation term(express topological or the directional relation)
 Rule-based Extraction of Spatial Relations in Natural Language Text
• Define <left, GNE, middle, GNE, right> as the structure of the
instance. A spatial relation can be abstracted as a binary relation
between the two geo-graphical entities, has apparent target and
reference.
• Precondition:the region of the extraction is in a sentence.
▫ Spatial relation usually is in the sentence in Chinese text.
 Rule-based Extraction of Spatial Relations in Natural Language Text
POS (part of speech)
GNE
SIGNAL
ICTCLASS
CRF
Vocabulary
Nanjing city is in Jiangsu province.
sentence
GNE V SIGNAL GNE.
sequence
 Rule-based Extraction of Spatial Relations in Natural Language Text
• The most important step: use the syntactic
patterns to match the sequence.
[GNE] [V] [SIGNAL] [GNE]
Nanjing city is in Jiangsu province.
a binary relation can be show: GNE(Nanjing city) GNE(Jiangsu province)
SIGNAL(in)
 Rule-based Extraction of Spatial Relations in Natural Language Text
• The key of Rule-based Extraction of Spatial Relations is the
syntactic pattern
…
how to get the Syntactic Pattern
• Summarized by Experts
▫ In general, the syntactic patterns about spatial
relations are manually summarized.
how to get the Syntactic Pattern
• Introduce a useful way to find syntactic patterns.
▫ Corpus marked by spatial relations;
▫ Use the sequence alignment algorithm to calculate the
similarity between instances of spatial relations,;
▫ Group instances of high similarity,
▫ Generalized to generate the syntactic patterns of
spatial relations.
Corpus
• The richness of corpus would have a direct effect on the information
extraction. We choose encyclopedia of China (Geography section) as
the original data. There are thousands of spatial relation instances .
Similarity
• extend alignment algorithm to handle language
unit in the sequence.
▫ For example, there are two sequences:
【GNE1】【V】【GNE】【GNE2】 and 【GNE1】【V】【GNE2】.
The process of sequence alignment :
Similarity
The result of sequence alignment :
Similarity
• the similarity between the sequences is quantified by this
formula.
SUM means the score of the whole sequence. If at the same position
the language unit in the target sequence is the same with the reference
sequence, the score is added one point, and called one match.
LENGTH means the sequence length. SIM is the result. The score of
the example which is expressed is 0.75.
The similarity matrix
• Because the similarity should be computed between two instances,
the matrix can be built. The similarity matrix for instances of spatial
relations show the similarity of all the instances.
Generalization of syntactic patterns
1. traverse the instances, and pick up the instance and the
list which corresponds the instance in the similarity
matrix;
2. get the most similar instances, the number of the
instances is variable(1,2,…n);
3. generalize the instances from step two. If the result
contains language feature, go back to step two.
Otherwise, go back to step one.
4. loop until all the instances are traversed in step one .
Result
• This method overcomes the shortcomings of manual induction, and
finds out hidden rules about spatial relations.
▫ More patterns
▫ Law in the Chinese language
Thank you very much.