Syntactic Patterns of Spatial Relations in Text Shaonan Zhu Rule-based Extraction of Spatial Relations in Natural Language Text Nanjing city is in Jiangsu province. Nanjing city (GNE) is in(SIGNAL) Jiangsu province(GNE). GNE: geographical named entity SIGNAL:spatial relation term(express topological or the directional relation) Rule-based Extraction of Spatial Relations in Natural Language Text • Define <left, GNE, middle, GNE, right> as the structure of the instance. A spatial relation can be abstracted as a binary relation between the two geo-graphical entities, has apparent target and reference. • Precondition:the region of the extraction is in a sentence. ▫ Spatial relation usually is in the sentence in Chinese text. Rule-based Extraction of Spatial Relations in Natural Language Text POS (part of speech) GNE SIGNAL ICTCLASS CRF Vocabulary Nanjing city is in Jiangsu province. sentence GNE V SIGNAL GNE. sequence Rule-based Extraction of Spatial Relations in Natural Language Text • The most important step: use the syntactic patterns to match the sequence. [GNE] [V] [SIGNAL] [GNE] Nanjing city is in Jiangsu province. a binary relation can be show: GNE(Nanjing city) GNE(Jiangsu province) SIGNAL(in) Rule-based Extraction of Spatial Relations in Natural Language Text • The key of Rule-based Extraction of Spatial Relations is the syntactic pattern … how to get the Syntactic Pattern • Summarized by Experts ▫ In general, the syntactic patterns about spatial relations are manually summarized. how to get the Syntactic Pattern • Introduce a useful way to find syntactic patterns. ▫ Corpus marked by spatial relations; ▫ Use the sequence alignment algorithm to calculate the similarity between instances of spatial relations,; ▫ Group instances of high similarity, ▫ Generalized to generate the syntactic patterns of spatial relations. Corpus • The richness of corpus would have a direct effect on the information extraction. We choose encyclopedia of China (Geography section) as the original data. There are thousands of spatial relation instances . Similarity • extend alignment algorithm to handle language unit in the sequence. ▫ For example, there are two sequences: 【GNE1】【V】【GNE】【GNE2】 and 【GNE1】【V】【GNE2】. The process of sequence alignment : Similarity The result of sequence alignment : Similarity • the similarity between the sequences is quantified by this formula. SUM means the score of the whole sequence. If at the same position the language unit in the target sequence is the same with the reference sequence, the score is added one point, and called one match. LENGTH means the sequence length. SIM is the result. The score of the example which is expressed is 0.75. The similarity matrix • Because the similarity should be computed between two instances, the matrix can be built. The similarity matrix for instances of spatial relations show the similarity of all the instances. Generalization of syntactic patterns 1. traverse the instances, and pick up the instance and the list which corresponds the instance in the similarity matrix; 2. get the most similar instances, the number of the instances is variable(1,2,…n); 3. generalize the instances from step two. If the result contains language feature, go back to step two. Otherwise, go back to step one. 4. loop until all the instances are traversed in step one . Result • This method overcomes the shortcomings of manual induction, and finds out hidden rules about spatial relations. ▫ More patterns ▫ Law in the Chinese language Thank you very much.
© Copyright 2026 Paperzz