Research on Semantic-Based Passive Transformation in Chinese

Research on Semantic-Based Passive Transformation
in Chinese-English Machine Translation
Wenfei Chang, Zhiying Liu, and Yaohong Jin
Institute of Chinese Information Processing, Beijing Normal University, Beijing, China
[email protected], {liuzhy,jinyaohong}@bnu.edu.cn
Abstract. Passive voice is widely used in English while it is less used in
Chinese, which is more prevalent in patent documents. The difference requires
us to transform the voice in Chinese-English machine translation in order to
make the result more smooth and natural. Previous studies in this field are
based on statistics, but the effect is not very good. In this paper we propose a
strategy to deal with the Chinese-English passive voice transformation from the
perspective of semantic. Through analyzing the sentences, a series of
transformation rules are summarized. Then we test them in our system.
Experiment results show that the transformation rules can achieve an accuracy
of 89.1% overall.
Keywords: passive voice, patent documents, Machine translation, Transformation
rules.
1
Introduction
Voice refers to the expression of the relationship between a verb and a noun phrase in
a language [1]. It includes two types: active voice and passive voice. Active voice
indicates that the subject is the agent of the action; passive voice means that the
subject is the patient of the action. There are passive sentences both in Chinese and
English, but they have a lot of differences in grammar grammatical concept, form of
structure, typical usages and semantic roles. In English, passive voice will be used
when the agent is uncertain or inconvenience to implicit or can be seen from the
context. In addition, when the sentence emphasizes on the event or action itself rather
than the agent, the passive voice is adopted, too. However, in Chinese we use active
voice in most cases except the sentence is used to express the feeling of unhappy or
unsatisfied. As a result, passive voice is widely used in English while it is less used in
Chinese. These differences require us to transform the voice in order to make the
translation result more smooth and natural.
With the rapid development of the world economy, the update velocity of the
technical knowledge becomes faster than ever. According to The World Intellectual
Property Organization (WIPO), patent applications increased year by year and reached
1.8 million in 2010. Most applications are from China or Europe and effective in these
areas. In order to better protect the benefit of the applicants, several major Intellectual
Property Office actively exploring how to improve the effect of machine translation.
G. Zhou et al. (Eds.): NLPCC 2013, CCIS 400, pp. 346–354, 2013.
© Springer-Verlag Berlin Heidelberg 2013
Semantic-Based Passive Transformation in Chinese-English Machine Translation
347
Patent documents as official and juridical documents, they tend to have some fixed
formats and they are suitable for machine translation (MT). However, the present MT
systems don’t have a good strategy to deal with the problem of passive
transformation, thus greatly degrades the whole quality of MT.
The writing Center of University of Delaware has done a statistics, result shows
that the passive form accounts for 65% of all predicate verbs in science and
technology [2]. According to [3], passive voice is one of the most important
characteristics in English. There is l/3 even more than l/2 verbs appear in passive
voice in the field of science and technology. 500 Chinese-English bilingual abstracts
of patent documents has been analyzed in [4], and found that the passive voice is not
appeared only in 22 English abstracts. That means more than 95% English patent
abstracts use passive voice. So it is essential to explore the passive translation
methods in Chinese-English patent machine translation.
The remainder of this paper is organized as follows. We discuss the related work
in Section 2. Semantic analysis of the passive voice is performed in Section 3. Next
is the transformation process in Section 4. The experiments and discussion are
presented in section 5. Finally, a conclusion is given and the further work is expected
in Section 6.
,
2
Related Work
There mainly two fields research on the passive transformation. One field is
traditional linguistic and the other is information processing field.
In traditional linguistic field, many papers have realized that passive voice is
widely used in English, especially in the field of science and technology. Some
researchers [5][6] has discovered that only transitive verbs can be used in the passive
voice. Besides the verb must be used to express a kind of act and followed by an
object. The difference between English and Chinese has been analyzed in [2], they
proposed that we should follow the language habit and translate the voice as much as
possible. Meanwhile, they present six methods about how to transform voice. But
they mainly pay attention to the transformation from English into Chinese. The
similarities and differences of the constituent components in Chinese and English
passive sentences have been discussed in [7].They described the situation which
should transform voice by analyzing the features of the subject, object, predicate or
the passive preposition in the sentence.
Though they have an in-depth study on the passive transformation, most of the
present studies are from the perspective of human rather than the machine, so it
doesn’t apply to machine translation.
In information processing field, some researchers has put forward some translation
methods from the perspective of lexical semantic and syntactic structure [8][9].And
[10] present a method to dispose the passive transformation based on the Case
Grammar. However, the related study is still limited in this field.
348
W. Chang, Z. Liu, and Y. Jin
Besides in present MT systems, most of them are based on statistics. Among them,
Google Translator (name it Google for short) is the best. So we select some sentences
from the patent documents and put them in Google to check the effect.
Example 1
12
Reference1,: In accordance with the revolutions of the combined gears, an optical
disk [is chucked], and a tray [is loaded].
Google: According to the rotation of the gears, the [clamping] disc, and the
[loading] tray 12.
Example 2
“ ”
Reference: The word [is recognized] by carrying out in the optical domain a
bit-wise Boolean “AND” operation.
Google: The word through the implementation of the bit by bit in the optical field
within the Boolean “and” operation to [identify].
Example 1 has omitted the subject, and the object has omitted in Example 2. In
these cases, the words showed by italics should be transformed into passive voice
according to the usage of English. But the result show that Google failed to transform
it. After test some kinds of sentences, we find the accuracy of passive transformation
is low. As we can see though statistical method is the mainstream, it doesn’t have a
good strategy to treat the passive transformation at the moment. The results reflect
that it is difficult to achieve a good effect without using syntactic and semantic
analysis when translating long patent sentences.
Hence, in this paper, from the perspective of semantic, we propose a systematic
processing strategy which composed by a series of rules according to the features of
the patent documents, which has greatly improved the effect of MT.
根据各齿轮的旋转,夹持光盘,并装载托盘 。
这个字通过在光学领域内执行逐个比特的布尔 与 运算来识别。
3
Semantic Analysis of the Passive Voice
In English, the structure of “be+V-ed” is used to indicate the sentence is a passive
sentence, that is to say, it is the mark of the passive sentence. However, in Chinese,
many passive meaning are expressed by the active form, thus judging whether a
sentence should be translated into passive sentence in Chinese-English MT system
should not only rely on the passive mark but also have to observe the sentence
semantic. Sentences with passive mark are only one kind of the sentences which
should be transformed, there are many kinds of sentences without passive mark
should transformed when translating, too. They all should use passive voice when
translated into English. Different transform methods are adopted in the process of
transformation according to whether can find a passive mark in the sentence or not.
3.1
Sentences with Passive Mark in Chinese
In Chinese, the preposition BEI or SUO are used to mark the passive voice. But there
are some differences in usage.
1
The bilingual corpus is provided by China Patent Information Center.
Semantic-Based Passive Transformation in Chinese-English Machine Translation
349
• Passive Mark BEI
BEI is an unconditional transformation mark whenever we find BEI before a verb in
the sentence. Regardless of whether BEI is closely adjacent to the verb, the passive
voice will be used when translated into English.
1)Patient+ BEI+ Verb: In this kind of sentences, BEI is immediately before the
verb, there is no other part between them, the order of the language blocks in the
sentence would keep unchanged when translated into English.
(Thereby the trader that sent
Example 3
in the order will be informed about the deal.)
2)Patient + BEI +…+Verb: It is allowed to have an agent or adverb or other
components between BEI and the verb in this kind of sentences. And the order of the
language blocks would keep unchanged, too.
Example 4
3
(As can be seen in the figure the ranking unit is separated by the matching unit and
connected to the input mechanism 3.)
因此提交订单的交易者将被通知成交。
如图中可见的,排列单元被匹配单元分离并连接到输入机构 。
• Passive Mark SUO
SUO is also a mark of the passive voice. Unlike BEI, there isn’t allow any part
between SUO and the verb, therefore if we find the word SUO located immediately
before a verb in Chinese, then we should transform the verb into passive form when
translated into English.
Example 5
(Hence, it does not need to handle the order that was received at the first
ranking unit and which was not top ranked.)
因此,它不需要处理在第一排列单元所接收的并且不是最优排列
的订单。
3.2
Sentences without Passive Mark in Chinese
Through the statistical analysis of 1000 sentences, we find that sentences should be
transformed into passive voice when translating and without passive mark can achieve
the proportion as high as 61%. The data can be seen in Table 1.
Table 1. Classification of Passive Sentence
Type
Sentence number
Proportion
Sentences with passive mark
390
39%
Sentences without passive mark
610
61%
From the table we can see that most of the passive sentences are without passive
mark in Chinese. So it is difficult for the MT systems to recognize the passive
meaning and transform the verb into passive voice when translating. Though they are
difficult to distinguish, they have an important role in enhancing the transformation
accuracy rate. Consequently, they are the emphasis of our research.
Our research are performed based on the Hierarchical Network of Concepts theory
(HNC theory)[11], which is a natural language understanding theory from the
350
W. Chang, Z. Liu, and Y. Jin
perspective of semantic. HNC views the language processing as a mapping process
from the natural language space to the language concept space. The language
concepts can be divided into two categories: action concept (presenting GX) and
effect concept (presenting GY) (The action is cause and the effect is result.)[12].
According to the concept category of the main verb in the sentence, two sentence
categories have been classified: global action sentence and global effect sentence.
And in this section, our work is done based on the division of the two sentence
categories.
• Action Sentence
The verb in global action sentence mainly expresses the meaning of one participant
exerts a power to the other. Generally speaking, this category of sentences needn’t
transform into passive voice if the components are complete. But when there is a
component ellipsis or there is a preposition immediately next to the main verb in the
sentence, then the sentence should be transformed into passive voice.
Component ellipsis in sentence. The complete sentence structure is SVO both in
Chinese and English. However, the sentence without subject or object can be
frequently found in Chinese. Then the structure of the sentence will become the form
of “V+NP” or “NP+V”. In these structures, NP acts the patient of the action. So the
sentences should be transformed into passive voice when translated into English.
“Verb+Prep” structure in sentence. The compound structure composed by the main
verb and an immediately adjacent preposition is used to describe an objective
phenomenon. The subject in this kind of sentences no longer acts the agent, but the
patient of the action. So we should transform the sentence into passive voice when
translating.
• Effect Sentence
Unlike the action sentences, there is no agent or patient in the sentence, the effect
sentences are used to describe a kind of objective phenomenon. But when the verb
expresses a strong result meaning, the word itself implies an agent, so it should be
translated into passive voice, too. In view of this situation, we have chosen to add
related property “ALL_PASS” in the knowledge base in order to provide information
for the MT system. As long as the main verb has the property of “ALL_PASS[Y]”, it
would be transformed into passive voice in the translation process.
4
Transformation Rules and Algorithm
According to several situations we have mentioned above, a series of rules are drawn
up to transform the passive voice in MT system.
4.1
Transformation Rules
• Transformation with Passive Mark in Chinese
There are mainly two rules in this part according to [13].
Semantic-Based Passive Transformation in Chinese-English Machine Translation
351
Rule 1:
(b)2{(-1)CHN[
]}+(0)LC_CHK[E,EG,EP]=>DEL_NODE(-1)+COPY[-1,0]+(0){VOI=P}$
Rule 2:
(-1)CHN[ ]&LC_CHK[QE]+(0)LC_CHK[E,EG,EP]=>DEL_NODE(-1)+(0){
VOI=P}$
Rule 1 means that if we can find the preposition BEI( ) before E,EG,EP 3
regardless of whether they are immediately adjacent to node 0, then preposition BEI(
) will be deleted, components between preposition BEI( ) and node 0 will be
copied as well as node 0 will be transformed into passive voice.
Example 6
(A scanning level of pixel data for a given horizontal line is regularly
stored in an address memory.)
Rule 2 means that if SUO( ) act QE4 and immediately adjacent to node 0, then
delete SUO( ) and transform node 0 into passive voice.
Example 7
(The range
of colors measured by an image sensor device depends on the color of the illuminant.)
被
所
被
存储器中。
所
被
被
一条指定水平线的像素数据的扫描级被有次序地存储在一个地址
所
图像传感器装置所测定的色彩范围取决于光源的色彩。
• Transformation without Passive Mark in Chinese
In action sentences, we give different transform rules according to the different
situations. Several examples are given below.
Rule 3:
(-1){BEGIN%}+(b){!LC_CHK[GBK]}+(0){LC_CHK[E,EG,EP]&LC_SC_KEY[
GX]&!CHN[ ,
,
]}+(1)LC_CHK[GBK]=>(-1)+COPY[-1,0]+(1)+(0){VOI
=P}$
Rule 3 means that if the verb belongs to [GX]5 except the words “ ”, “
”, “
”, and we can’t find GBK6 before it, then node (1) will be put forward before the
verb and the verb will be transformed into passive voice in the process of translation.
Example 8
118
120
122
(An annular recess 122 is formed in housing 118 radially inward of blade 120.)
Rule 4:
(b){(-1)BEGIN%}+(b){!LC_CHK[L0]}+(0)LC_CHK[E,EG,EP]&LC_SC_KEY[
GX]+(1){END%}=>(-1)+COPY[-1,0]+(0){VOI=P}+(1)$
Rule 4 means that if the verb belongs to [GX] and we can’t find L07 before it as
well as it locates at the end of the sentence, then the verb will be transformed into
passive voice.
使 具有 使得
得
2
3
4
5
6
7
使 具有 使
在外壳 中在叶片 的径向向内的位置处形成环形凹槽 。
(b) means looking for something forward.
E, EG, EP are terminologies in HNC which mean the verb in sentence.
QE is a terminology in HNC which means the modifier of E.
GX means action concept.
GBK is short for general object chunk.
L0 is a terminology in HNC which means the mark of main semantic chunk.
352
W. Chang, Z. Liu, and Y. Jin
Rule 5:
(0)LC_CHK[E]+(1)CHN[ , , , , ]&LC_CHK[HV]=>(0){VOI=P}+
DEL_NODE(1)+ADD_NODE(ENG=[to])$
Rule 5 means that if there is a preposition immediately behind E and act HV8, then
we will transform the verb into passive voice and HV will be substituted by the
English word “to” when translating.
Example 9
505
(In step 505, the normalized pixel data subset is projected into the color space subset.)
In effect sentence, we will take advantage of the information which in the
knowledge base to determine whether to transform the voice or not. One rule is used
to invoke the information.
Rule 6:
(0)LC_CHK[E,EG,EP]&LC_SC_KEY[ALL_PASSIVE]=>(0){VOI=P}$
Example 10
(A reflection plate
with a predetermined shape is formed inside a lower casing.)
Rule 6 means that if the verb has been labeled the tag of “ALL_PASSIVE” in
knowledge base, it will be transformed into passive voice.
至到给于成
在步骤 中,已标准化的像素数据子集投射到色空间子集中。
具有预定形状的反光板形成于一下壳体中。
4.2
Algorithm
According to the features of the transformation rules, we design the procedure of
transforming the passive voice in MT system semantically as below:
Step 1: To determine if there is a passive mark in Chinese sentence. If yes, go to step
6; if no, go to step 2.
Step 2: To determine the concept category of the predicative verb. If GX, go to step
3; if GY, go to step 5.
Step 3: To determine if there is a component ellipsis in the sentence. If yes, go to
step 6; if no, go to step 4.
Step 4: To determine if it is the “Verb + Prep” structure in the sentence. If yes, go to
step 6; if no, go to end.
Step 5: To determine if the main verb has the property of ALL_PASS[Y]. If yes, go
to step 6; if no, go to end.
Step 6: To transform the verb into passive voice.
5
Experiments and Result Analysis
5.1
Experiments
In this experiment, we have selected 1000 sentences randomly and put them into our
rule-based system (name it RB for short) to test the transformation effect. Meanwhile,
we test them in Google, too. Three types of data are counted and the definite data can
be seen in Table2.
8
HV is a terminology in HNC which means the verb suffix.
Semantic-Based Passive Transformation in Chinese-English Machine Translation
353
Table 2. Types of data
Type
Total number
Should be
transformed
Transformed
Right
transformed
RB
1000
632
540
481
Google
1000
632
515
430
Then, the Precision (P) and Recall(R) are calculated, and the results are shown in
Table 3:
Table 3. Result of transformation
System
Precision
Recall
RB
89.1%
76.1%
Google
83.4%
68.1%
From table 3 we can see that our system has achieved the higher Precision and
Recall than Google, and the accuracy can reach as high as 89% overall. The result
indicates that our method can efficiently improve the translation performance in
Chinese-English machine translation system.
5.2
Result Analysis
Although our system has achieved good results, there are still areas for improvement.
By analyzing errors in the result, we find there are mainly have four reasons: a) Rules
have not covered all the kinds of linguistic phenomenon. b) In effect sentence, the
passive voice transformation mainly relies on the information in knowledge base, so if
the verb has been wrongly given the information of “ALL_PASS[Y]”, it will be
wrongly transformed. c) Our work is performed based on the verb; if the verb is
wrongly recognized in the sentence, then it will not match the right transformation
rule. That is the main reason that leading to the low Recall. d) The system may be
left some sentences unanalyzed, thus leading to the transformation work can’t
be proceeded.
6
Conclusions and Future Work
Passive voice is widely used in English patent documents while it is less used in
Chinese. So it is an important problem in Chinese-English machine translation. In this
paper, with the guidance of HNC, we first classify the sentences into two types:
sentences with passive mark in Chinese and sentences without passive mark in
Chinese. And then analyze them in detail. Wherein sentence without passive mark in
Chinese is our emphasis, in this part, we further analyze the sentences which should
be transformed when translating in action sentence and effect sentence respectively.
Through analyzing amount of bilingual sentences, we have concluded the
354
W. Chang, Z. Liu, and Y. Jin
transformation rules then tested them in our system. Results show that the precision of
our system has achieved 89.1%.
In the future, in view of the reasons for the error, we will investigate more
sentences in order to supplement and refine the existing rules. On the other hand, we
will further improve the related information in the knowledge base.
Acknowledgements. This work was supported by the Hi-Tech Research and
Development Program of China (2012AA011104), and the Fundamental Research
Funds for the Central Universities.
References
1. Richards, J.C., Schmidt, R.W.: Longman Dictionary of Language Teaching and Applied
Linguistics, 3rd edn. Foreign Language Teaching and Research Press, Beijing (2005)
2. Man, B., Zijuan, S., Shengtao, Z.: A method of translating English passive voice into
Chinese. Journal of Guangdong Mechanical Institute 14(2) (June 1996)
3. Bin, L.: The comparative approach to the translation of English typical patterns in MT
software. Southwest Jiaotong University, 5 (2004)
4. Zhiying, L., Yaohong, J.: Passive sentence transformation in Chinese-English patent
machine translation. The Journal of China Universities of Posts and
Telecommunications 19(suppl. 2), 135–139 (2012)
5. Baoyu, B.: A discussion on English voice transformation. Journal of Daqing College 16(3)
(August 1996)
6. Yongxin, Z.: Comparison of Chinese and English passive structure. Foreign Language
Teaching (February 1983)
7. Wenhua, X.: Comparison of passive sentences in Chinese and English. Language Teaching
and Linguistic Studies (April 1983)
8. Yaohong, J.I.N., Zhiying, L.I.U.: Improving Chinese-English patent machine translation
using sentence segmentation. In: IEEE 7th International Conference on Natural Language
Processing and Knowledge Engineering (NLP-KE 2011), Tokushima, Japan, pp. 620–625
(2011)
9. Nunberg, G.: The Linguistics of Punctuation. CSLI Lecture Notes, No. 18, Stanford CA
(1990) (July 2012); Bai, X., Zhan, W.: Constraints of BEI and process of English passive in
machine translation, New expansion of Chinese passive expression, 1–17 (2006)
10. Jian, L., Bingxi, W., Yonghui, G.: Rule-Based Converter and Generation in English-Chinese
MT System. In: The 2nd National Conference on Computational Linguistics for Students,
pp. 390–393 (2004)
11. Zengyang, H.: Hierarchical Network of Concepts (HNC) Theory. Tsinghua University Press
(1998)
12. Chuanjiang, M.: HNC (hierarchical network of concepts) theory introduction. Tsinghua
University Press, Beijing (2005)
13. Yun, Z., Yaohong, J.: A Chinese-English patent machine translation system based on the
theory of hierarchical network of concepts. The Journal of China Universities of Posts and
Telecommunications 19(suppl. 2), 140–146 (2012)