基于听感知的 音频信息隐藏研究 - VideoLectures.NET

ECML PKDD 2008, Antwerp
A Joint Segmenting and Labeling
Approach for Chinese Lexical Analysis
Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu
Speech and Hearing Research Center,
Department of Machine Intelligence, Peking University
September 18th, 2008
Cascaded Subtasks in NLP
Word Sense Disambiguation
Chunking and Parsing
POS Tagging
Word Segmentation
and Named Entity Recognition
Drawbacks:
 Errors introduced by earlier
subtasks propagate through the
pipeline and will never be
recovered in downstream
subtasks.
 The information sharing among
different subtasks is prohibited
by this pipeline manner.
Speech and Hearing Research Center, Peking University
Researchers’ Efforts on Joint Processing
 Reranking (Shi, 2007; Sutton, 2005; Zhang, 2003)
As an approximation of joint processing, it may miss the true optimal
result, which often lies out of the k-best list.
 Taking multiple subtasks as a single one (Luo, 2003; Miller,
2000; Yi, 2005; Nakagawa, 2007, Ng, 2004)
The obstacle is the requirement of corpus annotated with multi-level
information.
 Unified probabilistic models (Sutton, 2004, Duh, 2005)
Dynamical Conditional Random Fields (DCRFs) and Factorial Hidden
Markov Models (FHMMs), which are trained jointly and performs the
subtasks all at once.
Both DCRFs and FHMMs suffer from the absence of multi-level data
annotation.
Speech and Hearing Research Center, Peking University
A Unified Framework for Joint Processing
 A WFSTs based approach is presented to jointly perform a
cascade of segmentation and labeling tasks, which holds two
remarkable features as below:
WFST offers a unified framework that can represent many widely
used models, like lexical constraints, n-gram language model and
Hidden Markov Models (HMMs), and thus a unified transducer
representation for modeling multiple knowledge sources can be
achieved.
Multiple WFSTs can be integrated into a fully composed single
WFST, which makes it possible to perform a cascade of subtasks
with a one-pass decoding.
Speech and Hearing Research Center, Peking University
Weighted Finite State Transducers (WFSTs)
 The WFST is the generalization of the finite state automata,
which is capable of realizing a weighted relation between
strings.
 Composition operation
b:a/0.6
2
c:b/0.4
0
a:b/0.1
c:b/0.2
1
a:c/0.4
3
c:b/0.2
2
1
(a)
b:c/0.6
b:c/0.3
a:c/0.4
0
0,0
a:b/0.8
3,2
b:c/0.7
c:c/0.5
a:c/0.1
(b)
1,1
2,1
c:c/0.8
1,0
(c)
Speech and Hearing Research Center, Peking University
Example of WFSTs composition. Two
simple WFSTs are showed in (a) and (b),
in which states are represented by circles
and labeled with their unique numbers.
The bold circles represented initial states
and double circles of final states. The
input and output labels as well as weight
of transition t are marked as
in(t):out(t)/weight(t). In (c), the
composition of (a) and (b) is illustrated.
Joint Chinese Lexical Analysis
 The WFST based approach
Uniform Representation for Multiple Subtask Models.
Integration of Multiple Models.
 Tasks
word segmentation, part-of-speech tagging, and person
and location names recognition.
Speech and Hearing Research Center, Peking University
Multiple Subtasks Modeling
 An n-gram language model based on word classes is
adopted for word segmentation.
 Hidden Markov Models (HMMs) are adopted both for
names recognition and POS tagging.
 In names recognition, both Chinese characters and words
are considered as model units, and it is performed with
word segmentation simultaneously
Speech and Hearing Research Center, Peking University
The Pipeline System vs. The Joint System
FSAinput  FSTdict WFSTne WFSTn  gram
Decode
The Best Segmentation
WFSTpos
Compose
Compose
Decode
Decode
Output
Output
Pipeline Baseline
Speech and Hearing Research Center, Peking University
Integrated Analyzer
Simulation Setup
 Corpus: People’s Daily of China annotated by the Institute of
Computational Linguistics of Peking University
01-05(98) is used as the training set
06(98) is the test set
The first 2000 sentences of the test set are taken as the
development set
System
Word
Segmentation
F1(%)
Pipeline Baseline
95.94
91.06
83.31
89.90
Integrated Analyzer
96.77
91.81
88.51
90.91
POS Tagging
Person Names
Place Names
F1(%)
Recognition F1(%) Recognition F1(%)
Speech and Hearing Research Center, Peking University
The Statistical Significance Test
 The approximate randomization approach (Yeh, 2000) is
adopted to test the performance improvement produced by
the joint processing.
The evaluation metric F1-value of word segmentation is tested.
The responses for each sentence produced by two systems are
shuffled and equally resigned to each system, and then the
significance level is computed based on the shuffled results
10 sets, 500 sentences for each, are randomly selected and tested.
For all the selected 10 sets, the significance level p-values are all far
smaller than 0.001.
Speech and Hearing Research Center, Peking University
Discussions
 This approach holds the full search space and chooses the optimal
results based on the multi-level sources, rather than reranking the kbest candidates .
 The models for each level subtask are trained separately, while the
decoding is conducted jointly. Accordingly, it avoids the necessary of
corpus annotated with multi-level information.
 In the case when a segmentation task precedes a labeling task, the
WFSTs based approach naturally ensures the consistency restriction
imposed by the segmentation.
 The unified framework of WFSTs provides the opportunity to easily
apply the presented analyzer in other natural language related
applications which are also based on WFSTs, such as speech
recognition and machine translation
Speech and Hearing Research Center, Peking University
Conclusion
 In this research, within the unified framework of WFSTs, a
joint processing approach is presented to perform a
cascade of segmentation and labeling subtasks.
 It has been demonstrated that the joint processing is
superior to the traditional pipeline manner.
 The finding suggests two directions for future research
More linguistic knowledge will be integrated in the analyzer, such as
organization names recognition and shallow parsing.
Since rich linguistic knowledge will play an important role for the tough
tasks, such as ASR and MT, incorporating our integrated analyzer may lead
to a promising performance improvement.
Speech and Hearing Research Center, Peking University
Thank you for your
attention!
Speech and Hearing Research Center, Peking University
Uniform Representation (1)
0
合
1
成
2
分
3
子
时
4
5
(a)
1
8
子:ε
7
分:ε
合:ε
ε:合
3
合:ε
ε:分子
时:ε
0
子:ε
9
4
ε:合成
ε:子时
10
成:ε
ε:成分
成:ε
时:ε
ε:时
5
分:ε
6
2
(b)
Lexicon WFSTs. (a) is the FSA representing an input example; (b) is
the FST representing a toy dictionary.
Speech and Hearing Research Center, Peking University
Uniform Representation (2)
Classes
1
wi
ε/back(w1)
w2/un(w2)
w1/un(w1)
w1/bi(w3,w1)
w2/bi(w1,w2)
w2/un(w2)
2
ε/back(w2)
w3/bi(w2,w3)
w3/un(w3)
The ith word listed in the
dictionary
CNAME
Chinese person names
TNAME
Translated person names
w1/bi(w2,w1)
w1/un(w1)
0
Description
w3/bi(w1,w3)
w2/bi(w3,w2)
4
w3/un(w3)
ε/back(w3)
3
LOC
Location names
NUM
Number expressions
LETTER
NON
BEGIN
(c)
END
Letter strings
Other non Chinese character
strings
Beginnings of sentences
Ends of sentences
The WFSA representing a toy bigram language model, where un(w1) denotes
the unigram of w1; bi(w1;w2) and back(w1) respectively denotes the bigram of
w2 and the backoff weight given the word history w1.
Speech and Hearing Research Center, Peking University
Uniform Representation (3)
word: pos/p(word/pos)
0
surname
(a)
1
pos1/un(pos1)
pos1/bi(pos2,pos1)
pos2/bi(pos1,pos2)
0
pos2/un(pos2)
2
the first character
of the given name
pos3/bi(pos1,pos3)
pos3/bi(pos2,pos3)
pos2/bi(pos3,pos2)
pos1/bi(pos3,pos1)
The second character
of the given name
pos3/un(pos3)
3
(b)
POS WFSTs. (a) is the WFST representing the relationship between the word
and the pos; (b) is the WFSA representing a toy bigram of POS
Speech and Hearing Research Center, Peking University
CNAME
The Statistical Significance Test
 The approximate randomization approach (Yeh, 2000) .
The responses for each sentence produced by two systems are
shuffled and equally resigned to each system, and then the significance
level is computed based on the shuffled results.
The shuffle times is fixed as:

2n ,
n  20
Shuffle _ Times   20
2  1048576, n  20
Since in our test set there are more than 21,000 sentences, the use of
220 shuffles to approximate 221000 shuffles turns unreasonable any more.
Thus, 10 sets, 500 sentences for each, are randomly selected and tested.
For all the selected 10 sets, the significance level p-values are all far
smaller than 0.001.
Speech and Hearing Research Center, Peking University