A Study on Detection Based Automatic Speech Recognition

A Study on Detection Based
Automatic Speech Recognition
Author
: Chengyuan Ma
Yu Tsao
Professor:陳嘉平
Reporter :許峰閤
Outline
• Introduction
• Word detector design
• Hypotheses combination
• Experiment
Introduction
• The current ASR system is top-down and
this is a bottom-up system.
• It include:
1.word detector.
2.word hypothesis verification and false
alarm pruning.
3.Hypothesis combination.
Word detector design
• We have separate detector for each lexical
item in the vocabulary.
• HMM model are used for detector design.
• The key issue is how to choose an
appropriate grammer network.
Word detector design
Word verification and pruning
Word verification and pruning
• It’s obvious that these detectors generate
a lot of false alarms.
• Here are three pruning strategies will be
presented.
Word verification and pruning
• Temporal information based pruning:
For example, the duration of the word “one” should be
greater than 150 ms.
• Attributes model based pruning:
Each word has its own attribute sequence pattern.
• Signal based pruning:
Signal feature based pruning.
For example, we know the energy of a nasalsound is
often concentrated on the low frequency region.
Hypotheses combination
• We investigate hypothesis combination
strategies using outputs from all detectors
to generate a word string.
• The weighted directed graph is one of the
methods that can be used to combine the
detector output into a digit string.
Hypotheses combination
• Each node in the graph is a detected digit
boundary.
• The number in the node is the time stamp.
• The number beside each edge is the
frame average log-likelihood.
• We can use the Dijkstra’s algorithm to find
the shortest path.
Experiment
• Conduct on the TIDIGITS corpus.
• Digit vocabulary is made of 11 digits, one
to nine, plus oh and zero.
• 12-dimensional MFCC is used for frondend processing.
Experiment