DBN_3

Case Study 1
Semantic Analysis of Soccer Video Using Dynamic
Bayesian Network
C.-L Huang, et al.
IEEE Transactions on Multimedia, vol. 8, no. 4, 2006
Fuzzy Systems
Lifelog management
Outline
• Introduction
• Low-Level Evidence Extraction
• Semantic Analysis Using BN and DBN
– Training Phase
– BN and DBN Model
– Propagation in Bayesian Network
– Temporal Intervening Network
• Experimental Results
• Conclusions
1/12
Introduction
• Recently, a large amount of digital media data have been delivered
 Flexible and scalable way to manage these rich media
• Several research have used domain knowledge to facilitate extraction of highlevel concepts directly from features
– Use of HMM for automatic learning capabilities to derive knowledge
– Sports video analysis and summarization using low-level video processing
algorithm
– Use of TIME (time interval maximum entropy) to classify the event
• This paper presents the automatic interpretation of the highlights using
BN/DBN in the soccer game video
• Different from previous semantic analysis approaches
– It used frame-based instead of shot-based
– BN & DBN are automatically generated in the training process
– Temporal intervening network is introduced to improve the accuracy of the
semantic analysis
2
Low-Level Evidence Extraction (1)
• Dominant color region
– Normally, the dominant color region indicates the soccer field
– The dominant color is described by the peak value of each color
component
• Short term motion
– The camera motion between two consecutive frames
• Texture intensity
– Audience region and grass field region can be differentiated by the texture
density information
3
Low-Level Evidence Extraction (2)
• Logo
– Logos sandwich the replay,
which is useful for navigation, indexing, and summarization of the sports
video
• Parallel lines
– These can be used to indicate the occurrence of the gate
– Edge detector and Hough Transform are used
• Score board
– A caption region distinguished from the surrounding region
– Provides the score information
4
Low-Level Evidence Extraction (3)
• Black object
– The information of referee is useful for the event detection,
e.g., card events
• Audio energy
– Voice conveys crucial information of the game, goal events
• Long-term static scene
– The camera keeps tracing the ball, so that the continuous camera panning
motion stops only when a particular event occurs
5
Semantic Analysis Using BN and DBN
• BN and DBN are powerful semantic analysis tools
which have been applied to model the high-level semantic information
• In sports, the high-level semantics are the highlight events
containing recurring temporal structure
• We use BN/DBN to model the semantic highlights of soccer game
such as goal, corner kick, penalty kick, and card events
• BN/DBN is automatically generated after following training process
6
Training Phase
Semantic Analysis
• Based on extractable features and the causality in the soccer video,
this paper defined three types of nodes
– The event nodes
• goal, corner, penalty, and card
– The hidden nodes
• replay, board, close-up, audio, audience, gate, panning, static camera,
and referee
– The evidence nodes
• energy, logo, texture, motion, parallel lines, and dominant color
• Training phase can be categorized into two kinds:
qualitative (structural) training & quantitative (parameter) training
7
Training: Parameter Training
Semantic Analysis
• In the 1st phase, we compute all the conditional probabilities
between event-hidden nodes or the hidden-hidden nodes
by counting # of times that the joint appearance of the event-hidden node pair
is true and the # of times that the appearance of the event node is true
• The 2nd phase is applied for all the temporal dependency for each eventevent pair, event-hidden pair, or hidden-hidden node pair at two consecutive
time slices
• The 3rd phase is applied to generate the conditional probability of the existing
link between the evidence nodes and the hidden nodes
– The appearance of the hidden node is obtained by human observer, and
the appearance of evidence node is obtained by feature extraction
process
8
Training: Structural Training
Semantic Analysis
• After parameter training, every two nodes in the network are somehow
related
• Assuming
– p(nei|ncj): the conditional probability relating the cause node nc to the effect node ne,
– U={(nei, ncj)}: the universe of the configuration over a universe of the linkages of
every two nodes
– {P(x)}={p(nei|ncj)}: original distribution after training
– M*: the candidate network with {P*(x)}={p*(nei|ncj)}, the distribution after thresholding
• Defining
– Size(M*): the number of entries in P* that p*(nei|ncj) > t
– Dist(P, P*): the cross entropy distance, P(x) ΣjlogP(x)/P*(x)
– Acc(P, M*): the acceptance measure, Size(M*) + k Dist(P, P*)
• We use the Lagrange method to choose Lagrange multiplier k and the
threshold t that minimize the Acc(P, M*)
• 500,000 frames are used to generate reliable DBN
9
BN and DBN Model
Semantic Analysis
• After the training processes, we generate the BN/DBN,
which may infer certain high-level semantics given the evidences as the input
10
BN and DBN Model
Semantic Analysis
• For different event, we have developed a corresponding DBN
• Some hidden nodes appear in the BN, but not in the corresponding DBN
• An example of “board” node in BN (a)  After parameter training, the
temporal causality between score board and the goal is stronger than its
spatial causality
11
Propagation in Bayesian Network
Semantic Analysis
• We apply the algorithm of probability updating in BNs
• The algorithm does not work directly on the BN but work on a junction tree
• With a complete set of evidences,
the final inference propagation of DBN will lead to the action-decision,
which can be divided into intervening actions and non-intervening actions
– Intervening action changes the posteriori probability distribution of the
model during the inference
– Non-intervening actions has no impact on the model
• This paper introduces temporal intervening network
• We can improve the identification detection rate of the high level semantic
meaning in the soccer video
12
Temporal Intervening Network (1)
Semantic Analysis
• The events in soccer video have certain regularity
– Goal event: occurrences of gate, close-up, and replay follow certain rules
of causality
– Once the regularity is found, the TIN is activated
• Ex) close-up disappears, the replay will appear in less than 20 frames
• Ex) Cl-Re distance < 20, we may also use the TIN to increase
P(Goal=Y|Replay=Y)
• For different events,
we may also apply the TIN to change the posteriori probabilities
for some linkages in the DBN
to improve the accuracy of the final inference result
13
Temporal Intervening Network (2)
Semantic Analysis
14
TIN: Goal Event Example
Semantic Analysis
• Based on the training data, following table for TIN can be generated
• Now, at an arbitrary instance of the testing video,
suppose the close-up appears, and the probability of the appearance of goal
is P(Goal=Y|Close-up=Y)=0.6
• If we assume Ga-Cl distance < 20,
we may have a new posterior probability P’(Go=Y|Cl=Y) as follows
• And if we assume Ga-Cl distance > 20,
the goal event probability is reduced from 0.6 to 0.159
15
Dataset
Experimental Results
• 7 soccer video games for more than 11 hours from two TV
• The video source is MPEG-1 clips in 320 x 240 resolution at 30 frames/s
• Audio is sampled at 44 kHz with 16 bits per sample
• This paper conducted experiments
both for the frame-based and shot-based event detections
16
Experimental Results
Frame-Based Event Detection
• TIN improves the detection rate slightly, but reduce the false alarm rate
greatly
17
Shot-Based Event Detection
Experimental Results
18
Conclusions
• This paper proposed a video program understanding system
• Given an input sequence,
the system will collect the low-level evidence, and
applies the inference engine in BN/DBN to infer high-level semantic concepts
that interpret the semantic content of video sport program
• The main contribution of the paper is
to add the temporal intervening network to DBN
to improve the semantic interpretation accuracy
19