An Invariant Model of the Significance of Different Body Parts in

1
An Invariant Model of the Significance of Different
Body Parts in Recognizing Different Actions
Yuping Shen and Hassan Foroosh
arXiv:1705.08293v1 [cs.CV] 22 May 2017
Abstract
In this paper, we show that different body parts do not play equally important roles in recognizing a human action in video
data. We investigate to what extent a body part plays a role in recognition of different actions and hence propose a generic
method of assigning weights to different body points. The approach is inspired by the strong evidence in the applied perception
community that humans perform recognition in a foveated manner, that is they recognize events or objects by only focusing on
visually significant aspects. An important contribution of our method is that the computation of the weights assigned to body parts
is invariant to viewing directions and camera parameters in the input data. We have performed extensive experiments to validate
the proposed approach and demonstrate its significance. In particular, results show that considerable improvement in performance
is gained by taking into account the relative importance of different body parts as defined by our approach.
Index Terms
Invariants, Action Recognition, Body Parts
I. I NTRODUCTION
Human action recognition from video data data has a wide range of applications in areas such as surveillance and image
retrieval [10], [13], [14], [58]–[61], [67], [119], [121], [122], [124], image annotation [125]–[129], video post-production and
editing [5], [6], [20], [26], [50], [84], and self-localization [62], [63], [68]–[70], to name a few.
The literature on human action recognition from video data includes both monocular and multiple view methods [9], [12],
[15], [16], [35], [112]–[117], [120], [123]. Often, multiple view methods are designed to tackle viewpoint invariant recognition
[9], [12], [15], [16], [113]–[115], [117], although such methods may require calibration across views [8], [25], [56], [57],
[64], [65], [71]–[74], image registration [7], [17]–[19], [21]–[24], [27], [28], [43]–[49], [95], [103], [104], or tracking across
views [79]–[81], [107], [118]. There are also methods that rely on human-object interaction [87], [135], [136], which often
require identifying image contents other than humans [2]–[4], [36]–[38], [40], [42], [76], [78], [85], [134]. Other preprocessing
steps that may be needed include image restoration [29]–[31], [53], [77], [92]–[94], [96]–[102], [105], [106], [108], or scene
modeling [11], [32], [54], [66].
In this paper, we look at a very specific problem of determining which body parts play a role in recognizing an action and
to what extent. This is a crucial information that may serve as a prior information to any method in the literature, whether
video-based or single-image method.
II. R ELATED W ORK
The literature in human action recognition has been extremely active in the past two decades and significant progress has
been made in this area [51], [82], [83], [131]. Human action recognition methods start by assuming a model of the human
body, e.g. silhouette, body points, stick model, etc., and build algorithms that use the adopted model to recognize body pose
and its motion over time. While there has been many different ways of classifying existing action recognition methods in the
literature [51], [82], [83], [131], we present herein a different perspective. We consider action recognition at three different
levels: (i) the subject level, which requires studying the kinematics of the articulated human body, (ii) the image level, which
requires studying the process of imaging the 3D human body both in terms of the geometry and lighting effects, and (iii)
the observer level, which requires studying how an observer would interpret visual data and recognize an action. This way of
categorizing action recognition is rather uncommon but offers a different point of view of how one can approach this problem.
Herein we are interested in investigating observer level issues from a new point of view. It has been long argued in the
applied perception community [90] that human observers use a foveated strategy, which may be interpreted as an approach
using “importance sampling”. What this implies is that humans sample only the most significant aspects of an event or action
for recognition, and do not give equal importance to every observed data point. Most of existing action recognition methods in
the literature have primarily focused on subject level [33], [34], [86], [132], [133], [137], [138] and image level [41], [75], [89],
[130], [140] issues. A number of methods have focused on the observer level [1], [89], but mostly from a machine learning
point of view. An interesting work by Sheikh et al. [91] recently has perhaps a close connection to what we aim to address
Yuping Shen was with the Department of Computer Science, University of Central Florida, Orlando, FL, 32816 USA at the time this project was conducted.
(e-mail: [email protected]).
Hassan Foroosh is with the Department of Computer Science, University of Central Florida, Orlando, FL, 32816 USA (e-mail: [email protected]).
2
in this paper. However, they investigate a holistic approach in an abstract framework of action space. Our goal is to propose
an observer level approach with more direct connection to image level features, which we believe would be more “intuitive”.
In other words, we are interested in exploring an idea similar to importance sampling where image level features are directly
assigned different importance weights that are derived directly from the extent to which the feature affects the recognition
performance for a given action.
Fig. 1. An example of human poses, with the points used for body representation: head, shoulders, elbows, hands, knees, and feet. A body point triplet is
shown.
One particular issue that has attracted attention among many of these groups is invariance [39], [52], [55], [86], [88],
[88], [110], [111]. Invariance is an important issue in any recognition problem from image or video data, since appearance is
substantially affected by the imaging level, i.e. the viewing angle, camera parameters, lighting, etc. In action recognition the
issue of invariant recognition is compounded with additional complexity introduced by the high degrees of freedom of human
body [139].
Our focus in this paper is twofold: to investigate action recognition at the observer level which takes into account how
humans recognize actions by paying attention to only the most significant visual cues, and to introduce this approach within
an invariant framework
III. O UR B ODY M ODEL
We use a point-based model similar to [110], [111], which consists of a set of 11 body points (see Figure 1). We also adopt
the idea of decomposing the non-rigid motion of human body into rigid motions of planes given by triplets of body points
[110], [111]. This implies that the non-rigid motion of human body is described by a set of homographies induced by planes
associated with the body point triplets . As a result the complex problem of estimating the human body motion is reduced to
a set of linear problems associated with the motions of planes.
With the triplet representation of human pose and action, we may consider the relative importance of body point triplets
for different actions. For instance, the triplets composed of shoulders and hips have similar motion in walking and jogging,
and thus make trivial contribution to distinguish them, while other triplets that consist of shoulder, knee and foot joints carry
more essential information of the differences between walking and jogging (See Figure 2). Understanding the roles of body
point triplets in human motion/action could help us retrieve more accurate information on human motion, and thus improve
the performance of our recognition methods.
Our problem can be described as follows. We have a database of reference actions. Given a target sequence we are interested
in finding out which reference sequence in the database best matches our target sequence. We are interested in performing
this task in a manner that is invariant to viewing directions and the camera parameters. Our first step is to align the target
sequences to be recognized to the reference sequences in our database, as described in the next section.
A. Sequence Alignment
The body motion perceived in two different frames is often referred to as body pose transition. Our sequence alignment
method is based on finding corresponding pose transitions in two actions sequences viewed by two different cameras. Our
starting point is the view-invariant similarity measure that was proposed in [110], [111]: Any triplet of body points in one
camera iamge and the corresponding triplet of points in a second camera define a homography H1 between the two cameras.
After a pose transition the triplets move to new positions defining a new homography H2 . If the triplet motions are similar, then
the two homographies become consistent with the fundamental matrix, and as a result the homography defined by H = H1 H−1
2
3
(a) Examples of walking (upper sequence) and jogging (lower sequence)
(b) Comparison of head-shoulders triplet in walking and jogging
(c) Comparison of head-hand-knee triplet in walking and jogging
Fig. 2. Roles of different triplets in action recognition
will become a homology, two of whose eigenvalues are equal. The equality of the two eigenvalues of the homography H,
provides thus a measure of similarity for the motion of the two point triplets, which is invariant to viewing angles and the
camera parameters [110]. Given 11 body points as shown in Figure 1, there are 165 such triplets, all of which are used to
provide a measure of similarity for the two actions being compared. We use the following measure of similarity over all
possible triplets for evaluating the similarity of two pose transitions T1 and T2 :
S (T1 , T2 )
=
X |ai − bi |
i
|ai + bi |
(1)
where ai and bi are the two closest eigenvaluse of the homography defined by the i-th triplet, and i = 1, ..., 165 for 11 body
points.
Given a target sequence of m pose transitions Aj , j = 1, ..., m and a reference sequence of n pose transitions Bj 0 ,
j 0 = 1, ..., n, in order to find the optimal alignment ψ : A → B between the two actions A and B, we build a matching
error matrix using (refeq:costf2), and use dynamic programming to find the optimal mapping. Once two sequences are aligned,
the main issue is to determine how similar the two actions are. In the next section, we describe our weighting-based action
recognition method.
4
IV. W EIGHTING - BASED H UMAN ACTION R ECOGNITION
To study the roles of body-point triplets in action recognition, we select two different sequences of walking action W A =
{I1...l } and W B = {J1...m }, and a sequence of running action R = {K1...n }. We then align sequence W B and R to W A,
using the alignment method described in Section 2, and obtain the corresponding alignment/mapping ψ : W A → W B and
ψ 0 : W A → R. As discussed in SectionD2, the similarity
of two poses is computed based on error scores of all body-point
E
triplets motion. For each matched poses Ii , Jψ(i) , we stack the error scores of all triplets as a vector Ve (i):


E(∆1 )
 E(∆2 ) 
,
(2)
Ve (i) = 


:
E(∆T )
where E(∆i ) =
|ai −bi |
|ai +bi | .
We then build an error score matrix Me for alignment ψW A→W B :
Me = Ve (1) Ve (2) . . .
Ve (l)
.
(3)
20
20
40
40
60
60
Triplet #
Triplet #
Each row i of Me indicates the dissimilarity scores of triplet i across the sequence, and the expected value of each column
j of Me is the dissimilarity score of pose Ij and JψW A→W B (j) . Similarly we build an error score matrix M0e for alignment
ψW A→R . Me and M0e are illustrated visually in Figure 3.
80
80
100
100
120
120
140
140
160
160
0
20
40
Frame #
60
10
20
Frame #
30
Fig. 3. Visual illustration of Me (left) and M0e (right)
To study the role of a triplet i in distinguishing walking and running, we compare the i-th row of Me and M0e , as plotted
in Figure 4 (a) - (f). We found that, some triplets such as triplets 1 , 21 and 90 have similar error scores in both cases, which
means the motion of these triplets are similar in walking and running. On the other hand, triplets 55, 94 and 116 have high
error scores in M0e and low error scores in Me , that is, the motion of these triplets in a running sequence is different from
their motion in a walking sequence. Triplets 55, 94 and 116 reflect the variation in actions of walking and running, thus are
more informative than triplets 1 , 21 and 90 for the task of distinguishing walking and running actions.
In the following experiments, we compare sequences of different individuals performing the same action, and study the roles
of triplets in categorizing them in the same group of action: Select four sequences G0, G1, G2, and G3 of golf-swing action,
and align G1, G2, and G3 to G0 using the alignment method described in Section 2, and then build error score matrix M1e ,
M2e , M3e correspondingly as in above experiments. From the illustrations of M1e , M2e , M3e in Figure 5 (a), (b) and (c). The
dissimilarity scores of some triplets, such as triplet 120 (see Figure 5 (f)) , is very consistent across individuals. Some other
triplets such as triplets 20 (Figure 5 (d)) and 162 (Figure 5 (e)) have various error score patterns across individuals, that is,
these triplets represent the variations of individuals performing the same action.
Definition 1: If a triplet reflects the essential differences between an action A and other actions, we call it a significant
triplet of action A. All triplets other than significant triplets are referred to as trivial triplets of action A.
5
1.6
0.5
1
0.8
0.6
0.4
dissimilarity score
1.2
0.3
0.2
1
0.8
0.6
0.4
0.1
0.2
0.2
0
Walk−walk
Walk−run
1.2
0.4
dissimilarity score
dissimilarity score
1.4
Walk−walk
Walk−run
Walk−walk
Walk−run
1.4
0
10
20
30
frame #
40
50
0
60
0
10
(a) Triplet 1
20
30
frame #
40
50
0
60
0
10
(b) Triplet 21
Walk−walk
Walk−run
30
frame #
40
50
60
(c) Triplet 90
1.4
1
20
1.4
Walk−walk
Walk−run
1.2
Walk−walk
Walk−run
1.2
0.6
0.4
1
dissimilarity score
dissimilarity score
dissimilarity score
0.8
0.8
0.6
0.4
1
0.8
0.6
0.4
0.2
0.2
0
0
10
20
30
frame #
40
50
60
0
0.2
0
10
(d) Triplet 55
20
30
frame #
40
(e) Triplet 94
50
60
0
0
10
20
30
frame #
40
50
60
(f) Triplet 116
Fig. 4. Roles of different triplets in action recognition. (a) - (f) are the plots of dissimilarity scores of some triplets across frames in the walk-walk and
walk-run alignments.
A typical significant triplet should (1) convey the variations between actions and/or (2) tolerate the individual variations of
the same action. For example, triplets 55, 94 and 116 are significant triplets for walking action, and triplet 20 is a significant
triplet for the golf-swing action.
Intuitively, in the task of action recognition, we should place more focus on the significant triplets while reducing the negative
impact of trivial triplets, that is, assigning appropriate influence factor to the body-point triplets. In our approach to action
recognition, this can be achieved by assigning appropriate weights to the similarity errors of body point triplets in equation
(1). That is, equation (1) could be rewritten as:
X
E(I1 → I2 , Ji → Jj ) =
(λi · E(∆i )),
(4)
all ∆i
where λ1 + λ2 + . . . + λT = 1, T = n3 , n is the number of body points in the human body model.
The next question is, how to determine the optimal set of weights λi for different actions. Manual assignment of weights
could be biased and difficult for a large database of actions, and is inefficient when new actions are added in. Automatic
assignment of weight values is desired for a robust and efficient action recognition system. To achieve this goal, we propose
to use a fixed size dataset of training sequences to learn weight values. Suppose we are given a training dataset T which
consists of K × J action sequences for J different actions, each of which with K pre-aligned sequences performed by various
individuals. Let λji be the weight value of body joint with label i (i = 1 . . . n) for the action j (j = 1 . . . J). Our goal is to
find optimal assignment of λji which maximize the similarity error between sequences of different actions and minimize those
of same actions. Since the size of the dataset and the alignments of sequences are fixed, this turns out to be an optimization
problem on λji . Our task is to define a good objective function f (λj1 , λj2 , . . . , λn ) for this purpose, and to apply optimization
to solve the problem.
A. Weights on Triplets versus Weights on Body Points
Given a human body model of n points, we could obtain at most n3 triplets, and need to solve a n3 dimensional
optimization problem for weight assignment. Even with a simplified human body model of 11 points, this yields a extremely
high dimensional ( 11
3 = 165 dimensions) problem. On the other hand, the body point triplets are not independent of each
other. In fact, adjacent triplets are correlated by their common body points, and the importance of a triplet is also determined
20
20
40
40
40
60
60
60
80
100
triplet #
20
triplet #
triplet #
6
80
100
120
120
120
140
140
140
160
160
10
20
30
frame #
40
50
160
60
10
20
(a) M1e
30
frame #
40
50
10
60
0.5
20
30
frame #
40
50
1.5
1
0
60
50
60
40
50
60
Golf−swing 0−1
Golf−swing 0−2
Golf−swing 0−3
0.4
0.5
0
40
0.5
Golf−swing 0−1
Golf−swing 0−2
Golf−swing 0−3
dissimilarity score
dissimilarity score
1
30
frame #
(c) M3e
2
Golf−swing 0−1
Golf−swing 0−2
Golf−swing 0−3
10
20
(b) M2e
1.5
dissimilarity score
80
100
0.3
0.2
0.1
0
10
(d) Triplet 20
20
30
frame #
40
50
0
60
0
(e) Triplet 162
10
20
30
frame #
(f) Triplet 20
Fig. 5. Roles of different triplets in action recognition
by the importance of its three corners (body points). Therefore, instead of using
assign n weights ω1...n to the body points P1...n , where:
n
3
variables for weights of n triplets, we
ω1 + ω2 + . . . + ωn = 1.
The weight of a triplet ∆ = Pi , Pj , Pk are then computed as:
λ∆ =
(5)
ωi + ωj + ωk
2(ωi + ωj + ωk )
.
=
n−1
(n − 1)(n − 2)
2
(6)
Note that the definition of λ in (6) ensures that λ1 + λ2 + . . . + λT = 1. Using (6), equation (4) is rewritten as:
2
(n − 1)(n − 2)
Median ((ωi + ωj + ωk ) · E(∆i,j,k )),
E(I1 → I2 , Ji → Jj ) =
1≤i<j<k≤n
(7)
By introducing weights {ω1...n } to body points, we reduce the high dimensional optimization problem to a lower dimensional,
and more tractable problem.
B. Automatic Adjustment of Weights
Before moving on to the automatic adjustment of weights, we first discuss the similarity score of two pre-aligned sequences.
Given two sequences A = {I1...N }, B = {J1...M }, and the known alignment ψ : A → B, the similarity of A and B is:
S (A, B) =
N
X
S(l, ψ(l)) = N τ − N
l=1
N
X
E(Il→r1 , Jψ(l)→r2 ),
(8)
l=1
where r1 and r2 are computed reference poses, and τ is a threshold, which we set as suggested in [110], [111]. Therefore, the
proximate similarity score of A and B is:
N
S¯(A, B) = N τ −
X
2·N
(n − 1)(n − 2)
X
l=1 1≤i<j<k≤n
(ωi + ωj + ωk ) · E l,ψ(l) (∆i,j,k ).
(9)
7
Considering that N , τ , n and E l,ψ(l) (∆i,j,k ) are constants given the alignment ψ, equation (9) can be further rewritten into a
simpler form:
n−1
X
S¯(A, B) = a0 −
ai · ωi ,
(10)
i=1
where {ai } are constants computed from (9).
Now let us return to our problem of automatic weights assignment for action recognition. As discussed earlier, a good
objective function would reflect the intuition that, significant triplets should be assigned higher weights, while trivial triplets
should be assigned lower weights. Suppose we have a training dataset T which consists of K × J action sequences for J
different actions, each of which with K pre-aligned sequences performed by various individuals. Tkj is the k-th sequence in
the group of action j, and Rj is the reference sequence of action j. To find the optimal weight assignment for action j, we
define the objective function as:
f j (ω1 , ω2 , . . . , ωn−1 ) = Q1 + αQ2 − βQ3 ,
(11)
where α and β are non-negative constants and
Q1 =
K
1 X ¯ j j
S (R , Tk ),
K
(12)
k=1

2
K
K
1 X ¯ j j 2
1 X ¯ j j 
Q2 =
S (R , Tk ) − 2
S (R , Tk ) ,
K
K
k=1
Q3 =
(13)
k=1
1
K(J − 1)
X
K
X
S¯(Rj , Tki ).
(14)
1≤i≤J,i6=j k=1
The optimal weights for action j are then computed using:
hω1 , ω2 , . . . , ωn−1 i =
argmax f j (ω1 , ω2 , . . . , ωn−1 ).
(15)
ω1 ,ω2 ,...,ωn−1
In this objective function, we use T1j as the reference sequence for action j, and the term Q1 and Q2 are the mean and
variance of similarity scores between T1j and other sequences in the same action. Q3 is the mean of similarity scores between
T1j and all sequences in other different actions. Hence f j (ω1 , ω2 , . . . , ωn−1 ) achieves high similarity scores for all sequences
of same action j, and low similarity scores for sequences of different actions. The second term Q2 may be interpreted as a
regularization term to ensure the consistency of sequences in the same group.
As Q1 and Q3 are linear functions, and Q2 is quadratic polynomial, our objective function f j (ω1 , ω2 , . . . , ωn−1 ) is quadratic
polynomial function, and the optimization problem becomes a quadratic programming (QP) problem. There are a variety of
methods for solving the QP problem, including interior point, active
set, conjugate
gradient, etc. In our problem, we adopted
the conjugate gradient method, with the initial weight values set to n1 , n1 , ..., n1 .
V. E XPERIMENTS
In this section, we apply the proposed weighting based approach to the action recognition problem, and compare its
performance with non-weighting methods proposed in [109]–[111]. For comparison, we use the same MoCap testing data
as in [109]–[111], and build a MoCap training dataset which consists of total of 2 × 17 × 5 = 170 sequences for 5 actions
(walk, jump, golf swing, run, and climb): each action is performed by 2 subjects, and each instance of action is observed by
17 cameras set up different random locations. The same 11-point human body model in Figure 1 is adopted in the training
data. We use the same set of reference sequences for the 5 actions, and align the sequences in the training set against the
reference sequences.
To obtain optimal weighting for each action j, we first aligned all testing sequences against the reference sequence Rj , and
stored the similarity scores of triplets for each pair of matched poses. The objective function f j (ω1 , ω2 , . . . , ω10 ) is then built
based on equation (11), and the computed similarity scores of triplets in the alignments. f j (·) is a 10-dimensional function,
and the weights ωi are constrained by
(
0 ≤ ωi ≤ 1, i = 1 . . . 10,
(16)
P10
i=1 ωi ≤ 1.
1 1
1
, 11 , . . . , 11
. The
The optimal weights hω1 , ω2 , . . . , ω10 i are then searched to maximize f j (·), with the initialization at 11
conjugate gradient method is then applied to solve this optimization problem.
After performing the above steps for all the actions, we obtained a set of weights W j for each action j in our database.
In order to compare our results with existing unweighted methods, we then carried out full comparisons with two methods
0.1
weight
weight
8
0.05
0
1
2
3
4
5 6 7
Point #
8
0.1
0.05
0
9 10 11
1
2
3
4
0.1
0.05
0
1
2
3
4
5 6 7
Triplet #
8
9 10 11
8
9 10 11
(2)
weight
weight
(1)
5 6 7
Point #
8
0.1
0.05
0
9 10 11
1
2
3
4
(3)
5 6 7
Triplet #
(4)
Fig. 6. Examples of computed weights. (1) and (2) are computed weights for walking and jumping correspondingly based on fundamental ratio invariant;
(3) and (4) are computed weights for walking and jumping correspondingly based on eigenvalue equality invariant.
TABLE I
C ONFUSION MATRIX USING EIGENVALUE EQUALITY INVARIANT: L ARGE VALUES ON THE DIAGONAL ENTRIES INDICATE ACCURACY.
Ground-truth
Walk
Jump
Golf Swing
Run
Climb
Walk
46
1
1
4
Jump
1
48
Recognized as
Golf Swing Run
1
48
2
1
Climb
2
1
1
48
1
44
proposed recently in the literature: action recognition using the fundamental ratios [109], and the method proposed in [110],
[111]. As these methods behave differently, the objective functions we obtain for estimating weights may contain slightly
different sets of coefficients. Figure 6 shows the computed weights for walking and jumping when using for the two methods.
Although the weights were slightly different, as shown in the figure similar patterns emerged in terms of significant and trivial
triplets: same triplets have relatively high weights in both results.
We repeated all the action recognition experiments reported in [109] and [110], [111] using the CMU MoCap data and
compared our performance. Results are summarized in Tables I and II for the two methods. The overall recognition rate is
93.6% using a weighted eigenvalue method, and 90.4% using a weighted fundamental ratios method, which are improved by
2% and 8.8% compared to the unweighted cases, respectively.
VI. C ONCLUSION
We propose a generic method of assigning weights to body points using a small set of training set, in order to improve action
recognition from video data. Our method is motivated by the studies in applied perception community on selective processing
of visual data for recognition. Our experimental results strongly support our hypothesis that weighting body points differently
for different actions leads to significant improvement in performance. Furthermore, since our formulation is based on invariant
features, our method shows outstanding performance in the presence of varying camera orientations and parameters. Finally,
we believe similar frameworks can be applied to other body models such as silhouette, motion flow, and stick-model.
TABLE II
C ONFUSION MATRIX USING FUNDAMENTAL RATIOS INVARIANT: L ARGE VALUES ON THE DIAGONAL ENTRIES INDICATE ACCURACY.
Ground-truth
Walk
Jump
Golf Swing
Run
Climb
Walk
45
2
3
4
Jump
1
47
1
1
1
Recognized as
Golf Swing Run
1
2
1
47
1
45
1
2
Climb
1
1
1
42
9
R EFERENCES
[1] M. Ahmad and S.W. Lee. HMM-based Human Action Recognition Using Multiview Image Sequences. ICPR’06, 1:263–266, 2006.
[2] Muhamad Ali and Hassan Foroosh. Natural scene character recognition without dependency on specific features. In Proc. International Conference on
Computer Vision Theory and Applications, 2015.
[3] Muhamad Ali and Hassan Foroosh. A holistic method to recognize characters in natural scenes. In Proc. International Conference on Computer Vision
Theory and Applications, 2016.
[4] Muhammad Ali and Hassan Foroosh. Character recognition in natural scene images using rank-1 tensor decomposition. In Proc. of International
Conference on Image Processing (ICIP), pages 2891–2895, 2016.
[5] Mais Alnasser and Hassan Foroosh. Image-based rendering of synthetic diffuse objects in natural scenes. In Proc. IAPR Int. Conference on Pattern
Recognition, volume 4, pages 787–790, 2006.
[6] Mais Alnasser and Hassan Foroosh. Rendering synthetic objects in natural scenes. In Proc. of IEEE International Conference on Image Processing
(ICIP), pages 493–496, 2006.
[7] Mais Alnasser and Hassan Foroosh. Phase shifting for non-separable 2d haar wavelets. IEEE Transactions on Image Processing, 16:1061–1068, 2008.
[8] Nazim Ashraf and Hassan Foroosh. Robust auto-calibration of a ptz camera with non-overlapping fov. In Proc. International Conference on Pattern
Recognition (ICPR), 2008.
[9] Nazim Ashraf and Hassan Foroosh. Human action recognition in video data using invariant characteristic vectors. In Proc. of IEEE Int. Conf. on Image
Processing (ICIP), pages 1385–1388, 2012.
[10] Nazim Ashraf and Hassan Foroosh. Motion retrieval using consistency of epipolar geometry. In Proceedings of IEEE International Conference on
Image Processing (ICIP), pages 4219–4223, 2015.
[11] Nazim Ashraf, Imran Junejo, and Hassan Foroosh. Near-optimal mosaic selection for rotating and zooming video cameras. Proc. of Asian Conf. on
Computer Vision, pages 63–72, 2007.
[12] Nazim Ashraf, Yuping Shen, and Hassan Foroosh. View-invariant action recognition using rank constraint. In Proc. of IAPR Int. Conf. Pattern
Recognition (ICPR), pages 3611–3614, 2010.
[13] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. Motion retrieval using low-rank decomposition of fundamental ratios. In Proc. IEEE International
Conference on Image Processing (ICIP), pages 1905–1908, 2012.
[14] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. Motion retrival using low-rank decomposition of fundamental ratios. In Image Processing (ICIP),
2012 19th IEEE International Conference on, pages 1905–1908, 2012.
[15] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. View-invariant action recognition using projective depth. Journal of Computer Vision and Image
Understanding (CVIU), 123:41–52, 2014.
[16] Nazim Ashraf, Chuan Sun, and Hassan Foroosh. View invariant action recognition using projective depth. Computer Vision and Image Understanding,
123:41–52, 2014.
[17] Vildan Atalay and Hassan Foroosh. In-band sub-pixel registration of wavelet-encoded images from sparse coefficients. Signal, Image and Video
Processing, 2017.
[18] Vildan A. Aydin and Hassan Foroosh. Motion compensation using critically sampled dwt subbands for low-bitrate video coding. In Proc. IEEE
International Conference on Image Processing (ICIP), 2017.
[19] Murat Balci, Mais Alnasser, and Hassan Foroosh. Alignment of maxillofacial ct scans to stone-cast models using 3d symmetry for backscattering
artifact reduction. In Proceedings of Medical Image Understanding and Analysis Conference, 2006.
[20] Murat Balci, Mais Alnasser, and Hassan Foroosh. Image-based simulation of gaseous material. In Proc. of IEEE International Conference on Image
Processing (ICIP), pages 489–492, 2006.
[21] Murat Balci, Mais Alnasser, and Hassan Foroosh. Subpixel alignment of mri data under cartesian and log-polar sampling. In Proc. of IAPR Int. Conf.
Pattern Recognition, volume 3, pages 607–610, 2006.
[22] Murat Balci and Hassan Foroosh. Estimating sub-pixel shifts directly from phase difference. In Proc. of IEEE International Conference on Image
Processing (ICIP), pages 1057–1060, 2005.
[23] Murat Balci and Hassan Foroosh. Estimating sub-pixel shifts directly from the phase difference. In Proc. of IEEE Int. Conf. Image Processing (ICIP),
volume 1, pages I–1057, 2005.
[24] Murat Balci and Hassan Foroosh. Inferring motion from the rank constraint of the phase matrix. In Proc. IEEE Conf. on Acoustics, Speech, and Signal
Processing, volume 2, pages ii–925, 2005.
[25] Murat Balci and Hassan Foroosh. Metrology in uncalibrated images given one vanishing point. In Proc. of IEEE International Conference on Image
Processing (ICIP), pages 361–364, 2005.
[26] Murat Balci and Hassan Foroosh. Real-time 3d fire simulation using a spring-mass model. In Proc. of Int. Multi-Media Modelling Conference, pages
8–pp, 2006.
[27] Murat Balci and Hassan Foroosh. Sub-pixel estimation of shifts directly in the fourier domain. IEEE Trans. on Image Processing, 15(7):1965–1972,
2006.
[28] Murat Balci and Hassan Foroosh. Sub-pixel registration directly from phase difference. Journal of Applied Signal Processing-special issue on Superresolution Imaging, 2006:1–11, 2006.
[29] M Berthod, M Werman, H Shekarforoush, and J Zerubia. Refining depth and luminance information using super-resolution. In Proc. of IEEE Conf.
Computer Vision and Pattern Recognition (CVPR), pages 654–657, 1994.
[30] Marc Berthod, Hassan Shekarforoush, Michael Werman, and Josiane Zerubia. Reconstruction of high resolution 3d visual information. In IEEE Conf.
Computer Vision and Pattern Recognition (CVPR), pages 654–657, 1994.
[31] Adeel Bhutta and Hassan Foroosh. Blind blur estimation using low rank approximation of cepstrum. Image Analysis and Recognition, pages 94–103,
2006.
[32] Adeel A Bhutta, Imran N Junejo, and Hassan Foroosh. Selective subtraction when the scene cannot be learned. In Proc. of IEEE International
Conference on Image Processing (ICIP), pages 3273–3276, 2011.
[33] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In Proc. ICCV, volume 2, pages 1395– 1402, 2005.
[34] AF Bobick and JW Davis. The recognition of human movement using temporal templates. TPAMI, 23(3):257–267, 2001.
[35] Hakan Boyraz, Syed Zain Masood, Baoyuan Liu, Marshall Tappen, and Hassan Foroosh. Action recognition by weakly-supervised discriminative region
localization.
[36] Ozan Cakmakci, Gregory E. Fasshauer, Hassan Foroosh, Kevin P. Thompson, and Jannick P. Rolland. Meshfree approximation methods for free-form
surface representation in optical design with applications to head-worn displays. In Proc. SPIE Conf. on Novel Optical Systems Design and Optimization
XI, volume 7061, 2008.
[37] Ozan Cakmakci, Brendan Moore, Hassan Foroosh, and Jannick Rolland. Optimal local shape description for rotationally non-symmetric optical surface
design and analysis. Optics Express, 16(3):1583–1589, 2008.
[38] Ozan Cakmakci, Sophie Vo, Hassan Foroosh, and Jannick Rolland. Application of radial basis functions to shape description in a dual-element off-axis
magnifier. Optics Letters, 33(11):1237–1239, 2008.
[39] LW Campbell, DA Becker, A. Azarbayejani, AF Bobick, and A. Pentland. Invariant features for 3-D gesture recognition. Automatic Face and Gesture
Recognition, 1996., Proceedings of the Second International Conference on, pages 157–162, 1996.
10
[40] Kristian L Damkjer and Hassan Foroosh. Mesh-free sparse representation of multidimensional LIDAR data. In Proc. of International Conference on
Image Processing (ICIP), pages 4682–4686, 2014.
[41] AA Efros, AC Berg, G. Mori, and J. Malik. Recognizing action at a distance. ICCV’03, pages 726–733, 2003.
[42] Farshideh Einsele and Hassan Foroosh. Recognition of grocery products in images captured by cellular phones. In Proc. International Conference on
Computer Vision and Image Processing (ICCVIP), 2015.
[43] H Foroosh. Adaptive estimation of motion using generalized cross validation. In 3rd International (IEEE) Workshop on Statistical and Computational
Theories of Vision, 2003.
[44] Hassan Foroosh. A closed-form solution for optical flow by imposing temporal constraints. In Proc. of IEEE International Conf. on Image Processing
(ICIP), volume 3, pages 656–659, 2001.
[45] Hassan Foroosh. An adaptive scheme for estimating motion. In Proc. of IEEE International Conf. on Image Processing (ICIP), volume 3, pages
1831–1834, 2004.
[46] Hassan Foroosh. Pixelwise adaptive dense optical flow assuming non-stationary statistics. IEEE Trans. on Image Processing, 14(2):222–230, 2005.
[47] Hassan Foroosh and Murat Balci. Sub-pixel registration and estimation of local shifts directly in the fourier domain. In Proc. International Conference
on Image Processing (ICIP), volume 3, pages 1915–1918, 2004.
[48] Hassan Foroosh and Murat Balci. Subpixel registration and estimation of local shifts directly in the fourier domain. In Proc. of IEEE International
Conference on Image Processing (ICIP), volume 3, pages 1915–1918, 2004.
[49] Hassan Foroosh, Josiane Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE Trans. on Image Processing,
11(3):188–200, 2002.
[50] Tao Fu and Hassan Foroosh. Expression morphing from distant viewpoints. In Proc. of IEEE International Conference on Image Processing (ICIP),
volume 5, pages 3519–3522, 2004.
[51] DM Gavrila. Visual analysis of human movement: A survey. CVIU, 73(1):82–98, 1999.
[52] A. Gritai, Y. Sheikh, and M. Shah. On the use of anthropometry in the invariant analysis of human actions. ICPR’004, 2, 2004.
[53] Apurva Jain, Supraja Murali, Nicolene Papp, Kevin Thompson, Kye-sung Lee, Panomsak Meemon, Hassan Foroosh, and Jannick P Rolland. Superresolution imaging combining the design of an optical coherence microscope objective with liquid-lens based dynamic focusing capability and
computational methods. In Optical Engineering & Applications, pages 70610C–70610C. International Society for Optics and Photonics, 2008.
[54] I Junejo, A Bhutta, and Hassan Foroosh. Dynamic scene modeling for object detection using single-class svm. In Proc. of IEEE International Conference
on Image Processing (ICIP), volume 1, pages 1541–1544, 2010.
[55] I. Junejo, E. Dexter, I. Laptev, and P. Perez. Cross-view action recognition from temporal self-similarities. In European Conference on Computer
Vision, volume 12, 2008.
[56] Imran Junejo and Hassan Foroosh. Dissecting the image of the absolute conic. In Proc. of IEEE Int. Conf. on Video and Signal Based Surveillance,
pages 77–77, 2006.
[57] Imran Junejo and Hassan Foroosh. Robust auto-calibration from pedestrians. In Proc. IEEE International Conference on Video and Signal Based
Surveillance, pages 92–92, 2006.
[58] Imran Junejo and Hassan Foroosh. Euclidean path modeling from ground and aerial views. In Proc. International Conference on Computer Vision
(ICCV), pages 1–7, 2007.
[59] Imran Junejo and Hassan Foroosh. Trajectory rectification and path modeling for surveillance. In Proc. International Conference on Computer Vision
(ICCV), pages 1–7, 2007.
[60] Imran Junejo and Hassan Foroosh. Using calibrated camera for euclidean path modeling. In Proceedings of IEEE International Conference on Image
Processing (ICIP), pages 205–208, 2007.
[61] Imran Junejo and Hassan Foroosh. Euclidean path modeling for video surveillance. Image and Vision Computing (IVC), 26(4):512–528, 2008.
[62] Imran Junejo and Hassan Foroosh. Camera calibration and geo-location estimation from two shadow trajectories. Computer Vision and Image
Understanding (CVIU), 114:915–927, 2010.
[63] Imran Junejo and Hassan Foroosh. Gps coordinates estimation and camera calibration from solar shadows. Computer Vision and Image Understanding
(CVIU), 114(9):991–1003, 2010.
[64] Imran Junejo and Hassan Foroosh. Optimizing ptz camera calibration from two images. Machine Vision and Applications (MVA), pages 1–15, 2011.
[65] Imran N Junejo, Nazim Ashraf, Yuping Shen, and Hassan Foroosh. Robust auto-calibration using fundamental matrices induced by pedestrians. In
Proc. International Conf. on Image Processing (ICIP), volume 3, pages III–201, 2007.
[66] Imran N. Junejo, Adeel Bhutta, and Hassan Foroosh. Single-class svm for dynamic scene modeling. Signal Image and Video Processing, 7(1):45–52,
2013.
[67] Imran N. Junejo and Hassan Foroosh. Trajectory rectification and path modeling for video surveillance. In Proc. International Conference on Computer
Vision (ICCV), pages 1–7, 2007.
[68] Imran N. Junejo and Hassan Foroosh. Estimating geo-temporal location of stationary cameras using shadow trajectories. In Proc. European Conference
on Computer Vision (ECCV), 2008.
[69] Imran N. Junejo and Hassan Foroosh. Gps coordinate estimation from calibrated cameras. In Proc. International Conference on Pattern Recognition
(ICPR), 2008.
[70] Imran N Junejo and Hassan Foroosh. Gps coordinate estimation from calibrated cameras. In Proc. International Conference on Pattern Recognition
(ICPR), pages 1–4, 2008.
[71] Imran N. Junejo and Hassan Foroosh. Practical ptz camera calibration using givens rotations. In Proc. IEEE International Conference on Image
Processing (ICIP), 2008.
[72] Imran N. Junejo and Hassan Foroosh. Practical pure pan and pure tilt camera calibration. In Proc. International Conference on Pattern Recognition
(ICPR), 2008.
[73] Imran N. Junejo and Hassan Foroosh. Refining ptz camera calibration. In Proc. International Conference on Pattern Recognition (ICPR), 2008.
[74] Imran N. Junejo and Hassan Foroosh. Using solar shadow trajectories for camera calibration. In Proc. IEEE International Conference on Image
Processing (ICIP), 2008.
[75] I. Laptev, S.J. Belongie, P. Perez, J. Wills, C. universitaire de Beaulieu, and UC San Diego. Periodic Motion Detection and Segmentation via Approximate
Sequence Alignment. ICCV’05, 1:816–823, 2005.
[76] Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. Sparse convolutional neural networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pages 806–814, 2015.
[77] Anne Lorette, Hassan Shekarforoush, and Josiane Zerubia. Super-resolution with adaptive regularization. In Proc. International Conf. on Image
Processing (ICIP), volume 1, pages 169–172, 1997.
[78] Sina Lotfian and Hassan Foroosh. View-invariant object recognition using homography constraints. In Proc. IEEE International Conference on Image
Processing (ICIP), 2017.
[79] Brian Milikan, Aritra Dutta, Qiyu Sun, and Hassan Foroosh. Compressed infrared target detection using stochastically trained least squares. IEEE
Transactions on Aerospace and Electronics Systems, page accepted, 2017.
[80] Brian Millikan, Aritra Dutta, Nazanin Rahnavard, Qiyu Sun, and Hassan Foroosh. Initialized iterative reweighted least squares for automatic target
recognition. In Military Communications Conference, MILCOM, IEEE, pages 506–510, 2015.
11
[81] Brian A. Millikan, Aritra Dutta, Nazanin Rahnavard, Qiyu Sun, and Hassan Foroosh. Initialized iterative reweighted least squares for automatic target
recognition. In Proc. of MILICOM, 2015.
[82] T.B. Moeslund and E. Granum. A survey of computer vision-based human motion capture. CVIU, 81(3):231–268, 2001.
[83] T.B. Moeslund, A. Hilton, and V. Krüger. A survey of advances in vision-based human motion capture and analysis. CVIU, 104(2-3):90–126, 2006.
[84] Brendan Moore, Marshall Tappen, and Hassan Foroosh. Learning face appearance under different lighting conditions. In Proc. IEEE Int. Conf. on
Biometrics: Theory, Applications and Systems, pages 1–8, 2008.
[85] Dustin Morley and Hassan Foroosh. Improving ransac-based segmentation through cnn encapsulation. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2017.
[86] V. Parameswaran and R. Chellappa. View invariants for human action recognition. CVPR’03, 2, 2003.
[87] Alessandro Prest, Cordelia Schmid, and Vittorio Ferrari. Weakly supervised learning of interactions between humans and objects. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 34(3):601–614, 2012.
[88] C. Rao, A. Yilmaz, and M. Shah. View-Invariant Representation and Recognition of Actions. IJCV, 50(2):203–226, 2002.
[89] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local SVM approach. ICPR’04, 3, 2004.
[90] A.C. Schütz, D.I. Brauna, and K.R. Gegenfurtnera. Object recognition during foveating eye movements. Vision Research, 49(18):2241–2253, 2009.
[91] Y.S. Sheikh and M.M. Shah. Exploring the Space of a Human Action. ICCV’05, 1, 2005.
[92] H Shekarforoush. Super-Resolution in Computer Vision. PhD thesis, PhD Thesis, University of Nice, 1996.
[93] H Shekarforoush, M Berthod, and J Zerubia. Sub-pixel reconstruction of a variable albedo lambertian surface. In Proceedings of the British Machine
Vision Conference (BMVC), volume 1, pages 307–316.
[94] H Shekarforoush and R Chellappa. adaptive super-resolution for predator video sequences.
[95] H Shekarforoush and R Chellappa. A multifractal formalism for stabilization and activity detection in flir sequences. In Proceedings, ARL Federated
Laboratory 4th Annual Symposium, pages 305–309, 2000.
[96] H Shekarforoush, R Chellappa, H Niemann, H Seidel, and B Girod. Multi-channel superresolution for images sequences with applications to airborne
video data. Proc. of IEEE Image and Multidimensional Digital Signal Processing, pages 207–210, 1998.
[97] Hassan Shekarforoush. Conditioning bounds for multi-frame super-resolution algorithms. Computer Vision Laboratory, Center for Automation Research,
University of Maryland, 1999.
[98] Hassan Shekarforoush. Noise suppression by removing singularities. IEEE Trans. Signal Processing, 48(7):2175–2179, 2000.
[99] Hassan Shekarforoush. Noise suppression by removing singularities. IEEE transactions on signal processing, 48(7):2175–2179, 2000.
[100] Hassan Shekarforoush, Amit Banerjee, and Rama Chellappa. Super resolution for fopen sar data. In Proc. AeroSense, pages 123–129. International
Society for Optics and Photonics, 1999.
[101] Hassan Shekarforoush, Marc Berthod, Michael Werman, and Josiane Zerubia. Subpixel bayesian estimation of albedo and height. International Journal
of Computer Vision, 19(3):289–300, 1996.
[102] Hassan Shekarforoush, Marc Berthod, and Josiane Zerubia. 3d super-resolution using generalized sampling expansion. In Proc. International Conf. on
Image Processing (ICIP), volume 2, pages 300–303, 1995.
[103] Hassan Shekarforoush, Marc Berthod, and Josiane Zerubia. Subpixel image registration by estimating the polyphase decomposition of the cross power
spectrum. PhD thesis, INRIA-Technical Report, 1995.
[104] Hassan Shekarforoush, Marc Berthod, and Josiane Zerubia. Subpixel image registration by estimating the polyphase decomposition of cross power
spectrum. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 532–537, 1996.
[105] Hassan Shekarforoush and Rama Chellappa. Blind estimation of psf for out of focus video data. In Image Processing, 1998. ICIP 98. Proceedings.
1998 International Conference on, pages 742–745, 1998.
[106] Hassan Shekarforoush and Rama Chellappa. Data-driven multi-channel super-resolution with application to video sequences. Journal of Optical Society
of America-A, 16(3):481–492, 1999.
[107] Hassan Shekarforoush and Rama Chellappa. A multi-fractal formalism for stabilization, object detection and tracking in flir sequences. In Proc. of
International Conference on Image Processing (ICIP), volume 3, pages 78–81, 2000.
[108] Hassan Shekarforoush, Josiane Zerubia, and Marc Berthod. Denoising by extracting fractional order singularities. In Proc. of IEEE International Conf.
on Acoustics, Speech and Signal Processing (ICASSP), volume 5, pages 2889–2892, 1998.
[109] Y. Shen and H. Foroosh. View-invariant action recognition using fundamental ratios. In Proc. of CVPR, 2008.
[110] Y. Shen and H. Foroosh. View-invariant recognition of body pose from space-time templates. In Proc. of CVPR, 2008.
[111] Y. Shen and H. Foroosh. View-invariant action recognition from point triplets. IEEE Trans. Pattern Anal. Mach. Intell., 31(10):1898–1905, 2009.
[112] Yuping Shen, Nazim Ashraf, and Hassan Foroosh. Action recognition based on homography constraints. In Proc. of IAPR Int. Conf. Pattern Recognition
(ICPR), pages 1–4, 2008.
[113] Yuping Shen and Hassan Foroosh. View-invariant action recognition using fundamental ratios. In Proc. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 1–6, 2008.
[114] Yuping Shen and Hassan Foroosh. View invariant action recognition using fundamental ratios. In Proc. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2008.
[115] Yuping Shen and Hassan Foroosh. View-invariant recognition of body pose from space-time templates. In Proc. of IEEE Conf. on Computer Vision
and Pattern Recognition, pages 1–6, 2008.
[116] Yuping Shen and Hassan Foroosh. View invariant recognition of body pose from space-time templates. In Proc. IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2008.
[117] Yuping Shen and Hassan Foroosh. View-invariant action recognition from point triplets. IEEE Transactions on Pattern Analysis and Machine Intelligence
(PAMI), 31(10):1898–1905, 2009.
[118] Chen Shu, Luming Liang, Wenzhang Liang, and Hassan Forooshh. 3d pose tracking with multitemplate warping and sift correspondences. IEEE Trans.
on Circuits and Systems for Video Technology, 26(11):2043–2055, 2016.
[119] Chuan Sun and Hassan Foroosh. Should we discard sparse or incomplete videos? In Proceedings of IEEE International Conference on Image Processing
(ICIP), pages 2502–2506, 2014.
[120] Chuan Sun, Imran Junejo, and Hassan Foroosh. Action recognition using rank-1 approximation of joint self-similarity volume. In Proc. IEEE
International Conference on Computer Vision (ICCV), pages 1007–1012, 2011.
[121] Chuan Sun, Imran Junejo, and Hassan Foroosh. Motion retrieval using low-rank subspace decomposition of motion volume. In Computer Graphics
Forum, volume 30, pages 1953–1962. Wiley, 2011.
[122] Chuan Sun, Imran Junejo, and Hassan Foroosh. Motion sequence volume based retrieval for 3d captured data. Computer Graphics Forum, 30(7):1953–
1962, 2012.
[123] Chuan Sun, Imran Junejo, Marshall Tappen, and Hassan Foroosh. Exploring sparseness and self-similarity for action recognition. IEEE Transactions
on Image Processing, 24(8):2488–2501, 2015.
[124] Chuan Sun, Marshall Tappen, and Hassan Foroosh. Feature-independent action spotting without human localization, segmentation or frame-wise
tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2689–2696, 2014.
[125] Amara Tariq and Hassan Foroosh. Scene-based automatic image annotation. In Proc. of IEEE International Conference on Image Processing (ICIP),
pages 3047–3051, 2014.
12
[126] Amara Tariq and Hassan Foroosh. Feature-independent context estimation for automatic image annotation. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pages 1958–1965, 2015.
[127] Amara Tariq, Asim Karim, and Hassan Foroosh. A context-driven extractive framework for generating realistic image descriptions. IEEE Transactions
on Image Processing, 26(2):619–632, 2002.
[128] Amara Tariq, Asim Karim, and Hassan Foroosh. Nelasso: Building named entity relationship networks using sparse structured learning. IEEE Trans.
on on Pattern Analysis and Machine Intelligence, 2017.
[129] Amara Tariq, Asim Karim, Fernando Gomez, and Hassan Foroosh. Exploiting topical perceptions over multi-lingual text for hashtag suggestion on
twitter. In The Twenty-Sixth International FLAIRS Conference, 2013.
[130] L. Wang. Abnormal Walking Gait Analysis Using Silhouette-Masked Flow Histograms. ICPR’06, 3:473–476, 2006.
[131] L. Wang, W. Hu, T. Tan, et al. Recent developments in human motion analysis. Pattern Recognition, 36(3):585–601, 2003.
[132] L. Wang and D. Suter. Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model. In CVPR’07,
pages 1–8, 2007.
[133] L. Wang, T. Tan, H. Ning, and W. Hu. Silhouette analysis-based gait recognition for human identification. IEEE Trans. PAMI, 25(12):1505–1518,
2003.
[134] Min Wang, Baoyuan Liu, and Hassan Foroosh. Factorized convolutional neural networks. arXiv preprint arXiv:1608.04337, 2016.
[135] Bangpeng Yao and Li Fei-Fei. Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9):1691–1703, 2012.
[136] Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei. Human action recognition by learning bases of action
attributes and parts. In Computer Vision (ICCV), 2011 s, pages 1331–1338. IEEE, 2011.
[137] A. Yilmaz and M. Shah. Actions sketch: a novel action representation. CVPR’05, 1, 2005.
[138] A. Yilmaz and M. Shah. Matching actions in presence of camera motion. CVIU, 104(2-3):221–231, 2006.
[139] V.M. Zatsiorsky. Kinematics of Human Motion. Human Kinetics, 2002.
[140] G. Zhu, C. Xu, W. Gao, and Q. Huang. Action Recognition in Broadcast Tennis Video Using Optical Flow and Support Vector Machine. LNCS,
3979:89, 2006.