Transfer Learning to Account for Idiosyncrasy in
Face and Body Expressions
Bernardino Romera-Paredes
Min S. H. Aung
Massimiliano Pontil
Department of Computer Science
University College London
London, UK, WC1E 6BT
{ucabbro, m.aung, m.pontil}@ucl.ac.uk
Nadia Bianchi-Berthouze
Amanda C. de C. Williams
Division of Psychology
and Language Sciences
University College London
London, UK, WC1E 6BT
{n.berthouze, amanda.williams}@ucl.ac.uk
Abstract—In this paper we investigate the use of the Transfer
Learning (TL) framework to extract the commonalities across
a set of subjects and also to learn the way each individual
instantiates these commonalities to model idiosyncrasy. To implement this we apply three variants of Multi Task Learning,
namely: Regularized Multi Task Learning (RMTL), Multi Task
Feature Learning (MTFL) and Composite Multi Task Feature
Learning (CMTFL). Two datasets are used; the first is a set
of point based facial expressions with annotated discrete levels
of pain. The second consists of full body motion capture data
taken from subjects diagnosed with chronic lower back pain. A
synchronized electromyographic signal from the lumbar paraspinal muscles is taken as a pain-related behavioural indicator.
We compare our approaches with Ridge Regression which is a
comparable model without the Transfer Learning property; as
well as with a subtractive method for removing idiosyncrasy. The
TL based methods show statistically significant improvements in
correlation coefficients between predicted model outcomes and
the target values compared to baseline models. In particular
RMTL consistently outperforms all other methods; a paired ttest between RMTL and the best performing baseline method
returned a maximum p-value of 2.3 × 10−4 .
I.
I NTRODUCTION
Face and body movements are highly informative channels to communicate and infer emotional state and intensity.
However, these movement patterns often inherently contain
idiosyncrasies. These are further compound with artifacts stemming from variable environmental influences. These factors
pose particular challenges in developing recognition models
which seek to identify behaviours and expressions attributed
to psychological states and affect in people. This confound
is particularly salient if available datasets do not contain a
high number or a wide enough variety of subjects. In such
cases the biases caused by idiosyncratic behaviours could be
significant if it is ignored or not treated carefully. This is
particularly important for researchers focusing on naturalistic
expressions. In many affect recognition studies this issue is not
taken as a central consideration. (For a comprehensive survey
which includes affect recognition from the face see [26] and
specifically for body motion, see [13]. Additionally Gao et al.
Appearing in Proceedings of the 10th International Conference on Automatic Face and Gesture Recognition (FG) 2013, Shanghai, China.
Paul Watson
Department of Health Sciences
University of Leicester
Leicester, UK, LE5 4PW
[email protected]
reviews methods to automatically recognise affect from touch
behaviour [11]).
Bernhardt & Robinson [4] showed that removal of idiosyncrasy improved the performance of Support Vector Machine
based models in classifying affect from hand and arm motions.
In that study, idiosyncrasy was considered to be an additive
constant and individuals’ mean feature values were subtracted
as a way to remove this bias. In [24] the authors apply affine
based registration obtained using calibration samples to remove
inter-personal shape differences for the recognition of temporal
phases in facial expression. A different approach was proposed
in [16] which decouples the contribution caused by identity and
expression using bilinear models [23]. This strategy is interesting for analyzing and synthesizing different combinations of
factors (identity-affect) but at present in supervised learning
this is constrained to classification and does not generalize to
regression.
The aim in this study is not to decouple and subtract
identity salient information and apply models to the remainder.
Rather the view is taken that there exists a commonality within
a group of subjects expressing the same affective state, and that
each subject will provide an instance of this commonality in a
different way. It is the differences in these instantiations that
represents idiosyncrasy and we seek to use this information
rather than subtract it. For example, during periods of intense
pain, the shrinking of the eye can be considered to be a
commonality but each individual will give an instance of
this in their own idiosyncratic way. To this end we exploit
methodologies that can determine the commonalities but are
also adaptive to new instances. One such paradigm which has
this capacity is Transfer Learning (TL) [17] (also known as
Learning to Learn [14] or Inductive Transfer [8], [19], [21]).
In general TL extracts knowledge from a set of transfer tasks
to be applied in the learning process of one or more target
tasks. It is inspired by the fact that human beings are able to
use knowledge from previous learned tasks in order to face
a new one [3]. There are works in psychology [5] which
provide support to the hypothesis that the knowledge acquired
by human beings in learning physical movements makes easier
the learning process of new movements.
To implement this “transfer” of learning we use the Multi-
Task Learning framework (MTL). TL can be seen as an
asymmetric modification of MTL where there is an explicit
distinction between transfer and target tasks. In general MTL
is a supervised learning framework which considers several
learning tasks together so that by taking advantage of the
commonalities it can achieve a better performance in each of
the tasks compared to solving them separately. Indeed, many
learning problems are not usually isolated in real life but are
intrinsically related to others. Previous MTL studies show that
it is possible to use commonalities among functions and obtain
more accurate models [7], [6], [22], [2]. In [20] the authors
showed that MTL can also be used for the aforementioned
decoupling approach to account for idiosyncrasy by assuming
that affect related tasks are orthogonal to identity tasks. To
the best of our knowledge TL approaches have not been
exhaustively explored within the recognition of non verbal
expressions of behaviour and affect.
In this study, we investigate the implementation of three
variants of MTL which we will refer to as 1) Regularized
Multi Task Learning (RMTL), 2) Multi Task Feature Learning (MTFL) and 3) Composite Multi Task Feature Learning
(CMTFL). Section II will further describe these three proposed
methodologies outlining the differences and the assumptions.
Section III details the acquisition methods and content of the
two datasets used in this study. Section IV shows comparative
correlation coefficients between the predicted outputs of each
of the proposed models as well as three baseline models
and the target outcomes. Lastly, we conclude and discuss the
findings in section V.
II.
M ETHODS FOR T RANSFER L EARNING ON FACE AND
B ODY M OTION
In this study three variants of MTL are implemented which
are based on optimization problems that use regularizers on
the task parameters to encode the relationships among the
tasks. For all three cases, learning is applied over two stages: a
transfer stage and a calibration stage. In the transfer stage, data
from a group of subjects (which we will refer to as transfer
subjects) performing actions which can convey information
about a ground truth are gathered. A supervised learning model
is trained to extract all of the information from these transfer
subjects regarding the desired ground truth (e.g. levels of pain
intensity). During the calibration stage, this model will be
adapted for a target subject. In order for this system to perform
well on the target subject, some labeled instances are taken.
Also, these samples do not need to represent the full outcome
range (i.e. all the levels of pain). However it must be noted
that the number of target labeled instances can be far fewer
than the number from the transfer subjects. In essence, we
use the model trained at the transfer stage and the new target
instances to create a personalized recognition model for the
target subject which takes into account commonalities and the
idiosyncrasies.
A. Notation
We assume the following setting: the model we want to
build has to be able to learn a set of T transfer linear tasks
ft (x) = hwt , xi, x, wt ∈ Rd , ∀t ∈ {1 . . . T }, where d is the
dimensionality of the data. In order to learn these tasks, a set
of labeled instances is provided: {Xt ,Yt }, Xt ∈ Rd×mt , Yt ∈ Rmt ,
where Xt is the matrix composed of all the mt instances
provided for task t as columns and Yt contains the labels. With
a slight abuse of notation, we denote X = {X1 , X2 , . . . , XT } and
Y = {Y1 , Y2 , . . . ,YT }. Let W be the matrix composed of the
T weight vectors wt as columns, that is, W = [w1 , . . . , wT ].
Then in the transfer stage the following optimization problem
is solved:
2
T min ∑ Xt⊤ wt −Yt + γΩ (W,C) ,
W,C t=1
2
(1)
where W contains the estimators for the T transfer tasks
and C contains information about the commonalities among
the tasks. Ω is called the regularizer and provides an intuitive
mechanism to incorporate similarities among tasks weight
vectors; and γ > 0 is a hyperparameter which needs to be tuned
beforehand and ponders the importance of the regularizer with
respect to the empirical loss. As noted above, we will consider
three different functions Ω, which implement three different
sets of assumptions about the relationships between the tasks.
We focus on regression problems but the generalization to
classification problems is straightforward by changing the loss
function.
We denote as XC and YC the instances and labels provided
by the target subject for the calibration stage. To do so, we
need to take C and optimize the following problem:
2
T min ∑ XC⊤ wC −YC + γΩ (wC ,C) .
wC
t=1
2
(2)
Note that in this case C is fixed and the optimization is
performed only on the weight vector of the target task wC .
Once this weight vector wC is learned, it can be used to perform
inference on new instances. In the following we will study
three MTL approaches which can be described within this
framework and use different regularizers Ω and different ways
to encode the common information C learned in the transfer
stage.
B. Regularized MTL
One way of defining relations among a set of tasks is by
looking at similarities between the parameters. Given that the
tasks are defined by linear functions, one may assume that
their weight vectors are close to each other; therefore, one
could penalize the distances among these weight vectors. One
way of formulating this idea is by assuming that all tasks
weight vectors are close to a reference vector w0 which also
needs to be learned. This idea was proposed in [9], where the
authors assume that wt = w0 + vt , ∀t ∈ {1 . . . T }, where vt is
what makes task t different from the others and therefore it is
assumed to be small. This assumption is taken into account by
adding regularization terms into the problem which penalizes
the square distance between any wt and w0 (in other words,
penalizing the module of vt , ∀t ∈ {1 . . . T }).
The resultant approach is based on the following regularizer:
1 T
Ω (W, w0 ) = ∑ kwt − w0 k22 + λ kw0 k22 ,
T t=1
D. Composite Multi-Task Feature Learning
(3)
where λ ≥ 0 is a regularization parameter which needs
to be tuned a priori. It controls the prior information we
have about both how close w0 is to 0, and how all weight
vectors wt are close to w0 . The first condition is encouraged for
high values of λ whereas the second condition becomes more
important for values of λ close to 0. Note that the reference
vector w0 corresponds to the commonalities, C in problem (1),
learned in the transfer stage. Consequently, by applying this
model at the calibration stage, we encourage the weight vector
of the target subject to be close to this reference vector.
This approach is studied in [10] and makes the assumption
that the divergence among tasks can be expressed in low
dimensionality, that is, the tasks are a low rank perturbation of
a common “mean task”. It can be seen as a combination of the
previous two approaches as the tasks are assumed to be near
a reference task w0 , as in eq. (3), and the differences between
the tasks and w0 can be expressed as linear combinations of
a few components, as in eq. (5). The equation to minimize is
as follows.
W, Dt=1
T
γ ∑ (wt − w0 )⊤ D−1 (wt − w0 ) + λ kw0 k22
C. Multi-Task Feature Learning
2
T min ∑ Xt⊤Uat −Yt + γ kAk22,1
2
(4)
That is, the matrix U is used to rotate the data, so that
there are some projections which can be useful for all tasks.
Consequently, U is constrained to be an orthonormal matrix.
The matrix A contains, for each column t, the weights of task t
for the components learned in U. Since the original assumption
was that all tasks use the same low dimensional representation
of the data, an ℓ2,1 -norm based regularization term is added so
that A is encouraged to have only a few non-zero rows.
For a better understanding, we can explore the similarities
between this approach and the unsupervised approach Principal
Component Analysis (PCA) [18]. The latter procedure obtains
a set of components (linear combinations of the attributes) so
that they have the largest possible variance whereas in eq.
(4), the components are obtained so that they are as useful as
possible for all tasks. The product Uat makes the problem (4)
non convex, yet the authors show that this is equivalent to the
following convex problem:
2
T
T min ∑ Xt⊤ wt −Yt + γ ∑ wt⊤ D−1 wt
2
s.t :W ∈ Rd×T , D ∈ Rd×d , D 0, tr (D) ≤ 1.
It is easy to check that this problem has the form
of eq. (1), where the commonalities learned in the transfer learning (C in eq. (1)) are defined in both elements
D
and w0 and the regularizer takes the form Ω (w, D) =
λ kw k2 + T w⊤ D−1 w if D 0, tr (D) ≤ 1
∑ t
t
0 2
γ
.
t=1
∞
otherwise
III.
s.t : A ∈ Rd×T , U ∈ Rd×d , U ⊤U = I.
W, Dt=1
(6)
t=1
A different way of characterizing how tasks are related is
by assuming that they share a low dimensional representation
of the data. In other words, one assumes that the tasks’ vectors
wt are linear combinations of a few common basis vectors
which need to be estimated from the data. This assumption
is studied in [1], where the authors propose the following
optimization problem:
U, A t=1
T 2
min ∑ Xt⊤ wt −Yt 2 +
(5)
t=1
s.t : W ∈ Rd×T , D ∈ Rd×d , D 0, tr (D) ≤ 1,
where Ŵ = Û Â. Note thateq. (5) is an instantiation of prob T w⊤ D−1 w if D 0, tr (D) ≤ 1
∑
t
lem (1) where Ω (w, D) = t=1 t
.
∞
otherwise
Consequently, D encodes the commonalities learned in the
transfer stage which will be transmitted to the calibration stage,
as it implicitly conveys information about the common features
which are used by all transfer tasks W .
DATA D ESCRIPTION
Two different datasets were selected to verify the generality
of our approach to different modalities and types of tasks:
recognition of levels of pain expressions, and recognition of
level of muscle activation during physical exercise.
1) UNBC-McMaster Shoulder Pain Expression Archive
- This is a publically distributed database (for a detailed
description see [15]) which contains facial expressions from 25
subjects who suffer from shoulder pain while they performed
active and passive exercises. We seek to predict the level
of pain from non acted expressions exhibited from the face.
We use all 66 tracked landmark points extracted (by way of
Active Appearance Models [15]) from video. From this the
distance values of 147 edges were used as the initial feature
set, see Figure IV.1. The target outcome is the expressed pain
level which was rated using the Prkachin and Solomon Pain
Intensity (PSPI) scale; a metric for defining pain based on
facial action units.
2) Multi-Modal Chronic Lower Back Pain Dataset - this
dataset contains motion capture data and synchronized muscle
activity signals acquired using wireless surface Electromyographic (sEMG). From this dataset, we aim to predict the
lumbar paraspinal muscle activity from a range of full body
positions. Since this activity has been shown to be an indicator
of chronic pain-related behaviour [12], [25], creating robust
motion based predictors of sEMG could lead to non intrusive
methods to pain in markerless motion capture rehabilitation
technology. The sEMG probes were adhered to the skin on the
lumbar paraspinal (LPS) muscles. Body movement information
was acquired using a Velcro strapped suit containing Inertial
Measurement Units (IMU), (Animazoo IGS-180). This suit
consisted of 18 IMU devices attached to each body segment
capturing motion data at 60Hz. The subjects in this dataset
consisted of diagnosed chronic lower pain patients. Each of
whom undertook a prescribed set of simple exercises: one leg
stand, repeated sit to stand, reaching forwards, bending and
a short walk. The data at each time instance is translated to
a matrix of triplets indicating the location of 26 anatomical
nodes in a global co-ordinate frame. In order to define the
body configuration, the relative distances between each of the
following nodes are calculated: crown of the head, hands,
segmental midpoint of the spine, hip, upper arms, thighs and
shanks. The target outcome is the smooth upper boundary of
the rectified sEMG signal to characterize the degree of LPS
muscle activity. A delay of 100 frames is applied to the motion
capture stream since the onset and offset of muscle activity
usually precedes the manifest movement.
IV.
R ESULTS
The three aforementioned TL based approaches: RMTL,
MTFL and CMTL are compared with three baseline methods
on both datasets. The baseline methods are based on Ridge
Regression which is derived from a variant of the minimization
expression in eq. (1); where the regularization term is replaced
by a Frobenius norm. Ridge Regression can be viewed as as
a special case of the MTL model without a term to capture
commonality information and thus has no capacity for TL. The
motivation for this comparison is to test the effect of TL directly. All experiments have the following settings in common.
If T is the total number of subjects within a dataset, then T − 2
randomly chosen subjects will form the transfer set. From the
remaining two subjects, one will be used for validation and the
other as a target subject. The validation subject is used to tune
the hyperparameters within the six approaches and the target
subject is used to assess their accuracy. From this we follow
a leave one-subject-out validation process (LOSO). Overall,
the whole process is repeated R times and the corresponding
averaged results are shown for each experiment. For each of
the transfer subjects, m instances are sampled for the transfer
stage. From the validation and test subject, a set of n instances
are sampled to be used as training set in the calibration stage.
In the validation procedure, we considered the values 10k , with
k ∈ {−6, −4, . . . , 6, 8} for all hyperparameters. We compare
three baseline methods and the three TL methods applied to
all experiments.
•
Ridge Regression (RR): In this basic model, the
parameters are not trained on any transfer subject
information, only the instances of the target subject
are used. This is implemented simply to measure the
improvement when transfer subjects information is
used.
•
Grouped Ridge Regression (GRR): The parameters
in this model are learned by grouping the training
instances of all tasks from all subjects (both transfer
and target).
•
Grouped Ridge Regression with mean subtracted bias
(GRR-b). The method for removing idiosyncratic bias
outlined in [4] is applied to the instances prior to the
GRR model. The averages to all instances belonging
to one subject are subtracted.
•
Regularized MTL (RMTL): It applies the approach
explained in subsection II.B to perform TL.
•
Multi-Task Feature Learning (MTFL): TL is carried
out as proposed in subsection II-C.
•
Composite Multi-Task Feature Learning (CMTL): It
performs TL by applying the approach described in
subsection II-D.
A. Pain Level Recognition from Faces
The objective is to build a model able to predict degree
of pain from frame by frame facial expression. Since the
level of pain is an integer valued scale between 0 - 15
(PSPI score scale), we employ correlation as a performance
measure. The absolute distances between facial points are used
as the descriptors in each time frame (see Fig. 1). We have
not used patient 15 since all the instances have a 0 PSPI
level. Therefore, T = 24 subjects. Since the dataset is highly
unbalanced in terms of PSPI label distribution (82% of the
instances have 0 PSPI level), the training instances for each
patient have been selected randomly so that half of them have
PSPI> 0.
Fig. 1.
Face edges used as descriptors of the instances of the UNBCMcMaster Shoulder Pain Expression Archive.
Two experiments are carried out to test the effect of varying
m with a fixed n and also varying n with a fixed m. For the
first experiment, the number of instances in the training set
of the target (and validation) task has been set to n = 6 and
the training set size for the transfer tasks varies in the range
m = {10, 30, 50, 70, 90}. The process is repeated R = 75 times
and the averaged results are presented in figure 2 (top). In the
second experiment, we have fixed the training set size of the
transfer tasks and vary the number of instances of the target
tasks n = {2, 4, 6, 8, 10} in order to study the learning curve.
The average results over R = 50 trials are presented in figure
2 (bottom).
B. LPS Muscle Activity Prediction using Motion Capture
Similar to the experiments described in section IV.A, the
two m and n variation tests are carried out on the body motion
data. For these experiments T = 6 subjects. The number of
instances in the training set of the target (and validation)
task has been set to n = 15 and the training set size for the
transfer tasks varies in the range m = {10, 30, 50, 70, 90}. The
process is repeated R = 150 times and the averaged results are
presented in figure 3 (top).
For the second experiment we vary n = {4, 8, 12, 16, 20}.
Overall the process is repeated R = 200 times and the averaged
results can be seen in figure 3 (bottom).
0.42
0.52
0.41
0.4
0.48
Correlation
Correlation
0.5
0.46
0.42
0.4
10
20
30
40
50
60
m (Transfer Instances)
70
80
0.38
0.37
RR
GRR
GRR−b
RMTL
MTFL
CMTFL
0.44
0.39
RR
GRR
GRR−b
RMTL
MTFL
CMTFL
0.36
0.35
0.34
10
90
0.55
20
30
40
50
60
m (Transfer Instances)
70
80
90
0.44
0.42
0.5
0.4
0.38
Correlation
Correlation
0.45
0.4
0.35
RR
GRR
GRR−b
RMTL
MTFL
CMTFL
0.3
0.25
2
3
4
5
6
7
n (Calibration Instances)
8
9
0.34
0.32
RR
GRR
GRR−b
RMTL
MTFL
CMTFL
0.3
0.28
0.26
10
Fig. 2.
Mean correlation coefficents over all LOSO validation folds.
The scores show the correlations between model predictions from facial
expressions and the labelled PSPI pain level. (Top) The number of randomly
sampled training instances m taken from the transfer subjects is shown in the
horizontal axis. The number of calibration instances n is fixed at 6. (Bottom)
The number of randomly sampled calibration instances n taken from the
validation and test subjects is shown in the horizontal axis. The number of
transfer instances m is fixed at 90.
V.
0.36
0.24
4
6
8
10
12
14
n (Calibration Instances)
16
18
20
Fig. 3. Mean correlation coefficents over all LOSO validation folds. The
scores show the correlations between model predictions from body position
and LPS muscle activity. (Top) The number of randomly sampled training
instances m taken from the transfer subjects is shown in the horizontal axis.
The number of calibration instances n taken from the target subject is fixed at
6. (Bottom) The number of randomly sampled calibration instances n taken
from the validation and test subjects is shown in the horizontal axis. The
number of transfer instances m is fixed at 90.
D ISCUSSION AND C ONCLUSION
In both comparative experiments described in sections
IV.A and B a twofold trend can be seen: firstly among the
non TL based methods it can be seen the application of the
subtractive bias removal delivers a distinct improvement. The
correlation coefficient for GRR-b is consistently higher than
the scores for GRR and RR, as can be seen in figures 2
(top) and 3 (top). Secondly, within the TL based methods
RMTL consistently outperforms CMTL and MFTL. Within the
comparison between the TL based methods and non TL based
methods it can be seen that GRR-b has returned a very similar
performance to MTFL and CMTL for muscle activity prediction but is outperformed by RMTL. For PSPI prediction GRRb is outperformed by all TL based methods. Figure 2 (bottom)
shows a consistent monotonic improvement given an increase
in n for the TL based methods. Figure 3 (bottom) repeats this
for RMTL and MTFL. With the exception of RR, the non TL
based methods show no improvement with the inclusion of a
small number of instances n in their training. The improvement
in RR is expected since this model is entirely trained on the n
instances. A notable difference between the TL based methods
(RMTL, MTFL and CMTL) and GRR and GRR-b is the effect
of incrementing the number of calibration instances n. This
can be seen in figures 2 (bottom) and 3 (bottom), where GRR
and GRR-b are flat. These methods are based on a traditional
supervised learning procedure where the n instances will only
serve as a few further instances of training samples. Therefore
this relatively small increment in the training set size will
have little effect. In contrast, an increase in n has a far more
significant impact on the results produced by the TL-based
methods. This demonstrates the power of a few calibration
instances in the TL framework and lends itself to modeling
idiosyncrasy in an effective way. To test if the results between
the various methods were statistically different, we applied
a paired t-test. The improvement of RMTL was statistically
significant with respect to GRR-b, obtaining p = 1.4 × 10−15
for the face results and p = 2.3 × 10−4 for the body movement
outputs.
The superiority of GRR-b over GRR and RR supports
[8]
5
sEMG
GRR−b
RMTL
4
[9]
3
EMG level
2
[10]
1
[11]
0
−1
[12]
−2
−3
0
2000
4000
6000
8000
Time Sampled at 60Hz
10000
12000
Fig. 4. Example of the predicted sEMG values from RMTL (light solid line)
and GRR-b (dashed line) with the ground truth value (dark solid line)
[13]
[14]
[15]
the findings in [4] showing that idiosyncratic bias removal
improves the effectiveness of recognition models. However,
it can be seen that RMTL correlates with the ground truth
values from both datasets consistently better compared to all
other models including GRR-b, see figure 4 for a sample of
this from the sEMG predictions. This could be due to the
appropriateness of the weight vector assumptions regarding
the commonality information in RMTL. Finally it should be
noted that both GRR and GRR-b have an advantage compared
to TL methods since the former methods are able to access
to transfer instances in the calibration stage whereas the latter
set of methods only use the learned model from the transfer
stage. Generally, the findings in this work demonstrate that
the utilisation of idiosyncratic information by TL improves
recognition within this method set. An expanded study which
makes comparisons with other baseline regression models
could follow.
[16]
[17]
[18]
[19]
[20]
[21]
ACKNOWLEDGEMENTS
[22]
The authors of this study are funded by the Engineering and Physical Sciences
Research Council, UK (EP/H017178/1), Leicester: EP/H017194/1; part of the Pain
rehabilitation: E/Motion-based automated coaching project (www.emo-pain.ac.uk).
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
A. Argyriou, T. Evgeniou, and M. Pontil. “Convex multi-task feature
learning”. Machine Learning, 73(3):243–272, 2008.
J. Baxter. “A Model of Inductive Bias Learning”. Journal of Artificial
Intelligence Research, 12:149–198, 2000.
S. Ben-David and R. Schuller. “Exploiting Task Relatedness for
Multiple Task Learning”. Conference on Learning Theory, COLT, 2003.
D. Bernhardt and P. Robinson. “Detecting affect from non-stylised body
motions”. Proceedings of the 2nd International Conference on Affective
Computing and Intelligent Interaction, pages 59–70, 2007.
B. D. Blume, J. K. Ford, T. T. Baldwin, and J. L. Huang. “Transfer
of Training: A Meta-Analytic Review”. Journal of Management,
36(4):1065–1105, December 2009.
R Caruana. “Algorithms and Applications for Multitask Learning”.
In Proceedings of the Thirteenth International Conference on Machine
Learning, 13:87–95, 1996.
R. Caruana. “Multitask Learning”. Machine Learning, 75(28):41–75,
1997.
[23]
[24]
[25]
[26]
T. Croonenborghs, K. Driessens, and M. Bruynooghe. “Learning
Relational Options for Inductive Transfer in Relational Reinforcement
Learning”. Proceedings of the 17th International Conference on
Inductive Logic Programming, ILP, pages 88–97, 2007.
T. Evgeniou and M. Pontil. “Regularized multi-task learning”. Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining - KDD, pages 109–117, 2004.
T. Evgeniou, M. Pontil, and O. Toubia. “A Convex Optimization Approach to Modeling Consumer Heterogeneity in Conjoint Estimation”.
Marketing Science, pages 805–818, 2007.
Y. Gao, N. Bianchi-Berthouze, and H. Meng. “What does touch tell us
about emotions in touchscreen-based gameplay?”. ACM Transactions
on Computer - Human Interaction, 2012.
M. E. Geisser, A. J. Haig, A. S. Wallbom, and E. A. Wiggert. “Painrelated fear, lumbar flexion, and dynamic EMG among persons with
chronic musculoskeletal low back pain”. Clinical Journal of Pain,
20(2):61–69, Mar-Apr 2004.
A. Kleinsmith and N. Bianchi-Berthouze. “Affective Body Expression
Perception and Recognition: A Survey”. IEEE Transactions on Affective
Computing, (99), 2012.
N. D. Lawrence and J. C. Platt. “Learning to learn with the informative
vector machine”. Proceedings of the 21st International Conference on
Machine Learning - ICML, pages 65–72, 2004.
P. Lucey, J. F Cohn, K. M. Prkachin, P. E. Solomon, and I. Matthews.
“Painful data: The UNBC-McMaster Shoulder Pain Expression Archive
Database”. IEEE International Conference on Automatic Face and
Gesture Recognition, FG, pages 57–64, 2011.
I. Mpiperis, S. Malassiotis, and M.G. Strintzis. “Bilinear Models for
3-D Face and Facial Expression Recognition”. IEEE Transactions on
Information Forensics and Security, 3(3):498–511, September 2008.
S. J. Pan and Q. Yang. “A Survey on Transfer Learning”. IEEE
Transactions on Knowledge and Data Engineering, 22(10):1345–1359,
2010.
K. Pearson. “On lines and planes of closest fit to systems of points in
space”. Philosophical Magazine, 2:559–572, 1901.
V. C. Raykar. “Bayesian Multiple Instance Learning: Automatic Feature
Selection and Inductive Transfer”. Proceedings of the 25th International
Conference on Machine Learning, ICML, 2008.
B. Romera-Paredes, A. Argyriou, N. Bianchi-Berthouze, and M. Pontil.
“Exploiting Unrelated Tasks in Multi-Task Learning”. Proceedings
of the 15th International Conference on Artificial Intelligence and
Statistics, AISTATS, 2012.
Ulrich Ruckert and Stefan Kramer. “Kernel-Based Inductive Transfer”.
In Machine Learning and Knowledge Discovery in Databases, volume
5212 of Lecture Notes in Computer Science, pages 220–233. Springer
Berlin Heidelberg, 2008.
Daniel L. Silver and Robert E. Mercer. “The Parallel Transfer of
Task Knowledge Using Dynamic Learning Rates Based on a Measure
of Relatedness”. In Connection Science Special Issue: Transfer in
Inductive Systems, pages 277–294, 1996.
J. B. Tenenbaum and W. T. Freeman. “Separating style and content
with bilinear models”. Neural computation, 12(6):1247–83, June 2000.
M. F. Valstar and M. Pantic. “Fully automatic recognition of the
temporal phases of facial actions”. IEEE transactions on systems,
man, and cybernetics. Part B, Cybernetics : a publication of the IEEE
Systems, Man, and Cybernetics Society, 42(1):28–43, February 2012.
M. van der Hulst, M. M. Vollenbroek-Hutten, J. S. Rietrnan, L. Schaake,
K. G. Groothuis-Oudshoorn, and H. J. Hermens. “Back Muscle
Activation Patterns in Chronic Low Back Pain During Walking: A
“Guarding” Hypothesis”. Clinical Journal of Pain, 26(1):30–37, Jan
2010.
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. “A survey of affect
recognition methods: audio, visual, and spontaneous expressions”. IEEE
transactions on pattern analysis and machine intelligence, 31(1):39–58,
January 2009.
© Copyright 2026 Paperzz