Multiple factor analysis combined with PLS path

Chemometrics and Intelligent Laboratory Systems 58 Ž2001. 261–273
www.elsevier.comrlocaterchemometrics
Multiple factor analysis combined with PLS path modelling.
Application to the analysis of relationships between
physicochemical variables, sensory profiles and
hedonic judgements
Jerome
´ ˆ Pages
` a, Michel Tenenhaus b,)
a
b
ENSA–INSFA, Rennes, France
HEC School of Management, Jouy-en-Josas France
Abstract
Multiple Factor Analysis ŽMFA. highlights the structures common to a set of J groups Žor blocks. of variables observed
for the same individuals. PLS path modelling allows a search for latent variables, summarising as far as possible one-dimensional blocks of manifest variables while taking account of causal links between the blocks. These two methods can be combined: MFA, as an exploratory analysis, helps to define blocks, being both one-dimensional and as well-correlated as possible, on which PLS path modelling is performed.
In this paper, we present MFA in detail and PLS path modelling more briefly. We also mention some links between MFA,
PLS path modelling and PLS regression. A detailed presentation of a sensory analysis example will illustrate the proposed
methodology. q 2001 Elsevier Science B.V. All rights reserved.
Keywords: Generalised canonical analysis; Multiple factor analysis; Hierarchical factor analysis; PLS path modelling; PLS regression;
Structural equation modelling
1. Introduction
Multiple factor analysis ŽMFA. was proposed by
Escofier and Pages
` w3–5x to highlight the structures
common to a set of J groups Žor blocks. of variables
observed for the same individuals. This method allows a graphical display of these common structures
with respect to variables and individuals.
When each data table represents a set of manifest
Žor observable. variables relating to one Žunobserva)
Corresponding author. Tel.: q33-1-3967-7249; fax: q33-13967-7109.
E-mail address: [email protected] ŽM. Tenenhaus..
ble. latent variable and when there are explicit causal
relationships between the latent variables, it is interesting to use the multi-block Partial Least Squares
ŽPLS. path modelling approach proposed by Wold
w21–23x. This approach has been adopted and extended by Lohmoller
¨ w9,10x at a theory and software
level. In France, PLS path modelling was studied
closely by Valette-Florence w17–19x and Tenenhaus
w15,16x.
PLS path modelling can still be adopted when
there are no causal relationships between the blocks.
Wold w22x proposed forming a supplementary block,
by juxtaposing all the blocks, and connecting each
initial block to this supplementary block. PLS path
0169-7439r01r$ - see front matter q 2001 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 Ž 0 1 . 0 0 1 6 5 - 4
262
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
modelling then makes it possible to recover various
methods such as generalised canonical correlation
analysis of Horst w8x, that of Carroll w1x, principal
components analysis and multiple factor analysis. The
links between PLS path modelling and other methods dealing with multiple tables are presented in
Guinot et al. w7x.
PLS regression w24x can be considered as a development of PLS path modelling specifically dedicated
to relate a set of responses Y to a set of predictors
X. The link between PLS regression and PLS path
modelling is studied in Tenenhaus w16x.
A detailed comparison between PLS path modelling and MFA can be found in Pages
` and Tenenhaus w13x. In the present paper, we describe the application of a methodology combining MFA and PLS
path modelling to a significant problem for the food
industry: predicting hedonic judgements on the basis
of sensory and physicochemical characteristics of a
set of products.
2. The orange juice example
2.1. Data
Six pure orange juices ŽP1 to P6. were selected
from the main brands on the French market. These
juices were pasteurised in two ways: thus, three of
them must be stored in refrigerated conditions while
the others can be stored at room temperature. Here is
the list of the six orange juices: Pampryl at room
temperature ŽP1., Tropicana at room temperature
ŽP2., refrigerated Fruivita ŽP3., Joker at room tem-
perature ŽP4., refrigerated Tropicana ŽP5., refrigerated Pampryl ŽP6..
Ninety-six students of food science, both trained to
evaluate foodstuffs and consumers of orange juice,
described each of these six products on the basis of
seven attributes: intensity and typical nature of its
smell, intensity of the taste, pulp content, sweetness,
sourness and bitterness. Moreover, they expressed an
overall hedonic judgement. By using this particular
panel, it was possible to collect descriptive and hedonic data from the same judges. This data characteristic has not been used in this case, since, in order to
simulate the usual application context, where the
sensory properties are assessed by trained panelists
while the hedonic judgements are given by naive
consumers Žin our data, the students play the two
roles.. This confers a wider scope on the methodology presented.
The serving order design was a juxtaposition of
Latin squares balanced for carry-over effects w12x.
In addition to the sensory investigation, the following chemical measurements pH, citric acid, overall acidity, saccharose, fructose, glucose, vitamin C
and sweetening power Ždefined as saccharoseq 0.6
= glucoseq 1.4 = fructose. were carried out.
The data are gathered in a table using the format
shown in Table 1.
2.2. Problems
The analysis of relationships between physicochemical variables, sensory profiles Žaccording to a
trained panel. and hedonic judgements Žby con-
Table 1
Source data
For product i: x i k Žpanel average of the sensory attribute k ., yi l Žinstrumental measurement l ., z i j Žhedonic score of judge j ..
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
sumers. is a classic problem for the food industry.
The objective is to predict hedonic judgements on the
basis of the sensory and physicochemical characteristics of the products.
This problem, usually named preference mapping
has been much studied Žsee, for example the synthesis given by Greenhoff and MacFie w6x.. We show
here, how the methodology combining MFA and PLS
path modelling brings a new point of view to this
classical problem.
Hedonic judgements can be considered in various
ways. When all the hedonic judgements correlate
positively, we have a consensus and it is legitimate
to summarise them by a compromise judgement, the
average judgement being the simplest form. Otherwise, in the absence of a consensus, a natural approach is to divide the consumers into homogeneous
clusters Žfrom the point of view of their judgements.
and to relate each cluster, which can now legitimately be summarised by a compromise, to the sensory and physicochemical data.
A priori, one might think of dividing consumers on
the basis of hedonic judgements alone. But, in this
case, nothing ensures that the clusters obtained will
be predictable on the basis of variables characterising the products. Hence, the idea of building clusters
of consumers, using the hedonic data of course but
taking other data into account, so that from one cluster to another, the differences in liking correspond to
physicochemical andror sensory differences. Poulsen
et al. w14x and Courcoux w2x have already implemented a similar idea.
To summarise, the aim of the analysis is to obtain
clusters of consumers with similar patterns of hedonic assessments, but also, whose averaged judgements are well predicted by the sensory and physicochemical properties of the products.
From the point of view of prediction, the difficulty arises in defining a summary for each cluster,
this one-dimensional summary being reasonable due
to the homogeneity of each cluster. In order to prepare this summary, the simplest approach is to give
the same weight to each individual, in which case, the
summary is the usual average. But given the wish for
predictability, when constructing the summary, it is
also possible to give preference to the most predictable individuals. This is what we are going to do
in this paper.
263
Summary
The initial assumption is that hedonic judgements depend on the physicochemical and sensory
characteristics of the products. It would have been
possible to use the sensory data alone. In fact, the
presence of physicochemical data can be seen as
contributing to the solidity of product characterisation: Ž1. the presence of physicochemical variables
relating to the sensory variables reinforces the latter,
and Ž2. the presence of physicochemical variables not
dependent on the sensory variables protects against
the possible shortcomings of the attributes list.
The objective is to obtain clusters of consumers
which are homogeneous in their hedonic judgements,
these clusters being predictable on the basis of product characterisation.
v
v
2.3. Methodology
2.3.1. DescriptiÕe approach of the three data sets
First, we apply MFA to Table 1 each of the three
groups playing an active role. This analysis makes it
possible to highlight the structures common to the
three groups of variables. Taking into account the
limited number of products, special attention must be
paid to the percentages of variability expressed by the
common structures as well as to their interpretability.
In particular, relationships between physicochemical
and sensory variables will be analysed.
2.3.2. Construction of consumer clusters
MFA provides a graphical display of the consumers Žhere the ninety-six students., which shows
the main variability of these consumers and also of
physicochemical and sensory variables.
This graphical display is thus suitable for building
clusters of consumers suited to our problem. Several
steps can be considered, which depend on:
v
v
the number of clusters which is reasonable to
manage, taking into account the number of consumers Žmuch smaller in sensory investigations
than in public opinion polls.,
the structure of the data: to obtain clusters of
hedonic judgements relating to variables of the
other groups, we will use the common factors of
MFA, which are related to the hedonic judgements.
264
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
Fig. 1. Definition of clusters of hedonic judgements based on two
common factors of MFA ŽAAxis 1B and AAxis 2B .. Each judgement is a vector inside the correlation circle. The two bisecting
lines Ždotted lines. define four zones.
To ensure the predictability of the clusters, we
propose the following process:
v
v
each hedonic judgement is initially associated
with the common factor with which it is the most
correlated;
each common factor defines two clusters of related hedonic judgements: those which are positively correlated with it, and the others.
Fig. 1 illustrates this process in the case of two
common factors, that is to say, when the two first
common factors provided by MFA are closely related to each block of variables.
2.3.3. Use of PLS path modelling
In PLS path modelling, it is advantageous that
each group of variables is essentially one-dimen-
sional, because each group should reflect one latent
variable and these latent variables can be related by
regression equations. To obtain this property we use
the results of MFA. With each common factor of
MFA, we associate four groups of variables which are
more correlated to this factor than to the others: the
most positively correlated hedonic judgements, the
most negatively correlated hedonic judgements, the
most correlated physicochemical variables, and the
most correlated sensory variables.
For each common factor, s, we prepare the arrow
diagram shown in Fig. 2.
It is possible to establish a model for each cluster
of hedonic judgements, which would improve the
quality of fit of each one. The choice of simultaneously explaining two clusters which have opposite
judgements is consistent with the process adopted in
building these groups and makes interpretation easier.
This approach does not take into account non-linear relationships between hedonic data and sensory or
physicochemical data.
2.3.4. Standardisation of the Õariables
The physicochemical data, being expressed in different units, are systematically standardised. The descriptive sensory data are also standardised. We have
also checked that each attribute presented a significant product effect in the analysis of variance performed on individual data.
The hedonic judgements may or may not be centred or standardised. This depends on the meaning we
give to the average and to the standard deviation of
one consumer. The majority of users of sensory analysis consider that the hedonic judgements put forward by a consumer for a series of products are primarily relative; thus, they measure the relationship
Fig. 2. Causal model for hedonic judgements.
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
between two consumers by the coefficient of correlation between their judgements. We adopt this point of
view by standardising these variables. The clusters of
consumers are thus made up on this basis.
3. Multiple factor analysis
3.1. Data and notations
The present data set has a classic structure: several groups of variables are measured on the same set
of individuals Žin the statistical sense.. These data can
be presented in a single table Žcf. Fig. 3. using a
sub-table structure.
We denote by:
X the complete data table
I the set of individuals
K the set of variables Žincluding all groups. or the
set of indices for the variables
J the set of sub-tables Žor groups of variables.
K j the set of variables in group j or the set of indices for the variables in group j
X j the sub-table associated with group j.
Moreover, the symbols I, J, K and K j designate
both the set and its cardinal. A variable of group, K j ,
is denoted by: Õ k Ž k g K j ..
To simplify the presentation, without any loss of
generality, the variables are assumed to be standardised and to have the same a priori weights. In the
same way, the individuals are assumed to have the
same weight 1rI.
265
3.2. Problems
Many problems are associated with this kind of
data. They include many aspects that all derive, more
or less directly, from the following question: which
structures are common to the various groups?
For this, we summarise each group using linear
combinations of its variables. These linear combinations have various names: canonical variables,
components, latent variables, factors, dimensions or
constructs. A common structure is highlighted by
a J-uplet of canonical variables Žone per group. that
are correlated.
The highlighting of several common structures involves searching in each group j for a succession of
S linear combinations of the variables of this group
Fsj; s s 1, . . . , S4 so that, between the groups, the
combinations having the same rank s Fsj; j s 1, . . . ,
J 4 are as closely correlated as possible.
We define Fsj as a linear combination of the input
variables Õ k of group K j by writing:
Fsj s
Ý
a ks Õk .
kgK j
Various methods deal with these problems, each
one according to a peculiar point of view: Carroll’s
w1x generalised canonical correlation analysis, Horst’s
w8x generalised canonical correlation analysis, hierarchical factor analysis w11x, PLS path modelling and
finally MFA.
MFA w3–5x is a method deriving from principal
component analysis Žsearch for directions of maximum inertia. and from canonical correlation analysis
Žsearch for common factors.. Here we use this second point of view to present MFA.
Fig. 3. Data table.
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
266
3.3. A priori balancing of the input Õariable blocks
In MFA, each input variable Õ k , k s 1, . . . K j , of
the group K j has the weight m k s 1rl1j where l1j is
the first eigenvalue of the separate principal component analysis of group K j . This is equivalent to replace each data table X j by the table:
ž 1r(l / X .
j
1
j
3.4. Measurement of the relationship between a Õariable z and a group K j
In MFA, measurement of the relationship between
any variable z and a group K j is defined in the following way when the variables Õ k are standardised:
m k cor 2 Ž z ,Õ k .
Ý
kgK j
1
s
Ý
l1j kgK j
cor 2 Ž z ,Õ k . .
This measurement is related to the redundancy index w20x, which, with our notations, is written as:
Rd Ž z, K j . s
R d Ž z , K j . s 1 m cor Ž z ,Õ k . s 1 for all k g K j .
This case, where all the variables of group K j
correlate perfectly, does not correspond, of course, to
any real situation. For L g ,
Lg Ž z , K j .
Thus, if we perform a separate principal component analysis for group K j with these weights, m k ,
the first eigenvalue is 1. In other words, with this
weighting scheme, for each group j, the maximum
axial inertia of the cloud of individuals is 1. This
weighting induces the balancing between the various
blocks j, Ž j s 1, . . . , J .. This leads, for example, to
a down weighting of variables from one block where
25 variables were used to express one real dimension, as compared to another block where only 10
variables were used for one dimension.
Notations: M j is the diagonal matrix of the
weights 1rl1j for the variables of group K j and M
the diagonal matrix of the weights of all variables
Žincluding all groups..
Lg Ž z , K j . s
The value of 1 corresponds to a maximum for the
two measurements, but this maximum does not have
the same meaning for R d and L g . For R d ,
1
Kj
Ý
cor 2 Ž z ,Õk . .
kgK j
These two measurements vary between 0 and 1.
They are zero when the variable z does not correlate
with any of the variables of group K j .
s 1 m z is the first principal component of K j .
We find here the basic idea of PCA according to
which the first principal component is a synthetic
variable related as far as possible to the initial variables.
3.5. Identifying general Õariables
As Carroll’s generalised canonical correlation
analysis, MFA provides a series of latent Ageneral
variablesB z 1 , z 2 . . . related as far as possible to the
various groups of input variables K j , j s 1, . . . , J.
It is natural to measure the relationship between a
variable z and the set of groups K j by:
ž
/
Relationship z , D K j s Ý L g Ž z , K j . .
j
j
In MFA, the general variable z s is thus defined by:
Ý L g Ž z s , K j . maximum,
j
with the constraints: Varw z s x s 1 and corŽ z s , z t . s 0
for any t - s
By expressing
Lg Ž zs , K j . s
1
I2
zXs X j M j X jX z s
the quantity to be maximised turns out to be:
1
Ý L g Ž z s , K j . s I 2 Ý zXs X j M j X jX z s
j
j
1
s
I2
zXs XMX X z s .
From this equation, we deduce that the variables
z s are the standardised principal components of the
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
complete table X, the variables Õ k , k g K j , j s 1, . . . ,
J, being weighted according to the diagonal matrix
M. Finally, MFA can be viewed both as a peculiar
canonical correlation analysis and as a weighted PCA
of the complete data matrix Žwith the weights 1rl1j ..
The associated principal components Fs are obtained by multiplying the general variable z s by the
square root of the eigenvalue l s : Fs s z s l s .
This process presents two features in comparison
with Carroll’s generalised canonical correlation analysis:
principal component Fs and the data table X j . This
convergence between the two approaches reinforces
them.
This process presents two features in comparison
with Carroll’s method:
v
(
v
v
it leads to general variables related to the initial
variables while ensuring a balance between the
groups of variables due to the weights m k ;
deriving from a weighted PCA, the general variables have all the properties of principal components.
v
here again, due to the properties of PLS regression, it leads to canonical variables that are more
closely related to the initial variables;
each Fsj provides a representation of the individuals, which can be superimposed on that of
the PCA Ž Fs ., having the two properties described below.
Property 1: up to coefficient 1rJ, the value Fs Ž i .
of the principal component Fs for the individual i is
the mean of the values Fsj Ž i . of the canonical variables Fsj for the individual i. In practice, the canonical variables Fsj are dilated according to the coefficient J, so that Fs Ž i . is an exact mean. That is to say,
3.6. Identifying canonical Õariables
We associate with each general variable z s a
canonical variable Fsj in each group j. In MFA, this
variable is defined in the following way.
We deduce from:
zs s
1 1
ls I
XMX X z s
the equation:
Fs s
1 1
ls I
J
1
Ý
j
js1 l1
X j X jX Fs .
Each canonical variable Fsj is then defined as the
fragment of Fs corresponding to the group j:
Fsj s
1
1
l s l1j
I
X j X jX Fs
so that the following decomposition of Fs is obtained:
267
Fs Ž i . s
1
J
Ý JFsj Ž i . .
j
In MFA terminology, we say that the global image Ži.e. from the point of view of the set of groups.
of an individual is in the centre of gravity of its partial images Ži.e. from the point of view of the various
groups..
Property 2: from the definition of the canonical
variable we obtain,
Fsj Ž i . s
1
(l
1
s
l1j
Ý
cor Ž Õk , Fs . Õ k Ž i . .
kgK j
Thus, the partial individual A i j B Žindividual i described by the variables Õk of block j . is attracted by
the variables Õ k , for which corŽ Õ k , Fs . Õ k Ž i . has a high
positive value, and conversely, repelled by those for
which it has a low negative value.
3.7. Link between MFA and the PLS path modelling
J
Fs s
Ý Fsj .
js1
Up to a multiplication coefficient, we find that the
canonical variable Fsj is confounded with the first
PLS component in the PLS regression between the
The components of MFA can be obtained by performing a PLS path modelling according to the arrow diagram of Fig. 4.
It can be shown that, by using the PLS path modelling with the options Mode A for the outer esti-
268
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
Table 3
Correlations between separate PCA factors
Sensory
attributes
Hedonic
judgements
Fig. 4. MFA arrow diagram.
mate of the latent Õariables and the path weighting
scheme for the inner estimate of the latent Õariables,
we obtain the general component z 1 and the standardised canonical components F1j Žw16x, section 4.6
and w7x.. The next component, z 2 , and the standardised canonical components F2j are obtained by replacing table XM 1r2 in Fig. 4 by the residuals of the
regression of XM 1r2 on z 1. More generally, we obtain the component z s and the standardised canonical components Fsj by replacing table XM 1r2 in Fig.
4 by the residuals of the regression of XM 1r2 on the
general components z 1 , . . . , z sy1.
4. Application to the orange juice example
4.1. Results from separate analyses
Some results from separate analyses of the physicochemical measurements and sensory description
groups are presented in Table 2. They highlight for
F1
F2
F1
F2
Physico-chemistry
Sensory attributes
F1
F2
F1
F2
y0.78
0.08
0.74
y0.31
y0.25
y0.74
0.35
0.86
y0.94
0.09
y0.01
y0.92
each group a main direction of inertia, though these
groups cannot be considered as one-dimensional. The
hedonic judgements group is clearly multidimensional, with however, a first well individualised factor.
The principal components Žof same rank. of the
three separate PCA, are closely correlated ŽTable 3..
This correlation is remarkable in the case of hedonic
and sensory data Žy0.94. and is not usual in practice, even when the number of products is small.
The marked differences between the first eigenvalues of separate analyses Ž6.21, 4.74 and 34.03.
means that it is essential to have weighting of the
groups within MFA.
4.2. Results from MFA
4.2.1. Global indicators
The correlations in Table 4 show that the first two
factors of MFA correspond to structures common to
the three groups Ž F1 and F2 are highly correlated
with the corresponding canonical variables of each
group F11, F12 , F13 and F21, F22 , F23 .. The analysis that
follows will be based on them. Moreover, according
to the L g measurements, the first axis corresponds to
a direction of very significant inertia for each group.
Table 2
Eigenvalues Žs inertia. associated with separate PCA
Axes
Physico-chemistry
Sensory attributes
Hedonic judgements
Eigenvalue
Ž%.
Eigenvalue
Ž%.
Eigenvalue
Ž%.
1
2
3
4
5
6.2135
1.4102
1.0457
0.3173
0.0133
69.04
15.67
11.62
03.53
00.15
4.7437
1.3333
0.8198
0.0840
0.0192
67.77
19.05
11.71
01.20
00.27
34.0281
19.3692
15.8922
13.8795
12.8311
35.45
20.18
16.55
14.46
13.37
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
Table 4
Correlations and L g measurements of relationship between the
canonical variables of each group and the general variable of the
same rank
Correlation
Physicochemistry
Sensory
attributes
Hedonic
judgements
Lg
F1
F2
F3
F1
F2
F3
0.93
0.82
0.62
0.84
0.24
0.22
0.97
0.89
0.41
0.93
0.23
0.07
0.99
0.99
0.96
0.94
0.58
0.49
4.2.2. Representation of indiÕiduals (s products) and
Õariables
This MFA builds a product space starting from
factors common to the sensory, instrumental and hedonic data, in which the influences of these three
groups of variables are a priori balanced. These MFA
representations Žof products and variables. can be
read like those of a PCA: the co-ordinates of a product are its values for the common factors; the coordinates of a variable are its correlations with these
factors.
From representations in Fig. 5, it follows that:
v
v
v
the products P1 and P4 have a high level of
acidity Žand a low pH., a rather low sugar content with a high Žglucoseq fructose.rsaccharose
ratio; they were perceived as sour, bitter, not
very sweet, and as not containing very much
pulp.
the products P3, P5 and P6 have a low level of
acidity Žand a high pH., a rather high sugar content with a low Žglucoseq fructose.rsaccharose
ratio; they were perceived as being not very sour
or bitter, sweet and having rich pulp content.
the product P2 roughly shows the same characteristics as the three above, except for a small
Fig. 5. First map from MFA of Table 1. ŽA. Representation of
physicochemical and sensory variables. ŽB. Representation of the
96 hedonic judgements. ŽC. Representation of the products Žwhere
refr. stands for refrigerated and r.t. for at room temperature.; P2,
P3 and P5 come from Florida.
269
270
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
quantity of sugars and a quantity of pulp considered to be very limited.
The individual hedonic judgements are widely
scattered, showing a total absence of consensus about
this point, and thus the need for a segmented approach to these judgements. In particular, there is no
preference for refrigerated juices, which are appreciably more expensive Žtheir AsoftB pasteurisation is
more difficult..
Let us also point out:
the opposition between fructose and glucose on
the one hand and saccharose and pH on the other
hand, connected with the hydrolysis of saccharose,
facilitated in an acid medium;
the correlation between acidity and sourness;
the absence of correlation between sweetening
power and sweetness: a high level of sweetness is associated with a low level of acidity Žthis refers to the
concept of gustatory balance.. Thus, the strong correlation between saccharose and sweetness is not due to
the direct influence of saccharose but to a high pH.
v
v
v
4.2.3. Preparation for PLS modelling
Two factors being common, the hedonic judgements are now divided into four clusters according to
the rule presented in the Methodology Žcf. Fig. 1..
The physicochemical and sensory variables are
subdivided into two groups according to their correlations with the first two factors.
4.3. Results of PLS path modelling
Using the exploratory results of MFA, we now
want to relate the hedonic judgements to the
physicochemical and sensory variables.
4.3.1. Causality models
The first axis of MFA suggests a AcorrelationB between the physicochemical block Žacidity, pH before
processing, pH after centrifugation, saccharose, citric
acid., the sensory block Žtypical smell, sweetness,
bitterness, sourness., the block of the 16 hedonic
judgements positively correlated with F1 , and the
block of the 44 hedonic judgements negatively correlated with F1 . The causality links between these
blocks are described in Fig. 6.
To estimate the coefficients of this model, we used
w9x
the program LVPLS1.8 proposed by Lohmoller
¨
with the following options: variables are standardised, Mode A for the outer estimates of the latent
variables, factor-weighting scheme for the inner estimates of the latent variables. In this diagram, the
numbers located on the arrows connecting the ob-
Fig. 6. Model for the first two clusters of hedonic judgements.
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
271
Fig. 7. Model for the third and fourth clusters of hedonic judgements.
servable variables to the latent variables are the correlations. The numbers appearing between the latent
variables are the regression coefficients in the regressions relating one dependent variable to the independent variableŽs.. The numbers appearing under the
endogenous latent variables are the R 2 of the regression, simple or multiple, as applicable.
The second axis of the MFA suggests a relationship between the physicochemical block Žglucose,
fructose, sweetening power Ždefined above.., the
sensory block Žintensity of smell, intensity of taste,
pulp content., the block of the 17 hedonic judgements correlated positively with F2, the block of the
19 hedonic judgements correlated negatively with F2.
The causality links between these blocks are described in Fig. 7. The numbers which appear in this
diagram have the same definition as in Fig. 6.
4.3.2. Interpretation of latent Õariables summarising
the clusters of hedonic judgements
Fig. 8 illustrates the latent variables summarising
the four clusters. The characterisation of these clusters is quite clear, namely:
cluster 2 Ž44 panel members. preferred the
Florida orange juices ŽP2, P3, P5., which were not too
sour, and were generally perceived to be sweet, not
very sour or bitter; on the other hand, cluster 1 Ž16
panel members. rejected these Florida juices and
preferred the products with a low pH and which were
generally perceived to be more sour, more bitter and
less sweet.
cluster 3 Ž17 panel members. preferred the refrigerated juices, with more pulp, whatever the origin
of these juices is; on the other hand, cluster 4 Ž19
panel members. preferred the orange juices at room
temperature Žcontaining less pulp..
v
v
5. Conclusion
Fig. 8. Latent variables summarising the four clusters of hedonic
judgements Žwhere refr. stands for refrigerated and r.t. for at room
temperature..
Simultaneous analysis of several groups of variables defined for the same individuals should always
be organised around the concept of structures common to the groups of variables. These structures are
272
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
highlighted using linear combinations of variables of
each group, called canonical variables or latent variables.
One can look for these canonical variables using a
criterion depending only on the correlations between
them; this is the case for canonical correlation analysis. On the other hand, one can look for these variables by introducing in the criterion the amount of
explained variance of each one within its own group:
this is the case for MFA and PLS path modelling.
The main difference between these two approaches lies in the existence of a causality model
taken into account with PLS path modelling.
This PLS model:
v
v
looks for common structures, taking into account causality links between the blocks,
whereas all the groups play the same role in
MFA;
leads to prediction equations, whereas MFA is a
purely descriptive method.
As a result of these common points and differences, the two methods are highly complementary in
building both a descriptive and modelling approach to
a problem. We have proposed an application which
offers an original solution to the traditional problem
of the connection between physicochemical variables, sensory attributes and hedonic judgements.
The process consists of:
v
v
using the common factors of MFA to subdivide
the groups of variables into one-dimensional
subgroups;
connecting the new subgroups by a causal
model. This leads to clusters of hedonic judgements which can be very simply explained by
physicochemical andror sensory variables.
Acknowledgements
The authors are very grateful to the suggestions of
the reviewers which have been most useful. They
would also like to express their thanks to the students Cecile
´ Lavanant, of the University of Rennes 2,
and Sebastien
Le Dien, of the University of Paris
´
ŽISUP., who, as part of their training at ENSAR, undertook all data-processing operations used in this
paper.
References
w1x J.D. Carroll, A generalization of canonical correlation analysis to three or more sets of variables, Proc. 76th Conv. Am.
Psychol. Assoc. 1968, pp. 227–228.
w2x P. Courcoux, Un modele
` en classes latentes en cartographie
des preferences,
6e Journees
Agro-industries et
´´
´ Europeennes
´
Methodes
Statistiques, 2000 Pau.
´
w3x B. Escofier, J. Pages,
pour l’analyse de plusieurs
` Methode
´
groupes de variables, Application a` la caracterisation
des vins
´
rouges du Val de Loire, Revue de Statistique Appliquee
´ XXXI
Ž2. Ž1983. 43–59.
w4x B. Escofier, J. Pages,
` Multiple factor analysis ŽAFMULT
package., Computational Statistics and Data Analysis 18
Ž1994. 121–140.
w5x B. Escofier, J. Pages,
` Analyses factorielles simples et multiples; objectifs, methodes
et interpretation,
3rd edn., Dunod,
´
´
Paris, 1998, 284 pp.
w6x K. Greenhoff, H.J.H. MacFie, Preference mapping in practice, in: H.J.H. MacFie, D.M.H. Thompson ŽEds.., Measurement of Food Preferences, Blackie Academic & Professional,
Glasgow, 1994, pp. 137–166.
w7x C. Guinot, J. Latreille, M. Tenenhaus, PLS approach and
multiple table analysis, Application to the study of cosmetic
habits of women in Ile de France, Chemometrics and Intelligent Laboratory Systems Ž2001. this issue.
w8x P. Horst, Relations among m sets of variables, Psychometrika 26 Ž1961. 129–149.
w9x J.-B. Lohmoller,
LVPLS Program Manual, Version 1.8, Zen¨
tralarchiv fur
¨ Empirische Sozialforschung, Koln.
¨ Ž1987..
w10x J.-B. Lohmoller,
Latent Variables Path Modeling with Partial
¨
Least Squares, Physica-Verlag, Heidelberg, 1989.
w11x R.P. McDonald, Factor Analysis and Related Methods,
Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.
w12x H.J. MacFie, N. Bratchell, K. Greenhoff, L.V. Vallis, Designs to balance the effect of order of presentation and firstorder carry-over effects in hall tests, Journal of Sensory Studies 4 Ž1989. 129–148.
w13x J. Pages,
` M. Tenenhaus, Analyse factorielle multiple et approche PLS, Revue de Statistique Appliquee
´ Ž2001. Žin press..
w14x C.S. Poulsen, P.M.B. Brockhoff, L. Erichsen, Heterogeneity
in consumer preference data—a combined approach, Food
Quality and Preference 8 Ž5r6. Ž1997. 409–417.
w15x M. Tenenhaus, La regression
PLS, Theorie
et Pratique. Tech´
´
nip, Paris, 1998.
w16x M. Tenenhaus, L’Approche PLS, Revue de Statistique Appliquee
´ XLVII Ž2. Ž1999. 5–40.
w17x P. Valette-Florence, Analyse structurelle comparative des
composantes des systemes
de valeurs selon Kahle et Rokeach,
`
Recherche et Applications en Marketing III Ž1. Ž1988. 15–34.
J. Pages,
` M. Tenenhausr Chemometrics and Intelligent Laboratory Systems 58 (2001) 261–273
w18x P. Valette-Florence, Specificite
´
´ et apports des methodes
´
d’analyse multivariee
Recherche
´ de la deuxieme
` generation,
´ ´
et Applications en Marketing III Ž4. Ž1988. 23–56.
w19x P. Valette-Florence, Analyse structurelle et analyse typologique: illustration d’une demarche
complementaire,
´
´
Recherche et Applications en Marketing V Ž1. Ž1990. 73–91.
w20x A.L. Van den Wollenberg, Redundancy analysis: an alternative for canonical correlation, Psychometrika 42 Ž1977. 207–
219.
w21x H. Wold, Modeling in complex situations with soft information, Third World Congress of Econometric Society, August
21–26, Toronto, Canada, 1975.
273
w22x H. Wold, Soft modeling: the basic design and some extensions, in: K.G. Joreskog,
H. Wold ŽEds.., System Under Indi¨
rect Observation, vol. 2, North-Holland, Amsterdam, 1982,
pp. 1–54.
w23x H. Wold, Partial least squares, in: S. Kotz, N.L. Johnson
ŽEds.., Encyclopedia of Statistical Sciences, vol. 6, Wiley,
New York, 1985, pp. 581–591.
w24x S. Wold, H. Martens, H. Wold, The multivariate calibration
problem in chemistry solved by the PLS method, in: A. Ruhe,
ŽEds.., Proc. Conf. Matrix Pencils, March 1982,
B. Kagstrøm
˚
Lecture Notes in Mathematics, Springer-Verlag, Heidelberg,
1983, pp. 286–293.

Download Report

Multiple factor analysis combined with PLS path

Paperzz.com

Your Paperzz