Exercise 1: Given is a mean-centred data matrix X ∈ R n×p with n

Multivariate Statistics
Thomas Asendorf, Steffen Unkel
Study sheet 8
Summer term 2017
Exercise 1:
Given is a mean-centred data matrix X ∈ Rn×p with n > p and rank(X) = r < p.
Before carrying out independent component analysis (ICA), the data are typically sphered
(whitened) and the dimensionality of the data is reduced from p to r dimensions, that is, X
is transformed into a sphered data matrix X̃ ∈ Rn×r such that SX̃ = Ir , where SX̃ denotes
the covariance matrix of X̃. Find an appropriate sphering transformation of X. Hint: You
may have a look at slide 18 of the set of lecture slides “Introduction and Visualisation of
Multivariate Data”.
Exercise 2:
Suppose that 1,2,. . . ,7 are regions (enclosed by unbroken lines) in a country arranged as in
the following Figure.
Let the distance matrix be constructed by counting the minimum number of boundaries
crossed to pass from region i to region j.
(a) Verify that the distance matrix is given

0 1

0




D=





by
2 2 2 1
1 2 2 2
0 1 2 2
0 1 2
0 1
0
1
1
1
1
1
1
0






 .





(b) Show that the distances constructed in this way obey the triangle inequality dik ≤
dij + djk .
(c) By showing that the eigenvalues of the matrix B (as defined on slide 16 in the lecture
Date: 23 June 2017
Page 1
on MDS) are
λ1 = λ2 = 7/2 ,
λ3 = λ4 = 1/2 ,
λ5 = 0 ,
λ6 = −1/7 ,
λ7 = −1 ,
deduce that this metric is non-Euclidean.
Exercise 3:
Consider 51 objects O1 , . . . , O51 assumed to be arranged along a straight line with the jth
object being located at a point with coordinate j. Define the similarity sij between object i
and object j as

9 if i = j





 8 if 1 ≤ |i − j| ≤ 3

 7 if 4 ≤ |i − j| ≤ 6
sij =

...




1 if 22 ≤ |i − j| ≤ 24



0 if |i − j| ≥ 25 .
p
Convert these similarities into dissimilarities (δij ) by using δij = sii + sjj − 2sij and then
apply classical multidimensional scaling (MDS) to the resulting dissimilarity matrix. The
configuration should clearly show the horseshoe effect in MDS solutions. How such effects
might occur?
Exercise 4:
The morse data in the R package smacof consists of confusion percentages between Morse
code signals. The scores are derived from confusion rates on 36 Morse code signals (26 for
the alphabet; 10 for the numbers 0,...,9). Each Morse code signal is a sequence of up to five
’beeps’. The beeps can be short (0.05 sec) or long (0.15 sec), and, when there are two or more
beeps in a signal, they are separated by periods of silence (0.05 sec). More information about
the morse code signals is contained in the list morsescales.
Rothkopf (1957)1 asked 598 subjects to judge whether two signals, presented acoustically one
after another, were the same or not. The values are the average percentages with which the
answer “same“ was given in each combination of row stimulus i and column stimulus j, where
either i or j was the first signal presented. The values are 1 minus the symmetrized confusion
rates and are thus dissimilarities.
(a) Use the function smacofSym() in the R package smacof to perform non-metrical
(ordinal) MDS of the morse data and obtain the plot of the fitted configuration in
two dimensions.
(b) Judge the adequacy of the obtained MDS solution by means of a Shepard diagram and
a scree plot.
1
Rothkopf, E. Z. (1957): A measure of stimulus similarity and errors in some paired-associate learning,
Journal of Experimental Psychology, Vol. 53, pp. 94-101.
Date: 23 June 2017
Page 2
Exercise 5:
The file colas.RData, which is available on the course website, contains the list colas. This
list consists of 10 dissimilarity matrices that are a result of an exploratory experiment to
determine if there is enough difference between cola drinks to map their taste qualities by
three-way MDS. Each of the 10 dissimilarity matrices has arisen from a pair-wise comparison
of ten colas by a subject. The 10 subjects, aged 18-21 years, were all university students – 5
were male, 5 were female. The variables (stimuli) are: Diet Pepsi, RC Cola, Yukon, Dr. Pepper,
Shasta, Coca-Cola, Diet Dr. Pepper, Tab, Pepsi-Cola, Diet Rite. A detailed description of the
experiment and the data is given in Schiffman et al. (1981).2 The dissimilarity judgements
were transcribed on a scale from 0 to 100, 0 representing “same” and 100 representing
“different”. The ten dissimilarity matrices are divided by 100.
(a) Load the object colas into your current R workspace.
(b) Fit an INDSCAL model with three dimensions to the cola data. Use the function smacofIndDiff(colas, ndim=3, constraint = ‘indscal’) in the R package
smacof. Assess the appropriateness of the fitted model for this data set. Compare the
fit with the one obtained by fitting a three-way mds model with no individual differences
(smacofIndDiff(colas, ndim=3, constraint = ‘identity’)).
(c) For the INDSCAL model fitted in (b), visualise and interpret the results of i. the group
stimulus space, ii. the subject weights (also named saliences) and iii. the individual
stimulus spaces.
2
Schiffman, S. S., Reynolds, M. L. and Young, F. W. (1981): Introduction to Multidimensional Scaling:
Theory, Methods and Applications, Academic Press, Section 3.
Date: 23 June 2017
Page 3