Multivariate Statistics Thomas Asendorf, Steffen Unkel Study sheet 8 Summer term 2017 Exercise 1: Given is a mean-centred data matrix X ∈ Rn×p with n > p and rank(X) = r < p. Before carrying out independent component analysis (ICA), the data are typically sphered (whitened) and the dimensionality of the data is reduced from p to r dimensions, that is, X is transformed into a sphered data matrix X̃ ∈ Rn×r such that SX̃ = Ir , where SX̃ denotes the covariance matrix of X̃. Find an appropriate sphering transformation of X. Hint: You may have a look at slide 18 of the set of lecture slides “Introduction and Visualisation of Multivariate Data”. Exercise 2: Suppose that 1,2,. . . ,7 are regions (enclosed by unbroken lines) in a country arranged as in the following Figure. Let the distance matrix be constructed by counting the minimum number of boundaries crossed to pass from region i to region j. (a) Verify that the distance matrix is given 0 1 0 D= by 2 2 2 1 1 2 2 2 0 1 2 2 0 1 2 0 1 0 1 1 1 1 1 1 0 . (b) Show that the distances constructed in this way obey the triangle inequality dik ≤ dij + djk . (c) By showing that the eigenvalues of the matrix B (as defined on slide 16 in the lecture Date: 23 June 2017 Page 1 on MDS) are λ1 = λ2 = 7/2 , λ3 = λ4 = 1/2 , λ5 = 0 , λ6 = −1/7 , λ7 = −1 , deduce that this metric is non-Euclidean. Exercise 3: Consider 51 objects O1 , . . . , O51 assumed to be arranged along a straight line with the jth object being located at a point with coordinate j. Define the similarity sij between object i and object j as 9 if i = j 8 if 1 ≤ |i − j| ≤ 3 7 if 4 ≤ |i − j| ≤ 6 sij = ... 1 if 22 ≤ |i − j| ≤ 24 0 if |i − j| ≥ 25 . p Convert these similarities into dissimilarities (δij ) by using δij = sii + sjj − 2sij and then apply classical multidimensional scaling (MDS) to the resulting dissimilarity matrix. The configuration should clearly show the horseshoe effect in MDS solutions. How such effects might occur? Exercise 4: The morse data in the R package smacof consists of confusion percentages between Morse code signals. The scores are derived from confusion rates on 36 Morse code signals (26 for the alphabet; 10 for the numbers 0,...,9). Each Morse code signal is a sequence of up to five ’beeps’. The beeps can be short (0.05 sec) or long (0.15 sec), and, when there are two or more beeps in a signal, they are separated by periods of silence (0.05 sec). More information about the morse code signals is contained in the list morsescales. Rothkopf (1957)1 asked 598 subjects to judge whether two signals, presented acoustically one after another, were the same or not. The values are the average percentages with which the answer “same“ was given in each combination of row stimulus i and column stimulus j, where either i or j was the first signal presented. The values are 1 minus the symmetrized confusion rates and are thus dissimilarities. (a) Use the function smacofSym() in the R package smacof to perform non-metrical (ordinal) MDS of the morse data and obtain the plot of the fitted configuration in two dimensions. (b) Judge the adequacy of the obtained MDS solution by means of a Shepard diagram and a scree plot. 1 Rothkopf, E. Z. (1957): A measure of stimulus similarity and errors in some paired-associate learning, Journal of Experimental Psychology, Vol. 53, pp. 94-101. Date: 23 June 2017 Page 2 Exercise 5: The file colas.RData, which is available on the course website, contains the list colas. This list consists of 10 dissimilarity matrices that are a result of an exploratory experiment to determine if there is enough difference between cola drinks to map their taste qualities by three-way MDS. Each of the 10 dissimilarity matrices has arisen from a pair-wise comparison of ten colas by a subject. The 10 subjects, aged 18-21 years, were all university students – 5 were male, 5 were female. The variables (stimuli) are: Diet Pepsi, RC Cola, Yukon, Dr. Pepper, Shasta, Coca-Cola, Diet Dr. Pepper, Tab, Pepsi-Cola, Diet Rite. A detailed description of the experiment and the data is given in Schiffman et al. (1981).2 The dissimilarity judgements were transcribed on a scale from 0 to 100, 0 representing “same” and 100 representing “different”. The ten dissimilarity matrices are divided by 100. (a) Load the object colas into your current R workspace. (b) Fit an INDSCAL model with three dimensions to the cola data. Use the function smacofIndDiff(colas, ndim=3, constraint = ‘indscal’) in the R package smacof. Assess the appropriateness of the fitted model for this data set. Compare the fit with the one obtained by fitting a three-way mds model with no individual differences (smacofIndDiff(colas, ndim=3, constraint = ‘identity’)). (c) For the INDSCAL model fitted in (b), visualise and interpret the results of i. the group stimulus space, ii. the subject weights (also named saliences) and iii. the individual stimulus spaces. 2 Schiffman, S. S., Reynolds, M. L. and Young, F. W. (1981): Introduction to Multidimensional Scaling: Theory, Methods and Applications, Academic Press, Section 3. Date: 23 June 2017 Page 3
© Copyright 2025 Paperzz