Evaluation of Card Sorting Experiments Using Factor Analysis

Master Thesis
Evaluation of Card Sorting
Experiments Using Factor Analysis
Author: Nishchal Narula
Matriculation Number: 6643461
Email: [email protected]
Supervisor: Prof. Dr. Gerd Szwillus
First Reviewer: Prof. Dr. Gerd Szwillus
Second Reviewer: Prof. Dr.Ing. Reinhard Keil
A thesis submitted in fulfillment of the requirements
for the degree of Master of Science
in the
Faculty of Electrical Engineering, Computer Science and Mathematics
Institute of Computer Science
Universitat Paderborn
October 2014
Declaration of Authorship
I, Nishchal Narula, declare that this thesis titled, ’Evaluation of Card Sorting Experiments Using Factor Analysis’ and the work presented in it are my own. I confirm
that:
This work was done wholly or mainly while in candidature for a research degree
at this University.
Where any part of this thesis has previously been submitted for a degree or any
other qualification at this University or any other institution, this has been clearly
stated.
Where I have consulted the published work of others, this is always clearly attributed.
Where I have quoted from the work of others, the source is always given. With
the exception of such quotations, this thesis is entirely my own work.
I have acknowledged all main sources of help.
Where the thesis is based on work done by myself jointly with others, I have made
clear exactly what was done by others and what I have contributed myself.
Signed:
Date:
i
Universitat Paderborn
Abstract
Faculty of Electrical Engineering, Computer Science and Mathematics
Institute of Computer Science
Master of Science
Evaluation of Card Sorting Experiments Using Factor Analysis
by Nishchal Narula
This is an era of science and technology. We live in the world, where everyone expects
best user experience from the gadgets available today. People want information to get
delivered on their smartphones, laptops, etc. without any delay. It is hard to organize
and manage such huge amount of data. This is the task of information architects, who
use insights from end users to structure the classification of websites into usable ones.
It is regarded as a difficult task because a user’s mental model can be different. In
this thesis, a method known as card sorting is described in detail. Card sorting is a
reliable method, which is used to collect data from users. Once the data is collected,
it is analyzed with the help of different techniques. Hierarchical cluster analysis is one
such technique. The groupings of cards depend on the similarities or dissimilarities
between them. Factor Analysis is the second technique that is discussed in this paper.
The results from hierarchical cluster analysis and factor analysis are evaluated, analyzed
and compared. Furthermore, appropriate suggestions for tackling the problematic cards
are given and productive ways in which a researcher can conduct card sorting experiments are discussed. To summarize, it is important to design a website according to the
perception of an end user. The result is a categorically-inclined and user-friendly website.
Keywords: Card Sorting, Hierarchical Cluster Analysis, Factor Analysis
Acknowledgements
First and foremost, I would like to sincerely express my gratitude to Prof. Dr. Gerd
Szwillus for giving me an interesting thesis topic in the field of Human-Machine Interaction. His support, good guidance and feedback were of great help to me. I would like
to thank the students of German universities and my friends, who took out time from
their busy schedules to perform card sorting experiments. Their suggestions helped me
a lot.
Last but not least, I would like to thank my Family. Their love and support helped me
to overcome all the difficulties during my Master thesis.
My work was in parallel with the work of Mr. Bhavesh Talreja, who was also a Masters
(in Informatik) student at the University of Paderborn. He conducted and evaluated
the card sorting experiments with the help of Multi-Dimensional Scaling, whereas I
evaluated the experiments with the help of Factor Analysis. The headings of the first
two chapters and the last two sections of the fifth chapter of this thesis are somehow
similar to the headings of his thesis because they form a relevant part of any card sorting
experiments. I also tried to highlight some extra information, which is not included in
his thesis. I would also like to extend my gratitude to Mr. Bhavesh Talreja, as his thesis
was also a good support to me.
iii
Contents
Declaration of Authorship
i
Abstract
ii
Acknowledgements
iii
Contents
iv
List of Figures
vi
List of Tables
viii
Abbreviations
ix
1 An
1.1
1.2
1.3
1.4
1.5
1.6
Introduction to Card Sorting
Motivation . . . . . . . . . . . .
Structure of a Card . . . . . . . .
Types of Card Sorting . . . . . .
Accomplish Card Sorting . . . .
Online Card Sorting . . . . . . .
Advantages and Disadvantages of
1.6.1 Advantages . . . . . . . .
1.6.2 Disadvantages . . . . . . .
1.7 Structure of the Thesis . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Card Sorting
. . . . . . . .
. . . . . . . .
. . . . . . . .
2 Evaluation of Card Sorting Experiments
2.1 Cluster . . . . . . . . . . . . . . . . . . . .
2.2 Data Analysis . . . . . . . . . . . . . . . .
2.2.1 Pattern Matrix . . . . . . . . . . .
2.2.2 Proximity Matrix . . . . . . . . . .
2.3 Selecting Attributes . . . . . . . . . . . .
2.4 Cluster Analysis - Classification . . . . . .
2.4.1 Distinctions . . . . . . . . . . . . .
2.4.2 Hierarchical Clustering . . . . . . .
2.4.3 How to interpret a Dendogram . .
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
3
4
5
6
6
6
7
8
.
.
.
.
.
.
.
.
.
10
10
11
11
13
15
15
15
17
21
Contents
2.5
v
2.4.4 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 An Insight to Factor Analysis
3.1 An Introduction to Factor Analysis .
3.2 Types of Factor Analysis . . . . . . .
3.2.1 Exploratory Factor Analysis .
3.2.2 Confirmatory Factor Analysis
3.3 Factor Analysis Model . . . . . . . .
3.4 Types of Factoring . . . . . . . . . .
3.5 Data Modes of Factor Analysis . . .
3.6 Factor Analysis Protocol . . . . . . .
3.7 Summary . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
27
27
28
28
30
31
32
38
4 Evaluation of Card Sorting with Factor Analysis
4.1 IBM SPSS Guide . . . . . . . . . . . . . . . . . . .
4.1.1 Defining a variable . . . . . . . . . . . . . .
4.1.2 Conduct Factor Analysis . . . . . . . . . . .
4.2 Card Sorting Experiments . . . . . . . . . . . . . .
4.2.1 Eatables Website . . . . . . . . . . . . . . .
4.2.2 Entertainment Website . . . . . . . . . . .
4.2.3 Automobile Website . . . . . . . . . . . . .
4.2.4 Health Website . . . . . . . . . . . . . . . .
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
40
41
48
49
57
64
74
83
5 Comparison of the results
5.1 Comparison of Results . . . . . . . . . . .
5.1.1 Eatables . . . . . . . . . . . . . . .
5.1.2 Entertainment . . . . . . . . . . .
5.1.3 Automobile . . . . . . . . . . . . .
5.1.4 Health . . . . . . . . . . . . . . . .
5.2 How to tackle Problematic Cards . . . . .
5.3 General recommendations for researchers
5.4 Summary . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
84
84
84
87
89
91
93
95
96
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Conclusion
97
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A Analysis Tools
99
A.1 Orange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.2 IBM SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Bibliography
101
List of Figures
1.1
1.2
1.3
Representation of cards in a card sorting experiment . . . . . . . . . . . .
A Sample Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An Open Card Sort in progress on WeCaSo . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Different Types ot Clusters . . . . . . . . . . . . . . . . . . .
Euclidean Distance between the points xi and xj . . . . . . .
Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . .
Merging nearest clusters . . . . . . . . . . . . . . . . . . . . .
Different types of distances between clusters . . . . . . . . . .
Clustering in Single, Complete and Group Average Linkage .
Dendogram - Group Average Linkage Hierarchical Clustering
K-means Clustering . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
13
16
18
18
20
22
23
3.1
3.2
3.3
3.4
A Two-factor model . . . . . . . . . .
Exploratory Factor Analysis Protocol
View of pillars(unrotated solution) . .
View of pillars(rotated solution) . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
33
36
37
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
Data View in SPSS with no variables . . . .
Value Labels Dialog Box . . . . . . . . . . .
Missing Values Dialog Box . . . . . . . . . .
Measure options . . . . . . . . . . . . . . .
Variable view after definition of one variable
Conduct Reliability Analysis . . . . . . . .
Reliability Analysis Output . . . . . . . . .
Conduct Factor Analysis . . . . . . . . . . .
Factor Analysis Dialog Box . . . . . . . . .
Factor Analysis: Descriptives Dialog Box .
Factor Analysis: Extraction Dialog Box . .
Factor Analysis: Rotation Dialog Box . . .
Factor Analysis: Scores Dialog Box . . . . .
Factor Analysis: Options Dialog Box . . . .
Descriptives Statistics Output . . . . . . . .
Correlation Matrix Output . . . . . . . . .
Communalities Output . . . . . . . . . . . .
Total Variance Explained Output . . . . . .
Scree Plot Output . . . . . . . . . . . . . .
Unrotated Component Matrix Output . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
40
41
42
42
42
43
43
44
45
45
46
47
47
48
51
52
53
54
55
55
vi
.
.
.
.
.
.
.
.
2
3
7
List of Figures
vii
4.21
4.22
4.23
4.24
4.25
4.26
4.27
4.28
4.29
4.30
4.31
4.32
4.33
4.34
4.35
4.36
4.37
4.38
4.39
4.40
4.41
4.42
4.43
Rotated Component Matrix Output . . . . . . . . . . . .
Component Plot in Rotated Space Output . . . . . . . . .
Reliability Analysis Output . . . . . . . . . . . . . . . . .
Correlation Matrix Output . . . . . . . . . . . . . . . . .
Communalities Output . . . . . . . . . . . . . . . . . . . .
Total Variance Explained Output . . . . . . . . . . . . . .
Scree Plot Output . . . . . . . . . . . . . . . . . . . . . .
Rotated Component Matrix Output . . . . . . . . . . . .
Reliability Analysis Output . . . . . . . . . . . . . . . . .
1st Half of Correlation Matrix Output . . . . . . . . . . .
2nd Half of Correlation Matrix Output . . . . . . . . . . .
Communalities Output . . . . . . . . . . . . . . . . . . . .
Total Variance Explained Output . . . . . . . . . . . . . .
Scree Plot Output . . . . . . . . . . . . . . . . . . . . . .
Rotated Component Matrix Output - 4 Extracted Factors
Rotated Component Matrix Output - 7 Extracted Factors
Reliability Analysis Output . . . . . . . . . . . . . . . . .
1st Half of Correlation Matrix Output . . . . . . . . . . .
2nd Half of Correlation Matrix Output . . . . . . . . . . .
Communalities Output . . . . . . . . . . . . . . . . . . . .
Total Variance Explained Output . . . . . . . . . . . . . .
Scree Plot Output . . . . . . . . . . . . . . . . . . . . . .
Rotated Component Matrix Output - 4 Extracted Factors
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
58
59
62
63
64
65
65
67
69
70
71
72
72
73
74
76
78
79
80
81
81
82
5.1
5.2
5.3
5.4
5.5
5.6
Single Linkage Hierarchical Clustering for 30 cards . . . . .
Group Average Linkage Hierarchical Clustering for 30 cards
Complete Linkage Hierarchical Clustering for 30 cards . . .
Complete Linkage Hierarchical Clustering for 24 cards . . .
Complete Linkage Hierarchical Clustering for 32 cards . . .
Complete Linkage Hierarchical Clustering for 28 cards . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
87
88
89
90
92
List of Tables
2.1
2.2
Pattern Matrix for four cars with their types and power. . . . . . . . . . . 12
Proximity Matrix with dissimilarities between 6 fruits. . . . . . . . . . . . 14
4.1
4.2
4.3
4.4
List
List
List
List
of
of
of
of
cards
cards
cards
cards
-
eatables. . . . . . . .
entertainment genres.
Automobiles Website.
Health Website. . . .
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
58
66
75
Abbreviations
HCA
Hierarchical Cluster Analysis
IBM
International Business Machines Corporation
SPSS
Statistical Package for the Social Sciences
WeCaSo
Web-based Card Sorting
UPGMA
Unweighted Pair Group Method with Arithmetic mean
etc
etcetera
ix
This thesis is dedicated to my parents for their endless love and
support. They have always encouraged me to follow the right path
in life and also motivated me to chase my dreams.
x
Chapter 1
An Introduction to Card Sorting
“Expect the best.
Prepare for the worst.
Capitalize on what comes.”
ZIG ZIGLAR
In today’s world, it is hard to believe the accuracy behind the management and organization of huge amount of data in the networking websites such as Google+, Facebook,
LinkedIn, etc. One of the prime reasons behind this achievment is immense technological advancements in the field of Computer Science. Software designers and data experts
strive hard to accomplish challenging tasks. The challenge is to search and filter for the
meaningful and required patterns in data. It becomes extremely difficult for users to
search for the information, what they want, if data sets are not organized in a proper
manner. Therefore, it is of utmost importance to organize data sets into appropriate
categories, which is usually done after understanding the relationships between items
[1].
An example here will be appropriate for the better understanding. We all check our
emails, visit our accounts on social networking websites, visit new websites, search information on Google, etc. Now, consider the website of University Paderborn. If a user
is keen to know about the courses offered at the University of Paderborn, he refers the
Studies page. He will not navigate the Research or Faculties page. Similarly, if he wants
to know about the University’s profile, he will visit About the University page. It is important for the data items to be organized into synchronized and appropriate categories.
This can be efficiently managed by professional people such as domain experts; however,
the perception of a final can be different. The classifications of an expert and the end
user can be different because their mental models are not identical [1][2].
1
Chapter 1. Card Sorting - An Introduction
2
According to Donna Spencer et al., “Card Sorting is a great, reliable, inexpensive method
for finding patterns in how users would expect to find content or functionality” [1]. In
the Figure 1.1, one can see the general representation of cards after sorting. The figure
has been taken from [3]. The cards, which are grouped together (whether two, three,
four or five), belong to one category. Here, cards have been sorted by hand. However,
one can also perform card sorting experiments with the help of online sorting softwares
[1]. This will be discussed in one of the later sections. It is important for every website
to be properly structured. The result, which we get from card sorting experiments, can
be used as an input for a website design process. A good categorically-inclined website
represents its own usage. If it is not convenient for a user to navigate through a website,
then it might be the case that he or she stops using it [1].
Figure 1.1: Representation of cards in a card sorting experiment.
1.1
Motivation
As we have just discussed that the outcome of card sorting experiments can be used
as an input for a website design process, it can also assist us in clearing many queries,
which might arise during the design phase. While designing the structure of a website,
information architects are the best judges because they are trained professionals in that
area and they know how the stuff works. But even then, there can be cases that the
website is not user-friendly because some users may not like the overall structure of a
website. They also might disagree with the contents and groupings of some elements.
Due to such cases, we can clarify doubts regarding various scenarios such as: number of
potential categories, how users want the information in a website to be structured, what
are their exact requirements and how are these needs similar among different users [1]
[4].
Card sorting helps people to categorize cards or items semantically based on their own
perspectives. It is also reliable in a way, if it includes those users in performing card
sorting experiments, who will eventually use the product or the website. In this case, we
can exactly come to know the expectations and requirements of users. So, card sorting
is easy to understand and simple to operate. It is quite cheap to use because the only
Chapter 1. Card Sorting - An Introduction
3
valuable thing you devote in it, is the preparation time. Once everything is prepared,
it is easy to apply and we can increase or decrease the number of users depending upon
the needs. It also avoids questionnaire, as a user can do whatever he thinks [5].
1.2
Structure of a Card
The cards, which are used in the card sorting experiments, should have some structure.
The things written on a card should be clear and easily understandable. The Figure 1.2
shows general representation of a card. The idea of the figure has been taken from [3].
The description of the card is as follows [6] [7]:
Figure 1.2: A Sample Card.
• Card Number
It is a unique identity of a card. In some cases, text is also used as an identification.
• Card Title
The title is like a main heading and should describe the essence of a card. It is
usually written in one or two words. It should be self explanatory.
• Card Description
While setting up the cards for an experiment, if any title is difficult to understand,
then some explanation about the card can be added to help a user understand the
card better.
• Pictures
In some cases, cards may contain text or only pictures or a combination of both.
Having a picture helps a user to understand the card without reading it’s description. When the participants are from different nationalities, it becomes quite easy
Chapter 1. Card Sorting - An Introduction
4
to conduct the experiments because pictures can explain almost everything irrespective of any language. Also, people with some learning disorders and children
can be included in such experiments.
1.3
Types of Card Sorting
Data experts usually decide the type of card sorting to conduct. It depends on the
requirements and limitations of the product. Users are provided with information based
on these requirements and are then asked to perform the respective experiments. The
explanation of the variants of card sorting is as follows [7]:
1. Open Card Sorting
In this type of card sorting, cards are given to participants and they are asked to
categorize these cards into appropriate groups. The names of groups are not predefined. Participants have the liberty to decide the names. It is quite interesting
to see the namings of groups because every participant decides them based on their
mental models. Eventually, data experts can pick these group names to decide the
final categories, if they want. This technique is useful when we have to design a
new structure of a website [1].
2. Closed Card Sorting
In this type of card sorting, cards are given to participants along with the predefined categories and then they are asked to sort the cards under the given categories. The names of the categories are decided by data experts according to
requirements. This technique helps to add new information to a structure (e.g.
website) or to evaluate the existing information in a structure. Additionally, it
helps to improve the usability of a website [1].
3. Hybrid Card Sorting
We can call it a closed card sorting with the benefits of an open one. In this type,
cards are given to participants along with the pre-defined categories and then they
are asked to sort the cards under the given categories. But, if the participants are
not satisfied with any of the given categories, they have the freedom to change
them [8].
Open and closed card sorting are mainly discussed in this paper. The hybrid card
sorting is not used in the conducted experiments.
Chapter 1. Card Sorting - An Introduction
1.4
5
Accomplish Card Sorting
There are certain key things, which one should take care before performing card sorting
experiments. They are as follows [1] [7]:
• Content Selection
Determining what to put on cards is one of the main task. The naming criteria
should satisfy all of the requirements. The research about content can be done
from the internet and by discussing business processes with the people who will
actually use the product. If proper planning is done, we can also include some
future contents. With the help of this, the product will be able to meet the
requirements of the future as well.
• Participant Selection
The number of people involved in card sorting experiments, depends on the complexity of a product. A decent number is considered to be as 30. Experiments
can be carried out either in groups or individually. Normally, we get rich data if
card sorting is performed among groups; however on the other hand, arranging
individuals is much easier than scheduling groups. So, both types have their own
plus points and disadvantages.
• Number and preparation of cards
The number of cards involved in card sorting experiments, again depends on the
complexity of a product. Normally, a number between 30 and 100 is acceptable,
not more and not less. The labels on your cards should be precise enough for a
participant to understand. In some cases, a small description about a card is added
and if necessary, a picture can also be added, which describes the card. Every card
has a unique identity known as card number.
• Sort Selection
The type of card sorting to perform, depends on the needs of a product. A good
plan would be to conduct both the sorts at different stages of a product’s life cycle.
An open sort is appropriate at the stage, when the final categories are not decided.
Once you are sure about the content, then a closed sort can be conducted to check
the correctness of all categories. At this point, a hybrid sort [8] is also helpful.
Card Sorting experiments are generally performed as follows [1] [7]:
1. Invite the participants and while inviting them, ask them that they just have to
perform some simple tasks.
Chapter 1. Card Sorting - An Introduction
6
2. Hand over the same set of cards to every participant and ask them to sort the
cards into appropriate categories.
3. In case of an open sort, the subjects should create their own categories according
to their mental models.
4. If they cannot find a suitable category for any card, they should leave that card
and proceed further.
5. In the end, they should hand over that data to administrator so that he/she can
analyze the data.
1.5
Online Card Sorting
Some years ago, card sorting was used to be done with the help of physical paper
cards. Now, online card sorting has gained much popularity. Online Sorting tools like
OptimalSort [9], WebSort [10], CardZort (for Windows OS) [11] etc. are used for card
sorting. University of Paderborn has its own tool for card sorting, known as WeCaSo
[12]. This tool is user-friendly and one can easily create his/her own sorting projects.
After creating the projects, links can be distributed among participants. Once subjects
are finished with their experiments, this tool gives the option of giving feedback to
administrator. Participants can also provide their email addresses, if they want to get
information about results. After they are finished with the experiments, the data can
be exported and used to analyze the results [13].
The figure 1.3 shows an open card sorting in progress on the WeCaSo tool. Some cards
are already sorted into the categories.
1.6
Advantages and Disadvantages of Card Sorting
1.6.1
Advantages
The plus points of card sorting are as follows [1]:
1. Card Sorting is simple to set up and easy to perform.
2. It costs almost nothing. If it is a physical paper sort, then one has to arrange some
cards, pens and notes. On the other hand, if it is an online sort, then you just
have to set up and perform an experiment online. Time is the only valuable thing
that one will invest in preparation.
Chapter 1. Card Sorting - An Introduction
7
Figure 1.3: An Open Card Sort in progress on WeCaSo [12].
3. This technique has been trusted and is in use for some years now.
4. It doesn’t take much time to perform the experiments and in less time we get
considerable data.
5. People who perform experiments, get a sense of involvement in a design process.
It also establishes a feeling of enthusiasm among the subjects.
6. Card sorting provides a good basis or platform for the structure of majority of
products (websites).
1.6.2
Disadvantages
The negative points of card sorting are as follows [1] [14]:
1. It can be difficult for participants to perform the sorting experiments, if they don’t
know much about the cards and whole process. In such cases, the result will not
be effective, usable and helpful.
2. One should always set up the experiments after knowing the content requirements,
otherwise sorting will not be successful.
3. One should not expect much from card sorting as it doesn’t guarantee usable
results everytime. There are many factors on which result is dependent.
Chapter 1. Card Sorting - An Introduction
8
4. Card sorting is highly subjective in nature and there is not any right or wrong
answer for experiments.
5. When the category of a card is not clear to a participant, then he or she may sort
the cards randomly and the results can be inconsistent.
6. Lastly, if the results are inconsistent, it becomes tedious to analyze the data.
As already discussed in the section 1.6, that card sorting is easy to prepare and perform,
the main challenge lies in analyzing the exported data. Data experts have to search and
filter for the meaningful and required patterns in data. This is quite a hard task. Data
can be analyzed with the help of different techniques, to name a few, cluster analysis,
factor analysis, multidimensional scaling, etc [1]. In this thesis, multidimensional scaling
will not be discussed, so to know more about it, the interested readers can refer this
paper [15].
1.7
Structure of the Thesis
The rest of the thesis is organized in the following way:
• Chapter 2
This section discusses the first technique to analyze card sorting experiments. The
technique, cluster analysis, is discussed along with its variants. It also shows how
one can interpret the results of cluster analysis, with the help of an example.
• Chapter 3
This section shows an insight to the technique of factor analysis and its variants.
It also throws light on some of the extraction and rotation techniques. Towards
the end, it shows a five-step protocol, which one should know before performing
factor analysis.
• Chapter 4
This section describes the step-by-step guide to conduct factor analysis with the
help of the tool IBM SPSS. Four different experiments, (from the field of eatables,
entertainment, cars and health) are evaluated.
• Chapter 5
This section discusses and compares the results of the experiments from both the
techniques (cluster analysis and factor analysis). It describes how one can deal
with the scenarios of problematic cards. Additionally, some recommendations for
a researcher, who performs card sorting, are also given.
Chapter 1. Card Sorting - An Introduction
9
• Chapter 6
This section provides the summary of the whole research and discusses the prospective future work in the area of card sorting.
Chapter 2
Evaluation of Card Sorting
Experiments
“The beginning of all understanding is classification.”
HYDEN WHITE
We have already discussed about card sorting in the Chapter 1. It is a user-centered
design, which is used to organize data into appropriate categories. The role of a user ends
with card sorting experiments, then the main challenge starts for data experts to analyze
the data. Generally, one should have a deep and clear understanding of certain concepts
for analyzing data. In this chapter, one such technique known as cluster analysis, is
discussed. It is a method, which is used to discover groups or similar patterns in data
and these patterns depend on some relationships among data items [16]. In the coming
sections, we will discuss about a cluster and how exactly distance between data items is
calculated.
2.1
Cluster
A Cluster is a bunch or a set of objects grouped together in a similar way or pattern.
We can see clusters of different shapes in the Figure 2.1. From the figure, it is clear that
the items belonging to one cluster are more alike or same to each other [17]. Due to this,
we can say that the cards belonging to one cluster are more alike to each other than the
cards belonging to other clusters. According to Everitt et. al. “a single definition is
not enough for many situations” [16], as in some of the cases, it is easy to understand
a cluster without a formal definition.
10
Chapter 2. Analysis of Experiments
11
Figure 2.1: Different Types ot Clusters [18].
2.2
Data Analysis
It is vital to know that how one should like to carry out the analysis of data. Data
must be arranged in a particular pattern so that it can get analyzed. For instance, tools
like Orange [19], accepts data in the form of dissimilarity matrix, for hierarchical cluster
analysis. Generally, to carry out cluster analysis, data either in the form of pattern
matrix or proximity matrix, is used as an input. Let us discuss about these matrices in
the following sub-sections.
2.2.1
Pattern Matrix
It is that type of matrix in which the rows are objects and the columns are attributes of
these objects. This will become clear with the help of an example. Assume, we have to
cluster x number of objects or items and y represents these objects’ attributes. Then,
we can arrange them in a x*y pattern matrix. This type of matrix is also known as twomode matrix because entities in the rows and columns are different [20]. Consider four
different cars and each car has it’s own attributes as type and power. From the Table
2.1, it can be seen that every car in a row is a pattern, which has it’s own attribute
values.
The matrix of x objects with y attributes is represented as follows [20]:
Chapter 2. Analysis of Experiments
Serial No.
1
2
3
4
12
CAR
AUDI A4
AUDI Q3
AUDI A1
AUDI R8
TYPE
Limousine
SUV
Hatchback
Coupe
POWER
150 PS
140 PS
122 PS
525 PS
Table 2.1: Pattern Matrix for four cars with their types and power.

Nx,y
n1,2 . . . n1,y


 n2,1 n2,2 . . . n2,y

= .
..
..
..
 ..
.
.
.

nx,1 nx,2 . . . nx,y






n1,1
As we discussed about dissimilarity matrix in the section 2.2, so to know about the
dissimilarity or similarity among data items in pattern matrix, we have to calculate the
distance between any two data items. The distance metrics, which are used to calculate
distances between items, are discussed as follows:
1. Euclidean Distance
The Euclidean distance between the points (x1 , y1 ) and (x2 , y2 ) is given by the
Pythagorean theorem [21]:
Euclidean − distance =
q
(x2 − x1 )2 + (y2 − y1 )2
(2.1)
See the figure 2.2 for a better understanding. In the same way, the Euclidean
distance between any two items yi and yj having n attributes in a pattern matrix
is given by [20]:
dyi ,yj =
q
(yi1 − yj1 )2 + (yi2 − yj2 )2 + . . . + (yin − yjn )2
(2.2)
2. Manhattan Distance
This distance metric is quite interesting as it has got it’s name from the Manhattan
city. The streets in Manhattan are perpendicular to each other, they are either
north to south or east to west. Consider a city, whose streets are same as in the
Figure 2.2. Then, if you want to travel from the point i to the point j, you will
have to cover the distance equal to |xi1 − xj1 | + |xi2 − xj2 |. In the same way, the
manhattan distance between any two items yi and yj having n attributes is given
by [20]:
dyi ,yj = |yi1 − yj1 | + |yi2 − yj2 | + . . . + |yin − yjn |
(2.3)
Chapter 2. Analysis of Experiments
13
Figure 2.2: Euclidean Distance between the points xi and xj [20].
3. Minkowski Distance
We can get minkowski distance, if we generalize both the manhattan and euclidean
distance metrics. The equation is as follows [20]:
dyi ,yj = (|yi1 − yj1 |z + |yi2 − yj2 |z + . . . + |yin − yjn |z )1/z
(2.4)
We can verify the correctness by inserting the value of z, where z is a real number
greater than or equal to 1, in the above equation. If we put z=1, we get manhattan
distance metric. If we insert z=2, we get euclidean distance metric. Interestingly,
it is also possible to construct more distance functions. To know more about this,
we refer the readers to this reference[20].
2.2.2
Proximity Matrix
A proximity matrix helps us to know about the similarity or dissimilarity between a pair
of items or objects. Table 2.2 shows a proximity matrix with dissimilarities between 6
fruits. Proximity matrix is symmetric, which means that the proximity from lemon to
plum is same as the proximity from plum to lemon. The diagonal elements in the given
proximity matrix are zero. The elements on the left side of a diagonal are same as the
elements on the right side. Also, the elements in the first row are same as the elements
in the first column, and so on [22]. As we have already discussed about pattern matrix
in the section 2.2.1, so we can create a proximity matrix with the help of Manhattan
or Euclidean distance. We have to use a proximity matrix as an input to a tool or a
clustering algorithm, to analyze data [20]. Proximity matrix is of two types:
Chapter 2. Analysis of Experiments
Lemon
Plum
Mango
Cranberry
Banana
Apple
Lemon
0
15
17
17
16
23
Plum
15
0
26
26
25
25
14
Mango
17
26
0
14
15
26
Cranberry
17
26
14
0
20
26
Banana
16
25
15
20
0
25
Apple
23
25
26
26
25
0
Table 2.2: Proximity Matrix with dissimilarities between 6 fruits.
• Dissimilarity Matrix
The proximity matrix of the size m*m, where m is the number of items, is known
as dissimilarity matrix if the distance function shows dissimilarity between two
items. The values in the matrix can be either Euclidean distance or Manhattan
distance. It is also known as distance matrix. These values are among the pair of
items. So, the two items are more dissimilar to each other if the distance between
them is high and the items are similar to each other if the distance between them
is low. Generally, the diagonal elements in this matrix are zero because distance
between an item and itself is always zero [23]. Let mi,j denotes a proximity index,
which shows dissimilarity between the items i and j. The items i and j belong to
the dissimilarity matrix M. Then, mi,j caters to the following points [20]:
1. mi,j = mj,i , symmetric distance function.
2. mi,j ≥ 0, distance function is non-negative.
3. mi,i = 0, distance between an item and itself is zero.
• Similarity Matrix
The proximity matrix is known as similarity matrix if the distance function shows
similarity between two items. The values in the matrix can be either Euclidean
distance or Manhattan distance. The two items are more similar to each other
if the distance between them is high and the items are dissimilar to each other
if the distance between them is low. A correlation matrix is also considered as a
similarity matrix because in a similarity matrix, we measure similarities of items
pairwise [24]. Let mi,j denotes a proximity index, which shows similarity between
the items i and j, having the values between 0 and 1. The items i and j belong to
the similarity matrix M. Then, mi,j caters to the following points [20]:
1. mi,j = mj,i , symmetric distance function.
2. mi,j = 0, distance function is non-negative and i and j are unlike to each
other.
Chapter 2. Analysis of Experiments
15
3. mi,j = 1, distance function is non-negative and i and j are alike to each other.
4. mi,i = 1, distance between an item and itself is one.
2.3
Selecting Attributes
It is an intelligent task to filter the attributes needed for any type of data analysis
because there are always some attributes, which are not important and useful for the
analysis. This is one of the challenging tasks for software designers and data analysts,
before carrying out any kind of data analysis. One point to note here is that, if these
attributes are used as an input, it may lead to the formation of irrelevant clusters and
might hinder the process of analysis. Assume, we want to conduct a survey of some
powerful cars in the world, then the price and colour can be considered as irrelevant
attributes. Also, we don’t need any personal information of the people participating in
the survey. As a result, one must take utmost care in choosing the right attributes for
analysis [20].
2.4
Cluster Analysis - Classification
Before we proceed towards the classification of cluster analysis, it should be clear what
exactly cluster analysis is and what we try to do. From the Figure 2.3 , one can see that
in cluster analysis, we try to reduce intra-cluster distances and increase inter-cluster
distances, so that alike items can be grouped together in the same cluster. The quality
of a cluster is considered to be good when the items in one cluster are more alike to
each other than the other items in different clusters. Cluster analysis finds its usage in
many multidisciplinary fields. To name a few, medical science, marketing, geography,
insurance, architecture, etc. For example, in the field of marketing, it helps people to
identify and study similar groups or classes of their already set customer bases. In the
field of geography, it helps in identifying same land usage and earthquake epicenters. In
the field of medical science, it helps to recognize similar patterns in CT scan. It also
offers a great helping hand in image processing, which falls under the field of computer
graphics [25].
2.4.1
Distinctions
There are some distinctions with which one can differentiate between the sets of clusters.
They are as follows [25]:
Chapter 2. Analysis of Experiments
16
Figure 2.3: Cluster Analysis [25].
• Exclusive and Non-exclusive
Points belonging to multiple clusters or groups, fall under the category of nonexclusive clustering. Whereas, in exclusive clustering, points belong to only one
group. Suppose we want to group cars in a category on the basis of their names
and colours. Then this comes under non-exclusive clustering as many cars can
have same names and colours. If we group the same cars on the basis of their
owner names and numbers, then this comes under exclusive clustering because the
numbers and owners of every car will be different. It should be noted that the
experiments, which were conducted for this thesis, belong to exclusive clustering
as participants had the chance to put a card only under single category.
• Fuzzy and Non-Fuzzy
In Fuzzy clustering, there is a concept of weight related to a point. In this, a point
belongs to all clusters having weight. The range of weight is from 0 to 1. Also, if
we calculate the sum of all the weights, it will be equal to 1. Whereas, in non-fuzzy
clustering, a point does not belong to every cluster.
• Partial and Complete
In complete clustering, the clustering of entire data is done whereas in partial
clustering, only some of the data is clustered.
• Homogenous and Heterogenous
Under these clustering distinctions, we deal with the clusters of various shapes,
sizes and densities.
Cluster analysis can be classified as follows:
Chapter 2. Analysis of Experiments
2.4.2
17
Hierarchical Clustering
In this type of cluster analysis, the main aim is to sort out and group those items
together, which are near to each other. Firstly, distance between the items is repeatedly
calculated. After that, distance between the clusters is repeatedly calculated, once the
items start to get assigned into clusters. After the clustering, the output can be seen
in the form of graphical tree-like structures, known as dendograms [26] [27]. One of the
main advantages of hierarchical clustering is that while interpreting dendograms, we can
take into account any number of clusters as there is a possibility to cut dendograms at
a particular level. This type of clustering is particularly useful in the field of biological
sciences (like plant and animal kingdom, phylogenetics, etc.) [25]. The interpretation
of dendograms will be explained in the next sub-section. Hierarchical clustering can be
further classified into the following categories:
1. Agglomerative Hierarchical Clustering
In agglomerative type of clustering, the given items are assigned as individual clusters. After that the clusters, which are nearest to each other (having minimum
distance), are merged. This process continues until a single cluster or an item (object) is left. This technique is used more than the Divisive hierarchical clustering
technique, which will be discussed in the next sub-section. It is also known as
bottom up approach [28]. As we have already discussed in the section 2.2.2 that a
proximity matrix is used as an input to the clustering algorithm, therefore after
the input, the algorithm works as follows: [25][29]
• The proximity matrix is computed.
• Suppose there are x number of items, then each item is assigned as individual
cluster.
• Merge the nearest two items or clusters to form a bigger cluster.
• The proximity matrix is updated.
• Repeat the steps 3 and 4 until a single cluster is left.
The Figure 2.4 shows how the nearest two clusters are merged and proximity
matrix is updated. We can see that the clusters C2 and C5 are merged and the
rows and columns corresponding to these clusters are updated.
The two closest clusters are merged according to the distance between them and
there are various methods with which we can define this distance and calculate
the dissimilarities between clusters. Those methods are explained as follows [25]:
Chapter 2. Analysis of Experiments
18
Figure 2.4: Merging nearest clusters [25].
(a) Single Linkage Clustering
In this type of clustering, the shortest distance between the two clusters is
taken into account. As soon as the two closest clusters are detected, they are
merged into one cluster. This process continues until all the items are in one
cluster. Due to this reason, it is also known as nearest neighbour clustering.
In the Figure 2.5, one can see the single linkage in blue colour. Chaining
Phenomenon is one of the disadvantage of this method. As more and more
items are combined to each other, the size of the overall cluster increases and
this gradually leads to the formation of diverse clusters, which are later hard
to analyze [30].
Figure 2.5: Different types of distances between clusters [25].
(b) Complete Linkage Clustering
Chapter 2. Analysis of Experiments
19
In this type of clustering, the farthest distance between the two clusters is
taken into account. Initially, the two closest clusters are merged into one cluster. The link between two clusters includes all item pairs. Also, the distance
between clusters is equal to the distance between the two farthest items in
those two clusters. Then, the merging of these clusters takes place, which is
based on the shortest links. Due to this reason, it is also known as farthest
neighbour clustering. As a result, the problem of Chaining Phenomenon is
no more there in complete linkage clustering. In the figure 2.5, one can see
the complete linkage in green colour [31].
(c) Group Average Linkage Clustering
In this type of clustering, the average of all the distances between all the
pairs of items or objects is taken into account. This average distance can
also be called as the mean distance between the items of every cluster. It
is also known as Unweighted Pair Group Method and Arithmetic Mean (UPGMA). Generally, this is the preferred technique for clustering as it leads to
the formation of good dendograms that shows the behaviour of dissimilarity
matrix in pairs. In the figure 2.5, one can see the group average linkage in
yellow colour. This technique is widely used in the field of ecology (species
composition) and in the area of bioinformatics (phenograms) [32].
In the Figure 2.6, one can see how exactly the clustering of items takes place.
MIN refers to single linkage clustering, MAX refers to complete linkage clustering and Group Average refers to group average linkage clustering. Let us
consider the case of MIN clustering. In the figure, black dots with numbers
are the items or objects to be clustered and red digits show the number of a
cluster. At first, the items 3 and 6 are merged into cluster number 1. Then,
the items 2 and 5 are merged into cluster number 2. After that, cluster 1
and cluster 2 are merged into cluster number 3. In the next step, item 4 and
cluster 3 are merged into cluster number 4. Finally, item 1 and cluster 4 are
merged into cluster number 5.
The clustering in the other two cases also takes place like this, however according to the their own algorithms.
One should take utmost care with the choice of the clustering methods because
if the data set is small, then there will be hardly any difference between
the dendograms formed by the above mentioned three techniques. On the
other hand, if the data set is large, then the dendograms formed with every
technique will be quite different [33].
2. Divisive Hierarchical Clustering
Chapter 2. Analysis of Experiments
20
Figure 2.6: Clustering in Single, Complete and Group Average Linkage [25].
In divisive type of clustering, initially all the items are assigned into a single
cluster. After that, it is splitted into sub-clusters and this process continues until
one elementary cluster is left. We can also say that it’s algorithm works opposite
to the algorithm of agglomerative clustering [25]. It is also known as top down
approach [28]. Initially, at the time of splitting an all-inclusive cluster, it should
have all the required data. After that, it splits the data into two best possible
clusters. Sometimes, this can lead to some complexities, if we have to deal with
large data sets. On the other hand, during the process of splitting big clusters into
smaller sub-clusters, we can stop if the number of clusters suits our requirements.
This technique is used less than it’s counterpart agglomerative clustering because
as it follows top-down approach, the initial big cluster (having many items) has to
be splitted into two clusters, which at times may be tedious for an algorithm to
compute [20] [34] [35].
One of the limitations of hierarchical clustering is that if someone decides to merge
suppose two clusters or items, it cannot be reversed or undone. Sometimes, in the
case of large and complex data sets, we get clusters of a wide variety of shapes and
sizes. Handling these types of clusters also becomes a challenge [25]. Towards the
end, let’s discuss some applications of clustering [35]:
• In the area of Astronomy, there is a project named as SkyCat, in which 2X109
sky objects were clustered into stars, galaxies, etc.
Chapter 2. Analysis of Experiments
21
• In the field of computer graphics, Image segmentation is done with the help
of clustering, which allows us to highlight or focus those parts of an image in
which we are interested in.
• In image database organization, it helps in searching the desired results efficiently.
• During the search of information, we can filter the results according to our
requirements.
• In biological applications, gene expression profile clustering helps us in identifying similar functions and expressions.
• Profiling of the web users can be done to categorize them and improve the
web usability.
• Last but not least, one should tune the clustering algorithm according to the
application, to get more or less the best results.
2.4.3
How to interpret a Dendogram
The results of hierarchical clustering can be viewed in the form of dendograms. A
dendogram is a tree-like structure, which helps us to know how the clusters are
arranged [36]. The clusters in dendograms are represented as nodes and the length
of these clusters can be viewed from the horizontal scale [20]. See the figure 2.7 for
a better understanding. As discussed in the previous sub-section, there are three
methods with which the distance between the two clusters can be calculated. They
are single linkage, group-average linkage and complete linkage.
For this thesis, a closed card-sorting experiment with 28 cards was conducted,
where the categories were already pre-defined. It was about healthy living. The
Figure 2.7 is the result of that experiment, evaluated with hierarchical cluster
analysis (group average linkage). Data mining tool Orange A was used for the
analysis. The group average distance between two clusters starts with 0.0 and
goes upto 23.42. The cut-off line is set just a little more than 17.56 so that we can
see five clusters. We can move this cut-off line in order to see different clusters
at a particular distance value. The nodes starting from the value 0.0 are known
as terminal nodes and the point where two terminal nodes intersect, is known
as internal node. In the figure 2.7, we can see names of the cards at terminal
nodes. They are also known as labels. A point to note here is that, we got these
dendograms once we used the dissimilarity matrix as an input to the Orange tool.
If you see the matrix, it shows the distance function as the number of participants
who didn’t put the two cards in the same category. Also, an item will not be used
for clustering anymore, if it has been assigned once to a cluster [20][25].
Chapter 2. Analysis of Experiments
22
Figure 2.7: Dendogram - Group Average Linkage Hierarchical Clustering.
2.4.4
K-means Clustering
K-means was one of the first partitional algorithms suggested for clustering. Although it was proposed around 50 years ago, however it is still used because of
it’s simple implementation, high efficiency and empirical success. In this type of
clustering, every cluster is related to a centroid. The grouping of an item to a
cluster takes place with the nearest centroid. The algorithm of k-means works as
follows: [26][25]
• Select k items as starting centroids
• Repeat the next two steps until the centroids don’t change
• Assign every item to it’s closest centroid so as to form k clusters
• Compute the centroid of every cluster
The Figure 2.8 illustrates the working of k-means clustering algorithm. The part
(a) shows the input data with three clusters (k=3). The part (b) shows 3 selected
seed points as centroids and assigning of items to clusters. The parts (c) and (d)
Chapter 2. Analysis of Experiments
23
show the iterations in which centroids are updated. The part (e) shows the final
resulting clusters [26].
Figure 2.8: K-means Clustering [26].
It should be noted that the k-means algorithm needs three parameters from users.
First, the number of clusters K. Then, cluster initialization and distance metric.
The selection of K is most important and there are some heuristic methods, which
ease our decision to choose the K. Generally, euclidean distance metric is used in
this clustering. For this, the points have to be placed on a plane. The resulting
clusters, which we usually get after this clustering, are round-shaped [26]. One of
the limitations this clustering has is that it is not so efficient in handling the clusters
of different sizes and densities. Also, when the data has outliers, algorithm finds
it difficult to compute the resulting clusters [25]. We limit the discussion about
k-means clustering until here only, as this thesis does not use this type for the
result evaluation. To know more about it’s usage and applications, we refer the
readers to these references [26] [37].
Chapter 2. Analysis of Experiments
2.5
24
Summary
This chapter explains one of the methods used for analyzing the results of card sorting experiments. It describes two types of matrices which are generally used as input
for cluster analysis. They are proximity matrix and pattern matrix. It discusses some
of the distance metrics, namely euclidean distance, manhattan distance and minkowski
distance, which are used to calculate the distance between items. Before going to the
classification of cluster analysis, it shows some of the distinctions with which we can differentiate between the sets of clusters. Cluster analysis can be classified into hierarchical
clustering and k-means partitional clustering. Hierarchical clustering can be further subclassified into agglomerative clustering and divisive clustering. We used agglomerative
hierarchical clustering for the evaluation of results. This chapter also throws light on
the effective interpretion of a dendogram, which is the output of hierarchical clustering.
The second technique (Factor analysis) used for analyzing the results of card sorting
experiments is explained in the following chapter.
Chapter 3
An Insight to Factor Analysis
“The ultimate authority must always rest with the individual’s own reason and
critical analysis.”
DALAI LAMA
Card Sorting is an approach to experiment with users’ perspectives towards categorization of information. Moreover, it also helps us to clear many queries that might arise
during the design phase of a website or any software product [2]. The results of this
experiment can be analyzed with the help of various statistical techniques. One such
technique has been already discussed in the chapter 2, which is cluster analysis. The
other technique known as factor analysis is discussed in this chapter. It is one of those
methods, if applied properly, can give us quite useful results.
3.1
An Introduction to Factor Analysis
Factor analysis is used to uncover the hidden meaning in multivariate data. It can also
be said that it is used to study the hidden nature of a set of variables. Items or cards
in card sorting are referred as variables in factor analysis. This chapter describes the
concepts of factor analysis along with it’s methodology. Since it’s advent, factor analysis
has been applied to the data from a wide range of areas like biological sciences, social
sciences, education, etc [38].
The first thing to note in the area of factor analysis theory is domain. The area of the
data on which we apply factor analysis is known as domain. We can also refer the domain
as the range of research in which one is interested. After that, population of interest is
identified. The entities in which we are interested are known as population. For example,
25
Chapter 3. Study of Factor Analysis
26
if psychology is the domain then people can be considered as the population. Once
population and domain are identified, a researcher chooses the most suitable variables
that have to be measured. Such variables are known as surface attributes. If people
is the population, then one surface attribute could be height that can be observed and
measured. Like this example, one can recognize various surface attributes from any
domain. When dealing with large amount of data, it is possible that one sees some
variations in the surface attributes because attributes in people differ most of the times.
It is also possible that there are correlations among different surface attributes. These
correlations can be high as well as low. As a result, the variations in the surface attributes
(observed variance) and correlations among the surface attributes become tedious to
understand. Factor analysis helps us to understand these variations and correlations in
an easy way [38].
Internal attributes form an important part of factor analysis theory. One can refer
unobserved characteristics of people as internal attributes. This is because there are a
lot of variations among people on the basis of unobserved characteristics. For example,
if mental ability is the domain, then logic ability, analysis ability, reasoning ability, etc.
can be internal attributes. It should be noted that one cannot measure these attributes
directly. These internal attributes are known as factors or latent variables and they
have a linear impact on the surface attributes. For example, the score of a person in a
logical exam (surface attribute) is affected by his/her logic ability (internal attribute)
[38].
Factors are of two types, namely common factor and unique factor. If more than one
of the surface attributes (in the selected set of surface attributes) get affected by a
factor, then that factor is termed as a common factor. On the other hand, a unique
factor is the one that impacts only one of the surface attributes (in the selected set
of surface attributes). It should be noted that a common factor is responsible for the
correlations among surface attributes. A unique factor cannot do that as it impacts only
a single surface attribute. During the observation of a surface attribute, some errors of
measurement originate from unsystematic events. They are termed as additional factors
and they have an impact on the surface attributes as well [38].
Factor analysis was invented in the year 1904 by Charles Spearman. He initially developed factor analysis for the field of psychology, which later reached the area of intelligence research known as g theory. Raymond Cattell continued his research on the
Spearman’s foundation of a two-factor theory and took it to a multi-factor theory for
explaining intelligence [39]. Factor Analysis is often known as a data reduction tool as
it helps in removing redundancy from a large set of correlated variables, so that they
can be represented with a smaller set of variables, known as factors. One thing to note
Chapter 3. Study of Factor Analysis
27
here is the factors, which are generally extracted, are independent to each other. These
factors are known as latent variables. There are many other uses of factor analysis,
which are described as follows [40][41][42]:
• Large number of variables can be reduced to a smaller number of factors and then
they can be used for modelling purposes.
• A subset of variables can be extracted from a large set of variables based on the
correlations with principal component factors and later this subset can be used
according to the requirements.
• Identification of clusters based on the factor loadings as we can assign objects or
items into categories according to their factor scores.
• Identification of network groups based on which subjects can be grouped together.
• Managing different tests after linking them to one factor.
According to Kline(1994), also cited in Pett et al.[43], “With the advent of powerful computers and the dreaded statistical packages which go with them, factor analysis and other
multivariate methods are available to those who have never been trained to understand
them” [44].
Let us discuss a non-technical analogy to make the concept of factor analysis more
clear [42]. Imagine a baby sleeping on a bed with a bed-sheet over her. We can easily
see some bumps and shapes over the bed-sheet. If a baby starts crawling towards one
side, all the bumps will move in the same direction (bumps of arms, head, legs, etc.).
Similarly, factor analysis works with different measures and tests same as bumps and
shapes. We can label all the measures moving together as a factor. The movement of a
factor, same as in the case of a baby here, is regarded as correlation. While conducting
factor analysis, one can check the correctness of the correlations among variables and
then decide whether to include them in the analysis or not [41][42].
3.2
Types of Factor Analysis
3.2.1
Exploratory Factor Analysis
This technique reveals the hidden relations among the large set of variables. The process
of analysis is not based on any prior theory rather the overall goal is to investigate and
identify the relationships between observed variables. Generally, factor loadings are
Chapter 3. Study of Factor Analysis
28
used to understand and group the variables into categories. A loading value shows how
closely a variable is associated with the factor. For instance, if we have 20 variables, then
exploratory factor analysis attempts to reduce them into 3 or 4 (assumption) underlying
factors [45][41][46][42]. It should be noted that for the thesis, this technique was adapted
to evaluate the experiments. We also refer the readers to this reference [38] as it is a
good book on this technique.
3.2.2
Confirmatory Factor Analysis
In this technique, the assumptions about factors and their loadings are made first and
then this hypothesis is checked on the basis of pre-established theory. For example,
a researcher puts forward a fact that there are 3 underlying factors for a group of 15
variables and then this supposition is checked whether it holds or not. One can follow
either of the following two approaches when conducting confirmatory factor analysis:
[41][46][42]:
• Traditional Method
One can perform confirmatory factor analysis traditionally with the help of any
statistical package that supports factor analysis. A researcher inspects the factor
loadings of observed variables, whether they get loaded on factors or not, based on
his model. It helps in having a precise and deep understanding of the measurement
(factor) model. As a result, this method is used more than the SEM approach
whenever the measurement model needs to be studied in detail.
• Structural Equation Modelling Approach
This approach includes the analysis of measurement (factor) models with the help
of structural equation modelling (SEM) packages like AMOS, LISREL, etc. Generally, this approach models casual relationships among factors (latent variables),
however it also helps in investigating the CFA factor models. This can be done by
including covariance among all the pairs of factors. This thesis limits the information elaboration of Confirmatory factor analysis until here only, as we do not use
this approach to evaluate the card sorting experiments.
3.3
Factor Analysis Model
We have already discussed about the observed variables, common factors and unique
factors in the section 3.1. Let Y1 , Y2 , ..., Yn be the observed variables, F1 , F2 , ..., Fm be
Chapter 3. Study of Factor Analysis
29
the common factors and N1 , N2 , ..., Nn be the unique factors, then the variables can be
denoted as linear functions of the factors. The equations are as follows [47]:
Y1 = x11 F1 + x12 F2 + x13 F3 + ... + x1m Fm + x1 N1
(3.1)
Y2 = x21 F1 + x22 F2 + x23 F3 + ... + x2m Fm + x2 N2
(3.2)
..
.
Yn = xn1 F1 + xn2 F2 + xn3 F3 + ... + xnm Fm + xn Nn
(3.3)
In all the above equations, the value of Y is dependent on the coefficient x. That is why
these equations are also known as regression equations. The aim of the factor analysis is
to determine these coefficients so that the observed variables can be linked to a common
factor. These coefficients are referred as loadings as a variable loads onto a factor. To
make it clear, in the above equations, x11 is the loading of variable Y1 on the factor F1 ,
x22 is the loading of variable Y2 on the factor F2 , and so on. If coefficients are correlated
then it means that factors are uncorrelated. In this case, we can calculate the share of
the variance of a variable by summing up the squares of the loadings for that variable.
This is termed as communality. The solution of factor analysis is regarded as good if the
value of the communality for each variable is large [47]. For example: the communality
for the factor Y1 is given by
x11 2 + x12 2 + . . . + x13 2
(3.4)
This communality is the extracted communality and we get this as an output in IBM
SPSS while conducting factor analysis. It can be seen in the next chapter, when we
evaluate the results of our experiments. We have shown in the evaluation of first experiment 4.2.1 that if we add all the squared loadings corresponding to a variable, then we
get the value of extracted communality.
Figure 3.1 shows a two-factor model. Oval-shaped circles represent factors, squares
represent observed variables and arrows represent regression relationships.
Chapter 3. Study of Factor Analysis
30
Figure 3.1: A Two-factor model [48].
3.4
Types of Factoring
There are various techniques with which one can extract factors from the data set. The
results can vary depending upon the selected technique. They are as follows [42][49][50]:
• Principal Component Analysis
It is the most widely used form of factoring to extract factors. It accounts for the
maximum or total variance among the variables. It can also be said that we reduce
data in order to explain the total variance associated with each variable included
in the analysis. The communalities here are assumed to be 1 and as a result there
is no room for error variance. That’s why the diagonal elements of a correlation
matrix are 1.
• Principal Axis Factoring
This technique tries to find least number of factors that account for the common
variance among a set of variables. For a better understanding, the only percentage
of variance that is included in the model is the variance, which is shared between a
variable and other variables included in the analysis. It is also known as principal
factor analysis and common factor analysis. There is a need to estimate the
communalities and hence error variance has to be taken into account.
Chapter 3. Study of Factor Analysis
31
The above two methods are used more often, however there are also other extraction methods available that are not used so often. They are as follows [42][49][50]:
• Image Factoring
It works on the principle of a correlation matrix of the variables that are predicted
from actual variables with the help of multiple regression.
• Maximum Likelihood Factoring
The factors are formed on the basis of a linear combination of variables. The
correlations are checked with every variable’s uniqueness (which is variability minus
communality). This method helps a researcher to change the number of variables
on the basis of chi-square goodness-of-fit test until requirements are met.
• Alpha Factoring
It works on the assumption of random variables, which maximizes the reliability
of factors.
• Unweighted Least Squares (ULS) Factoring
It works on the following terminology. Take the differences between all the observed
and estimated correlation matrices. Square all the differences and then add them
up. The aim of this factoring is to minimize this sum. It should be noted that the
diagonal elements of correlation matrices are not considered here.
• Generalized Least Squares (GLS) Factoring
It works with the slight modification of ULS factoring. The correlations are
weighed inversely depending upon their uniqueness. If a variable is more unique,
then it is of less importance. This method also helps a researcher to change the
number of variables on the basis of chi-square goodness-of-fit test until requirements are met.
It should be noted that we have used the Principal Component Analysis technique to
evaluate the experiments. According to Wilkinson et al. [51], most of the datasets give
similar results with Principal Component Analysis and Principal Axis Factoring.
3.5
Data Modes of Factor Analysis
There are some (not very well-known) data modes in which factor analysis is carried
out. They are as follows [42]:
Chapter 3. Study of Factor Analysis
32
• R-mode factor analysis
It is the most widely used data-mode. In this mode, the rows are cases (participants) and the columns are variables. The scores are denoted by cell entries. The
scores are particularly of the cases on the variables. The factors make up the
clusters of variables on a set of participants.
• Q-mode factor analysis
In this mode, the rows are variables and the columns are cases (participants). The
factors make up the clusters of cases. The scores are denoted by cell entries. Here,
the scores are also of the cases on the variables. This mode is also known as inverse
factor analysis.
The above two modes are used most of the times, however the following modes are
not used very often [42].
• O-mode factor analysis
It is one of the oldest modes we have, in which years are placed in columns and
variables are placed in rows. It is particularly used to gather the historic information of an entity. The factors help to cluster the years together and as a result the
entities can be compared to satisfy certain requirements.
• T-mode factor analysis
In this mode, the years are placed in columns (same as in O-mode) and cases are
placed in rows. Collecting data for a single variable is what this mode aims for.
The factors help to cluster the years together on a particular variable.
• S-mode factor analysis
In this mode, the entities are columns and the years are rows. The factors help to
cluster the entities together based on a variable.
3.6
Factor Analysis Protocol
Although Exploratory factor analysis seems to be a complicated approach, however the
following guide provides a very good platform for the beginners to conduct it. This
helps researchers to start and finish the analysis on a confident note and also in the
clean evaluation of results.
Figure 3.2 shows the different steps involved in the protocol. The explanation of the
steps is as follows [44]:
Chapter 3. Study of Factor Analysis
33
Figure 3.2: Exploratory Factor Analysis Protocol [44].
1. Checking Data
The first step is to check the data on which factor analysis is performed. We check
that whether it makes sense to perform factor analysis on such kind of data or not.
The first thing that comes here is Sample size. The sample size (number of cases)
plays a vital role in factor analysis. Different authors have their own opinions
regarding the sample size. Hair et al. [52] wrote that the sample size should be
atleast 100. Comrey and Lee stated in their work [53] that the sample size of
100 is regarded as poor, 200 is regarded as fair, 300 is regarded as good, 500 as
very good and 1000 is regarded as excellent. Authors like MacCallum, Widaman,
Zhang, and Hong (1999) have said that all these rules are not always right as they
do not highlight the technicality of factor analysis. Their work cited in Henson and
Roberts [54] “They illustrated that when communalities are high (greater than .60)
and each factor is defined by several items, sample sizes can actually be relatively
small.” According to Hogarty et al. [55] “disparate [sample size] recommendations
have not served researchers well.”
The sample size depends on the probability of how many participants one gets for
his or her experiments. Researchers are also guided regarding the N:p ratio, where
N is the number of participants and p is the number of items or variables [55].
In the end, what really matters is to understand the role or meaning of a variable
in factor analysis. According to Hogarty et al. [55] “our results show that there
was not a minimum level of N or N:p ratio to achieve good factor recovery across
Chapter 3. Study of Factor Analysis
34
conditions examined.” In case the readers are interested to know more about small
sample size in exploratory factor analysis, we refer them to this paper [56].
Suppose you want to conduct a survey in an office regarding job satisfaction. In
the survey, there are six questions (six variables) and we want to check whether
these variables are reliable or not. It can also be said like we want to check that
whether all the questions measure the same factor (job satisfaction) or not. It
is possible to check this with the reliability analysis option of the tool IBM
SPSS A. It runs the Cronbach’s alpha test to test the consistency of variables.
The value of Cronbach’s alpha coefficient ranges between 0 and 1. The value of 0.7
is regarded as acceptable, 0.8 as good and 0.9 as excellent. If the value is closer to
1.0, it is regarded as more reliable [57].
If we conduct reliability analysis on say 10 variables, then we check that whether
these 10 variables are closely related or not. It can be defined as a function of
total items (items to test) and average inter-correlation among these items. The
Cronbach’s alpha is computed as follows [58]:
F =
N.c
v + (N − 1).c
(3.5)
where N is the total number of items, c is the average covariance among the items
and v is the average variance [58].
IBM SPSS offers one more check to measure the adequacy of data, which is
known as Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy/Bartlett’s Test of Sphericity. The value of KMO measure ranges between 0
and 1. The value of 0.5 and more is considered to be good. The value of Bartlett’s
Test of Sphericity should be less than 0.05, i.e.(p < 0.05). It tells us about how
strongly variables are related to each other [59]. This test ends up being useful in
conducting partial confirmatory factor analysis [44].
We will now discuss how it is calculated. Let Scor be the sum of squared correlations of all the variables that are included in the analysis. The diagonal values of
correlation matrix (that are 1’s) are not considered here. Let Spcor be the sum of
squared partial correlations of every variable. The KMO measure is given by [60]:
KM O =
Scor
Scor + Spcor
(3.6)
2. How to extract factors
There are many techniques that help us in the extraction of factors. We have
already discussed about them in the section 3.4, however to remind you again,
they are as follows [44]:
Chapter 3. Study of Factor Analysis
35
• Principal components analysis
• Principal axis factoring
• Maximum likelihood
• Generalised least squares
• Unweighted least squares
• Image factoring
• Alpha factoring
Principal component analysis (PCA) and Principal axis factoring (PAF) are the
two techniques that are used more often. While conducting research, Thompson
diagnosed that PCA is used more than PAF to extract factors as it is the default
technique in various statistical programs [61]. In their research, Pett et al. [43]
have also supported PCA by stating that “it helps in establishing preliminary
solutions in Exploratory factor analysis.”
3. Decide the number of factors
One of the techniques in determining the number of factors is Kaiser criterion.
This is the default criteria in IBM SPSS software. It works by retaining only
those factors that have eigenvalues greater than 1. It is already discussed that the
diagonal elements of a correlation matrix are 1’s. A factor having an eigenvalue
over 1 also has the variance larger than a single original variable [62]. All the
factors with an eigenvalue less than 1 are dropped. Factors having eigenvalues less
than 1 means that they contribute very little in explaining the variances among
variables. A factor always captures some information regarding the input data. We
can also say that an eigenvalue represents that particular amount of information
[48][42].
The next criterion is percentage of variance. It retains all those factors that
account for 60-80 percentage of the variance. One more criterion is cumulative
percentage of variance. It does not have any fixed percentage cut-off. The
cut-offs depend on various disciplines in which factor analysis is carried out. Both
of the criterias are shown in the IBM SPSS output [44].
Scree plot also helps in determining the number of factors to retain. It is basically the plotting of the variables along X-axis and eigenvalues along Y-axis. The
eigenvalues drop as we go from left to right. The point where we see the making
of an elbow (the point from where a straight line can be drawn), stop there and
drop all the factors after that [48][42].
Parallel analysis is also a technique, which is not used much in the extraction
of factors because it is not available in the statistical programs like SPSS and
Chapter 3. Study of Factor Analysis
36
SAS. In this technique we get two types of eigenvalues, namely actual eigenvalues
and random order eigenvalues. Both of the values are compared with each other
and if the actual eigenvalue is greater than the random order eigenvalue, that
corresponding factor is retained [48][42].
In the end, researchers can decide or judge on their own that which technique suits
them the best. It is also suggested that one should not rely only on a single technique rather different approaches should be used in deciding the number of factors.
One should always verify the correlation of a factor with it’s dependent variable
before dropping that factor, because sometimes it is possible that a small factor
can have high correlation with a variable [42]. Hair et al. [52] also highlighted that
many researchers use different approaches for finalizing the number of factors.
4. Rotation Techniques
It should be noted that rotation does not change a solution, however it helps to
interpret a solution in an easy way. One can understand and analyze the rotated
output in a better way than the unrotated output. Before we go into the types of
rotation, let us understand the concept of rotation with the help of pictures. In
the Figure 3.3, we can see an orderly arrangement of pillars with a view that is
not so promising. It is also hard to say that how many pillars are there [48][42].
Figure 3.3: View of pillars(unrotated solution) [48].
Now see the Figure 3.4. One can see the same orderly arrangement of the pillars
with a view that is promising and easy to interpret. This is how the rotation
Chapter 3. Study of Factor Analysis
37
works. The overall sum of the eigenvalues is not altered by rotation. It only
changes eigenvalues and percentage of variance and as a result this change is also
reflected in the factor loadings (just to interpret the output easily) [42].
Figure 3.4: View of pillars(rotated solution) [48].
There are two types of rotation: orthogonal rotation and oblique rotation. Varimax,
Equamax and Quartimax are orthogonal rotations, whereas Direct oblimin and
Promax are oblique rotations. All these five rotations are offered in the IBM
SPSS software. The aim of orthogonal rotation is to generate uncorrelated factor
structures, however oblique rotation generates correlated factor structures [63].
Varimax is the mostly used variant of rotation [61]. In the end, it depends on
a researcher that which technique he/she chooses for rotation. The choice also
depends on the type and genre of data. While interpreting the results, even after
rotation has been applied, one can come across three cases. It is possible that a
variable loads on multiple factors, a variable does not load on any of the factors
and a variable might not fit in any of the factor structures [44]. These cases will
be discussed in the next chapter 4, when we evaluate the experiments with factor
analysis.
5. Interpretation
It is important to interpret the results in a correct manner. A researcher has to
see the variables that load pretty well onto a factor. Then, a name is assigned
to that factor. The naming depends on the genre of experiments and it should
Chapter 3. Study of Factor Analysis
38
reflect the theoretical essence of your work. This is how one can categorize the
items into a group. For a meaningful analysis, atleast three variables should load
onto a factor. By default, IBM SPSS gives a component matrix as an unrotated
output, however it also gives a rotated component matrix as an output. One has to
analyze this matrix and decide the resulting factors according to the requirements
[44]. Examples regarding the interpretation will be seen in the next chapter 4,
when we evaluate the experiments with factor analysis.
3.7
Summary
This chapter explains the second method for analyzing the results of card sorting experiments. It explains concepts of factor analysis along with it’s methodology. It describes
the types of factor analysis, which are Exploratory factor analysis and Confirmatory
factor analysis. It discusses the factor analysis model, which is used to give a general
understanding and relationship between variables and factor. Then it highlights the
types of factoring, which helps in the extraction of factors from a data set. After that,
it describes some of the data modes in which factor analysis is carried out. The modes
are not very well known. Towards the end, it describes a five-step protocol that provides
a solid foundation for novices to conduct factor analysis on a confident note. The next
chapter explains the step-by-step process of conducting factor analysis with IBM SPSS.
The four experiments are evaluated and results are shown.
Chapter 4
Evaluation of Card Sorting with
Factor Analysis
“To think or not to think? That is the new question.”
NADINA BOUN
In Chapters 2 and 3, data analyzing methods using cluster analysis and factor analysis
are discussed respectively. Hierarchical clustering is a widely used method among the
clustering techniques and in this paper the experiments are evaluated with agglomerative
hierarchical clustering. On the other hand, exploratory factor analysis is a good and
widely used method, therefore the experiments are evaluated with this technique. For
factor analysis, IBM SPSS software A is used to evaluate the card sorting experiments.
Before we start evaluating the experiments, the step-by-step guidance is provided on
how to prepare the data for input and how to conduct factor analysis with IBM SPSS.
4.1
IBM SPSS Guide
SPSS is a widely used statistical tool for analyzing data. It has an easy user-interface that
provides a good platform for beginners and conveniency for experts. SPSS can be used
to analyze data with many techniques, however this guide only includes the explanation
regarding factor analysis. Data editor in SPSS has two separate views, namely data view
and variable view. Data view looks similar to a Microsoft Excel sheet. It is a list of all
variables or all the concepts that were turned into variables to be measurable. Figure
4.1 shows the data view, when no variables are defined. Both views can be changed by
39
Chapter 4. Evaluation of Card Sorting with Factor Analysis
40
clicking on Data View or Variable View buttons present on the bottom left of the screen
[64] [65]. IBM SPSS Statistics Version 22 is used for the evaluation of experiments.
Figure 4.1: Data View in SPSS with no variables.
4.1.1
Defining a variable
Whenever there is a need to import data into SPSS, one should start making a list of
variables and define them in the variable view. First, we have to assign a name to a
variable in the Name column. For example, if the name of a variable is sex and as soon as
we press (Enter) after typing the name, SPSS fills out the other columns automatically.
After this, we have to assign a type to the variable in the Type column. Most of the
times, numeric values are used that are assigned to a theoretical concept. We also assign
a numeric type. Then comes the Width column. It is about the digits that variables shall
have. The number 8 is default setting. The fourth column is Decimal. It means that
how many digits are displayed after a decimal point in the data view. In our example,
sex can either be 1 (male) or 2 (female). It cannot be any float number such as 1.2.
Therefore, we assign 0 in the decimal column. The fifth column is Label. The name of
a variable should be small so that it can be identified among a list of variables. In the
label column, one can include the explanation about a variable or a note on a specific
content of a variable [64] [65].
The sixth field is the Values column. It allows us to specify what the numeric values
mean or what are the theoretical concepts they belong to. When we click on the values
field, a value labels window pops up. In the Value field, write 1 and in the Label field,
write female and then click Add button. To enter the second value, write 2 in the Value
field and male in the Label field, then click Add button. Once we are done with adding
all the values, click ok. One can see the Value Labels dialog box in the Figure 4.2. In this
manner, variables can be defined according to one’s requirements. After this, we have
Missing column. One should be careful in finding the missing values for every variable
as they are important and at times can cause troubles. Missing values mean that if we
do not have an answer from any participant, one can make an entry so that SPSS knows
Chapter 4. Evaluation of Card Sorting with Factor Analysis
41
that a specific value is missing and that participant or an entry can be excluded from
analysis. A missing value depends on the kind of a data set. We should try to find a
value that is not being used by any other regular variable. For example, if one wants to
specify 9,999 as the missing value, then he or she has to click on the missing field. The
Missing Value window is popped up. Click the Discrete missing values radio button and
enter 9,999 and click ok. See the Figure 4.3 [64] [65].
Figure 4.2: Value Labels Dialog Box.
Once the missing value is added, we can assign a label to that value. It can be done in
the same way as it has been done earlier for male and female. In our example, the entry
is (None) as we do not have any missing value. The next field is of Column. It is a feature
about how the data editor window should look. After this, we have Align column. It
just gives you the options (left, right and center) how one would like to see the values
corresponding to that variable in data view. The second last column is Measure. It has
three options, namely scale, ordinal and nominal. Scale is just SPSS’s way of saying
that it is a metric measurement. In the Figure 4.4, see the pictorial representation of
the three measure options. The last column is Role. It tells us for what that particular
variable is used for. In our example, the role is defined as an input. After the full
definition of the variable, the variable view in SPSS looks like Figure 4.5 [64] [65].
4.1.2
Conduct Factor Analysis
Once all variables are defined in the variable view, one can enter the values corresponding
to those variables in the data view. Data can also be imported into SPSS. It can be done
by clicking on File, Open and then Data. The defined variables are along the columns
in the data view [65]. Before conducting factor analysis, if one wants to run reliability
analysis, then it can be performed by clicking on Analyze, Scale and then Reliability
Chapter 4. Evaluation of Card Sorting with Factor Analysis
42
Figure 4.3: Missing Values Dialog Box.
Figure 4.4: Measure options [64].
Figure 4.5: Variable view after definition of one variable.
Analysis. See the Figure 4.6. After that we get (Reliability Analysis) dialog box, where
we have to select those variables on which we want to carry out the analysis. In this
box, select the variables from the list and transfer them to the Items list. Then, click
OK. We get the output in a separate output window. See the Figure 4.7. This output
is from one of our experiments, in which we had 20 variables and 26 participants. The
value is 0.753, which is acceptable [57].
After performing reliability analysis, it is the time to conduct factor analysis. Start
the analysis by clicking on Analyze, Dimension Reduction and then Factor. See the
procedure in the Figure 4.8. As soon as one clicks on Factor, a Factor Analysis window
Chapter 4. Evaluation of Card Sorting with Factor Analysis
Figure 4.6: Conduct Reliability Analysis.
Figure 4.7: Reliability Analysis Output.
43
Chapter 4. Evaluation of Card Sorting with Factor Analysis
44
pops up. Here, we have to select those variables on which we want to carry out the
analysis. Select the variables from the list and transfer them to the Variables list by
clicking on the right arrow. If any of the variables are problematic due to any reason,
then do not include them in the analysis. See the Figure 4.9. This dialog box gives us
various options. The first one is Descriptives. Click on it and we get Factor Analysis:
Descriptives dialog box as shown in the Figure 4.10. By selecting Univariate descriptives
option, SPSS gives you a descriptive statistics table in output, which shows the mean,
standard deviation and total number of variables (N) included in analysis. If you select
Initial solution option, it gives communalities table, total variance explained table and
component matrix - unrotated solution table [66].
Figure 4.8: Conduct Factor Analysis.
If we select the coefficients option, we get a R-matrix and if significance levels option
is selected, then a matix is generated in which we have the significance value of every
correlation in the R-matrix. One also has an option to get the determinant of the
matrix by selecting determinant check box. It is quite helpful in checking whether
the variables are very highly correlated (multicollinearity) or variables are perfectly
correlated (singularity). The determinant value should be more than 0.00001 and if
it is not, then one has to look through the correlation matrix and find the variables
having values greater 0.8 (R > 0.8). It means that those variables are highly correlated
and one can delete those variables from the analysis. The choice of deleting variables
Chapter 4. Evaluation of Card Sorting with Factor Analysis
45
Figure 4.9: Factor Analysis Dialog Box.
Figure 4.10: Factor Analysis: Descriptives Dialog Box.
is random and depends on one’s requirements. Then comes the KMO and Bartlett’s
test of sphericity option, which is used to check the sample adequacy and the KMO’s
value should be greater than 0.5. The last three options, namely inverse, reproduced and
anti-image, are advanced concepts and are not used in the evaluation of our experiments
[66].
The second option on the Factor Analysis Dialog Box is Extraction. If one clicks on the
extraction button, we get Factor Analysis: Extraction dialog box as shown in the Figure
4.11. Firstly, one can select the technique to extract factors. We have used principal
components method to extract the factors. There are other methods too that can be
selected from the drop down menu and these techniques are discussed in the section 3.4.
The Analyze box gives an option to choose between correlation matrix and covariance
matrix. It means that on which matrix one wants to conduct factor analysis. In the
Display box, there are two options, namely unrotated factor solution and scree plot. The
scree plot is a plot of eigenvalues versus component number (number of variables) that
Chapter 4. Evaluation of Card Sorting with Factor Analysis
46
helps us to determine the number of factors to retain. It is generally recommended to
interpret a rotated solution. An unrotated solution can only be used to check whether
we get some better results after the rotation or not. One can limit the number of factors
to be retained with the Extract box. Either one can assign an eigenvalue or directly
force SPSS to extract particular number of factors. Suppose we assign 1 in eigenvalues
over field, it means that SPSS will retain all factors having an eigenvalue over 1. If we
write 4 in the number of factors field, it means that SPSS will retain only 4 factors.
With maximum iterations for convergence option, one can write the number of steps the
algorithm in SPSS will take to compute a solution [66] [67].
Figure 4.11: Factor Analysis: Extraction Dialog Box.
The third option on the Factor Analysis Dialog Box is Rotation. By clicking on it,
we get Factor Analysis: Rotation dialog box as shown in the Figure 4.12. By default,
we get unrotated solution in SPSS as None check box is selected under the Method
box. When any rotation option is selected, it highlights the rotated solution and loading
plot(s) options under the Display box. It also highlights the maximum iterations for
convergence option with which we can specify the number of steps the algorithm in
SPSS will take to compute a solution. The default value is 25 [66] [67].
The fourth option on the Factor Analysis Dialog Box is Scores. By choosing it, we get
Factor Analysis: Scores dialog box as shown in the Figure 4.13. With the help of this
option, the factor scores of every participant or subject can be saved in the data editor.
A new column is created for every extracted factor and after that these factor scores
(of every participant) are put in that column. Once we have these scores in the data
editor, a researcher can use them for further analysis. The scores also help in filtering
the categories of participants who have scored pretty well on some specific factors. To
use this option, one has to select the Save as variables check box. SPSS provides three
Chapter 4. Evaluation of Card Sorting with Factor Analysis
47
Figure 4.12: Factor Analysis: Rotation Dialog Box.
methods with which one can acquire these scores. They are Regression, Bartlett and
Anderson-Rubin. Andy. P. Field explains more about these methods in [68]. It also gives
an option to display the factor score coefficient matrix in the SPSS output window. The
correlations among factor scores are also displayed in the matrix. For this, a user has
to select the Display factor score coefficient matrix check box [66] [67]. The factor score
option has not been used in the evaluation of our experiments as it was not relevant to
our type of data and results.
Figure 4.13: Factor Analysis: Scores Dialog Box.
The last option on the Factor Analysis Dialog Box is Options. By selecting it, we get
Factor Analysis: Options dialog box as shown in the Figure 4.14. The first option here
is how one allows SPSS to handle missing values. The options are exclude cases listwise,
exclude cases pairwise or replace with mean. This option is not used in our experiments
as we do not have any missing values [67]. Under Coefficient Display Format box, there
are two options, namely sorted by size and suppress small coefficients. (Sorted by size)
option lists the variables by size with respect to their factor loadings. This can be quite
helpful in factor interpretation. It also gives an option to suppress the absolute values
Chapter 4. Evaluation of Card Sorting with Factor Analysis
48
less than the specified limit. Suppose one writes 0.5 in the absolute value below: field, it
means that the factor loading values in the range between -0.5 to +0.5 are not displayed.
This value can be increased or decreased according to one’s data and results. It really
helps to interpret the data in an easy manner [66].
Figure 4.14: Factor Analysis: Options Dialog Box.
This is how one can use IBM SPSS to perform factor analysis. In the next section, we
evaluate our four experiments with factor analysis and provide the results.
4.2
Card Sorting Experiments
This section evaluates four experiments from different domains with the help of factor
analysis. The participants used an online card sorting tool, namely WeCaSo, to perform
the experiments (open and closed). WeCaSo belongs to the Paderborn University. This
tool is easy to use and subjects can perform card sorting experiments by drag and drop
feature, which means that a participant just have to select a card, drag it and drop it
under a suitable category. The experiments were conducted in the following order:
• The genre of the experiments were decided. Then, the number and names of the
cards were finalized. The category names for closed card sorting experiment were
also finalized.
• All the experiments were prepared with WeCaSo.
• Once the participants for the experiments were decided, the online links of the
experiments were distributed among them.
• The participants who performed the experiments were the Master’s students of
Paderborn University and the employees of some multinational companies in India.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
1.Apple
2.Orange
3.Mango
4.Pineapple
5.Cherry
6.Pomegranate
7.Tomato
8.Onion
9.Capsicum
10.Carrot
11.Radish
12.Lemon
13.Cabbage
14.Cauliflower
15.Strawberry
16.Plum
17.Cranberry
18.Spinach
19.Brinjal
20.Brocolli
21.Grapes
22.Banana
23.Avocado
49
24.Himbeere
25.Salt
26.Black Pepper
27.Cloves
28.Cardamom
29.Mint
30.Red Chilli
Table 4.1: List of cards - eatables.
• They were provided with the instructions to carry out the experiments. All the
subjects were familiar with the usage of internet and web tools.
• The participants were given two weeks time to perform the experiments and thereafter, the results were exported and analyzed.
4.2.1
Eatables Website
A website has to be designed for health purposes, which has some names of food items.
These food items have to be categorized under some headings on the website. To determine the potential categories, we conducted a card sorting experiment. This experiment
was an open one, which means the participants had the freedom to decide the categories.
The difficulty level of this open experiment is set to easy, where one can say that the
categories are clear and everybody would be sorting the cards easily. The list of cards
can be seen in the Table 4.1. As discussed earlier, this experiment is conducted with
WeCaSo [12]. For this experiment, we got 26 participants. Once the experiment was
over, the data was exported and studied thoroughly. We noticed that the subjects made
6 categories (not everyone made 6 categories but we had 6 different categories in total)
overall. The values were entered into the data editor in the same manner as we have
discussed for the values in the Figure 4.2. The naming of the cards is taken from various
websites such as [69].
All the cards, treated as variables, are entered in columns and the subjects are entered in
rows of the data editor of SPSS. We also noticed that all the participants have put these
10 items, namely spinach, plum, mango, apple, banana, pineapple, grapes, strawberry,
cherry, and orange, in the same category. For example, all the subjects have put mango
under fruit category and spinach under vegetable category. As a result, we have to
remove these 10 variables from the analysis because it makes no sense to conduct factor
analysis on the items that are always in the same category. Also, SPSS gives an error
regarding there are more than one variable that have no variance and we get no output.
Therefore, this experiment has been conducted with 20 cards (variables).
Chapter 4. Evaluation of Card Sorting with Factor Analysis
50
Interpretation of Results
Before conducting factor analysis, we conducted reliability analysis on this data. The
output has been shown earlier in the Figure 4.7. The Cronbach’s Alpha value is 0.753,
which is acceptable. In the Factor Analysis: Descriptives dialog box, every option
is selected except inverse, reproduced and anti-image. In the Extraction dialog box,
selected method is principal components and the matrix to be analyzed is correlation
matrix. The unrotated factor solution and scree plot are displayed. The factors are
extracted on the basis of eigenvalues over 1. In the (Rotation) dialog box, varimax
method is selected and rotated solution is displayed. If one selects the loading plot
option, then we get a component plot in rotated space in the output window showing
the items plotted in a 3D sample space. Initially, when we ran factor analysis, seven
factors were extracted on the basis of eigenvalues over 1. This solution was not good as
we could not have seven categories (in the end factors correspond to categories) for this
experiment. We can also force SPSS to extract a particular number of factors by selecting
fixed number of factors option in the (Extraction) window. We got the best solution
when SPSS was forced to extract 4 factors. Factor analysis is an exploratory technique
[66], therefore one should try to experiment and find the best solution according to
requirements.
The first output in SPSS is Descriptives Statistics, indicated in the Figure 4.15. It shows
the mean, standard deviation and number of subjects (i.e. N). The variable having the
highest mean is regarded as the most important variable. In our results, cardamom has
the highest mean of 2.96. Generally, this mean is important, when we have questions in
a survey and factor analysis is run to evaluate that survey. In that case, the question
having highest mean can be regarded as an important variable (questions are variables)
[70]. This table is not important for our results. Therefore, it will not be shown in the
next experiments.
The second output in SPSS is Correlation Matrix. It can be seen in the Figure 4.16.
In this matrix, one can see the correlation coefficients between one variable and all the
other variables. The principal diagonal of this matrix is always 1 because the correlation
of a variable with itself is 1. One can also see the correlation values below and above
the principal diagonal, they are similar. One can look through all values and find the
values greater than 0.9. If we see such values, then singularity can be the problem in
the data. The determinant is listed at the bottom of the matrix. In our case it is 0.000,
which is less than 0.00001. It means that multicollinearity can also be a problem in this
data. In these cases, one may or may not exclude such variables (having values greater
than 0.9) from the analysis. This decision is again helpful in the cases of survey, where
we have questionnaires. In our data, the correlation between cloves and pomegranate
Chapter 4. Evaluation of Card Sorting with Factor Analysis
51
Figure 4.15: Descriptives Statistics Output.
is 0.947. We excluded the two variables one by one and checked whether there was
any improvement in the results or not. There was not any improvement and hence we
included them back in the analysis [63] [66] [70] [71].
One can also see that it is listed below the determinant this matrix is not positive
definite, which means that there are some negative eigenvalues. It can be seen in the
Figure 4.18 that the last three eigenvalues are negative. This is the reason that we did
not get KMO and Bartlett’s test output, which measures sampling adequacy. A matrix
is non-positive definite if the variables are linearly dependent to each other. If we have
more variables than cases (participants) then also a matrix will be non-positive definite.
In this example, the gap between variables and subjects is not much (i.e. they differ by
6). The significance values of correlations, inverse of correlation matrix and anti-image
correlation matrix will also not be shown in SPSS output if a matrix is non-positive
definite [72]. These options are in this Figure 4.10.
Figure 4.16: Correlation Matrix Output.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
52
Chapter 4. Evaluation of Card Sorting with Factor Analysis
53
The third output in SPSS is Communalities. It can be seen in the Figure 4.17. The
initial values are before extraction and the other column is for after extraction. As we
selected principal components method for extraction, so it assumes the total variance to
be common, that is why all the initial values are 1. The values in the extraction column
tell us about the common variance in entire data. For example, the extraction value of
cranberry is 0.903, which means that 90.3% of the variance in cranberry is shared with
the remaining variables. One can consider dropping the items that have an extracted
communality less than 0.4. Again, the decision of dropping those variables from analysis
depends on the type of data and the requirements [63] [66] [70] [71].
Figure 4.17: Communalities Output.
The fourth output in SPSS is Total Variance Explained. It can be seen in the Figure
4.18. This table shows the eigenvalues in three different parts. First, eigenvalues before
extraction (Initial Eigenvalues); second, eigenvalues after extraction (Extraction Sums
of Squared Loadings) and last, eigenvalues after rotation (Rotation Sums of Squared
Loadings). The number of factors is equal to the number of variables (i.e. 20) in our
case and it can be seen in the component column. The initial factors always explain
maximum variance and that can be seen in the percent of variance column. For example,
23.67% of the total variance is explained by the first factor. This amount gradually
decreases as we go downwards in the table. It has been discussed earlier that SPSS
extracted 7 factors initially and it can also be seen in the total column. Here the first
seven factors have eigenvalues over 1. It should be noted that if all the eigenvalues
Chapter 4. Evaluation of Card Sorting with Factor Analysis
54
are added up, we get the number of variables. If we look at the percent of variance
column, one can find that the first four factors (having above 10% of variance) account
for most of the variance and also until fourth factor 65.6% of the total variance has been
explained. It can be seen in the cumulative % column. This was one of the reasons why
we decided to extract four factors. The values in the first two parts, namely (Initial
Eigenvalues) and (Extraction Sums of Squared Loadings), are the same. The second
part is after extraction, that is why the values after fourth factor are not visible as four
factors were extracted. The values in the last column change because of the rotation.
Now, 18.44% of the total variance is explained by the first factor as compared to 23.67%
(before rotation) [63] [66] [70] [71].
Figure 4.18: Total Variance Explained Output.
The fifth output in SPSS is Scree Plot. It can be seen in the Figure 4.19. It helps us in
retaining the factors. It can be seen that there are seven factors, which have eigenvalues
over 1. It is discussed before that we have to retain the factors, which are until the
point of an elbow. In our case, one can see three elbows. The 1st one is at the second
factor. The 2nd one is at the fifth factor and the 3rd one is at the eleventh factor. We
disregard the 3rd elbow because it has eigenvalue below 1. We took into consideration
the 2nd one and retained four factors [63] [66] [70] [71].
The sixth output in SPSS is Unrotated Component Matrix. It can be seen in the Figure
4.20. This matrix shows how a variable loads onto a particular factor. This concept
is reflected by a loading value. Unrotated component matrix is not so good for the
interpretation of results. If one looks at the loadings of all the variables, actually none
Chapter 4. Evaluation of Card Sorting with Factor Analysis
55
Figure 4.19: Scree Plot Output.
of the variable loads pretty well onto a single factor. That is why a rotated solution is
preferred over this unrotated one. This output will not be shown in the next experiments
because we will not use it for the interpretation [63] [66] [70] [71].
Figure 4.20: Unrotated Component Matrix Output.
The seventh output in SPSS is Rotated Component Matrix. It can be seen in the Figure
Chapter 4. Evaluation of Card Sorting with Factor Analysis
56
4.21. The loadings marked in yellow colour indicate that corresponding variables load
well onto a single factor. The negative and positive values are treated as same. The
sign just indicates the type of relationship. Negative sign means that it is a negative
relationship whereas, positive sign means that it is a positive relationship [73]. Brocolli, Avocado, Cabbage and Himbeere load well onto the first factor. Cranberry and
Cauliflower load well onto the second factor. Salt, Cardamom, Black pepper, Red Chilli
and Mint load well onto the third factor. Pomegranate, Cloves and Capsicum load well
onto the fourth factor. Now, the loadings marked in light yellow colour with red outlines indicate that corresponding variables are problematic as they load on more than
one factors. Whenever a subject is doubtful about the category of a card, three things
can be done. First, he or she leaves that card and does not put it under any category.
Second, a subject makes a new category and put that card under the new category if
he thinks that already defined categories are not suitable for it (hybrid card sorting).
Third, he or she makes a new category specially for that card. In such cases, cards can
be problematic. Onion and Lemon load on the first, second and fourth factors. Carrot,
Radish and Tomato load on the first and second factors. Egg plant loads on the first
and fourth factor [63] [66] [70] [71].
If we square all the loadings of Salt from the Figure 4.21 and add them, the value will
be same as it’s extracted communality as in the figure 4.17 [47].
−0.0222 + 0.0042 + 0.9292 + 0.1172 = 0.877
(4.1)
Now we have to label the factors. We label the first factor as Vegetables as most of the
variables that load onto this factor are vegetables. Avocado and Himbeere are exceptions.
We label the second factor as Salads as all of them can be eaten as a salad. The third
factor can be labelled as Spices as all of them are used as spices. The fourth factor does
not have a clear categorization of variables as Pomegranate is a fruit, Cloves is a spice
and Capsicum and Egg plant are vegetables. The fourth factor cannot be discarded
also as three variables load pretty well onto it. So we can label this factor as Mix or
Favorites. One should not forget the initial ten variables, which were dropped from the
analysis as the categories for these items were clear. As discussed earlier, SPSS gives an
option to suppress the loadings to interpret the solution easily. If we had suppressed the
values below 0.4, then Onion would not be problematic as it would have loaded onto
the first factor. Lemon and Mint will not get loaded at all. Tomato also will not be
problematic as it would have loaded onto the second factor [63] [66] [70] [71]. Hence, the
final categorization is as follows:
• Vegetables: Brocolli, Avocado, Cabbage, Himbeere, Onion and Spinach
Chapter 4. Evaluation of Card Sorting with Factor Analysis
57
• Salads: Cranberry, Cauliflower and Tomato
• Spices: Salt, Cardamom, Black pepper and Red Chilli
• Mix: Pomegranate, Cloves and Capsicum
• Problematic: Lemon, Carrot, Radish, Mint and Egg plant
• Fruits: Plum, Mango, Apple, Banana, Pineapple, Grapes, Strawberry, Cherry
and Orange
It should be noted that all the variables under Fruits category and spinach under Vegetables category were not included in the analysis.
Figure 4.21: Rotated Component Matrix Output.
The last output in SPSS is Component Plot in Rotated Space. It can be seen in the
Figure 4.22. It just shows all the variables plotted in 3-D sample space. It is not so
important for the interpretation and hence will not be shown in the further experiments.
4.2.2
Entertainment Website
A website has to be designed for entertainment purposes, which has some genre names
of leisure time activities. These items have to be categorized under some headings on the
website. To determine the potential categories, we conducted a card sorting experiment.
This experiment was an open one, which means the participants had the freedom to
Chapter 4. Evaluation of Card Sorting with Factor Analysis
58
Figure 4.22: Component Plot in Rotated Space Output.
1.Rock
2.Pop
3.Classic
4.Blue
5.Country
6.Lounge
7.Hip Hop
8.Action
9.Romance
10.Comedy
11.Thriller
12.Salsa
13.Rumba
14.Tango
15.Flamenco
16.Jazz
17.Reggae
18.Latin
19.Modern Jive
20.Kathak
21.Tap
22.Odissi
23.Bhangra
24.Robot
Table 4.2: List of cards - entertainment genres.
decide the categories. The difficulty level of this open experiment is set to medium,
where one can say that the categories are not that clear and everybody would not be
sorting the cards easily. The list of cards can be seen in the Table 4.2. As discussed
earlier, this experiment has been conducted with WeCaSo [12]. For this experiment,
we got 25 participants. Once the experiment was over, the data was exported and
studied thoroughly. We noticed that the subjects made 7 categories (not everyone made
7 categories but we had 7 different categories in total) overall. Some subjects sorted
the items according to country names, like a genre belongs to a particular country. The
values were entered into the data editor in the same manner as we have discussed for the
values in the Figure 4.2. The naming of the cards has been taken from various websites
such as [74].
All the cards, treated as variables, are entered in columns and the subjects are entered
in rows of the data editor of SPSS.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
59
Interpretation of Results
Before conducting factor analysis, we conducted reliability analysis on this data. The
output is shown in the Figure 4.23. The Cronbach’s Alpha value is 0.982, which is
regarded as very good. In the Factor Analysis: Descriptives dialog box, every option
is selected except inverse, reproduced and anti-image. In the Extraction dialog box,
selected method is principal components and the matrix to be analyzed is correlation
matrix. The scree plot is also displayed. The factors are extracted on the basis of
eigenvalues over 1. In the Rotation dialog box, varimax method is selected and rotated
solution is displayed. When we ran factor analysis, three factors were extracted on the
basis of eigenvalues over 1. In the Options dialog box, sorted by size option is selected
and we suppressed the coefficients’ values below 0.6. This was the best solution.
Figure 4.23: Reliability Analysis Output.
The first output in SPSS is Correlation Matrix. It can be seen in the Figure 4.24. In this
matrix, one can see the correlation coefficients between one variable and all the other
variables. The principal diagonal of this matrix is always 1 because the correlation of
a variable with itself is 1. One can also see the correlation values below and above the
principal diagonal, they are similar. In this matrix, we have many values greater than
0.9, which means that singularity can be the problem in the data. The determinant is
listed at the bottom of the matrix. In our case it is 0.000, which is less than 0.00001. It
means that multicollinearity can also be a problem in this data. In these cases, one may
or may not exclude such variables (having values greater than 0.9) from the analysis.
This decision is helpful in the cases of survey, where we have questionnaires. We did
not exclude any of the variables from the analysis because of two reasons. First, this
Chapter 4. Evaluation of Card Sorting with Factor Analysis
60
data was not for a survey and our variables were not questions. Second, if we delete all
such variables then we would have left with just few items and it does not make sense
to conduct factor analysis on few variables [63] [66] [70] [71].
One can also see that it is listed below the determinant this matrix is not positive
definite, which means that there are some negative eigenvalues. It can be seen in the
Figure 4.26 that the last six eigenvalues are negative. This is the reason that we did
not get KMO and Bartlett’s test output, which measures sampling adequacy. A matrix
is non-positive definite if the variables are linearly dependent to each other. If we have
more variables than cases (participants) then also a matrix will be non-positive definite.
In this example, both of them are almost equal. The significance values of correlations,
inverse of correlation matrix, and anti-image correlation matrix will also not be shown
in SPSS output if a matrix is non-positive definite [72]. These options are in Figure
4.10.
The second output in SPSS is Communalities. It can be seen in the Figure 4.25. The
initial values are before extraction and the other column is for after extraction. As
we selected principal components method for extraction, it assumes the total variance
to be common, which is why all the initial values are 1. The values in the extraction
column tells us about the common variance in entire data. For example: the extraction
value of blue is 0.954, which means that 95.4% of the variance in blue is shared with
the remaining variables. One can consider dropping the items that have an extracted
communality less than 0.4, however we did not have any such variables. Again, the
decision of dropping those variables depends on the type of data and one’s requirements
[63] [66] [70] [71].
The third output in SPSS is Total Variance Explained. It can be seen in the Figure
4.26. This table shows the eigenvalues in three different parts. First, eigenvalues before
extraction (Initial Eigenvalues); second, eigenvalues after extraction (Extraction Sums
of Squared Loadings), and third, eigenvalues after rotation (Rotation Sums of Squared
Loadings). The number of factors is equal to the number of variables i.e. 24 in this case
and it can be seen in the component column. The initial factors always explain maximum
variance and that can be seen in the percent of variance column. For example: 73.28%
of the total variance is explained by the first factor. This amount gradually decreases as
we go downwards in the table. In this example, the first three factors have eigenvalues
over 1. It should be noted that if all the eigenvalues are added up, we get the number
of variables. If we look at the percent of variance column, one can find that the first
three factors account for most of the variance and also until third factor 88.5% of the
total variance has been explained. It can be seen in the cumulative % column. The
values in the first two parts, namely Initial Eigenvalues and Extraction Sums of Squared
Chapter 4. Evaluation of Card Sorting with Factor Analysis
61
Loadings, are the same. The second part is after extraction, that is why the values after
third factor are not visible as three factors were extracted. The values in the last column
change because of the rotation. Now, 50.45% of the total variance is explained by the
first factor as compared to 73.28% (before rotation) [63] [66] [70] [71].
Figure 4.24: Correlation Matrix Output.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
62
Chapter 4. Evaluation of Card Sorting with Factor Analysis
63
Figure 4.25: Communalities Output.
The fourth output in SPSS is Scree Plot, as shown in the Figure 4.27. It helps us in
retaining the factors. It can be seen that there are three factors, which have eigenvalues
over 1. It has been discussed before that we have to retain the factors, which are until
the point of an elbow. In this case, one can see two elbows. The 1st one is at the second
factor. The 2nd one is at the fourth factor. We disregard the 1st elbow because the
third factor still has an eigenvalue over 1. We took into consideration the 2nd one and
retained three factors. SPSS also retained three factors [63] [66] [70] [71].
The last output we have is Rotated Component Matrix. It can be seen in the Figure
4.28. This time we suppressed the values below 0.6 in the component matrix so that
it becomes easier for us to interpret the solution. We have already seen in the (Total
Variance Explained) output that 73.28% of the total variance is explained by the first
factor. That is the reason why we see most of the variables loaded pretty well onto the
first factor. Country, Pop, Tango, Flamenco, Salsa, Jazz, Rock, Rumba, Reggae, Blue,
Classic, Latin, Hip Hop, Lounge and Robot load well onto the first factor. Thriller,
Comedy , Action and Romance load well onto the second factor. Bhangra, Kathak and
Tap load well onto the last i.e. third factor. Modern Jive and Odissi are problematic as
Chapter 4. Evaluation of Card Sorting with Factor Analysis
64
they load on more than one factors. Modern Jive loads on the first and third factors.
Odissi does not load at all on any of the factors [63] [66] [70] [71].
Now we have to label the factors. We label the first factor as Music with Western Dance
as most of the variables that load onto this factor are music genres along with some
typical western dance types. We label the second factor as Movies as all of them are
movie genres. The third factor can be labelled as Dances to Learn as all of them are
dance styles [63] [66] [70] [71]. Therefore, the final categorization is as follows:
• Music with Western Dance: Country, Pop, Tango, Flamenco, Salsa, Jazz,
Rock, Rumba, Reggae, Blue, Classic, Latin, Hip Hop, Lounge and Robot
• Movies: Thriller, Comedy, Action and Romance
• Dances to Learn: Bhangra, Kathak and Tap
• Problematic: Modern Jive and Odissi
Figure 4.26: Total Variance Explained Output.
4.2.3
Automobile Website
A website has to be designed for an automobile company. We have some items that
need to be categorized under some headings on the website. To determine the potential
Chapter 4. Evaluation of Card Sorting with Factor Analysis
Figure 4.27: Scree Plot Output.
Figure 4.28: Rotated Component Matrix Output.
65
Chapter 4. Evaluation of Card Sorting with Factor Analysis
1.Register Now
2.Configure
3.View Brochure
4.All Models
5.Service
6.Genuine Accessories
7.Extended Warranty
8.Insurance
9.Service Plan
10.Road Assist
11.Technology
12.Multimedia Experience
13.Social Web
14.Dealership
15.Used Cars
16.Corporate Sales
17.History
18.News
19.Virtual Factory Tour
20.Book Online
66
21.Test Drive
22.Environmental Protection
23.Production Plants
24.Contact Us
25.Finance
26.Dealer Locator
27.Glossary
28.Help
29.Privacy Policy
30.Terms and Conditions
31.Price Calculator
32.Technical Data
Table 4.3: List of cards - Automobiles Website.
categories, we conducted a card sorting experiment. This experiment was an open one,
which means the participants had the freedom to decide the categories. The difficulty
level of this open experiment is set to hard, where one can say that the categories are
not clear and everybody would be sorting the cards with difficulty. The list of cards
can be seen in the Table 4.3. As discussed earlier, this experiment has been conducted
with WeCaSo [12]. For this experiment, we got 23 participants. Once the experiment
was over, the data was exported and studied thoroughly. We noticed that the subjects
made 7 categories (not everyone made 7 categories but we had 7 different categories in
total) overall. The values were entered into the data editor in the same manner as we
have discussed for the values in the Figure 4.2. The naming of the cards has been taken
from various websites such as [75][76].
All the cards, treated as variables, are entered in columns and the subjects are entered
in rows of the data editor of SPSS.
Interpretation of Results
Before conducting factor analysis, we conducted reliability analysis on this data. The
output is shown in the Figure 4.29. The Cronbach’s Alpha value is 0.775, which is
regarded as acceptable. In the Factor Analysis: Descriptives dialog box, every option
is selected except inverse, reproduced and anti-image. In the Extraction dialog box,
selected method is principal components and the matrix to be analyzed is correlation
matrix. The scree plot is also displayed. The factors are extracted on the basis of
eigenvalues over 1. In the Rotation dialog box, varimax method is selected and rotated
solution is displayed. Initially when we ran factor analysis, seven factors were extracted
on the basis of eigenvalues over 1. This solution was not good as we could not have
seven categories (in the end factors correspond to categories) for this experiment. We
can also force SPSS to extract a particular number of factors by selecting fixed number
Chapter 4. Evaluation of Card Sorting with Factor Analysis
67
of factors option in the Extraction window. We got the best solution when SPSS was
forced to extract 4 factors. In the Options dialog box, sorted by size option is selected
and we suppressed the coefficients’ values below 0.4.
Figure 4.29: Reliability Analysis Output.
The first output in SPSS is Correlation Matrix. The correlation matrix is big and could
not be covered in one page. So, we divided it into two parts. They can be seen in the
Figures 4.30 and 4.31 respectively. In this matrix, one can see the correlation coefficients
between one variable and all the other variables. The principal diagonal of this matrix is
always 1 because the correlation of a variable with itself is always 1. One can also see the
correlation values below and above the principal diagonal are similar. In this matrix, we
have one correlation value of 0.988 between Terms and Conditions and Privacy Policy.
The determinant is listed at the bottom of the matrix. In this example, it is 0.000,
which is less than 0.00001. It means that multicollinearity can also be a problem in this
data. The decision of excluding any variables (having value above 0.9) is helpful in the
cases of survey, where we have questionnaires. We excluded the two variables one by
one and checked whether there was any improvement in the results or not. There was
not any improvement and hence we included them back in the analysis [63] [66] [70] [71].
Additionally as listed below, this matrix is not positive definite, which means that there
are some negative eigenvalues. It is clear from the Figure 4.33 that the last six eigenvalues
are negative. This is the reason that we did not get KMO and Bartlett’s test output,
which measures sampling adequacy. A matrix is non-positive definite if the variables are
linearly dependent to each other. If we have more variables than cases (participants)
then also a matrix will be non-positive definite. In this example, the variables are more
than the subjects. The significance values of correlations, inverse of correlation matrix,
Chapter 4. Evaluation of Card Sorting with Factor Analysis
68
and anti-image correlation matrix will also not be shown in SPSS output if a matrix is
non-positive definite [72]. These options are in Figure 4.10.
The next output in SPSS is Communalities. It can be seen in the Figure 4.32. The
initial values are before extraction and the other column is for after extraction. As
we selected principal components method for extraction, it assumes the total variance
to be common, that is why all the initial values are 1. The values in the extraction
column tells us about the common variance in entire data. For example, the extraction
value of service plan is 0.609, which means that 60.9% of the variance in service plan
is shared with the remaining variables. One can consider dropping the items that have
an extracted communality less than 0.4. Again, the decision of dropping those variables
depends on the type of data and one’s requirements [63] [66] [70] [71].
The third output in SPSS is Total Variance Explained, as shown in the Figure 4.33. This
table shows the eigenvalues in three different parts. First, eigenvalues before extraction
(Initial Eigenvalues); second, eigenvalues after extraction (Extraction Sums of Squared
Loadings) and last, eigenvalues after rotation (Rotation Sums of Squared Loadings).
The number of factors is equal to the number of variables i.e. 32 in this case and it can
be seen in the component column. The initial factors always explain maximum variance
and that can be seen in the percent of variance column. For example: 26.80% of the
total variance is explained by the first factor. This amount gradually decreases as we
go downwards in the table. It has been discussed earlier that SPSS extracted 7 factors
initially and it can also be seen in the total column. Here, the first seven factors have
eigenvalues over 1. It should be noted that if all the eigenvalues are added up, we get
the number of variables. If we look at the percent of variance column, one can find that
the first four factors (having above 10% of variance, 4th one nearing to 10%) account
for most of the variance and also until fourth factor 68.37% of the total variance has
been explained. It can be seen in the cumulative % column. This was one of the reasons
why we decided to extract four factors. The values in the first two parts, namely (Initial
Eigenvalues) and (Extraction Sums of Squared Loadings), are the same. The second
part is after extraction, that is why the values after fourth factor are not visible as four
factors were extracted. The values in the last column change because of the rotation.
Now, 20.32% of the total variance is explained by the first factor as compared to 26.80%
(before rotation) [63] [66] [70] [71].
69
Figure 4.30: 1st Half of Correlation Matrix Output.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
70
Figure 4.31: 2nd Half of Correlation Matrix Output.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
Chapter 4. Evaluation of Card Sorting with Factor Analysis
71
Figure 4.32: Communalities Output.
The next output in SPSS is Scree Plot, as shown in the Figure 4.34. It helps in retaining
the factors. It can be seen that there are seven factors, which have eigenvalues over 1.
It is discussed before that we have to retain the factors, which are until the point of
an elbow. In this case, one can see somehow four elbows. The 1st one is at the second
factor. The 2nd one is at the fourth factor. The 3rd one is at the sixth factor and the
last one is at eighth factor. SPSS took into consideration the last one and retained 7
factors (eigenvalues above 1). We took into consideration the 2nd one and retained four
factors [63] [66] [70] [71].
The last output we have is Rotated Component Matrix, as shown in the Figure 4.35. This
time we suppressed the values below 0.4 in the component matrix so that it becomes
easier for us to interpret the solution. The negative and positive values are treated
as same. News, Production Plants, Social Web, Contact Us, Register now, Dealership
and Dealer locator load well onto the first factor. Configure, Technology, Technical
Data, Price Calculator, Genuine Accessories, Extended Warranty and All Models load
well onto the second factor. Glossary, Terms and Conditions, Privacy Policy, Help,
Virtual Factory Tour, History and Book Online load well onto the third factor. Test
Drive, Environmental Protection and Finance load well onto the last i.e. fourth factor.
Corporate Sales, Multimedia Experience, Insurance, Service, Used Cars, View Brochure,
Chapter 4. Evaluation of Card Sorting with Factor Analysis
Figure 4.33: Total Variance Explained Output.
Figure 4.34: Scree Plot Output.
72
Chapter 4. Evaluation of Card Sorting with Factor Analysis
73
Service Plan and Road Assist are problematic as they load on more than one factors
[63] [66] [70] [71].
Figure 4.35: Rotated Component Matrix Output - 4 Extracted Factors.
We have also shown the Rotated Component Matrix output with 7 extracted factors, so
that one can compare both the solutions and judge that this solution is not better than
the one with 4 extracted factors. It can be seen in the Figure 4.36 that only two variables
are loaded onto the 4th, 5th, 6th and 7th factors (two variables on every factor). For a
factor to be reliable, there should be atleast three variables loading onto a factor. This
is the reason we discarded this solution [63] [66] [70] [71].
Now we have to label the factors. We label the first factor as About Us or Company
because most of the variables that load onto this factor provide information regarding
the automobile company. We label the second factor as About Cars or All You Want
to Know because the variables provide necessary information about cars. The third
factor can be labelled as Experience because we have the variables that are semantically
different. The last factor can be labelled as Services because the variables tell about the
services that a company provides [63] [66] [70] [71]. Therefore, the final categorization
is as follows:
Chapter 4. Evaluation of Card Sorting with Factor Analysis
74
Figure 4.36: Rotated Component Matrix Output - 7 Extracted Factors.
• About Us/Company: News, Production Plants, Social Web, Contact Us, Register Now, Dealership and Dealer Locator
• About Cars/All You Want to Know: Configure, Technology, Technical Data,
Price Calculator, Genuine Accessories, Extended Warranty and All Models
• Experience: Glossary, Terms and Conditions, Privacy Policy, Help, Virtual Factory Tour, History and Book Online
• Services: Test Drive, Environmental Protection and Finance
• Problematic: Corporate Sales, Multimedia Experience, Insurance, Service, Used
Cars, View Brochure, Service Plan and Road Assist
4.2.4
Health Website
A website has to be designed for health and fitness purposes. We have some items
that need to be categorized under the decided headings on the website. To determine
the suitable categories, we conducted a card sorting experiment. This experiment was
a closed one, which means the participants had already the pre-defined (by experts)
Chapter 4. Evaluation of Card Sorting with Factor Analysis
1.Weight Loss
2.Skin Care
3.Hair Care
4.Go Green
5.Nutrition
6.Clean Living
7.Workout Regime
8.Different Workouts
9.Financial Tips
10.Massage
11.Stress Therapies
12.Planning Your Meals
13.Recipes
14.Laughter Therapy
15.Disease Risks
16.Yoga
17.Cardio
18.Running
75
19.Food Facts
20.Tips From Celebrities
21.Family Planning
22.Mind, Body and Soul
23.Videos
24.Health News
25.Maintaining Relationships
26.Contact Us
27.Subscribe Newsletter
28.Become An Elite Member
Table 4.4: List of cards - Health Website.
categories and they could not add any new categories on their own. The six categories
are: Health, Happiness, Food, Diet, Connect and Fitness. The difficulty level of
this open experiment is set to medium, where one can say that the categories are not
that clear and everybody would not be sorting the cards easily. The list of cards can
be seen in the Table 4.4. As discussed earlier, this experiment has been conducted with
WeCaSo [12]. For this experiment, we got 24 participants. Once the experiment was
over, the data was exported and studied thoroughly. The values were entered into the
data editor in the same manner as we have discussed for the values in the Figure 4.2.
The naming of the cards has been taken from various websites such as [77][78].
All the cards, treated as variables, are entered in columns and the subjects are entered
in rows of the data editor of SPSS.
Interpretation of Results
Before conducting factor analysis, we conducted reliability analysis on this data. The
output is shown in the Figure 4.37. The Cronbach’s Alpha value is 0.120, which is not
acceptable. This value does not matter to us because our data is not for a survey and
the variables are not questions [57]. In the Factor Analysis: Descriptives dialog box,
every option is selected except inverse, reproduced and anti-image. In the Extraction
dialog box, selected method is principal components and the matrix to be analyzed
is correlation matrix. The scree plot is also displayed. The factors are extracted on
the basis of eigenvalues over 1. In the Rotation dialog box, varimax method is selected
and rotated solution is displayed. Initially when we ran factor analysis, nine factors were
extracted on the basis of eigenvalues over 1. This solution was not good as the categories
were predefined (in the end factors correspond to categories) for this experiment. We
can also force SPSS to extract a particular number of factors by selecting fixed number
of factors option in the Extraction window. We got the best solution when SPSS was
forced to extract 6 factors. In the Options dialog box, sorted by size option is selected
and we suppressed the coefficients’ values below 0.5.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
76
Figure 4.37: Reliability Analysis Output.
The first output in SPSS is Correlation Matrix. The correlation matrix is big and could
not be covered in one page. So, we divided it into two parts. They can be seen in the
Figures 4.38 and 4.39 respectively. In this matrix, one can see the correlation coefficients
between one variable and all the other variables. The principal diagonal of this matrix
is always 1 because the correlation of a variable with itself is always 1. One can also
see the correlation values below and above the principal diagonal, they are similar.
In this matrix, we have one correlation value of 0.912 between Subscribe Newsletter
and Workout Regime. The determinant is listed at the bottom of the matrix. In this
example, it is 0.000, which is less than 0.00001. It means that multicollinearity can also
be a problem in this data. The decision of excluding any variables (having value above
0.9) is helpful in the cases of survey, where we have questionnaires. We excluded the two
variables one by one and checked whether there was any improvement in the results or
not. There was not any improvement and hence we included them back in the analysis
[63] [66] [70] [71].
Additionally as listed below, this matrix is not positive definite, which means that there
are some negative eigenvalues. From the Figure 4.41, it is clear that the last two eigenvalues are negative. This is the reason that we did not get KMO and Bartlett’s test output,
which measures sampling adequacy. A matrix is non-positive definite if the variables are
linearly dependent to each other. If we have more variables than cases (participants)
then also a matrix will be non-positive definite. In this example, the variables are more
than the subjects. The significance values of correlations, inverse of correlation matrix,
and anti-image correlation matrix will also not be shown in SPSS output if a matrix is
non-positive definite [72]. These options are in Figure 4.10.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
77
The next output in SPSS is Communalities, see Figure 4.40. The initial values are
before extraction and the other column is for after extraction. As we selected principal
components method for extraction, so it assumes the total variance to be common, that
is why all the initial values are 1. The values in the extraction column tells us about
the common variance in entire data. For example, the extraction value of Go Green is
0.712, which means that 71.2% of the variance in go green is shared with the remaining
variables. One can consider dropping the items that have an extracted communality less
than 0.4. Again, the decision of dropping those variables depends on the type of data
and one’s requirements [63] [66] [70] [71].
The third output in SPSS is Total Variance Explained, as shown in the Figure 4.41. This
table shows the eigenvalues in three different parts. First, eigenvalues before extraction
(Initial Eigenvalues); second, eigenvalues after extraction (Extraction Sums of Squared
Loadings), and third, eigenvalues after rotation (Rotation Sums of Squared Loadings).
The number of factors is equal to the number of variables i.e. 28 in this case and it can
be seen in the component column. The initial factors always explain maximum variance
and that can be seen in the percent of variance column. For example: 15.11% of the
total variance is explained by the first factor. This amount gradually decreases as we
go downwards in the table. It has been discussed earlier that SPSS extracted 9 factors
initially and it can also be seen in the total column. Here, the first nine factors have
eigenvalues over 1. It should be noted that if all the eigenvalues are added up, we get
the number of variables. If we look at the percent of variance column, one can find
that the first six factors (having above 10% of variance, 5th one nearing to 10% and
6th one 7.5%) account for most of the variance and also until sixth factor 67.51% of
the total variance has been explained. This can be seen in the cumulative % column.
The six categories are also predefined for this experiment. These were the reasons why
we decided to extract six factors. The values in the first two parts, namely (Initial
Eigenvalues) and (Extraction Sums of Squared Loadings), are the same. The second
part is after extraction, that is why the values after sixth factor are not visible as six
factors were extracted. The values in the last column change because of the rotation.
Now, 12.36% of the total variance is explained by the first factor as compared to 15.11%
(before rotation) [63] [66] [70] [71].
78
Figure 4.38: 1st Half of Correlation Matrix Output.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
79
Figure 4.39: 2nd Half of Correlation Matrix Output.
Chapter 4. Evaluation of Card Sorting with Factor Analysis
Chapter 4. Evaluation of Card Sorting with Factor Analysis
80
Figure 4.40: Communalities Output.
The next output in SPSS is Scree Plot, as shown in the Figure 4.42. It helps in retaining
the factors. It can be seen that there are nine factors, which have eigenvalues over 1.
It is discussed before that we have to retain the factors, which are until the point of
an elbow. In this case, one can see three elbows. The 1st one is at the third factor.
The 2nd one is at the sixth factor. The 3rd one is at the eighth factor. SPSS took into
consideration the last one and retained 9 factors (eigenvalues above 1) because the ninth
factor has an eigenvalue of over 1. We took into consideration the 2nd one and retained
six factors. The straight line starts quite late (factors have eigenvalues below 1) [63] [66]
[70] [71].
The last output we have is Rotated Component Matrix, as shown in the Figure 4.43. This
time we suppressed the values below 0.5 in the component matrix so that it becomes
easier for us to interpret the solution. The negative and positive values are treated
as same. Running, Become An Elite Member and Hair Care load well onto the first
factor. Subscribe Newsletter, Contact Us and Workout Regime load well onto the second
factor. Videos, Maintaining Relation, Cardio and Mind, Body and Soul load well onto
the third factor. Family Planning, Skin Care, Clean Living and Weight Loss load well
onto the fourth factor. Recipes, Disease Risks and Health News load well onto the fifth
factor. Laughter Therapy, Planning Your Meals and Go Green load well onto the last
(i.e. sixth factor). Tips From Celebrities, Financial Tips, Yoga, Nutrition, Massage,
Chapter 4. Evaluation of Card Sorting with Factor Analysis
Figure 4.41: Total Variance Explained Output.
Figure 4.42: Scree Plot Output.
81
Chapter 4. Evaluation of Card Sorting with Factor Analysis
82
Different Workouts, Stress Therapies and Food Facts are problematic because either
they load on more than one factors or they do not load at all [63] [66] [70] [71].
Figure 4.43: Rotated Component Matrix Output - 4 Extracted Factors.
Now we have to label the factors. We label the first factor as Fitness. We label the
second factor as Connect. The third factor can be labelled as Happiness. The fourth
factor can be labelled as Health. We label the fifth factor as Food and the last factor can
be labelled as Diet [63] [66] [70] [71]. Therefore, the final categorization is as follows:
• Health: Family Planning, Skin Care, Clean Living and Weight Loss
• Happiness: Videos, Maintaining Relation, Cardio and Mind, Body and Soul
• Food: Recipes, Disease Risks and Health News
• Diet: Laughter Therapy, Planning Your Meals and Go Green
• Connect: Subscribe Newsletter, Contact Us and Workout Regime
• Fitness: Running, Become An Elite Member and Hair Care
• Problematic: Tips From Celebrities, Financial Tips, Yoga, Nutrition, Massage,
Different Workouts, Sress Therapies and Food Facts
Chapter 4. Evaluation of Card Sorting with Factor Analysis
83
One can see that some of the variables do not fit that well into the categories. If majority
(2 out of 3) of the variables are semantically same, then an appropriate label has been
assigned to them.
4.3
Summary
This chapter explains the step-by-step procedure on how to prepare the data before one
can conduct factor analysis. After the data preparation, it describes how to perform
factor analysis. We conducted four card sorting experiments, out of those three were
open and one was closed. All the experiments have been evaluated with factor analysis.
Some examples are also shown, which are not the best results. In few cases, there were
some cards, which were not much semantically related to clusters. It was also possible to
identify problematic cards even from easy experiments. Despite of less subjects, factor
analysis provided good and satisfying results. In the next chapter, the results from
factor analysis will be compared with the results of cluster analysis. Towards the end,
it gives explanation to deal with problematic cards and suggestions for a researcher to
conduct card sorting.
Chapter 5
Comparison of the results
“Not everything that can be counted counts, and not everything that counts can
be counted.”
ALBERT EINSTEIN
We talked about cluster analysis, in Chapter 2, as one of the methods for analyzing card
sorting experiments. In Chapter 3, we discussed about second such technique i.e. factor
analysis. In Chapter 4, four experiments (3 open and 1 closed) have been evaluated
with the help of exploratory factor analysis 3.2.1. This chapter analyzes and evaluates
the same experiments with the help of agglomerative hierarchical clustering 2.4.2. After
that, the results from both the techniques are compared. Moreover, the scenarios having
problematic cards and the ways to tackle them are also discussed.
5.1
Comparison of Results
This section compares the results of agglomerative hierarchical clustering with the results
of exploratory factor analysis. The explanation of the experiments has already been done
in the section 4.2. The comparison of the four experiments is as follows:
5.1.1
Eatables
The same experiment has been considered here as in the section 4.2.1. This time it
is evaluated with the Orange tool rather than SPSS. It has also been discussed in the
section 2.4.2 that merging of the clusters depends on the distance between them and
84
Chapter 5. Cluster Analysis vs Factor Analysis
85
this distance can be calculated with three methods. They are single linkage, groupaverage linkage and complete linkage. We will analyze this experiment with all the
three methods and then proceed with the best one for the other experiments. Figure 5.1
shows the single linkage clustering for 30 items. We can see that the third cluster (as
seen from top to bottom) is an elongated one and tomato is problematic as it merges late
into the third cluster. Lemon is also problematic as it merges really late (not highlighted
because of the cut-off line) into the second cluster. In some cases, this technique leads
to the formation of diverse clusters and as a result, it becomes tedious to analyze such
clusters [30]. One should also see that whether all the items in a cluster are semantically
alike or not. Hence, the resulting clusters are as follows:
• Cluster 1: Lemon
• Cluster 2: Spinach, Cauliflower, Cabbage, Egg Plant, Onion, Brocolli, Carrot,
Radish and Capsicum
• Cluster 3: Plum, Mango, Cranberry, Apple, Banana, Pineapple, Grapes, Strawberry, Cherry, Orange, Himbeere, Pomegranate, Avocado and Tomato
• Cluster 4: Cardamom, Salt, Black Pepper, Cloves, Red Chilli and Mint
Figure 5.1: Single Linkage Hierarchical Clustering for 30 cards.
In the Figures 5.2 and 5.3, one can see the average group linkage and complete linkage
clustering for 30 items respectively. It has already been discussed in the section 2.4.3
Chapter 5. Cluster Analysis vs Factor Analysis
86
that one can move the cutoff line in order to see different resulting clusters depending
upon one’s requirements. Although we get almost the same results with average group
linkage and complete linkage, however we selected complete linkage as the preferred
one because it does not include outliers [20]. It can be seen in the figure 5.3 that the
participants did not have any difficulty in sorting the cards Plum, Mango, Cranberry,
Apple, Banana, Pineapple, Grapes, Strawberry and Cherry because they all start from
one cluster and their categories were clear to them. So the final resulting clusters are as
follows:
• Cluster 1: Lemon
• Cluster 2: Spinach, Cauliflower, Cabbage, Egg Plant, Onion, Brocolli, Carrot,
Radish and Capsicum
• Cluster 3: Cardamom, Salt, Black Pepper, Cloves, Red Chilli and Mint
• Cluster 4: Plum, Mango, Cranberry, Apple, Banana, Pineapple, Grapes, Strawberry, Cherry, Orange, Himbeere and Pomegranate
• Cluster 5: Avocado
• Cluster 6: Tomato
We have already discussed once about problematic cards in the previous chapter, however
we will again discuss so that the readers can recall it. Whenever a subject is doubtful
about the category of a card, three things can be done. First, he or she leaves that card
and does not put it under any category. Second, a subject makes a new category and
put that card under the new category if he thinks that already defined categories are not
suitable for it (hybrid card sorting). Third, he or she makes a new category specially
for that card. In such cases, cards can be problematic.
The items in the clusters 2, 3 and 4 (as seen from top to bottom) are semantically
related to each other and the clusters can be given the names as Vegetables, Spices and
Fruits respectively. Now, if we compare this result with the result of factor analysis in
the Figure 4.21, we have the conclusion as follows:
1. In case of factor analysis, there are five problematic cards, namely Lemon, Carrot, Radish, Mint and Egg Plant. In case of cluster analysis, there are three
cards, namely Lemon, Avocado and Tomato.
2. It was possible to get more potential categories with factor analysis than cluster
analysis.
Chapter 5. Cluster Analysis vs Factor Analysis
87
3. Some of the cards like Carrot, Egg Plant, etc. that are problematic in factor
analysis are a part of the potential clusters in cluster analysis.
4. It was easier to interpret meaningful results with factor analysis rather than cluster
analysis.
Figure 5.2: Group Average Linkage Hierarchical Clustering for 30 cards.
5.1.2
Entertainment
The same experiment has been considered here as in the section 4.2.2. This time it is
evaluated with the Orange tool rather than SPSS. We will analyze this experiment with
complete linkage agglomerative hierarchical clustering. Figure 5.4 shows the complete
linkage clustering for 24 items. We can see that the first, third and fifth clusters (as
seen from top to bottom) are good resulting clusters as the items in these clusters are
semantically related to each other. Latin is problematic as it merges late into the first
cluster. Hip Hop is also problematic as it merges late into the third cluster. Hence, the
resulting clusters are as follows:
• Cluster 1: Robot, Tap, Modern Jive, Flamenco, Salsa, Tango, Rumba, Kathak,
Bhangra and Odissi
Chapter 5. Cluster Analysis vs Factor Analysis
88
Figure 5.3: Complete Linkage Hierarchical Clustering for 30 cards.
• Cluster 2: Latin
• Cluster 3: Jazz, Classic, Rock, Reggae, Blue, Lounge, Country and Pop
• Cluster 4: Hip Hop
• Cluster 5: Action, Thriller, Romance and Comedy
The items in the clusters 1, 3 and 5 (as seen from top to bottom) are semantically
related to each other and the clusters can be given the names as Dance, Music and
Movies respectively. Now, if we compare this result with the result of factor analysis in
the Figure 4.28, we have the conclusion as follows:
1. In case of factor analysis, there are two problematic cards, namely Modern Jive
and Odissi. In case of cluster analysis, there are also two cards, namely Latin
and Hip Hop.
2. It was possible to get more diverse categories with factor analysis than cluster
analysis.
3. We got different problematic cards from both the techniques, however the genre
of the cards is similar. All of them belong to Dance category.
Chapter 5. Cluster Analysis vs Factor Analysis
89
Figure 5.4: Complete Linkage Hierarchical Clustering for 24 cards.
4. It was easier to interpret meaningful results with factor analysis rather than cluster
analysis.
5.1.3
Automobile
The same experiment has been considered here as in the section 4.2.3. This time it is
evaluated with the Orange tool rather than SPSS. As discussed before, this experiment is
analyzed with complete linkage agglomerative hierarchical clustering. Figure 5.5 shows
the complete linkage clustering for 32 items. We can see that the first, third, fourth
and sixth clusters (as seen from top to bottom) are good resulting clusters as the items
in these clusters are semantically related to each other. Dealership and Dealer Locator
are somehow semantically related to 3rd cluster, however they are problematic as they
merge late into the third cluster. Book Online is also problematic as it merges late into
the fourth cluster. Hence, the resulting clusters are as follows:
• Cluster 1: Virtual Factory Tour, Privacy Policy, Terms and Conditions, Glossary,
Production Plants, History and Corporate Sales
• Cluster 2: Dealership and Dealer Locator
Chapter 5. Cluster Analysis vs Factor Analysis
90
• Cluster 3: Contact Us, Register Now, Social Web, News and Help
• Cluster 4: Road Assist, Service Plan, Service, Insurance, Extended Warranty
and Finance
• Cluster 5: Book Online
• Cluster 6: All Models, Used Cars, Multimedia Experience, Configure, Technical
Data, Technology, Genuine Accessories, Price Calculator, View Brochure, Environmental Protection and Test Drive
Figure 5.5: Complete Linkage Hierarchical Clustering for 32 cards.
The items in the clusters 1, 3, 4 and 6 (as seen from top to bottom) are semantically
related to each other. The first cluster can be given the name as About Us. The third
cluster can be given the name as Member Area. The fourth cluster can be labelled as
Elite Services and the last i.e. sixth cluster can be labelled as About Cars. Now, if we
compare this result with the result of factor analysis in the Figure 4.35, we have the
conclusion as follows:
Chapter 5. Cluster Analysis vs Factor Analysis
91
1. In case of factor analysis, there are eight problematic cards, namely Corporate
Sales, Multimedia Experience, Insurance, Service, Used Cars, View Brochure, Service plan and Road Assist. In case of cluster analysis, there are only three
problematic cards, namely Dealership, Dealer locator and Book Online.
2. As the difficulty level of this experiment was hard, we noticed that it was possible
to label the categories with factor analysis more diversely than cluster analysis.
3. We actually got more problematic cards in factor analysis than cluster analysis.
The amount of problematic cards assured that the experiment was difficult and it
was hard for the subjects to sort the 32 cards.
4. It was easier to interpret meaningful results with factor analysis rather than cluster
analysis.
5.1.4
Health
The same experiment has been considered here as in the section 4.2.4. This time it
is evaluated with the Orange tool rather than SPSS. As discussed before, this experiment is analyzed with complete linkage agglomerative hierarchical clustering. Figure
5.6 shows the complete linkage clustering for 28 items. We can see that the first, third,
fourth, sixth, eighth and tenth clusters (as seen from top to bottom) are good resulting
clusters as the items in these clusters are semantically related to each other. Tips From
Celebrities, Massage, Planning Your Meals and Go Green are problematic as they merge
late into the first, fourth and eighth clusters respectively.
This is a close experiment and the categories are already fixed. The six categories are:
Health, Happiness, Food, Diet, Connect and Fitness. Hence, the resulting clusters
and their labellings are as follows:
• Cluster 1/Connect: Subscribe Newsletter, Contact Us, Become An Elite Member, Videos and Health News
• Cluster 2: Tips From Celebrities
• Cluster 3/Health: Hair Care, Skin Care, Disease Risks, Clean Living and Weight
Loss
• Cluster 4: Laughter Therapy and Stress Therapies
• Cluster 5: Massage
Chapter 5. Cluster Analysis vs Factor Analysis
92
• Cluster 6/Happiness: Maintaining Relationships, Financial Tips, Mind Body
and Soul and Family Planning
• Cluster 7: Planning Your Meals
• Cluster 8/Food: Recipes, Food Facts and Nutrition
• Cluster 9: Go Green
• Cluster 10/Fitness: Different Workouts, Workout Regime, Running, Cardio and
Yoga
The clusters 2, 5, 7 and 9 are problematic. Now, we are left with one category i.e. Diet
and we could not assign this name to the cluster 4 because Laughter Therapy and Stress
Therapies are not semantically related to the category Diet. So, cluster 4 also becomes
problematic.
Figure 5.6: Complete Linkage Hierarchical Clustering for 28 cards.
Now, if we compare this result with the result of factor analysis in the Figure 4.43, we
have the conclusion as follows:
1. In case of factor analysis, there are eight problematic cards, namely Tips From
Celebrities, Financial Tips, Yoga, Nutrition, Massage, Different Workouts, Stress
Chapter 5. Cluster Analysis vs Factor Analysis
93
Therapies and Food Facts. In case of cluster analysis, there are six problematic
cards, namely Tips From Celebrities, Laughter Therapy, Stress Therapies, Massage, Planning Your Meals and Go Green.
2. As the difficulty level of this experiment was medium, we noticed that it was
possible to label the categories with factor analysis more diversely than cluster
analysis.
3. We actually got more problematic cards in factor analysis than cluster analysis.
The amount of problematic cards assured that the experiment was not that easy
and it was a little difficult for the subjects to sort the 28 cards.
4. It was easier to interpret meaningful results with factor analysis rather than cluster
analysis.
5. We can see that with cluster analysis, the category Diet is left over and it cannot
be assigned to any of the resulting clusters. On the other hand, we were able to
label a cluster with Diet in factor analysis.
It should be noted that we got different problematic cards with factor analysis and with
cluster analysis. Exploratory factor analysis is an iterative technique, in the sense that
one repeats it again and again (by removing some variables or by increasing the number
of factors to be extracted, etc.) to find the best solution according to the requirements.
Therefore, everytime factor analysis is conducted, it gives different results and hence
different problematic cards. As we tried to find an optimum solution, that is why we
got different problematic cards with factor analysis [73]. The number of clusters and
problematic cards can also be increased or decreased with the movement of the cutoff line in cluster analysis. If this line is moved more towards left, then some cards
(that are problematic) will merge into the clusters and the resulting clusters may not be
considered as good [20]. Therefore, in both the cases, results can vary according to the
requirements.
In the next two sections, we will discuss how one can deal with the scenarios of problematic cards and suggest some recommendations for a researcher carrying out card sorting
experiments.
5.2
How to tackle Problematic Cards
As we have already discussed that whenever a subject is doubtful about the category of
a card, he or she acts in these possible ways. First, he or she leaves that card and does
Chapter 5. Cluster Analysis vs Factor Analysis
94
not put it under any category. Second, a subject makes a new category and put that
card under the new category (for that card only) if he thinks that predefined categories
are not suitable for it (hybrid card sorting 1.3). Third, he or she makes a new category
specially for that card (open card sorting 1.3). Now, it is the task of a researcher to
act in an appropriate way to deal with such problematic cards. One cannot leave such
cards and move ahead in a design process (of a website) because initially those cards
were a part of the analysis and analysts have included them due to the requirements.
Therefore, a researcher can follow the following ways to tackle problematic cards:
1. A new category can be made by an analyst for the problematic cards (keeping
semantic meaning of the cards in mind) or they can be assigned to one of the
defined categories by participants. This act is possible only in open card sorting
and hybrid card sorting. It is not possible to make a new category in closed card
sorting because the potential categories are already predefined by a researcher. For
example, if we consider the result of first (open) experiment with factor analysis
4.2.1, there are five problematic cards, namely Lemon, Carrot, Radish, Mint and
Egg Plant. If the cards are sorted by keeping the perception of participants in
mind, then Lemon, Carrot, Radish and Egg Plant can be assigned to the Vegetables
category and Mint can be assigned to the Spices category. Lemon, Carrot and
Radish can also be placed under the Salads category, depending upon certain
requirements of a website.
2. A problematic card can be placed under two categories if it is a requirement of
a website. Again, the card has to be semantically relevant to those two clusters
[79]. For example, in the first (open) experiment 4.2.1, Lemon, Carrot and Radish
can be assigned to the Vegetables as well as to the Salads category. Now, if we
consider the fourth (closed) experiment 4.2.4, the problematic cards like Nutrition
and Food Facts can be placed under the Food category and also under the Diet
category. Similarly, the problematic cards like Yoga and Different Workouts can
be placed under the Health category and also under the Fitness category.
3. An analyst can conduct the experiment again according to the deadlines of a
project (if time and other resources allow them to conduct an experiment). The
second time, an experiment can be conducted either with the same number of
cards or with just the problematic cards. If an analyst decides, the participants
can be changed while conducting the experiment. Also, an open card sorting
experiment can be changed either into a closed one or a hybrid one depending
upon the requirements and satisfaction of the analyst.
4. In case the resources (time, money, etc.) of a project are scarce, then one can
follow this approach rather than the third one. While conducting the experiment
Chapter 5. Cluster Analysis vs Factor Analysis
95
second time, the cards can be sorted by some data experts or analysts. As they
have an experience and a general idea of how a website looks like, the overall
usability of a final product improves considerably.
The labels of categories, in all the above mentioned approaches, can be decided by
experts by keeping in mind the perception of participants and end users.
5.3
General recommendations for researchers
For this thesis, four experiments were conducted out of which 3 were open and 1 was
closed. While conducting and evaluating these experiments, we observed some points
regarding the sorting behaviour of participants. This section discusses some recommendations based on those observations, which can be followed by an analyst for evaluating
card sorting experiments. They are as follows:
1. Finalize the technique(s) with which one wants to analyze card sorting experiments. Then decide the genre of the experiments.
2. Decide the number and the type of target participants. For example, if a company’s
intranet website has to be improved, then it makes sense to include the employees
of that company as participants because the final product will be used by them
only.
3. The analysts should name the cards by keeping in mind the type of target subjects,
who will perform the experiments. The naming of the cards should be clear and
easy to understand. If not, the subjects will sort the cards randomly and results
will not be good.
4. If analysts have time, they should sit together with participants and observe them
how they are sorting the cards. In the end, it can be asked from the subjects that
what was their impression about the experiments, was it easy for them to sort the
cards or not.
5. By observing the subjects, one can realize whether they completed the experiments
quickly or they took time in thinking and rearranging the cards. This observation
helps an analyst a lot in deciding the final categories of a website.
6. For factor analysis, the csv output files were studied thoroughly to decide the
potential categories. A general trend was noticed among the participants. After
sorting one card (say A), most of the participants sorted the next card (say B
Chapter 5. Cluster Analysis vs Factor Analysis
96
which is in a queue) in the category same as of the previous card. The card B was
not that semantically relevant to the category.
7. In order to avoid this type of behaviour from subjects, one should not include
many cards in an experiment. The recommended number is between 30 to 40. In
case there are more cards than 40, then we recommend to conduct another round
of card sorting experiment for those cards.
8. We also recommend that the labelling of the cards must be easy to understand. If
the name is complex then some tools like Orange provides the facility to include a
line of explanation about that name. This way, if a name is difficult to understand
for a subject, he or she can refer the explanantion.
9. Before finalizing a webite’s structure, it should be tested among the target users.
Their feedback helps in improving the overall usability of the website.
5.4
Summary
This section compares the results of card sorting experiments with the help of two
techniques, namely hierarchical cluster analysis and factor analysis. In HCA, one can
analyze the results with dendograms, whereas in factor analysis, the results are analyzed
with the help of component matrix. Problematic cards are easier to interpret in factor
analysis as compared to HCA. If there are hundreds of cards in the analysis, then it
becomes tedious to analyze the huge dendograms (in case of HCA). On the other hand,
it is not true in the case of factor analysis as we have to see on which factor a card
loads well (from component matrix). In HCA, if an item is assigned once to a cluster,
it is not considered again for the clustering. On the contrary, in factor analysis, an
item may get loaded on another factor after the rotation. The loading of an item also
variates if the number of extracted factors are changed. According to the type of data,
which we have in the experiments, the appropriate type of HCA is agglomerative and
the appropriate type of factor analysis is exploratory. Towards the end, this section gives
some suggestions about how one can deal with problematic cards. Additionally, it gives
some recommendations for a researcher to carry out card sorting experiments.
Chapter 6
Conclusion
—————————————————————————————-
“I have come to the conclusion that the most important element in human life is
faith”
ROSE KENNEDY
The main purpose of this thesis “Evaluation of Card Sorting Experiments Using Factor
Analysis” is to recommend a user to use factor analysis as an evaluation technique
for card sorting experiments. While conducting card sorting experiments, it becomes
quite important for a researcher to observe the sorting behaviour of participants. This
observation and their feedback plays a vital role in improving the overall design of a
website. It makes sense to include the target users as the participants in card sorting
experiments as they are the users of a final product. In this way, the final product is
designed according to the needs of target users and they will feel motivated to use the
product. The online card sorting tools provide a good, reliable and easy platform for
the researchers to carry out experiments.
The results of card sorting experiments (open and closed) are evaluated with the help
of two techniques, namely agglomerative hierarchical clustering and exploratory factor
analysis. While evaluating the experiments, the main focus was on problematic cards.
The emphasis was on whether factor analysis helps us in problematic situations or not.
The open card sorting experiments were more problematic to evaluate as compared to
the closed one. Factor analysis helped us to find the better and diverse categories than
cluster analysis. We got more information about the relationships of cards with factor
analysis than cluster analysis. Factor analysis is also the technique to choose, in case
there are hundreds of cards in an experiment.
97
Chapter 6. Conclusion
98
About the role or meaning of a variable, it can be said that we had more than one
variable which were always similarly answered. It means that we had two cards that
were placed very similarly into the categories. With the help of factor analysis, it can
be said that there are two variables that behave practically the same and as a result we
can combine them. For example, in the first experiment 4.2.1, mango and orange are
always placed together (in the same category), then after combining them we have one
variable less. This proves the fact that factor analysis helps in the reduction of variables.
According to the results, there are more problematic cards in factor analysis than cluster
analysis. Therefore, one can say that factor analysis reflects a user’s sorting behaviour
in a better way than cluster analysis.
Towards the end, it can be said that factor analysis is the better technique than cluster
analysis if one wants to analyze the relationships among cards deeply and effectively.
Also, it gives different outputs with the help of which one can evaluate the card sorting
experiments in a comprehensible way. It is an exploratory technique [66], so one should
try to explore different scenarios while evaluating the card sorting results and choose
the best results according to the requirements.
6.1
Future Work
Some future works are recommended in this section, which can help users in improving
their card sorting experiences. They are as follows:
• IBM SPSS is one of the most powerful analytical tools available today. As it is
discussed in the Chapter 4, in the section 4.2.1, that some of the outputs were not
shown due to less number of cases (participants) than variables, this tool can be
further enhanced so that all outputs can be shown in case of less cases.
• More and more online card sorting tools should be made available free of cost, so
that people (users and researchers) can try and test card sorting experiments and
analyze the results.
• This thesis can be taken as a starting point and a tool can be implemented, so
that it becomes easier for people to conduct factor analysis than to use any other
analytic tool.
Appendix A
Analysis Tools
A.1
Orange
Orange is an open-source data mining tool that provides a good platform to analyze data
for both, the novices and the experts. Orange has the concept of widgets (also known
as building blocks), which provides a convenient way to analyze data according to one’s
requirements. A user can build his or her own workflow (group of widgets) and perform
the data mining. Orange provides numerous widgets for data input, to name a few, File,
Data Table, Select Attributes, Select Data, etc. In the same way it provides a number of
widgets for visualizing the data, classifying the data, regression, evaluating the data and
associating the data. It also has some unsupervised options like Distance File, Distance
Map, MDS, K-Means Clustering, Principal Component Analysis, etc. Overall, it gives
user a good visual environment and an easy user interface to work with [80].
A.2
IBM SPSS
IBM SPSS (Statistical Package for the Social Sciences) Statistics is a powerful tool for
analyzing data. It takes the data input (in many types) and produces results in the form
of charts, tabular reports, distributed plots, etc. This tool was originally made by SPSS
Inc. in 1968, however IBM acquired SPSS in the year 2009. It is used in various fields
like health sciences, social sciences, etc. Data management and data documentation
are also some of the elite features of this software. SPSS also gives an option to add a
new feature with the help of 4GL (fourth-generation programming language) command
syntax language. This feature is helpful in the programming of complex applications.
It’s graphical user-interface has two views, namely Data View and Variable View. The
output file from SPSS has .spv extension and it can be exported to many formats like
99
Appendix A. Analysis Tools
100
PDF, excel, text, etc. Overall, it provides a very good platform and easy usability for
novices as well as experts [81].
Bibliography
[1] Donna Spencer. Todd Warfel. Card sorting: A definitive guide. 2004. URL http:
//boxesandarrows.com/card-sorting-a-definitive-guide/.
[2] Stefano Bussolon. Card sorting, category validity, and contextual navigation. Journal of Information Architecture, pages 5–30, 2009.
[3] Peter McGeorge. Gordon Rugg. The sorting techniques: a tutorial paper on card
sorts, picture sorts and item sorts. Expert Systems, pages 80–93, 1997.
[4] Kayla Knight.
Usability testing with card sorting.
Useful Information for
Web Developers and Designers, April 2011. URL http://sixrevisions.com/
usabilityaccessibility/card-sorting/.
[5] James Robertson. Information design using card sorting. Step Two DESIGNS, 19th
February 2001. URL http://www.steptwo.com.au/papers/cardsorting/index.
html.
[6] Card sorting - introduction.
Syntagm - Design for Usability., Accessed on
22.06.2014. . URL http://www.syntagm.co.uk/design/cardsortintro.shtml.
[7] Adam Dimmick.
Your guide to card sorting and how to use it.
foot., 24th August. 2011.
Bunny-
URL http://www.bunnyfoot.com/blog/2011/08/
your-guide-to-card-sorting-and-how-to-use-it/.
[8] Card sorting.
usabiliTEST., Accessed on 22.06.2014. .
URL http://www.
usabilitest.com/CardSorting.
[9] Accessed on 24.06.2014 . URL http://www.optimalworkshop.com/optimalsort.
htm.
[10] Websort. Accessed on 24.06.2014 . URL http://uxpunk.com/websort/.
[11] Cardzort. Accessed on 24.06.2014 . URL http://www.cardzort.com/cardzort/
index.htm.
[12] Wecaso. Accessed on 24.06.2014 . URL http://wecaso.de/.
101
Bibliography
102
[13] Online sorting. Syntagm - Design for Usability., Accessed on 24.06.2014. . URL
http://www.syntagm.co.uk/design/cardsortonline.shtml.
[14] Sam Ng. Card sorting : Mistakes made and lessons learned. UX matters., (Accessed
on 24.06.2014.), 10th September 2007.
URL http://www.uxmatters.com/mt/
archives/2007/09/card-sorting-mistakes-made-and-lessons-learned.php.
[15] Andreas Buja, Deborah F Swayne, Michael L Littman, Nathaniel Dean, Heike Hofmann, and Lisha Chen. Data visualization with multidimensional scaling. Journal
of Computational and Graphical Statistics, 17(2):444–472, 2008.
[16] Brian S Everitt, Sabine Landau, Morven Leese, and Daniel Stahl. Hierarchical
clustering. Cluster Analysis, 5th Edition, pages 71–110, 2001.
[17] Cluster analysis. (Accessed on 30.06.2014.). URL http://en.wikipedia.org/
wiki/Cluster_analysis.
[18] (Accessed on 27.06.2014.).
URL http://www.intechopen.com/source/html/
38548/media/image4.jpeg.
[19] Orange. (Accessed on 30.06.2014.). URL http://orange.biolab.si/.
[20] Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction
to cluster analysis. 344, 2009.
[21] (Accessed on 13.07.2014.).
URL http://mathworld.wolfram.com/Distance.
html.
[22] (Accessed on 15.07.2014.).
URL http://www.analytictech.com/borgatti/
proximit.htm.
[23] (Accessed on 22.07.2014.). URL http://www.statistics.com/glossary&term_
id=512.
[24] (Accessed on 22.07.2014.). URL http://www.statistics.com/index.php?page=
glossary&term_id=355.
[25] Mirek Riedewald.
Data mining techniques:
on 03.08.2014.), April 10 2013.
Cluster analysis.
(Accessed
URL http://www.ccs.neu.edu/home/mirek/
classes/2012-S-CS6220/Slides/Lecture4-Clustering.pdf.
[26] Anil K Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters,
31(8):651–666, 2010.
[27] (Accessed on 03.08.2014.). URL http://www.statistics.com/glossary&term_
id=773.
Bibliography
103
[28] Hierarchical clustering. (Accessed on 06.08.2014.). URL http://en.wikipedia.
org/wiki/Hierarchical_clustering.
[29] Richard C Dubes. Anil K Jain. Algorithms for clustering data. Prentice Hall, Inc.,
1988.
[30] Single linkage clustering. (Accessed on 04.08.2014.). URL http://en.wikipedia.
org/wiki/Single_linkage.
[31] Complete linkage clustering.
(Accessed on 04.08.2014.).
URL http://en.
wikipedia.org/wiki/Complete-linkage_clustering.
[32] Upmga. (Accessed on 04.08.2014.). URL http://en.wikipedia.org/wiki/UPGMA.
[33] Pierre Legendre and Louis Legendre. Numerical ecology. Volume 20 Elsevier, 2012.
[34] Dr N Rajalingam and K Ranjini. Hierarchical clustering algorithm-a comparative
study. International Journal of Computer Applications, 19(3), 2011.
[35] Prof. Olga Veksler. Pattern recognition. (Accessed on 06.08.2014.). URL http:
//www.csd.uwo.ca/~olga/Courses/CS434a_541a/Lecture16.pdf.
[36] Dendogram. (Accessed on 09.08.2014.). URL http://en.wikipedia.org/wiki/
Dendrogram.
[37] James M Keller, Michael R Gray, and James A Givens. A fuzzy k-nearest neighbor
algorithm. Systems, Man and Cybernetics, IEEE Transactions on, (4):580–585,
1985.
[38] Robert C. MacCallum Ledyard R Tucker. Exploratory factor analysis. (Accessed
on 19.08.2014.), 1997. URL http://www.unc.edu/~rcm/book/factor.pdf.
[39] Factor analysis. (Accessed on 15.08.2014.). URL http://en.wikipedia.org/wiki/
Factor_analysis.
[40] Elizabeth Garrett-Mayer. Statistics in psychosocial research, lecture 8, factor analysis 1. Johns Hopkins University, (Accessed on 15.08.2014.), 2006. URL http://ocw.
jhsph.edu/courses/statisticspsychosocialresearch/pdfs/lecture8.pdf.
[41] Jae-On Kim and Charles W Mueller. Introduction to factor analysis: What it is
and how to do it. Number 13. Sage, 1978.
[42] David Garson. Factor analysis. NC State University, (Accessed on 31.05.2014).
URL http://www2.chass.ncsu.edu/garson/pa765/factor.htm.
Bibliography
104
[43] Sullivan JJ. Pett MA, Lackey NR. Making sense of factor analysis: The use of
factor analysis for instrument development in health care research. California:
Sage Publications Inc., 2003.
[44] Brett Williams, Ted Brown, and Andrys Onsman. Exploratory factor analysis: A
five-step guide for novices. Australasian Journal of Paramedicine, 8(3):1, 2012.
[45] Exploratory factor analysis.
(Accessed on 15.08.2014.).
URL http://en.
wikipedia.org/wiki/Exploratory_factor_analysis.
[46] Jae-On Kim and Charles W Mueller. Factor analysis: Statistical methods and
practical issues, volume 14. Sage, 1978.
[47] Alan Taylor. A brief introduction to factor analysis. (Accessed on 15.06.2014.),
2004.
[48] Chapter 6 factor analysis. (Accessed on 19.08.2014.). URL http://www.sagepub.
com/upm-data/41164_6.pdf.
[49] A. Field. Discovering statistics using spss for windows. London-Thousand Oaks-New
Delhi, 2000.
[50] T. Rietveld and R. Van Hout. Statistical techniques for the study of language and
language behaviour. Berlin-New York:Mouton de Gruyter, 1993.
[51] G. Blank Wilkinson, L. and C. Gruber. Desktop data analysis with systat. Upper
Saddle River NJ:Prentice Hall, 1996.
[52] J Hair, RE Anderson, RL Tatham, and WC. Black. Multivariate data analysis.
New Jersey:Prentice-Hall Inc., 4th ed., 1995.
[53] AL. Comrey. A first course in factor analysis. New York:Academic Press Inc., 1973.
[54] Robin K Henson and J Kyle Roberts. Use of exploratory factor analysis in published
research common errors and some comment on improved practice. Educational and
Psychological measurement, 66(3):393–416, 2006.
[55] Kristine Y Hogarty, Constance V Hines, Jeffrey D Kromrey, John M Ferron, and
Karen R Mumford. The quality of factor solutions in exploratory factor analysis:
The influence of sample size, communality, and overdetermination. Educational and
Psychological Measurement, 65(2):202–226, 2005.
[56] JCF de Winter, D Dodou, and PA Wieringa. Exploratory factor analysis with small
sample sizes. Multivariate Behavioral Research, 44(2):147–181, 2009.
Bibliography
105
[57] Indiana University. Cronbach’s alpha in spss. (Accessed on 31.08.2014). URL
https://kb.iu.edu/d/bctl.
[58] Spss faq what does cronbach’s alpha mean?
(Accessed on 23.09.2014.). URL
http://www.ats.ucla.edu/stat/spss/faq/alpha.html.
[59] The multivariate social scientist: Introductory statistics using generalized linear models. (Accessed on 23.09.2014.). URL http://evolutionarymedia.com/
cgi-bin/wiki.cgi?StatisticalMethods.
[60] Graeme Hutcheson and Nick Sofroniou.
The multivariate social scientist: In-
troductory statistics using generalized linear models. Thousand Oaks, (Accessed
on 23.09.2014.), 1999. URL http://faculty.chass.ncsu.edu/garson/PA765/
hutcheson.htm.
[61] B. Thompson. Exploratory and confirmatory factor analysis: understanding concepts and applications.
Washington, DC: American Psychological Association,
2004.
[62] idre ucla. (Accessed on 23.09.2014.). URL http://www.ats.ucla.edu/stat/sas/
library/factor_ut.htm.
[63] AB Costello and JW Osborne. Best practices in exploratory factor analysis: four
recommendations for getting the most from your analysis. pract assess res eval 2005;
10. pareonline. net/getvn. asp, 10:7, 2011. URL http://pareonline.net/getvn.
asp?v=10&n=7.
[64] Ibm spss statistics 20 brief guide. (Accessed on 08.09.2014.). URL https://www.
csun.edu/sites/default/files/statistics20-briefguide-64bit.pdf.
[65] Ibm spss statistics 22 core system user’s guide. (Accessed on 09.09.2014.). URL
http://www.sussex.ac.uk/its/pdfs/SPSS_Statistics_Core_System_User_
Guide_22.pdf.
[66] Dr. Andy Field. Factor analysis using spss. (Accessed on 09.09.2014.), 2005. URL
http://www.statisticshell.com/docs/factor.pdf.
[67] Chapter 7 - factor analysis - spss. (Accessed on 09.09.2014.). URL http://www.
cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdf.
[68] Andy.P. Field. Discovering statistics using spss (2nd edition). SAGE Publications,
London, 2005.
[69] Category:fruit vegetables. (Accessed on 09.09.2014.). URL http://en.wikipedia.
org/wiki/Category:Fruit_vegetables.
Bibliography
106
[70] Lecture 11: Factor analysis using spss. (Accessed on 09.09.2014.). URL http:
//staff.neu.edu.tr/~ngunsel/files/Lecture%2011.pdf.
[71] James M Conway and Allen I Huffcutt. A review and evaluation of exploratory factor analysis practices in organizational research. Organizational research methods,
6(2):147–168, 2003.
[72] IBM. Factor does not print kmo or bartlett test for nonpositive definite matrices.
(Accessed on 09.09.2014.). URL http://www-01.ibm.com/support/docview.wss?
uid=swg21476768.
[73] Frequency distributions. principalcomponentanalysis.ppt sw388r7 data analysis and
computers 2, university of texas. (Accessed on 30.9.2014.).
[74] Entertainment. (Accessed on 09.09.2014.). URL http://en.wikipedia.org/wiki/
Entertainment.
[75] Audi india. (Accessed on 09.09.2014.). URL http://www.audi.in/sea/brand/in.
html.
[76] Bmw india. (Accessed on 09.09.2014.). URL http://www.bmw.in/in/en/.
[77] Shape.
(Accessed on 09.09.2014.).
URL http://www.shape.com/lifestyle/
mind-and-body/best-health-and-fitness-sites-women.
[78] Women’s health and fitness magazine. (Accessed on 09.09.2014.). URL http:
//www.womenshealthandfitness.com.au/.
[79] Christian M Richard, James A Kleiss, and Alvah C Bittner. Comparison of cluster analysis and structural analysis methods for identifying user mental models:
An integrated in-vehicle telematics systems illustration. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 48, pages 2471–2475.
SAGE Publications, 2004.
[80] Orange documentation. (Accessed on 30.9.2014.). URL http://orange.biolab.
si/docs/latest/widgets/rst/.
[81] Spss. (Accessed on 30.9.2014.). URL http://en.wikipedia.org/wiki/SPSS.