Borsci et al Proof

Assessing User Satisfaction in the Era of User Experience: Comparison of the SUS, UMUX
and UMUX-LITE as a Function of Product Experience
Simone Borsci1*, Stefano Federici2, Silvia Bacci3, Francesco Bartolucci3,Michela Gnaldi4
1
National Institute for Health Research, Diagnostic Evidence Cooperative, Imperial College of
London, United Kingdom
2
Department of Philosophy, Social & Human Sciences and Education, University of Perugia, Italy
3
4
Department of Economics, University of Perugia, Italy
Department of Political Sciences, University of Perugia, Italy
*Corresponding author: [email protected]
Abstract
Nowadays, practitioners extensively apply quick and reliable scales of user satisfaction as part of
their user experience (UX) analyses to obtain well-founded measures of user satisfaction within time
and budget constraints. However, in the human-computer interaction (HCI) literature the relationship
between the outcomes of standardized satisfaction scales and the amount of product usage has been
only marginally explored. The few studies that have investigated this relationship have typically
shown that users who have interacted more with a product have higher satisfaction. The purpose of
this paper was to systematically analyze the variation in outcomes of three standardized user
satisfaction scales (SUS, UMUX and UMUX-LITE) when completed by users who had spent
different amounts of time with a website. In two studies, the amount of interaction was manipulated
to assess its effect on user satisfaction. Measurements of the three scales were strongly correlated and
their outcomes were significantly affected by the amount of interaction time. Notably, the SUS acted
as a unidimensional scale when administered to people who had less product experience, but was
bidimensional when administered to users with more experience. We replicated previous findings of
similar magnitudes for the SUS and UMUX-LITE (after adjustment), but did not observe the
previously reported similarities of magnitude for the SUS and the UMUX. Our results strongly
encourage further research to analyze the relationships of the three scales with levels of product
exposure. We also provide recommendations for practitioners and researchers in the use of the
questionnaires.
1. Introduction
The assessment of website user satisfaction is a fascinating topic in the field of human–
computer interaction (HCI). Since the late 1980s, practitioners have applied standardized usability
questionnaires to the measurement of user satisfaction, which is one of the three main components of
usability (ISO, 1998). Satisfaction analysis is a strategic way to collect information – before or after
the release of a product – about the user experience (UX), defined as the “person’s perceptions and
responses resulting from the use and/or anticipated use of a product” (ISO, 2010, p. 3).
1.1. Amount of experience and perceived usability
UX is a relatively new and broad concept which includes and goes beyond traditional usability
(Petrie & Bevan, 2009). As recently outlined by Lallemand, Gronier and Koenig (2015), experts have
different points of view on UX; however, there is a wide agreement in the HCI community that the
UX concept has a temporal component – i.e., the amount of user interaction with a product affects
people's overall experience. In addition, experts generally agree that the interactive experience of a
user is affected by the perceived usability and aesthetics of interface (Borsci, Kuljis, Barnett, &
Pecchia, In press; Hassenzahl, 2005; Hassenzahl & Tractinsky, 2006; Lee & Koubek, 2012;
Tractinsky, 1997), and the extent to which user needs are met (Hassenzahl, this issue). Accordingly,
to fully model the perceived experience of a user (Borsci, Kurosu, Federici, & Mele, 2013; Lindgaard
& Dudek, 2003; McLellan, Muddimer, & Peres, 2012), practitioners should include a set of repeated
objective and subjective measures in their evaluation protocols to enable satisfaction analysis as a
“subjective sum of the interactive experience” (Lindgaard & Dudek, 2003, p. 430).
Several studies have found that the magnitude of user satisfaction is associated with a user's
amount of experience with the product or system under evaluation (Lindgaard & Dudek, 2003;
McLellan, et al., 2012). For instance, Sauro (2011) that System Usability Scale (SUS; Brooke, 1996)
scores differed as a function of different levels of product experience. In other words, people with
long-term experience in the use of a product tended to rate their satisfaction with higher (better) scores
than users with shorter terms of experience.
In summary, researchers and practitioners assess user satisfaction by means of questionnaires,
but there are only a few empirical studies that have systematically analyzed the variation of the
outcomes of satisfaction scales when filled out by users with different amounts of experience in the
use of a product (Kortum & Johnson, 2013; Lindgaard & Dudek, 2003; McLellan, et al., 2012; Sauro,
2011).
1.2. The System Usability Scale (SUS)
Several standardized tools are available in the literature to measure satisfaction (for a review,
Borsci, et al., 2013). An increasing trend favors the use of short scales due to their speed and ease of
administration, either as online surveys for customers or after a usability test. One of the most popular
is the SUS (Lewis, 2006; Sauro & Lewis, 2011; Zviran, Glezer, & Avni, 2006), which has been cited
in more than 600 publications (Sauro, 2011) and is considered an industry standard. Its popularity
among HCI experts is due to several factors, such as its desirable psychometric properties (high
reliability and demonstrated validity), relatively short length (10 items), and low cost (free) (Bangor,
Kortum, & Miller, 2008; McLellan, et al., 2012).
The ten items of the SUS were designed to form a unidimensional measure of perceived
usability (Brooke, 1996). The standard version of the questionnaire has a mix of positive and negative
tone items, with the odd-numbered items having a positive tone and the even-numbered items having
a negative tone. Respondents rate the magnitude of their agreement with each item using a five-point
scale (1: Strongly disagree; 5: Strongly agree). To compute the overall SUS score, (1) each item is
converted to a 0-4 scale for which higher numbers indicated a greater amount of perceived usability,
(2) the converted scores are summed, and (3) the sum is multiplied by 2.5. This process produces
scores that can range from 0 to 100.
As Lewis (2014) recently stated, there are still lessons that have to be learned about SUS, in
particular about its dimensionality. Despite the SUS having been designed to be unidimensional,
several researchers recently showed that the items of SUS might load in two dimensions: usability
and learnability (Bangor, et al., 2008; Borsci, Federici, & Lauriola, 2009; Lewis & Sauro, 2009;
Lewis, Utesch, & Maher, 2013; Sauro & Lewis, 2012). Since 2009, however, there have been reports
of large-sample SUS datasets for which two-factor structures did not have the expected item-factor
alignment (Items 4 and 10 with Learnable, all others with Usable), indicating a need for further
research to clarify its dimensional structure and the variables that might affect it (Lewis, 2014).
In recent years, the growing availability of SUS data from a large number of studies (Bangor,
et al., 2008; Kortum & Bangor, 2012) has led to the production of norms for the interpretation of
mean SUS scores, e.g., the Curved Grading Scale (CGS) (Sauro & Lewis, 2012). Using data from
446 studies and over 5,000 individual SUS responses, Sauro & Lewis (2012) found the overall mean
score of the SUS to be 68 with a standard deviation of 12.5.The Sauro and Lewis CGS assigned
grades as a function of SUS scores ranging from ‘F’ (absolutely unsatisfactory) to ‘A+’ (absolutely
satisfactory), as follows:

Grade F (0–51.7)

Grade D (51.8–62.6)

Grade C- (62.7–64.9)

Grade C (65.0–71.0)

Grade C+ (71.1–72.5)

Grade B- (72.6–74.0)

Grade B (74.1–77.1)

Grade B+ (77.2–78.8)

Grade A- (78.9–80.7)

Grade A (80.8-84.0)

Grade A+ (84.1–100)
Although they should be interpreted with caution, the grades from the CGS provide an initial
basis for determining if a mean SUS score is below average, average, or above average (Sauro &
Lewis, 2012).
1.3. Ultra-short scales: The UMUX and UMUX-LITE
Although the SUS is a quick scale, practitioners sometimes need to use reliable scales that are
even shorter than the SUS to minimize time, cost, and user effort. “This need is most pressing when
standardized usability measurement is one part of a larger post-study or online questionnaire” (Lewis,
2014, p. 676). As a consequence, quite recently, two new scales have been proposed as shorter proxies
of the SUS: the Usability Metric for User Experience (UMUX, see Table 1), a four-item tool
developed and validated by Finstad (2010, 2013); and the UMUX-LITE composed of only the two
positive-tone questions from the UMUX (Lewis, et al., 2013). The scale for the UMUX items has
seven points (1: Strongly disagree; 7: Strongly agree).
______________________________________________________________________
Insert Table 1 about here. Items of the UMUX and UMUX-LITE
______________________________________________________________________
Some findings show the UMUX to be bidimensional as a function of the item tone, positive
vs. negative, (Lewis, 2013; Lewis, et al., 2013), despite of the intention to develop a unidimensional
scale. The UMUX’s statistical structure might be an artifact of the mixed positive/negative tone of
the items, and in practice might not matter much. In light of this, both the UMUX and its reduced
version, the UMUX-LITE, are usually interpreted as unidimensional measures.
By design (using a method similar to but not exactly the same as the SUS), the overall UMUX
and UMUX-LITE scores can range from 0 to 100. Their scoring procedures are:

UMUX: The odd items are scored as [score − 1] and even items as [7 − score].
The sum of the item scores is then divided by 24 and multiplied by 100 (Finstad, 2010).

UMUX-LITE: The two items are scored as [score − 1], and the sum of these is
divided by 12 and multiplied by 100 (Lewis, et al., 2013). For correspondence with SUS
scores, this sum is entered into a regression equation to produce the final UMUX-LITE score.
The equation below combines the initial computation plus the regression to show how to
compute the recommended UMUX-LITE score from the ratings of its two items.
UMUX-LITE = .65(([Item 1 score]+[Item 2 score] - 2)100/12) + 22.9.
(1)
Prior research (Finstad, 2010, 2013; Lewis, et al., 2013) has shown that the SUS, UMUX and
UMUX-LITE are reliable (Cronbach’s α between .80 and .95) and correlate significantly (p < .001).
In the research reported to date, UMUX scores have not only correlated with the SUS, they have also
had a similar magnitude. However, for the UMUX-LITE, it is necessary to use the preceding formula
(1) to adjust its scores to achieve correspondence with the SUS (Lewis, et al., 2013). For the rest of
this paper, reported UMUX-LITE values will be those computed using formula (1).
Thus, the literature on the UMUX and UMUX-LITE (Finstad, 2010; Lewis, et al., 2013),
suggests that these two new short scales can be used as surrogates for, the SUS.
Currently, three studies (McLellan, et al., 2012; Sauro, 2011) have investigated the
relationship between SUS scores and amount of product experience. The results of these studies have
consistently indicated that more experienced users had higher satisfaction outcomes (SUS scores).
Notably, researchers have not yet studied this effect on the outcomes of quick scales such as the
UMUX and UMUX-LITE. The comparative analyses of these instruments were performed mainly to
validate the questionnaires, without considering the effect of the different levels of experience in the
use of a website (Finstad, 2010; Lewis, et al., 2013).
1.4. Research goals
The use of short scales as part of UX evaluation protocols could sensibly reduce the costs of
assessment, as well as users’ time and effort to complete the questionnaires. Currently, few studies
have investigated the relationship among the SUS, UMUX and UMUX-LITE, and none have
analyzed their reliabilities as a function of different amounts of interaction with a product.
The primary goal of this paper was to analyze the variation of SUS, UMUX and UMUX-LITE
outcomes when completed concurrently by users with different levels of experience in the use of a
website. To reach this goal, we pursued three main objectives. First, we aimed to explore the variation
of UMUX and UMUX-LITE outcomes when administered to users with two different levels of
product experience. Second, we aimed to observe whether, at different levels of product experience,
the correlations among the SUS, UMUX and UMUX-LITE were stable, with particular interest in the
generalizability of formula (1). Finally, we checked whether the levels of respondents' product
experience affected the dimensional structure of the SUS. It may be that the Learnable scale might
not emerge until respondents have sufficient experience with the product they are rating. To achieve
these aims, we performed two studies with the three standardized usability metrics to measure the
self-report satisfaction of end-users with different levels of experience with an e-learning web
platform known as CLab (www.cognitivelab.it).
2. Methodology
Students enrolled in the bachelor’s degree of psychology program at the University of Perugia
are strongly encouraged to use the CLab target platform as an e-learning tool. Commonly, students
access it at least once a week for several reasons: for instance, to look for information about courses
and exam timetables; to sign in for mandatory attendance classes; to download course materials; to
book a test/exam; or to post a question and discuss issues with the professor.
For each study, a sample of volunteer students was asked to assess the interface after different
times of usage (based on their date of subscription to the platform) by filling out the SUS, UMUX
and UMUX-LITE questionnaires, presented in a random order. The Italian version of the SUS used
in Borsci et al. (2009) was administered. Additionally, translations and retranslations were made by
an independent group of linguistic experts to produce Italian versions of the UMUX and UMUXLITE.
Participants of the two studies were invited to fill out the scales two months (Study 1) or six
months (Study 2) after they first accessed CLab. In these studies, participants received the same
instruction before the presentation of the questionnaires: ‘Please, rate your satisfaction in the use of
the platform on the basis of your current experience of use’ [Per favore, in base alla tua attuale
esperienza d’uso , valuta la tua soddisfazione nell’utilizzo della piattaforma].
The two studies were organized to measure different times of participants’ exposure to CLab,
thus measuring different moments of UX acquisition, as follows:

Study 1, carried out two months after the students first accessed CLab. The participants’
number of access times and interaction with the platform (time exposure) ranged from 8 (once
a week) to 56 (once a day).

Study 2, carried out six months after the students first accessed CLab. The participants’
number of access times and interaction with the platform (time exposure) ranged from 24
(once a week) to 168 (once a day).
The two studies were reviewed and approved by the Institutional Review Board of the
Department of Philosophy, Social & Human Sciences and Education, University of Perugia. All
participants provided their written informed consent to participate in this study. No minors/children
were enrolled in this study. The study presented no potential risks.
2.1. Hypotheses
Concerning the effect of usage levels on the SUS’ dimensional structure, we expected that:

The SUS dimensionality would be affected by the level of experience acquired by the
participants with the product before the administration of the scale (Hypothesis 1).
The second hypothesis concerns the correlations among the tools (SUS, UMUX, UMUXLITE). Recently (Finstad, 2010; Lewis, et al., 2013), researchers have reported strong correlations
among the SUS, UMUX and UMUX-LITE. There is as yet, however, no data on the extent to which
the correlations might be affected by the users’ levels of acquired product experience. Thus, we
expected:

Significant correlations among the overall scores of the SUS, UMUX and UMUX-LITE, for
all studies independent of the administration conditions, i.e., different amounts of interaction
time with the target website (Hypothesis 2).
Finally, the third hypothesis concerns the relationship between scale outcomes and users’
levels of product experience. As noted above, user satisfaction may vary depending on both the level
of product experience and the time of exposure to a product (Lindgaard & Dudek, 2003; McLellan,
et al., 2012; Sauro, 2011). In particular, experts tend to provide higher satisfaction scores compared
to novices (Sauro, (2011). In the light of this, we expected that:

User satisfaction measured through SUS, UMUX and UMUX-LITE would be affected by the
different conditions of time exposure to the target website (two or six months) as well as by
different level of website frequency of use. Therefore, we expected students in Study 2
(cumulative UX condition) to rate the CLab with all the three scales as more satisfactory
compared to users in the first study due to their greater product exposure (six months).
Concurrently, we expected that participants with greater levels of product frequency of use
will rate the CLab with all the scales as more satisfactory than participants with lower levels
of use (Hypothesis 3).
2.2. Data analysis
For each study, principal components analyses were performed to assess the SUS’
dimensionality -- focusing on whether the item alignment of the resulting two-component structure
was consistent with the emergence of Learnable and Usable components. Only if this expected pattern
did not emerge did we plan to follow up with a multidimensional latent class item response theory
model (LC IRT) to more deeply test the dimensional structure of the scale (Bacci, Bartolucci, &
Gnaldi, 2014; Bartolucci, 2007; Bartolucci, Bacci, & Gnaldi, 2014). The primary purpose of this
additional analysis would be to confirm if a unidimensional structure was a better fit to the data than
the expected bidimensional structure – an assessment that is not possible with standard principal
components analysis because it is impossible to rotate a unidimensional solution (Cliff, 1987).
Descriptive statistics (mean [M], standard deviation [SD]) and Pearson correlation analyses
among the scales were performed to both compare the outcomes of the three scales and observe their
relationships. Moreover, a one-way ANOVA was carried out to assess the effect of experience on
user satisfaction as measured by the SUS, UMUX and UMUX-LITE.
Finally, an independent sample t-test analysis with 1,000 bootstrap resampling iterations was
performed to test the differences among the satisfaction levels of participants gathered through each
scale and a final comprehensive ANOVA was conducted to enable comparison of results between the
two studies. The MultiLCIRT package of R software by Bartolucci et al.(2014) was used to estimate
the multidimensional LC IRT models. All other analyses were performed using IBM® SPSS 22.
3. The studies
3.1 Study 1
Participants
One hundred and eighty-six first-year students of psychology (Male: 31 (17%), Age: 21.97,
SD: 5.63) voluntarily participated in the study two months after their subscription to and first use of
the platform.
Procedure
All participants used an online form to fill out the questionnaires (SUS, UMUX and UMUXLITE) and also indicated their weekly use of the platform (from 1 – once per week to 5 – once a day).
Participants were asked to rate their satisfaction in the use of CLab on the basis of their current
experience of use.
Results of Study 1
SUS dimensionality
As shown in Table 2, principal components analysis with Varimax rotation suggested a
unidimensional solution was appropriate for this set of SUS data. The table shows the item loadings
for one- and two-component solutions.
______________________________________________________________________
Insert Table 2 about here. Principal components analysis of the SUS in Study 1 showing
one- and two-component solutions (bidimensional loadings > .4 in bold)
______________________________________________________________________
The expected two-component solution would have shown Items 4 and 10 aligning with one
component and the other eight items aligning on the other. Instead, the apparent pattern of alignment
was positive tone (odd-numbered) items versus negative tone (even-numbered) items, similar to the
pattern observed in previous research for the UMUX. Another indicator of the inappropriateness of
this two-component solution were the items that had relatively large loadings on both components
(Items 2, 3, 5 and 7).
To verify the appropriateness of the one-component solution, we conducted additional
analyses with a special class of statistical models known as LC IRT models (Bacci, et al., 2014;
Bartolucci, 2007; Bartolucci, et al., 2014). This class of models extended traditional IRT models
(Nering & Ostini, 2010; van der Linden & Hambleton, 1997) in two main directions. First, they allow
the analysis of item responses in cases of questionnaires which measure more than one factor (also
called, in the context of IRT, latent trait or latent variable or ability). Second, multidimensional LC
IRT models assume that the population is composed of homogeneous groups of individuals sharing
unobserved but common characteristics (so-called latent classes; Goodman, 1974; Lazarsfeld &
Henry, 1968). IRT models are considered a powerful alternative to principal components analysis,
especially when questionnaires consist of binary or (ordered) polytomously-scored items rather than
quantitative items.
Our analysis proceeded with two steps. The first was the selection of the number of latent
classes (C). To compare unidimensional and bidimensional assumptions through the class of models
at issue, we first needed to detect the optimal number of latent classes, i.e., the number of groups
which ensures a satisfactory level of goodness of fit of the statistical model at issue to the observed
data. For this aim, we selected the number of latent classes, relying on the Bayesian Information
Criterion (BIC) index (Schwarz, 1978). More specifically, we estimated unidimensional LC IRT
models to increase values of C (C = 1; 2; 3; 4), keeping constant all the other elements characterizing
the class of models. We took that value just before the first increase of BIC. We repeated the analysis
under the assumption of bidimensionality.
Table 3 shows that the minimum value of the BIC index is observed for C = 3, both in the
unidimensional case and in the bidimensional case, suggesting that the sample of individuals comes
from a population composed of three latent classes. As smaller values of BIC are better than higher
values, a comparison between the unidimensional and the bidimensional models with C = 3 found on
the BIC index gave evidence of a better goodness of fit for the model with unidimensional structure
(BIC = 2924.805) than for that with bidimensional structure (BIC = 2941.496, bolded in Table 3).
______________________________________________________________________
Insert Table 3 about here. Unidimensional and bidimensional LC IRT models for SUS data:
number of latent classes (C), estimated maximum log-likelihood (ℓ), number of parameters (#par),
BIC index
______________________________________________________________________
The unidimensionality assumption was also verified through a likelihood-ratio (LR) test as
follows. Given the number of latent classes selected in the previous step, a LR test was used to
compare models which differ in terms of the dimensional structure, that is, bidimensional versus
unidimensional structure. This type of statistical test allowed us to evaluate the similarity between a
general model and a restricted model, i.e., a model which is obtained by the general one by imposing
one constraint, so that the restricted model is nested in the general model. More precisely, an LR test
evaluates, at a given significance level, the null hypothesis of equivalence between the two nested
models at issue: if the null hypothesis is not rejected, the restricted mode is preferred, in the interests
of parsimony; if the null hypothesis is rejected, the general model is preferred. In our framework, the
general model represents a bidimensional structure, where items 4 and 10 of the SUS questionnaire
contribute to a different latent trait with respect to the remaining ones, whereas the restricted model
is used when all items belong to the same dimension. The LR test is based on the difference between
the maximum log-likelihood of the two models and it evaluates if this difference is statistically
significantly different from zero. Higher values of log-likelihood difference denote that the
hypothesis of unidimensionality is unlikely and it should be discarded in favor of bidimensionality.
In particular, the LR test statistic is given by −2 times the difference between the maximum loglikelihood and, under certain regularity conditions, it is distributed as a Chi-square with the difference
in the number of free parameters in the two compared models as degrees of freedom.
The LR test statistic equalled 3.9918 with 4 degrees of freedom. The resulting p-value, based
on the Chi-square distribution, is equal to 0.4071 and, therefore, the null hypothesis of SUS
unidimensionality cannot be rejected for Study 1. To conclude, both the BIC index and the LR test
provided evidence in favor of the unidimensionality assumption.
Correlations among the tools
The overall SUS, UMUX and UMUX-LITE scale results significantly correlated (p < .001,
Table 4). There was considerable overlap in the 95% confidence intervals for the correlations between
the SUS and the other scales, so it appeared that the UMUX and UMUX-LITE had similar magnitudes
of association with the SUS.
______________________________________________________________________
Insert Table 4 about here. Correlations among SUS, UMUX and UMUX-LITE in Study 1
(with 95% confidence intervals)
______________________________________________________________________
Table 5 shows that, on average, participants rated the platform as satisfactory (scores higher
than 70 out of 100) for all three questionnaires – i.e., with SUS scores greater than or equal to ‘C’ on
the Sauro-Lewis CGS (Sauro & Lewis, 2012). As demonstrated by comparison of the confidence
intervals, there were significant differences in the magnitudes of the three metrics, with the SUS and
UMUX-LITE having a closer correspondence than the SUS and UMUX.
______________________________________________________________________
Insert Table 5 about here. Means and standard deviations of overall scores of the SUS,
UMUX and UMUX-LITE for Study 1 with 95% confidence intervals (and associated CGS grades)
______________________________________________________________________
User satisfaction and frequency of use
61.9% of participants declared that they used the platform for more than three days per week
to every day, while 38.1% stated a lower rate of usage. A one-way ANOVA showed a significant
difference among the user satisfaction levels between participants with a low (1–2 days per week),
medium (3–4 days per week) and high (≥ 5 days per week) self-reported level of use of CLab for all
the questionnaires: SUS (F = (2, 184) = 4.39, p = .014), UMUX (F = (2, 184) = 8.71, p = .001) and
UMUX-LITE (F = (2, 184) = 6.76, p = .002). In particular, least significant difference LSD post-hoc
analyses showed that students who were used to interacting with the platform from more than three
days per week to every day (those who used the website more frequently), tended to judge the product
as more satisfactory than people with less exposure (Figure 1).
______________________________________________________________________
Insert Figure 1 about here. Interaction between scale and frequency of use for Study 1
______________________________________________________________________
3.2 Study 2
Participants and procedure
Ninety-three students of psychology (Male: 17 (18%), Age: 22.03, SD: 1.44) voluntarily
participated in the study six months after their subscription to and first use of CLab. Participants
followed the same procedure as in Study 1.
Results of Study 2
SUS dimensionality
To check the dimensionality of the SUS after six months of usage, we performed a principal
components analysis with Varimax rotation. Table 6 shows the SUS, under the test conditions of
Study 2, was composed of two dimensions in line with the previous studiesthat reported alignment of
Items 4 and 10 separate from the other items (Borsci, et al., 2009; Lewis & Sauro, 2009; Lewis, et
al., 2013).
______________________________________________________________________
Insert Table 6 about here. Principal components analysis of the SUS in Study 2 (loadings >
.4 in bold)
______________________________________________________________________
To further confirm the bidimensional structure of SUS in study 2 we performed a LC IRT
analysis (Table 7). We estimated the increase values of C (C = 1; 2; 3; 4; 5) for both the
unidimensional and bidimensional models, keeping constant all the other elements characterizing the
class of models. We took that value just before the first increase of BIC. Table 7 shows that the
minimum value of the BIC index for the unidimensional model is C = 4, and C=5 for the
bidimensional case. In study 2, the smaller value of BIC is outlined in the bidimensional model C=5.
The BIC index gave evidence of a better goodness of fit for the model with bidimensional structure
(BIC = 1912.636) than for that with unidimensional structure (BIC = 2059.4, bolded in Table 7).
Finally, LR test the LR test statistic equalled 164.206 with 3 degrees of freedom for the
bidimensional model, and 148.5492 with 2 degrees of freedom for the unidimensional model. For
both the models (bidimensional and unidimensional) the resulting p-value, based on the Chi-square
distribution, is equal to 0.001. Therefore, the null hypothesis of SUS unidimensionality can be
rejected for Study 2. To conclude, both the BIC index and the LR test provided evidence in favor of
the bidimensionality assumption.
______________________________________________________________________
Insert Table 7 about here. Unidimensional and bidimensional LC IRT models for SUS data: number
of latent classes (C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC
index
______________________________________________________________________
Correlations among the tools
As in Study 1, all three scales were strongly correlated (p < .001, see Table 8). Comparison
of the 95% confidence intervals indicated that the magnitudes of association among the three scales
were similar between the studies (p > .05).
Table 9 shows that participants judged the platform as satisfactory – i.e., with mean scores
higher than 75 out of 100 – for all three questionnaires. For this set of data, there was substantial
overlap in the confidence intervals for the SUS and UMUX-LITE, with identical CGS grades for the
SUS/UMUX-LITE means and interval limits. Consistent with the findings from Study 1, the
magnitude of difference between the mean SUS and UMUX scores was significant.
______________________________________________________________________
Insert Table 8 about here. Correlations among SUS, UMUX and UMUX-LITE in Study 2
(with 95% confidence intervals)
______________________________________________________________________
______________________________________________________________________
Insert Table 9 about here. Means and standard deviations of overall scores of the SUS,
UMUX and UMUX-LITE for Study 2 with 95% confidence intervals (and associated CGS grades)
______________________________________________________________________
User satisfaction and frequency of use
51% of participants declared that they used the platform for more than three days per week to
every day, while 49% declared a lower rate of usage. One-way ANOVA confirmed that for all three
scales there was a significant difference (SUS (F = (2, 91) = 18.37, p = .001, UMUX (F = (2, 91) =
12.11, p = .001; and UMUX-LITE (F = (2, 91) = 11.57, p = .001) among the satisfaction rates of
participants (at least for students with low and high levels of exposure to the platform). Again, our
results indicated a significant relationship between frequency of use and satisfaction (Figure 2).
______________________________________________________________________
Insert Figure 2 about here. Interaction between scale and frequency of use for Study 2
______________________________________________________________________
4. General results
Although the questionnaire results strongly correlated independent of the administration
conditions, in line with Hypothesis 3, we performed ANOVAs to analyze the differences among the
ratings as a function of duration of exposure (two or six months) and independently as a function of
different frequencies of use indicated significant main effects and interactions for all three scales (see
Table 10). Figure 3 provide a graphic depiction of the significant interactions.
______________________________________________________________________
Insert Figure 3 about here. Combined interaction between scale and frequency of use for
Study 1 and 2
______________________________________________________________________
LSD post-hoc analyses revealed significant differences (p < .001) for all comparisons of the
different frequencies of use (low, medium, or high).
______________________________________________________________________
Insert Table 10 about here. Main effects and interactions for combined ANOVA
______________________________________________________________________
5. Discussion
Table 11 summarizes the testing outcomes for the hypotheses.
______________________________________________________________________
Insert Table 11 about here. Summary of study outcomes
______________________________________________________________________
The outcomes of the studies show that the learnability dimension of the SUS, as suggested,
might emerge only under certain conditions – that is, when it is administered to users after a long
enough period of exposure to the interface. This variability of the SUS dimensionality may be due to
its original development as a unidimensional scale of perceived usability (Brooke, 1996). Items 4 and
10 of the SUS comprise the learnability dimension of this scale (Tables 1 and 5). The item 4 pertains
to the need for support in use of the system (‘I think I would need the support of a technical person
to be able to use this system’), and the item 10 pertains to the perceived complexity of learning the
system (‘I needed to learn a lot of things before I could get going with this system’). These two items
are strongly related to the ability of users to quickly understand how to use the product without help.
In tune with that, the learnability dimension is probably sensible to the level of confidence acquired
by users in using the product functions and in anticipating system reactions. Our results are consistent
with the of learnability as “the ease with which new users can begin effective interaction and achieve
maximal performance” (2003, p. 260). In fact, the second dimension of the SUS might emerge only
when users perceived themselves as effective in the use of the product, making this an interesting
topic for future research.
All the satisfaction scale results strongly correlated under each condition of scale
administration, demonstrating convergent validity of the construct they purport to measure. The
Italian versions of the UMUX and UMUX-LITE, as with the version of the SUS previously validated
by other studies (Borsci, et al., 2009), confirmed the reliability coefficient rates of the English version,
with a Cronbach’s α between .80 and .90 for the UMUX, and between .71 and .85 for UMUX-LITE.
Moreover, independent of the condition of scale administration, formula (1) functioned properly by
bringing UMUX-LITE scores into reasonable correspondence with concurrently collected SUS
scores.
Unlike previous studies (Finstad, 2010; Lewis, et al., 2013), where the magnitudes of UMUX
averages were quite close to corresponding SUS averages, in the present study the average UMUX
scores were significantly higher compared to the SUS and UMUX-LITE means (Tables 4 and 7). For
instance, for our cohort of participants with two months of product experience, Table 5 shows that
while the interval limits around the UMUX had CGS grade ranges from ‘A’ to ‘A+’, the SUS and
UMUX-LITE were more aligned with grade ranges from ‘C’ to ‘B’. This difference with previous
outcomes could be a peculiarity of this research. However, the differences among the results of
UMUX and the other two scales were large enough to lead practitioners to make different decisions
about CLab. For instance, by relying only on UMUX outcomes, a practitioner may report to designers
that CLab is a very satisfactory interface and no further usability analyses are needed. Alternatively,
on the basis of the SUS or UMUX-LITE outcomes, practitioners would likely report to designers that
CLab is reasonably satisfactory, but further usability analysis and redesign could improve the overall
interaction experience of end-users (the UX).
Finally, as Tables 9 and 10 show, the outcomes of all three questionnaires were affected by
the levels of experience of the respondents, both by the duration of use across months and the weekly
frequency of use. In tune with previous studies (McLellan, et al., 2012; Sauro, 2011), users with a
greater amount of product experience were more satisfied than users with less experience. This was
also confirmed by the ANOVAs performed for Studies 1 and 2: greater product experience was
associated with a higher the level of user satisfaction.
Limitations of the study
Even though the outcomes of this study were generally in line with previous research, the
representativeness of these results is limited due to the characteristics of cohorts involved in the study.
Our results concern satisfaction in the use of an e-learning web interface rated by students with similar
characteristics (age, education, country, etc.). Therefore, we cannot assume that the outcomes will
generalize to other kinds of interfaces. To exhaustively explore the relationship between the amount
of experience gained through time and satisfaction gathered by means of short scales, future studies
should include users with various individual differences and divergent characteristics (e.g.,
experience with the product, age, education, individual functioning and disability) in the use of
different types of websites.
Our results, obtained through the variation of the amount of exposure of users to the interface,
showed that people with more exposure to a product were likely to rate the interface as more
satisfactory. However, because this is correlational evidence, it is also possible that users who
experience higher satisfaction during early use of a product might choose to use it more frequently,
thus gaining high levels of product experience. Future longitudinal studies should investigate this
relationship, measuring, through SUS, UMUX and UMUX-LITE, whether users who perceive a
product as satisfactory tend to spend more time using it. It would also be of value to conduct a
designed experiment with random assignment of participants to experience conditions to overcome
the ambiguity of correlations.
6. Conclusions
Prior product experience was associated with the user satisfaction measured by the SUS and
its proposed alternate questionnaires, the UMUX and UMUX-LITE. Therefore, consistent with
previous research (Kortum & Johnson, 2013; McLellan, et al., 2012; Sauro, 2011), to obtain an
exhaustive picture of user satisfaction, researchers and practitioners should take into consideration
each user’s amount of exposure to the product under evaluation (duration and frequency of use).
All of the scales we analyzed were strongly correlated and can be used as quick tools to assess
user satisfaction. Therefore, practitioners who plan to use one or all of these scales should carefully
consider their administration for the proper management of satisfaction outcomes. Based on our
results, we offer our points of advice:

When administered to users after a short period of product use, it is safest to consider the SUS
to be a unidimensional scale, so we recommend against partitioning it into Usable and
Learnable components in that context. Moreover, practitioners should anticipate the
satisfaction scores of newer users will be significantly lower than the scores of more
experienced people.

When the SUS is administered to more experienced users, the scale appears to have
bidimensional properties, making it suitable to compute both an overall SUS score and its
Learnable and Usable components. The overall level of satisfaction will be higher than that
among less experienced users.

Due to their high correlation with the SUS, in particular, the UMUX and UMUX-LITE overall
scores showed similar behaviors.

If using one of the ultra-short questionnaires as a proxy for the SUS, the UMUX-LITE (with
its adjustment formula) appears to provide results that are closer in magnitude to the SUS than
the UMUX, making it the more desirable proxy.
The UMUX and UMUX-LITE are both reliable and valid proxies of the SUS. Nevertheless,
Lewis et al. (2013) suggested using them in addition to the SUS rather than instead of the SUS for
critical usability work, due to their recent development and still limited employment. In particular,
on the basis of our results, we recommend that researchers avoid using only the UMUX for their
analysis of user satisfaction because, at least in the current study, this scale seemed too optimistic.
In the formative phase of design or in agile development, the UMUX-LITE could be adopted as a
preliminary and quick tool to test users’ reactions to a prototype. Then, in advanced design phases
or in summative evaluation phases, we recommend using a combination of the SUS and UMUXLITE (or UMUX) to assess user satisfaction with usability (note that because the UMUX-LITE was
derived from the UMUX, when you collect the UMUX you also collect the data needed to compute
the UMUX-LITE). Over time this could lead to a database of concurrently collected SUS, UMUX,
and UMUX-LITE scores that would allow more detailed investigation of their relationships and
psychometric properties.
Acknowledgment
We would like to thank Dr. James R. Lewis, senior human factors engineer at IBM Software Group
and guest editor of this special issue, for his generous feedback during the preparation of this paper.
References
Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A Class of Multidimensional Latent Class IRT
Models for Ordinal Polytomous Item Responses. Communications in Statistics – Theory and
Methods, 43(4), 787–800. doi:10.1080/03610926.2013.827718
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An Empirical Evaluation of the System Usability
Scale. International Journal of Human-Computer Interaction, 24(6), 574–594.
doi:10.1080/10447310802205776
Bartolucci, F. (2007). A Class of Multidimensional IRT Models for Testing Unidimensionality and
Clustering Items. Psychometrika, 72(2), 141–157. doi:10.1007/s11336-005-1376-9
Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package for multidimensional
latent class item response models. Computational Statistics & Data Analysis, 71(0), 971–
985. doi:10.1016/j.csda.2013.05.018
Borsci, S., Federici, S., & Lauriola, M. (2009). On the Dimensionality of the System Usability
Scale (SUS): A Test of Alternative Measurement Models. Cognitive Processing, 10(3),
193–197. doi:10.1007/s10339-009-0268-9
Borsci, S., Kuljis, J., Barnett, J., & Pecchia, L. (In press). Beyond the user preferences: Aligning the
prototype design to the users’ expectations. Human Factors and Ergonomics in
Manufacturing & Service Industries.
Borsci, S., Kurosu, M., Federici, S., & Mele, M. L. (2013). Computer Systems Experiences of Users
with and without Disabilities: An Evaluation Guide for Professionals. Boca Raton, FL: CRC
Press. doi:10.1201/b15619-1
Brooke, J. (1996). SUS: A “quick and dirty” usability scale. In P. W. Jordan, B. Thomas, B. A.
Weerdmeester & I. L. McClelland (Eds.), Usability evaluation in industry (pp. 189–194).
London, UK: Taylor & Francis.
Cliff, N. (1987). Analyzing multivariate data. San Diego, CA: Harcourt Brace Jovanovich.
Dix, A., Finlay, J., Abowd, G. D., & Beale, R. (2003). Human-Computer Interaction. Harlow, UK:
Pearson Education.
Finstad, K. (2010). The Usability Metric for User Experience. Interacting with Computers, 22(5),
323–327. doi:10.1016/j.intcom.2010.04.004
Finstad, K. (2013). Response to commentaries on ‘The Usability Metric for User Experience’.
Interacting with Computers, 25(4), 327–330. doi:10.1093/iwc/iwt005
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and
unidentifiable models. Biometrika, 61(2), 215–231. doi:10.1093/biomet/61.2.215
Hassenzahl, M. (2005). The Thing and I: Understanding the Relationship Between User and
Product. In M. Blythe, K. Overbeeke, A. Monk & P. Wright (Eds.), Funology: From
Usability to Enjoyment (Vol. 3, pp. 31–42). Berlin, DE: Springer. doi:10.1007/1-4020-29675_4
Hassenzahl, M., & Tractinsky, N. (2006). User experience – a research agenda. Behaviour &
Information Technology, 25(2), 91–97. doi:10.1080/01449290500330331
ISO 9241-11:1998 Ergonomic requirements for office work with visual display terminals – Part 11:
Guidance on usability.
ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210: Human-centred design for
interactive systems.
Kortum, P. T., & Bangor, A. (2012). Usability Ratings for Everyday Products Measured With the
System Usability Scale. International Journal of Human-Computer Interaction, 29(2), 67–
76. doi:10.1080/10447318.2012.681221
Kortum, P. T., & Johnson, M. (2013). The Relationship Between Levels of User Experience with a
Product and Perceived System Usability. Proceedings of the Human Factors and
Ergonomics Society Annual Meeting, 57(1), 197–201. doi:10.1177/1541931213571044
Lallemand, C., Gronier, G., & Koenig, V. (2015). User experience: A concept without consensus?
Exploring practitioners’ perspectives through an international survey. Computers in human
behavior, 43, 35–48. doi:10.1016/j.chb.2014.10.048
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton,
Mifflin.
Lee, S., & Koubek, R. J. (2012). Users’ perceptions of usability and aesthetics as criteria of pre- and
post-use preferences. European Journal of Industrial Engineering, 6(1), 87–117.
doi:10.1504/EJIE.2012.044812
Lewis, J. R. (2006). Usability testing. In G. Salvendy (Ed.), Handbook of Human Factors and
Ergonomics (pp. 1275–1316). New York, NY: John Wiley & Sons.
Lewis, J. R. (2013). Critical Review of ‘The Usability Metric for User Experience’. Interacting with
Computers, 25(4), 320–324. doi:10.1093/iwc/iwt013
Lewis, J. R. (2014). Usability: Lessons Learned … and Yet to Be Learned. International Journal of
Human-Computer Interaction, 30(9), 663–684. doi:10.1080/10447318.2014.930311
Lewis, J. R., & Sauro, J. (2009). The Factor Structure of the System Usability Scale. In M. Kurosu
(Ed.), Human Centered Design (Vol. 5619, pp. 94–103). Berlin, DE: Springer.
doi:10.1007/978-3-642-02806-9_12
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013, Apr 27–May 2). UMUX-LITE: when there’s no
time for the SUS. Paper presented at the Conference on Human Factors in Computing
Systems: CHI ’13, Paris, FR. doi:10.1145/2470654.2481287
Lindgaard, G., & Dudek, C. (2003). What is this evasive beast we call user satisfaction? Interacting
with Computers, 15(3), 429–452. doi:10.1016/S0953-5438(02)00063-2
McLellan, S., Muddimer, A., & Peres, S. C. (2012). The Effect of Experience on System Usability
Scale Ratings. Journal of Usability Studies, 7(2), 56–67.
Nering, M. L., & Ostini, R. (2010). Handbook of Polytomous Item Response Theory Models. New
York, NY: Taylor and Francis.
Petrie, H., & Bevan, N. (2009). The Evaluation of Accessibility, Usability, and User Experience. In
C. Stephanidis (Ed.), The Universal Access Handbook (pp. 299–314). Boca Raton, FL: CRC
Press.
Sauro, J. (2011). Does Prior Experience Affect Perceptions Of Usability? Retrieved May 15, 2014,
from http://www.measuringusability.com/blog/prior-exposure.php
Sauro, J., & Lewis, J. R. (2011). When Designing Usability Questionnaires, Does It Hurt To Be
Positive? Paper presented at the Conference on Human Factors in Computing Systems: CHI
’11, Vancouver, BC, CA. doi:10.1145/1978942.1979266
Sauro, J., & Lewis, J. R. (2012). Quantifying the User Experience: Practical statistics for user
research. Burlington, MA: Morgan Kaufmann.
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461–464.
doi:10.2307/2958889
Tractinsky, N. (1997, Mar 22–27). Aesthetics and Apparent Usability: Empirically Assessing
Cultural and Methodological Issues. Paper presented at the Conference on Human Factors
in Computing Systems: CHI ’97, Atlanta, GA. doi:10.1145/258549.258626
van der Linden, W., & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. New
York, NY: Springer.
Zviran, M., Glezer, C., & Avni, I. (2006). User satisfaction from commercial web sites: the effect of
design and use. Information & Management, 43(2), 157–178. doi:10.1016/j.im.2005.04.002
Table 1. Items of the UMUX and UMUX-LITE
Item # -- Scale
Item content
Item 1 – UMUX
[This system’s] capabilities meet my requirements.
Item 1 – UMUX-LITE
Item 2 – UMUX
Using [this system] is a frustrating experience.
Item 3 – UMUX
[This system] is easy to use.
Item 2– UMUX-LITE
Item 4 – UMUX
I have to spend too much time correcting things with [this system].
Table 2. Principal components analysis of the SUS in Study 1 showing one- and two-component
solutions (bidimensional loadings > .4 in bold)
Eigenvalues extraction
Unidimensional
Items
One component
9.
I felt very confident using this
Bidimensional
Component
Component
1
2
0.198
0.803
0.455
0.505
0.113
0.762
0.509
0.418
0.822
-0.019
0.471
0.503
0.513
0.266
0.500
0.626
0.789
0.055
0.517
0.360
.936
website.
7.
I would imagine that most people
.932
would learn to use this website very quickly.
1.
would like to use this website
.867
frequently
2.
I found this website unnecessarily
.865
complex.
8.
I found this website very
.839
cumbersome/awkward to use.
3.
I thought this website was easy to use.
.814
6.
I thought there was too much
.815
inconsistency in this website.
5.
I found the various functions in this
.823
website were well integrated.
10.
I needed to learn a lot of things before
.753
I could get going with this website.
4.
I think that I would need assistance to
be able to use this website.
.637
Table 3. Unidimensional and bidimensional LC IRT models for SUS data: number of latent classes
(C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC index
unidimensional model
C
ℓ
1
bidimensional model
BIC
C
ℓ
−1470.447 40
3147.714
1
−1470.447 40
3147.714
2
−1405.317 24
2934.725
2
−1405.057 27
2949.717
3
−1395.186 26
2924.805
3
−1393.190 30
2941.496
4
−1394.914 28
2934.602
4
−1386.052 33
2942.729
#par
#par
BIC
Table 4. Correlations among SUS, UMUX and UMUX-LITE in Study 1 (with 95% confidence
intervals)
UMUX CI
SUS
UMUX-LITE CI
r
Lower Upper r
Lower Upper
.554**
.430
.679
447**
.313
.581
--
--
.838**
.756
.920
UMUX 1
** Correlation is significant at the 0.01 level (2-tailed).
Table 5. Means and standard deviations of overall scores of the SUS, UMUX and UMUX-LITE for
Study 1 with 95% confidence intervals (and associated CGS grades)
Scale
SUS
UMUX
UMUX-LITE
Mean
70.88 (C)
84.66 (A+)
73.83 (B-)
SD
6.703
12.838
9.994
95% CI Lower
69.9 (C)
82.8 (A)
72.4 (C+)
95% CI Upper
71.8 (C+)
86.5 (A+)
75.3 (B)
Table 6. Principal components analysis of the SUS in Study 2 (loadings > .4 in bold)
Items
Usability Learnability
8.
I found this website very cumbersome/awkward to use.
.910
.121
1.
would like to use this website frequently
.869
.105
5.
I found the various functions in this website were well
.800
.122
integrated.
3.
I thought this website was easy to use.
.769
.226
7.
I would imagine that most people would learn to use this
.754
.258
website very quickly.
2.
I found this website unnecessarily complex.
.739
.273
6.
I thought there was too much inconsistency in this website.
.708
.183
9.
I felt very confident using this website.
.683
.106
10.
I needed to learn a lot of things before I could get going with
.133
.841
.206
.753
this website.
4. I think that I would need assistance to be able to use this website.
Table 7. Unidimensional and bidimensional LC IRT models for SUS data: number of latent classes
(C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC index
C
1
2
3
4
5
6
unidimensional model
ℓ
#par
BIC
C
-1050.62 40
2282.965 1
-996.09 24
2101.22 2
-975.599 26
2069.323 3
-966.094 28
2059.4 4
-963.457 30
2063.212 5
-963.457 32
2072.299 6
bidimensional model
ℓ
#par
BIC
-1050.62 40
2282.965
-964.826 24
2038.692
-910.835 27
1944.338
-891.82 30
1919.938
-881.354 33
1912.636
-878.343 36
1920.244
Table 8. Correlations among SUS, UMUX and UMUX-LITE in Study 2 (with 95% confidence
intervals)
UMUX CI
SUS
UMUX-LITE CI
r
Lower Upper r
Lower Upper
.716**
. 571
. 861
658**
. 502
. 815
--
--
. 879**
. 717
. 953
UMUX 1
** Correlation is significant at the 0.01 level (2-tailed).
Table 9. Means and standard deviations of overall scores of the SUS, UMUX and UMUX-LITE for
Study 2 with 95% confidence intervals (and associated CGS grades)
Scale
SUS
UMUX
UMUX-LITE
Mean
75.24 (B)
87.69 (A+)
76.45 (B)
SD
13.037
10.291
9.943
95% CI Lower
72.6 (B)
85.6 (A+)
74.4 (B)
95% CI Upper
77.0 (B+)
89.8 (A+)
78.5 (B+)
Table 10. Main effects and interactions for combined ANOVA
Scale
Effect
Outcome
SUS
Main effect of duration
F(1, 263) = 10.7, p = .001
Main effect of frequency of use
F(2, 263) = 30.9, p < .0001
Duration x frequency interaction
F(2, 263) = 15.8, p < .0001
Main effect of duration
F(1, 263) = 17.4, p < .0001
Main effect of frequency of use
F(2, 263) = 22.2, p < .0001
Duration x frequency interaction
F(2, 263) = 3.4, p = .035
UMUX
UMUX-LITE Main effect of duration
F(1, 263) = 4.7, p = .03
Main effect of frequency of use
F(2, 263) = 16.8, p < .0001
Duration x frequency interaction
F(2, 263) = 4.7, p = .01
Table 11. Summary of study outcomes by hypothesis
Hypothesis
Result
Meaning
Hypothesis 1
Supported
SUS dimensionality was affected by the
different levels of product experience. When
administered to users with less product
experience, the SUS had a unidimensional
structure, whereas it had a bidimensional
structure for respondents with more product
experience.
Hypothesis 2
Supported
All the three scales were strongly correlated,
independent of the administration conditions.
Hypothesis 3
Supported
Participants with more product experience
were more satisfied than those with less
product experience regardless of whether
that experience was gained over a duration
of exposure or by frequency of use.
List of Figures
1. Interaction between scale and frequency of use for Study 1
2. Interaction between scale and frequency of use for Study 2
3. Interaction between scale and frequency of use for Study1 and 2
Figure 1. Interaction between scale and frequency of use for Study 1
Figure 2. Interaction between scale and frequency of use for Study 2
Figure 3. Interaction between scale and frequency of use for Study1 and 2
BIOGRAPHIES
Simone Borsci is Research Fellow in Human Factors at Imperial College of London NHIRDiagnostic Evidence Cooperative group. He has over 10 years of experience as a Human Factor and
HCI expert working on people and technology interaction in both industry and academia. He has
worked as UX lead of Italian Government working group of usability, and as research at University
of Perugia, School of computing of Brunel University, and at the Human Factor Research Group of
Nottingham University. Simone holds a PhD in Cognitive Psychology at the University of Rome
“La Sapienza”, he has over 30 publications (journals, book, book chapters, and conference papers),
many of which relate to people with and without disability interaction with complex systems, and in
human error and risk reduction in the use of technologies.
Stefano Federici is currently Associate Professor of General Psychology at the University
of Perugia. He is the coordinator of research team of CognitiveLab at University of Perugia
(www.cognitivelab.it). His research is focused on assistive technology assessment processes,
disability, and cognitive and human interaction factors.
Michela Gnaldi is currently assistant professor of Social Statistics at the Department of
Political Sciences of the University of Perugia. Her main research interest concerns measurement in
education. On this topic, she participated in several research projects of national interest, in Italy
and in the UK, where she has been working as a statistician and researcher at the National
Foundation for Educational Research (NFER). She actively collaborates with several research
institutions and has been responsible of a number contracts signed between the INVALSI (Istituto
Nazionale di Valutazione del Sistema dell’Istruzione) and the Università di Perugia. Among her
publications, she is author, jointly with F. Bartolucci and S. Bacci, of a book concerning item
response theory models.
Silvia Bacci earned the PhD in Applied Statistics in 2007 and she is Assistant Professor of
Statistics since 2009 at the University of Perugia. Her research interests concern latent variable
models, with a special focus on models for categorical and longitudinal/multilevel data, latent class
models, and item response theory models. The applications range from the education to the health
fields. On these topics, she participated in several research projects of national interest and she
earned research fellowships and research contracts. Now she partecipates to a FIRB project funded
by the Italian government, concerning “Mixture and latent variable models for causal inference and
analysis of socio-economic data”. Among her publications, she is author, jointly with F. Bartolucci
and M. Gnaldi, of a book concerning item response theory models.
Francesco Bartolucci is a Full Professor of Statistics in the Department of Economics at
the University of Perugia, where he coordinated the PhD program in “Mathematical and Statistical
Methods for the Economic and Social Sciences”. He was a member of the Group of Evaluation
Experts (GEV) that managed the Evaluation of the Quality of Research (VQR), for the period 20042010. He is the Principal Investigator of the research project “Mixture and latent variable models
for causal inference and analysis of socio-economic data” (FIRB 2012 - “Futuro in ricerca” - Italian
Government). He has been the Principal Investigator of the research project “Advances in nonlinear panel models with socioeconomic applications” founded by the “Einaudi Institute for
Economics and Finance” (Rome) for the year 2009-2011. He has introduced several methodological
advances on these topics, which have been published in top international journals and presented at
many international conferences, often in the form of invited talks. The methodological advances
have been carried out together with the development of relevant applications concerning labor
market, education, health, and criminal data, often with a perspective of causal analysis and
evaluation of policies and treatments.