Assessing User Satisfaction in the Era of User Experience: Comparison of the SUS, UMUX and UMUX-LITE as a Function of Product Experience Simone Borsci1*, Stefano Federici2, Silvia Bacci3, Francesco Bartolucci3,Michela Gnaldi4 1 National Institute for Health Research, Diagnostic Evidence Cooperative, Imperial College of London, United Kingdom 2 Department of Philosophy, Social & Human Sciences and Education, University of Perugia, Italy 3 4 Department of Economics, University of Perugia, Italy Department of Political Sciences, University of Perugia, Italy *Corresponding author: [email protected] Abstract Nowadays, practitioners extensively apply quick and reliable scales of user satisfaction as part of their user experience (UX) analyses to obtain well-founded measures of user satisfaction within time and budget constraints. However, in the human-computer interaction (HCI) literature the relationship between the outcomes of standardized satisfaction scales and the amount of product usage has been only marginally explored. The few studies that have investigated this relationship have typically shown that users who have interacted more with a product have higher satisfaction. The purpose of this paper was to systematically analyze the variation in outcomes of three standardized user satisfaction scales (SUS, UMUX and UMUX-LITE) when completed by users who had spent different amounts of time with a website. In two studies, the amount of interaction was manipulated to assess its effect on user satisfaction. Measurements of the three scales were strongly correlated and their outcomes were significantly affected by the amount of interaction time. Notably, the SUS acted as a unidimensional scale when administered to people who had less product experience, but was bidimensional when administered to users with more experience. We replicated previous findings of similar magnitudes for the SUS and UMUX-LITE (after adjustment), but did not observe the previously reported similarities of magnitude for the SUS and the UMUX. Our results strongly encourage further research to analyze the relationships of the three scales with levels of product exposure. We also provide recommendations for practitioners and researchers in the use of the questionnaires. 1. Introduction The assessment of website user satisfaction is a fascinating topic in the field of human– computer interaction (HCI). Since the late 1980s, practitioners have applied standardized usability questionnaires to the measurement of user satisfaction, which is one of the three main components of usability (ISO, 1998). Satisfaction analysis is a strategic way to collect information – before or after the release of a product – about the user experience (UX), defined as the “person’s perceptions and responses resulting from the use and/or anticipated use of a product” (ISO, 2010, p. 3). 1.1. Amount of experience and perceived usability UX is a relatively new and broad concept which includes and goes beyond traditional usability (Petrie & Bevan, 2009). As recently outlined by Lallemand, Gronier and Koenig (2015), experts have different points of view on UX; however, there is a wide agreement in the HCI community that the UX concept has a temporal component – i.e., the amount of user interaction with a product affects people's overall experience. In addition, experts generally agree that the interactive experience of a user is affected by the perceived usability and aesthetics of interface (Borsci, Kuljis, Barnett, & Pecchia, In press; Hassenzahl, 2005; Hassenzahl & Tractinsky, 2006; Lee & Koubek, 2012; Tractinsky, 1997), and the extent to which user needs are met (Hassenzahl, this issue). Accordingly, to fully model the perceived experience of a user (Borsci, Kurosu, Federici, & Mele, 2013; Lindgaard & Dudek, 2003; McLellan, Muddimer, & Peres, 2012), practitioners should include a set of repeated objective and subjective measures in their evaluation protocols to enable satisfaction analysis as a “subjective sum of the interactive experience” (Lindgaard & Dudek, 2003, p. 430). Several studies have found that the magnitude of user satisfaction is associated with a user's amount of experience with the product or system under evaluation (Lindgaard & Dudek, 2003; McLellan, et al., 2012). For instance, Sauro (2011) that System Usability Scale (SUS; Brooke, 1996) scores differed as a function of different levels of product experience. In other words, people with long-term experience in the use of a product tended to rate their satisfaction with higher (better) scores than users with shorter terms of experience. In summary, researchers and practitioners assess user satisfaction by means of questionnaires, but there are only a few empirical studies that have systematically analyzed the variation of the outcomes of satisfaction scales when filled out by users with different amounts of experience in the use of a product (Kortum & Johnson, 2013; Lindgaard & Dudek, 2003; McLellan, et al., 2012; Sauro, 2011). 1.2. The System Usability Scale (SUS) Several standardized tools are available in the literature to measure satisfaction (for a review, Borsci, et al., 2013). An increasing trend favors the use of short scales due to their speed and ease of administration, either as online surveys for customers or after a usability test. One of the most popular is the SUS (Lewis, 2006; Sauro & Lewis, 2011; Zviran, Glezer, & Avni, 2006), which has been cited in more than 600 publications (Sauro, 2011) and is considered an industry standard. Its popularity among HCI experts is due to several factors, such as its desirable psychometric properties (high reliability and demonstrated validity), relatively short length (10 items), and low cost (free) (Bangor, Kortum, & Miller, 2008; McLellan, et al., 2012). The ten items of the SUS were designed to form a unidimensional measure of perceived usability (Brooke, 1996). The standard version of the questionnaire has a mix of positive and negative tone items, with the odd-numbered items having a positive tone and the even-numbered items having a negative tone. Respondents rate the magnitude of their agreement with each item using a five-point scale (1: Strongly disagree; 5: Strongly agree). To compute the overall SUS score, (1) each item is converted to a 0-4 scale for which higher numbers indicated a greater amount of perceived usability, (2) the converted scores are summed, and (3) the sum is multiplied by 2.5. This process produces scores that can range from 0 to 100. As Lewis (2014) recently stated, there are still lessons that have to be learned about SUS, in particular about its dimensionality. Despite the SUS having been designed to be unidimensional, several researchers recently showed that the items of SUS might load in two dimensions: usability and learnability (Bangor, et al., 2008; Borsci, Federici, & Lauriola, 2009; Lewis & Sauro, 2009; Lewis, Utesch, & Maher, 2013; Sauro & Lewis, 2012). Since 2009, however, there have been reports of large-sample SUS datasets for which two-factor structures did not have the expected item-factor alignment (Items 4 and 10 with Learnable, all others with Usable), indicating a need for further research to clarify its dimensional structure and the variables that might affect it (Lewis, 2014). In recent years, the growing availability of SUS data from a large number of studies (Bangor, et al., 2008; Kortum & Bangor, 2012) has led to the production of norms for the interpretation of mean SUS scores, e.g., the Curved Grading Scale (CGS) (Sauro & Lewis, 2012). Using data from 446 studies and over 5,000 individual SUS responses, Sauro & Lewis (2012) found the overall mean score of the SUS to be 68 with a standard deviation of 12.5.The Sauro and Lewis CGS assigned grades as a function of SUS scores ranging from ‘F’ (absolutely unsatisfactory) to ‘A+’ (absolutely satisfactory), as follows: Grade F (0–51.7) Grade D (51.8–62.6) Grade C- (62.7–64.9) Grade C (65.0–71.0) Grade C+ (71.1–72.5) Grade B- (72.6–74.0) Grade B (74.1–77.1) Grade B+ (77.2–78.8) Grade A- (78.9–80.7) Grade A (80.8-84.0) Grade A+ (84.1–100) Although they should be interpreted with caution, the grades from the CGS provide an initial basis for determining if a mean SUS score is below average, average, or above average (Sauro & Lewis, 2012). 1.3. Ultra-short scales: The UMUX and UMUX-LITE Although the SUS is a quick scale, practitioners sometimes need to use reliable scales that are even shorter than the SUS to minimize time, cost, and user effort. “This need is most pressing when standardized usability measurement is one part of a larger post-study or online questionnaire” (Lewis, 2014, p. 676). As a consequence, quite recently, two new scales have been proposed as shorter proxies of the SUS: the Usability Metric for User Experience (UMUX, see Table 1), a four-item tool developed and validated by Finstad (2010, 2013); and the UMUX-LITE composed of only the two positive-tone questions from the UMUX (Lewis, et al., 2013). The scale for the UMUX items has seven points (1: Strongly disagree; 7: Strongly agree). ______________________________________________________________________ Insert Table 1 about here. Items of the UMUX and UMUX-LITE ______________________________________________________________________ Some findings show the UMUX to be bidimensional as a function of the item tone, positive vs. negative, (Lewis, 2013; Lewis, et al., 2013), despite of the intention to develop a unidimensional scale. The UMUX’s statistical structure might be an artifact of the mixed positive/negative tone of the items, and in practice might not matter much. In light of this, both the UMUX and its reduced version, the UMUX-LITE, are usually interpreted as unidimensional measures. By design (using a method similar to but not exactly the same as the SUS), the overall UMUX and UMUX-LITE scores can range from 0 to 100. Their scoring procedures are: UMUX: The odd items are scored as [score − 1] and even items as [7 − score]. The sum of the item scores is then divided by 24 and multiplied by 100 (Finstad, 2010). UMUX-LITE: The two items are scored as [score − 1], and the sum of these is divided by 12 and multiplied by 100 (Lewis, et al., 2013). For correspondence with SUS scores, this sum is entered into a regression equation to produce the final UMUX-LITE score. The equation below combines the initial computation plus the regression to show how to compute the recommended UMUX-LITE score from the ratings of its two items. UMUX-LITE = .65(([Item 1 score]+[Item 2 score] - 2)100/12) + 22.9. (1) Prior research (Finstad, 2010, 2013; Lewis, et al., 2013) has shown that the SUS, UMUX and UMUX-LITE are reliable (Cronbach’s α between .80 and .95) and correlate significantly (p < .001). In the research reported to date, UMUX scores have not only correlated with the SUS, they have also had a similar magnitude. However, for the UMUX-LITE, it is necessary to use the preceding formula (1) to adjust its scores to achieve correspondence with the SUS (Lewis, et al., 2013). For the rest of this paper, reported UMUX-LITE values will be those computed using formula (1). Thus, the literature on the UMUX and UMUX-LITE (Finstad, 2010; Lewis, et al., 2013), suggests that these two new short scales can be used as surrogates for, the SUS. Currently, three studies (McLellan, et al., 2012; Sauro, 2011) have investigated the relationship between SUS scores and amount of product experience. The results of these studies have consistently indicated that more experienced users had higher satisfaction outcomes (SUS scores). Notably, researchers have not yet studied this effect on the outcomes of quick scales such as the UMUX and UMUX-LITE. The comparative analyses of these instruments were performed mainly to validate the questionnaires, without considering the effect of the different levels of experience in the use of a website (Finstad, 2010; Lewis, et al., 2013). 1.4. Research goals The use of short scales as part of UX evaluation protocols could sensibly reduce the costs of assessment, as well as users’ time and effort to complete the questionnaires. Currently, few studies have investigated the relationship among the SUS, UMUX and UMUX-LITE, and none have analyzed their reliabilities as a function of different amounts of interaction with a product. The primary goal of this paper was to analyze the variation of SUS, UMUX and UMUX-LITE outcomes when completed concurrently by users with different levels of experience in the use of a website. To reach this goal, we pursued three main objectives. First, we aimed to explore the variation of UMUX and UMUX-LITE outcomes when administered to users with two different levels of product experience. Second, we aimed to observe whether, at different levels of product experience, the correlations among the SUS, UMUX and UMUX-LITE were stable, with particular interest in the generalizability of formula (1). Finally, we checked whether the levels of respondents' product experience affected the dimensional structure of the SUS. It may be that the Learnable scale might not emerge until respondents have sufficient experience with the product they are rating. To achieve these aims, we performed two studies with the three standardized usability metrics to measure the self-report satisfaction of end-users with different levels of experience with an e-learning web platform known as CLab (www.cognitivelab.it). 2. Methodology Students enrolled in the bachelor’s degree of psychology program at the University of Perugia are strongly encouraged to use the CLab target platform as an e-learning tool. Commonly, students access it at least once a week for several reasons: for instance, to look for information about courses and exam timetables; to sign in for mandatory attendance classes; to download course materials; to book a test/exam; or to post a question and discuss issues with the professor. For each study, a sample of volunteer students was asked to assess the interface after different times of usage (based on their date of subscription to the platform) by filling out the SUS, UMUX and UMUX-LITE questionnaires, presented in a random order. The Italian version of the SUS used in Borsci et al. (2009) was administered. Additionally, translations and retranslations were made by an independent group of linguistic experts to produce Italian versions of the UMUX and UMUXLITE. Participants of the two studies were invited to fill out the scales two months (Study 1) or six months (Study 2) after they first accessed CLab. In these studies, participants received the same instruction before the presentation of the questionnaires: ‘Please, rate your satisfaction in the use of the platform on the basis of your current experience of use’ [Per favore, in base alla tua attuale esperienza d’uso , valuta la tua soddisfazione nell’utilizzo della piattaforma]. The two studies were organized to measure different times of participants’ exposure to CLab, thus measuring different moments of UX acquisition, as follows: Study 1, carried out two months after the students first accessed CLab. The participants’ number of access times and interaction with the platform (time exposure) ranged from 8 (once a week) to 56 (once a day). Study 2, carried out six months after the students first accessed CLab. The participants’ number of access times and interaction with the platform (time exposure) ranged from 24 (once a week) to 168 (once a day). The two studies were reviewed and approved by the Institutional Review Board of the Department of Philosophy, Social & Human Sciences and Education, University of Perugia. All participants provided their written informed consent to participate in this study. No minors/children were enrolled in this study. The study presented no potential risks. 2.1. Hypotheses Concerning the effect of usage levels on the SUS’ dimensional structure, we expected that: The SUS dimensionality would be affected by the level of experience acquired by the participants with the product before the administration of the scale (Hypothesis 1). The second hypothesis concerns the correlations among the tools (SUS, UMUX, UMUXLITE). Recently (Finstad, 2010; Lewis, et al., 2013), researchers have reported strong correlations among the SUS, UMUX and UMUX-LITE. There is as yet, however, no data on the extent to which the correlations might be affected by the users’ levels of acquired product experience. Thus, we expected: Significant correlations among the overall scores of the SUS, UMUX and UMUX-LITE, for all studies independent of the administration conditions, i.e., different amounts of interaction time with the target website (Hypothesis 2). Finally, the third hypothesis concerns the relationship between scale outcomes and users’ levels of product experience. As noted above, user satisfaction may vary depending on both the level of product experience and the time of exposure to a product (Lindgaard & Dudek, 2003; McLellan, et al., 2012; Sauro, 2011). In particular, experts tend to provide higher satisfaction scores compared to novices (Sauro, (2011). In the light of this, we expected that: User satisfaction measured through SUS, UMUX and UMUX-LITE would be affected by the different conditions of time exposure to the target website (two or six months) as well as by different level of website frequency of use. Therefore, we expected students in Study 2 (cumulative UX condition) to rate the CLab with all the three scales as more satisfactory compared to users in the first study due to their greater product exposure (six months). Concurrently, we expected that participants with greater levels of product frequency of use will rate the CLab with all the scales as more satisfactory than participants with lower levels of use (Hypothesis 3). 2.2. Data analysis For each study, principal components analyses were performed to assess the SUS’ dimensionality -- focusing on whether the item alignment of the resulting two-component structure was consistent with the emergence of Learnable and Usable components. Only if this expected pattern did not emerge did we plan to follow up with a multidimensional latent class item response theory model (LC IRT) to more deeply test the dimensional structure of the scale (Bacci, Bartolucci, & Gnaldi, 2014; Bartolucci, 2007; Bartolucci, Bacci, & Gnaldi, 2014). The primary purpose of this additional analysis would be to confirm if a unidimensional structure was a better fit to the data than the expected bidimensional structure – an assessment that is not possible with standard principal components analysis because it is impossible to rotate a unidimensional solution (Cliff, 1987). Descriptive statistics (mean [M], standard deviation [SD]) and Pearson correlation analyses among the scales were performed to both compare the outcomes of the three scales and observe their relationships. Moreover, a one-way ANOVA was carried out to assess the effect of experience on user satisfaction as measured by the SUS, UMUX and UMUX-LITE. Finally, an independent sample t-test analysis with 1,000 bootstrap resampling iterations was performed to test the differences among the satisfaction levels of participants gathered through each scale and a final comprehensive ANOVA was conducted to enable comparison of results between the two studies. The MultiLCIRT package of R software by Bartolucci et al.(2014) was used to estimate the multidimensional LC IRT models. All other analyses were performed using IBM® SPSS 22. 3. The studies 3.1 Study 1 Participants One hundred and eighty-six first-year students of psychology (Male: 31 (17%), Age: 21.97, SD: 5.63) voluntarily participated in the study two months after their subscription to and first use of the platform. Procedure All participants used an online form to fill out the questionnaires (SUS, UMUX and UMUXLITE) and also indicated their weekly use of the platform (from 1 – once per week to 5 – once a day). Participants were asked to rate their satisfaction in the use of CLab on the basis of their current experience of use. Results of Study 1 SUS dimensionality As shown in Table 2, principal components analysis with Varimax rotation suggested a unidimensional solution was appropriate for this set of SUS data. The table shows the item loadings for one- and two-component solutions. ______________________________________________________________________ Insert Table 2 about here. Principal components analysis of the SUS in Study 1 showing one- and two-component solutions (bidimensional loadings > .4 in bold) ______________________________________________________________________ The expected two-component solution would have shown Items 4 and 10 aligning with one component and the other eight items aligning on the other. Instead, the apparent pattern of alignment was positive tone (odd-numbered) items versus negative tone (even-numbered) items, similar to the pattern observed in previous research for the UMUX. Another indicator of the inappropriateness of this two-component solution were the items that had relatively large loadings on both components (Items 2, 3, 5 and 7). To verify the appropriateness of the one-component solution, we conducted additional analyses with a special class of statistical models known as LC IRT models (Bacci, et al., 2014; Bartolucci, 2007; Bartolucci, et al., 2014). This class of models extended traditional IRT models (Nering & Ostini, 2010; van der Linden & Hambleton, 1997) in two main directions. First, they allow the analysis of item responses in cases of questionnaires which measure more than one factor (also called, in the context of IRT, latent trait or latent variable or ability). Second, multidimensional LC IRT models assume that the population is composed of homogeneous groups of individuals sharing unobserved but common characteristics (so-called latent classes; Goodman, 1974; Lazarsfeld & Henry, 1968). IRT models are considered a powerful alternative to principal components analysis, especially when questionnaires consist of binary or (ordered) polytomously-scored items rather than quantitative items. Our analysis proceeded with two steps. The first was the selection of the number of latent classes (C). To compare unidimensional and bidimensional assumptions through the class of models at issue, we first needed to detect the optimal number of latent classes, i.e., the number of groups which ensures a satisfactory level of goodness of fit of the statistical model at issue to the observed data. For this aim, we selected the number of latent classes, relying on the Bayesian Information Criterion (BIC) index (Schwarz, 1978). More specifically, we estimated unidimensional LC IRT models to increase values of C (C = 1; 2; 3; 4), keeping constant all the other elements characterizing the class of models. We took that value just before the first increase of BIC. We repeated the analysis under the assumption of bidimensionality. Table 3 shows that the minimum value of the BIC index is observed for C = 3, both in the unidimensional case and in the bidimensional case, suggesting that the sample of individuals comes from a population composed of three latent classes. As smaller values of BIC are better than higher values, a comparison between the unidimensional and the bidimensional models with C = 3 found on the BIC index gave evidence of a better goodness of fit for the model with unidimensional structure (BIC = 2924.805) than for that with bidimensional structure (BIC = 2941.496, bolded in Table 3). ______________________________________________________________________ Insert Table 3 about here. Unidimensional and bidimensional LC IRT models for SUS data: number of latent classes (C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC index ______________________________________________________________________ The unidimensionality assumption was also verified through a likelihood-ratio (LR) test as follows. Given the number of latent classes selected in the previous step, a LR test was used to compare models which differ in terms of the dimensional structure, that is, bidimensional versus unidimensional structure. This type of statistical test allowed us to evaluate the similarity between a general model and a restricted model, i.e., a model which is obtained by the general one by imposing one constraint, so that the restricted model is nested in the general model. More precisely, an LR test evaluates, at a given significance level, the null hypothesis of equivalence between the two nested models at issue: if the null hypothesis is not rejected, the restricted mode is preferred, in the interests of parsimony; if the null hypothesis is rejected, the general model is preferred. In our framework, the general model represents a bidimensional structure, where items 4 and 10 of the SUS questionnaire contribute to a different latent trait with respect to the remaining ones, whereas the restricted model is used when all items belong to the same dimension. The LR test is based on the difference between the maximum log-likelihood of the two models and it evaluates if this difference is statistically significantly different from zero. Higher values of log-likelihood difference denote that the hypothesis of unidimensionality is unlikely and it should be discarded in favor of bidimensionality. In particular, the LR test statistic is given by −2 times the difference between the maximum loglikelihood and, under certain regularity conditions, it is distributed as a Chi-square with the difference in the number of free parameters in the two compared models as degrees of freedom. The LR test statistic equalled 3.9918 with 4 degrees of freedom. The resulting p-value, based on the Chi-square distribution, is equal to 0.4071 and, therefore, the null hypothesis of SUS unidimensionality cannot be rejected for Study 1. To conclude, both the BIC index and the LR test provided evidence in favor of the unidimensionality assumption. Correlations among the tools The overall SUS, UMUX and UMUX-LITE scale results significantly correlated (p < .001, Table 4). There was considerable overlap in the 95% confidence intervals for the correlations between the SUS and the other scales, so it appeared that the UMUX and UMUX-LITE had similar magnitudes of association with the SUS. ______________________________________________________________________ Insert Table 4 about here. Correlations among SUS, UMUX and UMUX-LITE in Study 1 (with 95% confidence intervals) ______________________________________________________________________ Table 5 shows that, on average, participants rated the platform as satisfactory (scores higher than 70 out of 100) for all three questionnaires – i.e., with SUS scores greater than or equal to ‘C’ on the Sauro-Lewis CGS (Sauro & Lewis, 2012). As demonstrated by comparison of the confidence intervals, there were significant differences in the magnitudes of the three metrics, with the SUS and UMUX-LITE having a closer correspondence than the SUS and UMUX. ______________________________________________________________________ Insert Table 5 about here. Means and standard deviations of overall scores of the SUS, UMUX and UMUX-LITE for Study 1 with 95% confidence intervals (and associated CGS grades) ______________________________________________________________________ User satisfaction and frequency of use 61.9% of participants declared that they used the platform for more than three days per week to every day, while 38.1% stated a lower rate of usage. A one-way ANOVA showed a significant difference among the user satisfaction levels between participants with a low (1–2 days per week), medium (3–4 days per week) and high (≥ 5 days per week) self-reported level of use of CLab for all the questionnaires: SUS (F = (2, 184) = 4.39, p = .014), UMUX (F = (2, 184) = 8.71, p = .001) and UMUX-LITE (F = (2, 184) = 6.76, p = .002). In particular, least significant difference LSD post-hoc analyses showed that students who were used to interacting with the platform from more than three days per week to every day (those who used the website more frequently), tended to judge the product as more satisfactory than people with less exposure (Figure 1). ______________________________________________________________________ Insert Figure 1 about here. Interaction between scale and frequency of use for Study 1 ______________________________________________________________________ 3.2 Study 2 Participants and procedure Ninety-three students of psychology (Male: 17 (18%), Age: 22.03, SD: 1.44) voluntarily participated in the study six months after their subscription to and first use of CLab. Participants followed the same procedure as in Study 1. Results of Study 2 SUS dimensionality To check the dimensionality of the SUS after six months of usage, we performed a principal components analysis with Varimax rotation. Table 6 shows the SUS, under the test conditions of Study 2, was composed of two dimensions in line with the previous studiesthat reported alignment of Items 4 and 10 separate from the other items (Borsci, et al., 2009; Lewis & Sauro, 2009; Lewis, et al., 2013). ______________________________________________________________________ Insert Table 6 about here. Principal components analysis of the SUS in Study 2 (loadings > .4 in bold) ______________________________________________________________________ To further confirm the bidimensional structure of SUS in study 2 we performed a LC IRT analysis (Table 7). We estimated the increase values of C (C = 1; 2; 3; 4; 5) for both the unidimensional and bidimensional models, keeping constant all the other elements characterizing the class of models. We took that value just before the first increase of BIC. Table 7 shows that the minimum value of the BIC index for the unidimensional model is C = 4, and C=5 for the bidimensional case. In study 2, the smaller value of BIC is outlined in the bidimensional model C=5. The BIC index gave evidence of a better goodness of fit for the model with bidimensional structure (BIC = 1912.636) than for that with unidimensional structure (BIC = 2059.4, bolded in Table 7). Finally, LR test the LR test statistic equalled 164.206 with 3 degrees of freedom for the bidimensional model, and 148.5492 with 2 degrees of freedom for the unidimensional model. For both the models (bidimensional and unidimensional) the resulting p-value, based on the Chi-square distribution, is equal to 0.001. Therefore, the null hypothesis of SUS unidimensionality can be rejected for Study 2. To conclude, both the BIC index and the LR test provided evidence in favor of the bidimensionality assumption. ______________________________________________________________________ Insert Table 7 about here. Unidimensional and bidimensional LC IRT models for SUS data: number of latent classes (C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC index ______________________________________________________________________ Correlations among the tools As in Study 1, all three scales were strongly correlated (p < .001, see Table 8). Comparison of the 95% confidence intervals indicated that the magnitudes of association among the three scales were similar between the studies (p > .05). Table 9 shows that participants judged the platform as satisfactory – i.e., with mean scores higher than 75 out of 100 – for all three questionnaires. For this set of data, there was substantial overlap in the confidence intervals for the SUS and UMUX-LITE, with identical CGS grades for the SUS/UMUX-LITE means and interval limits. Consistent with the findings from Study 1, the magnitude of difference between the mean SUS and UMUX scores was significant. ______________________________________________________________________ Insert Table 8 about here. Correlations among SUS, UMUX and UMUX-LITE in Study 2 (with 95% confidence intervals) ______________________________________________________________________ ______________________________________________________________________ Insert Table 9 about here. Means and standard deviations of overall scores of the SUS, UMUX and UMUX-LITE for Study 2 with 95% confidence intervals (and associated CGS grades) ______________________________________________________________________ User satisfaction and frequency of use 51% of participants declared that they used the platform for more than three days per week to every day, while 49% declared a lower rate of usage. One-way ANOVA confirmed that for all three scales there was a significant difference (SUS (F = (2, 91) = 18.37, p = .001, UMUX (F = (2, 91) = 12.11, p = .001; and UMUX-LITE (F = (2, 91) = 11.57, p = .001) among the satisfaction rates of participants (at least for students with low and high levels of exposure to the platform). Again, our results indicated a significant relationship between frequency of use and satisfaction (Figure 2). ______________________________________________________________________ Insert Figure 2 about here. Interaction between scale and frequency of use for Study 2 ______________________________________________________________________ 4. General results Although the questionnaire results strongly correlated independent of the administration conditions, in line with Hypothesis 3, we performed ANOVAs to analyze the differences among the ratings as a function of duration of exposure (two or six months) and independently as a function of different frequencies of use indicated significant main effects and interactions for all three scales (see Table 10). Figure 3 provide a graphic depiction of the significant interactions. ______________________________________________________________________ Insert Figure 3 about here. Combined interaction between scale and frequency of use for Study 1 and 2 ______________________________________________________________________ LSD post-hoc analyses revealed significant differences (p < .001) for all comparisons of the different frequencies of use (low, medium, or high). ______________________________________________________________________ Insert Table 10 about here. Main effects and interactions for combined ANOVA ______________________________________________________________________ 5. Discussion Table 11 summarizes the testing outcomes for the hypotheses. ______________________________________________________________________ Insert Table 11 about here. Summary of study outcomes ______________________________________________________________________ The outcomes of the studies show that the learnability dimension of the SUS, as suggested, might emerge only under certain conditions – that is, when it is administered to users after a long enough period of exposure to the interface. This variability of the SUS dimensionality may be due to its original development as a unidimensional scale of perceived usability (Brooke, 1996). Items 4 and 10 of the SUS comprise the learnability dimension of this scale (Tables 1 and 5). The item 4 pertains to the need for support in use of the system (‘I think I would need the support of a technical person to be able to use this system’), and the item 10 pertains to the perceived complexity of learning the system (‘I needed to learn a lot of things before I could get going with this system’). These two items are strongly related to the ability of users to quickly understand how to use the product without help. In tune with that, the learnability dimension is probably sensible to the level of confidence acquired by users in using the product functions and in anticipating system reactions. Our results are consistent with the of learnability as “the ease with which new users can begin effective interaction and achieve maximal performance” (2003, p. 260). In fact, the second dimension of the SUS might emerge only when users perceived themselves as effective in the use of the product, making this an interesting topic for future research. All the satisfaction scale results strongly correlated under each condition of scale administration, demonstrating convergent validity of the construct they purport to measure. The Italian versions of the UMUX and UMUX-LITE, as with the version of the SUS previously validated by other studies (Borsci, et al., 2009), confirmed the reliability coefficient rates of the English version, with a Cronbach’s α between .80 and .90 for the UMUX, and between .71 and .85 for UMUX-LITE. Moreover, independent of the condition of scale administration, formula (1) functioned properly by bringing UMUX-LITE scores into reasonable correspondence with concurrently collected SUS scores. Unlike previous studies (Finstad, 2010; Lewis, et al., 2013), where the magnitudes of UMUX averages were quite close to corresponding SUS averages, in the present study the average UMUX scores were significantly higher compared to the SUS and UMUX-LITE means (Tables 4 and 7). For instance, for our cohort of participants with two months of product experience, Table 5 shows that while the interval limits around the UMUX had CGS grade ranges from ‘A’ to ‘A+’, the SUS and UMUX-LITE were more aligned with grade ranges from ‘C’ to ‘B’. This difference with previous outcomes could be a peculiarity of this research. However, the differences among the results of UMUX and the other two scales were large enough to lead practitioners to make different decisions about CLab. For instance, by relying only on UMUX outcomes, a practitioner may report to designers that CLab is a very satisfactory interface and no further usability analyses are needed. Alternatively, on the basis of the SUS or UMUX-LITE outcomes, practitioners would likely report to designers that CLab is reasonably satisfactory, but further usability analysis and redesign could improve the overall interaction experience of end-users (the UX). Finally, as Tables 9 and 10 show, the outcomes of all three questionnaires were affected by the levels of experience of the respondents, both by the duration of use across months and the weekly frequency of use. In tune with previous studies (McLellan, et al., 2012; Sauro, 2011), users with a greater amount of product experience were more satisfied than users with less experience. This was also confirmed by the ANOVAs performed for Studies 1 and 2: greater product experience was associated with a higher the level of user satisfaction. Limitations of the study Even though the outcomes of this study were generally in line with previous research, the representativeness of these results is limited due to the characteristics of cohorts involved in the study. Our results concern satisfaction in the use of an e-learning web interface rated by students with similar characteristics (age, education, country, etc.). Therefore, we cannot assume that the outcomes will generalize to other kinds of interfaces. To exhaustively explore the relationship between the amount of experience gained through time and satisfaction gathered by means of short scales, future studies should include users with various individual differences and divergent characteristics (e.g., experience with the product, age, education, individual functioning and disability) in the use of different types of websites. Our results, obtained through the variation of the amount of exposure of users to the interface, showed that people with more exposure to a product were likely to rate the interface as more satisfactory. However, because this is correlational evidence, it is also possible that users who experience higher satisfaction during early use of a product might choose to use it more frequently, thus gaining high levels of product experience. Future longitudinal studies should investigate this relationship, measuring, through SUS, UMUX and UMUX-LITE, whether users who perceive a product as satisfactory tend to spend more time using it. It would also be of value to conduct a designed experiment with random assignment of participants to experience conditions to overcome the ambiguity of correlations. 6. Conclusions Prior product experience was associated with the user satisfaction measured by the SUS and its proposed alternate questionnaires, the UMUX and UMUX-LITE. Therefore, consistent with previous research (Kortum & Johnson, 2013; McLellan, et al., 2012; Sauro, 2011), to obtain an exhaustive picture of user satisfaction, researchers and practitioners should take into consideration each user’s amount of exposure to the product under evaluation (duration and frequency of use). All of the scales we analyzed were strongly correlated and can be used as quick tools to assess user satisfaction. Therefore, practitioners who plan to use one or all of these scales should carefully consider their administration for the proper management of satisfaction outcomes. Based on our results, we offer our points of advice: When administered to users after a short period of product use, it is safest to consider the SUS to be a unidimensional scale, so we recommend against partitioning it into Usable and Learnable components in that context. Moreover, practitioners should anticipate the satisfaction scores of newer users will be significantly lower than the scores of more experienced people. When the SUS is administered to more experienced users, the scale appears to have bidimensional properties, making it suitable to compute both an overall SUS score and its Learnable and Usable components. The overall level of satisfaction will be higher than that among less experienced users. Due to their high correlation with the SUS, in particular, the UMUX and UMUX-LITE overall scores showed similar behaviors. If using one of the ultra-short questionnaires as a proxy for the SUS, the UMUX-LITE (with its adjustment formula) appears to provide results that are closer in magnitude to the SUS than the UMUX, making it the more desirable proxy. The UMUX and UMUX-LITE are both reliable and valid proxies of the SUS. Nevertheless, Lewis et al. (2013) suggested using them in addition to the SUS rather than instead of the SUS for critical usability work, due to their recent development and still limited employment. In particular, on the basis of our results, we recommend that researchers avoid using only the UMUX for their analysis of user satisfaction because, at least in the current study, this scale seemed too optimistic. In the formative phase of design or in agile development, the UMUX-LITE could be adopted as a preliminary and quick tool to test users’ reactions to a prototype. Then, in advanced design phases or in summative evaluation phases, we recommend using a combination of the SUS and UMUXLITE (or UMUX) to assess user satisfaction with usability (note that because the UMUX-LITE was derived from the UMUX, when you collect the UMUX you also collect the data needed to compute the UMUX-LITE). Over time this could lead to a database of concurrently collected SUS, UMUX, and UMUX-LITE scores that would allow more detailed investigation of their relationships and psychometric properties. Acknowledgment We would like to thank Dr. James R. Lewis, senior human factors engineer at IBM Software Group and guest editor of this special issue, for his generous feedback during the preparation of this paper. References Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A Class of Multidimensional Latent Class IRT Models for Ordinal Polytomous Item Responses. Communications in Statistics – Theory and Methods, 43(4), 787–800. doi:10.1080/03610926.2013.827718 Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An Empirical Evaluation of the System Usability Scale. International Journal of Human-Computer Interaction, 24(6), 574–594. doi:10.1080/10447310802205776 Bartolucci, F. (2007). A Class of Multidimensional IRT Models for Testing Unidimensionality and Clustering Items. Psychometrika, 72(2), 141–157. doi:10.1007/s11336-005-1376-9 Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package for multidimensional latent class item response models. Computational Statistics & Data Analysis, 71(0), 971– 985. doi:10.1016/j.csda.2013.05.018 Borsci, S., Federici, S., & Lauriola, M. (2009). On the Dimensionality of the System Usability Scale (SUS): A Test of Alternative Measurement Models. Cognitive Processing, 10(3), 193–197. doi:10.1007/s10339-009-0268-9 Borsci, S., Kuljis, J., Barnett, J., & Pecchia, L. (In press). Beyond the user preferences: Aligning the prototype design to the users’ expectations. Human Factors and Ergonomics in Manufacturing & Service Industries. Borsci, S., Kurosu, M., Federici, S., & Mele, M. L. (2013). Computer Systems Experiences of Users with and without Disabilities: An Evaluation Guide for Professionals. Boca Raton, FL: CRC Press. doi:10.1201/b15619-1 Brooke, J. (1996). SUS: A “quick and dirty” usability scale. In P. W. Jordan, B. Thomas, B. A. Weerdmeester & I. L. McClelland (Eds.), Usability evaluation in industry (pp. 189–194). London, UK: Taylor & Francis. Cliff, N. (1987). Analyzing multivariate data. San Diego, CA: Harcourt Brace Jovanovich. Dix, A., Finlay, J., Abowd, G. D., & Beale, R. (2003). Human-Computer Interaction. Harlow, UK: Pearson Education. Finstad, K. (2010). The Usability Metric for User Experience. Interacting with Computers, 22(5), 323–327. doi:10.1016/j.intcom.2010.04.004 Finstad, K. (2013). Response to commentaries on ‘The Usability Metric for User Experience’. Interacting with Computers, 25(4), 327–330. doi:10.1093/iwc/iwt005 Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231. doi:10.1093/biomet/61.2.215 Hassenzahl, M. (2005). The Thing and I: Understanding the Relationship Between User and Product. In M. Blythe, K. Overbeeke, A. Monk & P. Wright (Eds.), Funology: From Usability to Enjoyment (Vol. 3, pp. 31–42). Berlin, DE: Springer. doi:10.1007/1-4020-29675_4 Hassenzahl, M., & Tractinsky, N. (2006). User experience – a research agenda. Behaviour & Information Technology, 25(2), 91–97. doi:10.1080/01449290500330331 ISO 9241-11:1998 Ergonomic requirements for office work with visual display terminals – Part 11: Guidance on usability. ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210: Human-centred design for interactive systems. Kortum, P. T., & Bangor, A. (2012). Usability Ratings for Everyday Products Measured With the System Usability Scale. International Journal of Human-Computer Interaction, 29(2), 67– 76. doi:10.1080/10447318.2012.681221 Kortum, P. T., & Johnson, M. (2013). The Relationship Between Levels of User Experience with a Product and Perceived System Usability. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 57(1), 197–201. doi:10.1177/1541931213571044 Lallemand, C., Gronier, G., & Koenig, V. (2015). User experience: A concept without consensus? Exploring practitioners’ perspectives through an international survey. Computers in human behavior, 43, 35–48. doi:10.1016/j.chb.2014.10.048 Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton, Mifflin. Lee, S., & Koubek, R. J. (2012). Users’ perceptions of usability and aesthetics as criteria of pre- and post-use preferences. European Journal of Industrial Engineering, 6(1), 87–117. doi:10.1504/EJIE.2012.044812 Lewis, J. R. (2006). Usability testing. In G. Salvendy (Ed.), Handbook of Human Factors and Ergonomics (pp. 1275–1316). New York, NY: John Wiley & Sons. Lewis, J. R. (2013). Critical Review of ‘The Usability Metric for User Experience’. Interacting with Computers, 25(4), 320–324. doi:10.1093/iwc/iwt013 Lewis, J. R. (2014). Usability: Lessons Learned … and Yet to Be Learned. International Journal of Human-Computer Interaction, 30(9), 663–684. doi:10.1080/10447318.2014.930311 Lewis, J. R., & Sauro, J. (2009). The Factor Structure of the System Usability Scale. In M. Kurosu (Ed.), Human Centered Design (Vol. 5619, pp. 94–103). Berlin, DE: Springer. doi:10.1007/978-3-642-02806-9_12 Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013, Apr 27–May 2). UMUX-LITE: when there’s no time for the SUS. Paper presented at the Conference on Human Factors in Computing Systems: CHI ’13, Paris, FR. doi:10.1145/2470654.2481287 Lindgaard, G., & Dudek, C. (2003). What is this evasive beast we call user satisfaction? Interacting with Computers, 15(3), 429–452. doi:10.1016/S0953-5438(02)00063-2 McLellan, S., Muddimer, A., & Peres, S. C. (2012). The Effect of Experience on System Usability Scale Ratings. Journal of Usability Studies, 7(2), 56–67. Nering, M. L., & Ostini, R. (2010). Handbook of Polytomous Item Response Theory Models. New York, NY: Taylor and Francis. Petrie, H., & Bevan, N. (2009). The Evaluation of Accessibility, Usability, and User Experience. In C. Stephanidis (Ed.), The Universal Access Handbook (pp. 299–314). Boca Raton, FL: CRC Press. Sauro, J. (2011). Does Prior Experience Affect Perceptions Of Usability? Retrieved May 15, 2014, from http://www.measuringusability.com/blog/prior-exposure.php Sauro, J., & Lewis, J. R. (2011). When Designing Usability Questionnaires, Does It Hurt To Be Positive? Paper presented at the Conference on Human Factors in Computing Systems: CHI ’11, Vancouver, BC, CA. doi:10.1145/1978942.1979266 Sauro, J., & Lewis, J. R. (2012). Quantifying the User Experience: Practical statistics for user research. Burlington, MA: Morgan Kaufmann. Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461–464. doi:10.2307/2958889 Tractinsky, N. (1997, Mar 22–27). Aesthetics and Apparent Usability: Empirically Assessing Cultural and Methodological Issues. Paper presented at the Conference on Human Factors in Computing Systems: CHI ’97, Atlanta, GA. doi:10.1145/258549.258626 van der Linden, W., & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. New York, NY: Springer. Zviran, M., Glezer, C., & Avni, I. (2006). User satisfaction from commercial web sites: the effect of design and use. Information & Management, 43(2), 157–178. doi:10.1016/j.im.2005.04.002 Table 1. Items of the UMUX and UMUX-LITE Item # -- Scale Item content Item 1 – UMUX [This system’s] capabilities meet my requirements. Item 1 – UMUX-LITE Item 2 – UMUX Using [this system] is a frustrating experience. Item 3 – UMUX [This system] is easy to use. Item 2– UMUX-LITE Item 4 – UMUX I have to spend too much time correcting things with [this system]. Table 2. Principal components analysis of the SUS in Study 1 showing one- and two-component solutions (bidimensional loadings > .4 in bold) Eigenvalues extraction Unidimensional Items One component 9. I felt very confident using this Bidimensional Component Component 1 2 0.198 0.803 0.455 0.505 0.113 0.762 0.509 0.418 0.822 -0.019 0.471 0.503 0.513 0.266 0.500 0.626 0.789 0.055 0.517 0.360 .936 website. 7. I would imagine that most people .932 would learn to use this website very quickly. 1. would like to use this website .867 frequently 2. I found this website unnecessarily .865 complex. 8. I found this website very .839 cumbersome/awkward to use. 3. I thought this website was easy to use. .814 6. I thought there was too much .815 inconsistency in this website. 5. I found the various functions in this .823 website were well integrated. 10. I needed to learn a lot of things before .753 I could get going with this website. 4. I think that I would need assistance to be able to use this website. .637 Table 3. Unidimensional and bidimensional LC IRT models for SUS data: number of latent classes (C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC index unidimensional model C ℓ 1 bidimensional model BIC C ℓ −1470.447 40 3147.714 1 −1470.447 40 3147.714 2 −1405.317 24 2934.725 2 −1405.057 27 2949.717 3 −1395.186 26 2924.805 3 −1393.190 30 2941.496 4 −1394.914 28 2934.602 4 −1386.052 33 2942.729 #par #par BIC Table 4. Correlations among SUS, UMUX and UMUX-LITE in Study 1 (with 95% confidence intervals) UMUX CI SUS UMUX-LITE CI r Lower Upper r Lower Upper .554** .430 .679 447** .313 .581 -- -- .838** .756 .920 UMUX 1 ** Correlation is significant at the 0.01 level (2-tailed). Table 5. Means and standard deviations of overall scores of the SUS, UMUX and UMUX-LITE for Study 1 with 95% confidence intervals (and associated CGS grades) Scale SUS UMUX UMUX-LITE Mean 70.88 (C) 84.66 (A+) 73.83 (B-) SD 6.703 12.838 9.994 95% CI Lower 69.9 (C) 82.8 (A) 72.4 (C+) 95% CI Upper 71.8 (C+) 86.5 (A+) 75.3 (B) Table 6. Principal components analysis of the SUS in Study 2 (loadings > .4 in bold) Items Usability Learnability 8. I found this website very cumbersome/awkward to use. .910 .121 1. would like to use this website frequently .869 .105 5. I found the various functions in this website were well .800 .122 integrated. 3. I thought this website was easy to use. .769 .226 7. I would imagine that most people would learn to use this .754 .258 website very quickly. 2. I found this website unnecessarily complex. .739 .273 6. I thought there was too much inconsistency in this website. .708 .183 9. I felt very confident using this website. .683 .106 10. I needed to learn a lot of things before I could get going with .133 .841 .206 .753 this website. 4. I think that I would need assistance to be able to use this website. Table 7. Unidimensional and bidimensional LC IRT models for SUS data: number of latent classes (C), estimated maximum log-likelihood (ℓ), number of parameters (#par), BIC index C 1 2 3 4 5 6 unidimensional model ℓ #par BIC C -1050.62 40 2282.965 1 -996.09 24 2101.22 2 -975.599 26 2069.323 3 -966.094 28 2059.4 4 -963.457 30 2063.212 5 -963.457 32 2072.299 6 bidimensional model ℓ #par BIC -1050.62 40 2282.965 -964.826 24 2038.692 -910.835 27 1944.338 -891.82 30 1919.938 -881.354 33 1912.636 -878.343 36 1920.244 Table 8. Correlations among SUS, UMUX and UMUX-LITE in Study 2 (with 95% confidence intervals) UMUX CI SUS UMUX-LITE CI r Lower Upper r Lower Upper .716** . 571 . 861 658** . 502 . 815 -- -- . 879** . 717 . 953 UMUX 1 ** Correlation is significant at the 0.01 level (2-tailed). Table 9. Means and standard deviations of overall scores of the SUS, UMUX and UMUX-LITE for Study 2 with 95% confidence intervals (and associated CGS grades) Scale SUS UMUX UMUX-LITE Mean 75.24 (B) 87.69 (A+) 76.45 (B) SD 13.037 10.291 9.943 95% CI Lower 72.6 (B) 85.6 (A+) 74.4 (B) 95% CI Upper 77.0 (B+) 89.8 (A+) 78.5 (B+) Table 10. Main effects and interactions for combined ANOVA Scale Effect Outcome SUS Main effect of duration F(1, 263) = 10.7, p = .001 Main effect of frequency of use F(2, 263) = 30.9, p < .0001 Duration x frequency interaction F(2, 263) = 15.8, p < .0001 Main effect of duration F(1, 263) = 17.4, p < .0001 Main effect of frequency of use F(2, 263) = 22.2, p < .0001 Duration x frequency interaction F(2, 263) = 3.4, p = .035 UMUX UMUX-LITE Main effect of duration F(1, 263) = 4.7, p = .03 Main effect of frequency of use F(2, 263) = 16.8, p < .0001 Duration x frequency interaction F(2, 263) = 4.7, p = .01 Table 11. Summary of study outcomes by hypothesis Hypothesis Result Meaning Hypothesis 1 Supported SUS dimensionality was affected by the different levels of product experience. When administered to users with less product experience, the SUS had a unidimensional structure, whereas it had a bidimensional structure for respondents with more product experience. Hypothesis 2 Supported All the three scales were strongly correlated, independent of the administration conditions. Hypothesis 3 Supported Participants with more product experience were more satisfied than those with less product experience regardless of whether that experience was gained over a duration of exposure or by frequency of use. List of Figures 1. Interaction between scale and frequency of use for Study 1 2. Interaction between scale and frequency of use for Study 2 3. Interaction between scale and frequency of use for Study1 and 2 Figure 1. Interaction between scale and frequency of use for Study 1 Figure 2. Interaction between scale and frequency of use for Study 2 Figure 3. Interaction between scale and frequency of use for Study1 and 2 BIOGRAPHIES Simone Borsci is Research Fellow in Human Factors at Imperial College of London NHIRDiagnostic Evidence Cooperative group. He has over 10 years of experience as a Human Factor and HCI expert working on people and technology interaction in both industry and academia. He has worked as UX lead of Italian Government working group of usability, and as research at University of Perugia, School of computing of Brunel University, and at the Human Factor Research Group of Nottingham University. Simone holds a PhD in Cognitive Psychology at the University of Rome “La Sapienza”, he has over 30 publications (journals, book, book chapters, and conference papers), many of which relate to people with and without disability interaction with complex systems, and in human error and risk reduction in the use of technologies. Stefano Federici is currently Associate Professor of General Psychology at the University of Perugia. He is the coordinator of research team of CognitiveLab at University of Perugia (www.cognitivelab.it). His research is focused on assistive technology assessment processes, disability, and cognitive and human interaction factors. Michela Gnaldi is currently assistant professor of Social Statistics at the Department of Political Sciences of the University of Perugia. Her main research interest concerns measurement in education. On this topic, she participated in several research projects of national interest, in Italy and in the UK, where she has been working as a statistician and researcher at the National Foundation for Educational Research (NFER). She actively collaborates with several research institutions and has been responsible of a number contracts signed between the INVALSI (Istituto Nazionale di Valutazione del Sistema dell’Istruzione) and the Università di Perugia. Among her publications, she is author, jointly with F. Bartolucci and S. Bacci, of a book concerning item response theory models. Silvia Bacci earned the PhD in Applied Statistics in 2007 and she is Assistant Professor of Statistics since 2009 at the University of Perugia. Her research interests concern latent variable models, with a special focus on models for categorical and longitudinal/multilevel data, latent class models, and item response theory models. The applications range from the education to the health fields. On these topics, she participated in several research projects of national interest and she earned research fellowships and research contracts. Now she partecipates to a FIRB project funded by the Italian government, concerning “Mixture and latent variable models for causal inference and analysis of socio-economic data”. Among her publications, she is author, jointly with F. Bartolucci and M. Gnaldi, of a book concerning item response theory models. Francesco Bartolucci is a Full Professor of Statistics in the Department of Economics at the University of Perugia, where he coordinated the PhD program in “Mathematical and Statistical Methods for the Economic and Social Sciences”. He was a member of the Group of Evaluation Experts (GEV) that managed the Evaluation of the Quality of Research (VQR), for the period 20042010. He is the Principal Investigator of the research project “Mixture and latent variable models for causal inference and analysis of socio-economic data” (FIRB 2012 - “Futuro in ricerca” - Italian Government). He has been the Principal Investigator of the research project “Advances in nonlinear panel models with socioeconomic applications” founded by the “Einaudi Institute for Economics and Finance” (Rome) for the year 2009-2011. He has introduced several methodological advances on these topics, which have been published in top international journals and presented at many international conferences, often in the form of invited talks. The methodological advances have been carried out together with the development of relevant applications concerning labor market, education, health, and criminal data, often with a perspective of causal analysis and evaluation of policies and treatments.
© Copyright 2026 Paperzz