Overview Extends Classical Test theory Generalizability Theory r xx = DeShon - 2007 t2 2 O = 2t 2 2 t e G-Theory seeks to decompose the undifferentiated error into its constituents to better inform decisions 1 2 Overview Developed by Cronbach and friends in 1963 Resulted in much jargon Seeks to identify the most serious sources of inconsistency in responses over measurement conditions Assumes that the error component can be partitioned into multiple sources Cronbach, L.J., Rajaratnam, N., and Gleser, G.C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137-163. Attempted to clarify reliability concepts and distinguish G-theory from CTT Jargon Universe of admissible observations Measurements are assumed to be a sample from a universe of potential measurement conditions (time, raters, items, etc...) Facets/conditions of measurement Blah, Blah, Blah... Key: How generalizable are measurements from some set of measurement conditions (e.g., items) to other measurement conditions? 3 4 Example Key Issues in G-theory 3 judges rate the creativity of essays written by college applicants G-Study Can the ratings provided by any one judge be exchanged for any other judge and thus provide a good estimate of the true (universe) score? D-study Judges are the facet of the measurement universe 5 Estimate variance components Fully crossed designs are best Fixed vs. Random conditions of measurement Absolute vs. Relative decisions Examination of various combinations of the measurement conditions to determine desired reliability - Spearman-Brown Prophesy formula 6 Single Facet G-study Simplest G-theory design Person 1 4 sources of variance Person 2 Rater 1 Single Facet G-study Rater 2 Object of measurement Person 3 Systematic variability due Person 4 to the focus of differentiation – persons Person 5 Condition of measurement Systematic variability due to the Rater facet. Simplest G-theory design Person 1 4 sources of variance Person 2 Interaction variance Random error and Person 5 unaccounted for systematic variability (i.e., unmeasured facets) Rater 1 Rater 2 Person 3 Person 4 The last two sources of variability cannot be separated. 7 8 Single Facet G-study Single Facet G-study The object of measurement— desirable source of variability p Person x rater interaction—the tendency for some raters to rank order the objects differently than other raters. pr r Overall rater differences—the tendency for some raters to give generally higher or lower ratings The magnitude of the three sources of variability can be estimated and compared to make decisions about adequacy of current measurement or the best way to redesign a measure. Measures are generalizable to the extent that variance due to the object of measurement is large relative to variance from the several sources of error. 9 10 Multiple Facet G-Studies 2 Facet G-study The single facet study is no different than an interrater reliability coefficient or an ICC from Shrout & Fleiss The real power of G-theory comes from the extension of the ICC to decompose the error into its multiple constituents Can use this information to substantially improve decision making 11 Raters and Occasions as conditions of measurement pr p r pro po ro Now there are seven sources of variability: People Raters Occasions People x Raters People x Occasions Raters x Occasions People x Raters x Occasions, error o 12 2 Facet G-study This design allows us to determine the generalizability of ratings across different raters and different occasions. Variance Component Estimation pr p r pro po ro o 13 SPSS Judge 1 AM PM Person 1 2 3 Person 2 1 2 Person 3 2 3 Person 4 3 4 Person 5 4 5 Person 6 4 6 Person 7 3 7 Person 8 4 7 Person 9 3 5 Person 10 4 4 Person 11 3 5 Person 12 3 4 Person 13 3 3 Person 14 1 2 Person 15 2 3 Mean 2.80 4.20 1.03 2.60 Var. Judge 2 AM PM 1 3 2 4 2 4 3 3 3 5 3 3 4 6 4 6 4 7 4 5 3 4 3 2 2 4 2 3 1 2 2.73 4.07 1.07 2.21 Judge 3 AM PM 3 5 4 6 5 4 4 6 5 7 5 4 6 7 5 6 3 7 4 4 5 5 3 5 1 2 2 4 3 3 3.87 5.00 1.84 2.29 14 SPSS 15 SPSS 16 SPSS Syntax VARCOMP Rating BY Person Rater Time /RANDOM = Person Rater Time /METHOD = MINQUE (1) /DESIGN = Person Rater Time Person*Rater Person*Time Rater*Time /INTERCEPT = INCLUDE . Variance Estimates Component Estimate Var(Person) .863 Var(Rater) .300 Var(Time) .821 Var(Person * Rater) .300 Var(Person * Time) .102 Var(Rater * Time) -.029 a Var(Error) .573 Dependent Variable: Rating Method: Minimum Norm Quadratic Unbiased Estimation (Weight = 1 for Random Effects and Residual) a. For the ANOVA and MINQUE methods, negative variance component estimates may occur. Some possible reasons for their occurrence are: (a) the specified model is not the correct model, or (b) the true value of the variance equals zero. 17 18 SPSS Results Source df SPSS Results Var. MS Comp. 6.659 .863 29.17 People 14 % Source df Var. Comp. 6.659 .863 29.17 MS % People 14 Judges 2 9.744 .300 10.14 Judges 2 9.744 .300 10.14 Occasions 1 37.378 .821 27.75 Occasions 1 37.378 .821 27.75 PxJ 28 1.173 .300 10.14 PxJ 28 1.173 .300 10.14 PxO 14 .878 .102 3.45 PxO 14 .878 .102 3.45 JxO 2 .144 -.029 0.00 JxO 2 .144 -.029 0.00 28 .573 .573 19.36 28 .573 .573 19.36 P x J x O, e P x J x O, e 19 20 Interpreting Results D-study Details Examine the Variance Components Big effect of time; smaller effect of raters So, focus effort on reducing inconsistency over time. More measurement over time Identify factors that might be responsible for the inconsistency and get rid of them (e.g. , food) 21 D-study Details Once the variance components are estimated, D studies can be conducted to explore the implications for using the measure in different designs and for different kinds of decisions. Estimating the magnitude of error (lack of generalizability) requires attention to four important distinctions: Decision 1: Absolute vs. Relative Error 23 Generalizability versus decision studies Random versus fixed effects Relative versus absolute decisions Number of measurement conditions 22 D-study Details Decision 2: Number of measurement conditions (e.g., raters, occasions, etc..) required to obtain desired level of dependability/reliability Use a variant of the spearman-brown formula to determine number of measurement conditions 26 D-study Details Compute a G-coefficient based on these decisions to estimate dependability Absolute Error 2p 2 = 2p 2 o 2 pr 2 po 2 ro Compute a G-coefficient based on these decisions to estimate dependability Relative Error 2p 2 = nr no nr no n r no nr no 2 r 2 pro 2p 2 2 po pro nr no n r no 2 pr 27 28 Relative Error Absolute Error Absolute Generalizability for a P x J x O Design Relative Generalizability for a P x J x O Design 0.9-1 0.8-0.9 0.7-0.8 0.6-0.7 0.5-0.6 0.4-0.5 0.3-0.4 0.2-0.3 0.1-0.2 0-0.1 0.9-1 0.8-0.9 0.7-0.8 0.6-0.7 0.5-0.6 0.4-0.5 0.3-0.4 0.2-0.3 0.1-0.2 0-0.1 1.00 0.90 0.80 0.70 0.60 0.50 Generalizability 0.40 1.00 0.90 0.80 0.70 0.60 0.50 0.20 0.20 0.10 5 13 25 23 21 19 17 15 Number of Judges 1 1 3 3 11 9 7 13 5 11 9 7 25 23 21 19 17 15 25 23 21 19 17 15 13 11 9 7 5 0.00 Number of Occasions Number of Judges 3 3 1 1 13 11 9 7 5 21 19 17 25 23 0.10 0.00 Number of Occasions Generalizability 0.40 0.30 0.30 15 D-study Details 29 30 Fixed Effects Fixed Effects Someone might estimate a factor as random that you think of as fixed (e.g., raters). If one of the facets is fixed, then it makes no sense to speak of generalizing from a sample of facet levels to the universe of admissible facet levels—all facet levels are already present. Two methods to handle this If occasion is fixed, then the averaging approach calculates the variance components as: Source Var. Comp. % People .914 51.16 Judges .286 15.99 P x J, e .587 32.85 An averaging approach Separate estimation of variance components with levels of the fixed facet 31 32 Fixed Effects Separate Analyses AM Source Var. Comp. PM % Var. Comp. % People .649 38.83 1.281 50.27 Judges .360 21.54 .183 7.18 P x J, e .662 39.61 1.084 42.54 33 Summary Summary G-theory is an extension of classical measurement theory. assumes that an observed score is a linear combination of true score and error. The major difference between the two approaches is how they treat error. CCT- error is considered to be a single entity that is random in its influence. G-theory, error is multi-faceted and can be systematic. 35 Similar to domain-sampling theory in assuming that measurement occasions (items, times, judges, etc.) are randomly drawn from a population. The conditions of measurement define the universe of admissible observations. The relevant conditions—called facets—are the potential sources of systematic measurement error. 36 Summary Summary Goal: estimate how well a given observation can generalize to the universe score—the average score that would be obtained over all observations in the universe of admissible observations. G-theory determines how exchangeable observations are and identifies the major obstacles to exchangeability. Large sources of error hinder exchangeability. Variance of observed scores is the sum of the variances for the universe score (true score) and all of the separate sources of error. The importance of different error sources is indexed by the relative size of variance components. Estimate the size of variance components is called a G-Study. 37 38 Summary Summary D-study error is suppressed by adding more levels of a facet (like adding more items to a questionnaire). Exploring the implications of adding levels to facets is called a D-Study and is akin to the use of the Spearman-Brown formula in classical measurement theory. Reliability coefficients in generalizability theory are alled generalizability coefficients) defined same as CTT - the ratio of universe score (true score) variance to obtained score variance. 39 40 Summary Summary The major complications in generalizability theory model specification are: (a) Is the design crossed or nested? (b) Are there any fixed facets? (c) Is the decision to be made from the measurement a relative or an absolute one? Generalizability theory is symmetrical—anything can be the object of measurement. A target reliability can be achieved in many different ways. This means that cost, feasibility, and design implications (threats to inference) need to be considered carefully. Model specification determines the nature of the variance components and the definition of error. 41 42
© Copyright 2026 Paperzz