Journal af Advanced Nursing, 1994,20,196-201
Likert or Rasch? Nothing is more applicable than
good theory
Arnold van Alphen MNSc RN PhD
Student, Department of Medical Informatics
Ruud Halfens PhD
Associate Professor, Department of Nursmg Science
Ane Hasman PhD
Professor, Department of Medical Informatics
and Tjaart Imbos PhD
Associate Professor, Department of Methodology and Statistics, Faculty of Health
Sciences, University of Umburg, Maastncht, The Netherlands
Accepted for publication 3 November 1993
VAN ALPHEN A , HALFENS R , HASMAN A & IMBOS T (1994)
Journal
of
Advanced
Nursing 20,196-201
Likert or Rasch? nothing is more applicable than good theory
In nursing researcb many concepts are measured by questionnaires
Respondents are asked to respond to a set of related statements or questions In
unidimensional scaling these statements or questions are indicants of tbe same
concept Scaling means to assign numbers to respondents, according to their
position on the continuum underlymg the concept It is very common to use tbe
summative Likert scaling procedure Tbe sumscore ofthe responses to the items
IS the estimator ofthe position ofthe patient on the continuum The rationale
behind this procedure is classical test theory The mam assumption m this
theory is that all items are parallel mstruments The Rasch model offers an
altemative scaling procedure With Rasch both respondents and items are
scaled on the same continuum Whereas m Likert scaling all items have the
same weight in the summatmg procedure, m the Rasch model items are
differentiated from each other by 'difficulty' The model holds that the
probability of a positive response to an item is dependent on tbe difference
between the difficulty of the item and the value of the person on the latent trait
The rationale behind this procedure is item response theory In this paper both
scaling procedures and their rationales are discussed
TMTonnTTPTinM. MFA«!ITBFMFVT
INTRODUCTION: MEASUREMENT
In nursing research it is common to measure concepts of
importance to nursing pracbce by means of quesbonnaires
Measurement can be thought of as the process of assigning
numbers or signs to charactensbcs or attnbutes of persons.
objects, groups or systems accordmg to rules Meas^^^^^^ ^^ ^ p^^^^^^ involving both theorebcal and empincal considerabons In a number of applicabons mterest
lies m an unobservable (and thus not directly measurable)
concept that is represented by an observable referent
In order to measure a new, unobservable concept often
a new quesbonnaire has to be constructed In the construc-
Corr«spondence DrA van Alphen, University of Umburg, Department of
Nursing Science, POBox 616, 6200 MD Maastncht The Netherlands
196
tion phase many decisions are made What IS the purpose
of the measurement' What kmd of SCale has to be chosen
Likert and Rasch scaling procedures
Do you want to scale j>ersons or items or both' How nuny
quesbons or items should be used' What is the content of
the Items' What data level is appropnate' Are you meeisurmg a umdimensional or a mulbdimensional concept'
In many instances researchers choose a Lakert scale m
order to measure a umdimensional concept In this paper
an overview is presented of the rabonale behmd the Likert
scale, pa3ang sp)ecial attention to classical test theory It
will be argued that Likert scaling does have limitabons
The monopoly of Likert scaling cannot be understood well
when taking altemabve approaches into account One
such altemabve approach to the measurement of umdimensional concepts is offered by the Rasch model
In contrast to psychology, for instance, the Rasch model
IS hardly menboned in the nursing literature Over the
years 1983-1992 only one reference (Ptaszynsky Howard
1985) could be traced in the Medline databases This
absence is probably due to unfamilianty with the Rasch
model This paper has the mtenbon to acquaint nursing
researchers with the Rasch model The model will be compared with the Likert scale The objective is to encourage
nursing researchers to use the Rasch model more often,
rather than to offer a fully theorebcal and mathemabcal
outlme of the model This contnbubon is also meant to
lower the threshold to the more specific literature
SCALING PROCEDURES
Measurement is the assignment of numbers to objects or
subjects according to specified rules Tbe development of
systemabc rules and meaningful units of measurement for
quanbfying empmcal observations is known as scabng
The terms 'tests' and 'scales' should not be confused A
quesbonnaire itself is only a test, not a scale It becomes
a scale once we have determmed the set of possible values
that can be assigned during tbe measurement process and
have stated an explicit assignment rule After having determined the content and the number of items to measure a
concept, we are actually engaged m testing a senes of
hypotheses about the 'scalability' ofthe data obtained from
measurements of the proposed construct First, a h)rpothesis has been explicitly formulated that the concept is a
property occumng in pabents m varymg amounts so that
It can be quanbfied usmg the proposed scaling rule on a
theorebcal unidimensional conbnuum Second is the
quesbon of what properbes (order, distance and ongm)
the scale values on this conbnuum possess
Tbe Likert scalmg is an example of a subject-centred
scalmg procedure Subject-centred scaling focuses on the
locabon of individuals on the underlying contmuum
(Torgerson 1958) It is the approach commonly used in
many test-development efforts The Rasch model, however, IS an example of the response-centred approach to
scaling This approach permits simultaneous scalmg of
both sbmuli and subjects Response-centred approaches
enable the researcher to test m a more explicit way hypotheses about the obtained data using the measurement instrument under development
LIKERT SCALING AND CLASSICAL TEST
THEORY
The first scaling theory, the one to which nursing research
pracbce conforms, is the Likert scale that builds on pnneiples denved from classical test theory Three issues will
discussed the scaling procedure, the model assumpbons
and the testing of the model
Scaling procedure
Likert scaling is an approach to unidimensional scaling
(Mclver & Carmines 1981) All statements persons have to
respond to are opwrationalizabons or empincal referents
of one underl)ang unobservable concept The scalmg
model involves a single type of sbmulus and a single type
of response Typically in Likert scaling every item is
scored on an ordinal level containing five response categones, rangmg fi'om totally disagree, disagree, neither disagree nor agree, agree, to totally agree
Likert scaling is based on a summabve model because
the responses on the different items are summated In order
to do so numeneal values are assigned to the different
response categones Frequently for a five-pomt scale these
values are 0—4 or 1-5 The sumscore of the item scores
now IS assumed to be the reliable indicator of the respondent's score on the concept The set of possible values of
sumscores constitutes the final scale for measurmg the
one-dimensional concept
Model assumptions
The rabonale behind the summation procedure m Likert
scaling IS based on the assumpbon of umdimensionality
In Likert scalmg unidimensionabty is defined m terms of
equal and high correlabons among all items All items are
assumed to be replications of each other or in other words
items are considered to be parallel mstruments This is
implied by the nobon of umdimensionality By combining
item scores as mdicants of one and the same dimension,
random error that occurs with respect to mdividual items
IS partly averaged away As a consequence of the assumpbon of umdimensionality all systemabc vanabon m the
responses to the items is attnbuted to differences m levels
of the latent trait among respondents
Another way of stating this consequence of the assumpbon of umdimensionality is that cJl items have almost
equal tracelmes An item trace lme is a curve descnbmg
the relabonship between the probability of agreement to a
statement and the attnbute that the item is supposed to
measure It is assumed that with an increase m
197
A van Alphen et al
10
00
Utant trait
Figure 1 Trace line for dichotomous items associated with Likert
scaling
of the latent trait the degree of agreement vnth an item
increases as well In other words, the higher one's magmtude the more likely it is for a respondent to agree with an
item In Figure 1 an item trace line is visualized indicating
the probability that a subject agrees with an item m case
of a dichotomous item as a function of one's value of tbe
concept In Likert scaling it is assumed that the trace lines
of all Items of the questionnaire coincide approximately
This implies tbat in Likert scaling no attenbon is paid to
the item 'strengths'
Model testmg
Model testing assesses the degree to which the model
assumpbons are met In order to assess the applicability of
the Likert scale its reliability has to be assessed Reliability
concems the extent to which a measuring procedure jnelds
the same results on repeated trials
Reliability is expressed as the ratio of true to observed
variance of a variable determmed m a certain sample In
classical test theory the observed score of a person on a
test IS supposed to bave two components the true score
component and a random error component (Camunes &
Zeller 1979) A person's true score on a variable is
approached by the average score that would be obtained
if the person was re-measured an mfimte number of bmes
In that case the mean random error component is zero
However, since true scores are not observable reliability
should be esbmated m another way Estimabon of
reliability of a measure normally is based on the nobon of
parallel instruments Two items or two instruments are
CEdled parallel when they have ldenbced true scores and
equal error variances
Methods to esbmate rebabibty are the test-retest, alternative form method, the split halves method and the
intemal consistency method The most popular of the
intemal consistency esbmates is KR-20 or Cronbaeh's
alpha In nursing research this stabstic normally suffices
as the legitimabon of the use of sumscores m a researcb
project
In summary, the virtues of the Likert scalmg procedure
198
are that it is simple to apply and has a straightforward
logic (Mclver & Carmmes 1981) Subjects are assigned a
number equal to the sumscore of their item responses This
procedure is legibmate under the assumpbon that items
are parallel instruments Intemal consistency tests this
assumpbon In the pracbce of scaling the test developier
only computes the reliability There is no practice of systematically testing explicit hjT)otheses about the data measured with the test It is this practice, m addition to very
weak assumpbons of the underlymg model, that has contnbuted to the acceptability of the model In fact one considers a concept to be umdimensional if all the mter-item
correlabons are posibve As we will see below, the concept
of unidimensionality can be stated more explicitly
ITEM RESPONSE THEORY- AN
ALTERNATIVE
Using tbe practice of Likert scaling the researcher only
deals with mter-item correlations Although this pracbce
works effectively m many test construction situabons, it
IS conceivable that test development decisions could be
improved by usmg addibonal information about item
responses
For instance, it nught be important for the test developer
to know how persons havmg different levels of the latent
trait performed on a test item An approach to 3neld a more
complete picture of how an item functions is known as
Item response theory or latent trait theory (Hullm et al
1983)
Just like the researcher of the Likert tradition, the item
response researcher assumes that the responses to the
test items can be accounted for by a single latent trait At
the heart of the theory is a mathemabcal model of how
persons with different levels of the latent trait should
respond to an item This knowledge allows one to
compare the performance of persons having taken different tests It also permits one to apply the results of an
item analysis to groups with different levels fi'om the
group used for the item analysis Basic concepts of item
response theory are latent traits, item cbaractensbc
curves, local mdependence and umdimensionality These
basic concepts will be discussed m the next secbon of
this paragraph
Latent traits and item characteristic curves
One cenb'al concept of item response theory is the item
charactensbc curve (ICC) An ICC plots the probability of
agreeing with an item as a funcbon of the magnitude of
the latent trait underlying the performance on the test
items In most applicabons of item response theory the
ICC IS assumed to have an S-shape, as illustrated m
Figiu'e 2 In this figure the latent trait is denoted by p The
graph shows that a person with a higher score on the latent
Likert and Basch scaling procedures
00
Figure 2 Two item characteristic curves associated with item
response models
trait has a higher probability of answering posibvely on a
test item
Unidmiensionahty and local statistical
mdependence
and subject scale values The next step is to test the fit
between the model assumpbons and the data When the
Items obey the Rasch assumpbons scale values for the
Items can be published and used for other populabons
The researcher has shown explicitly that the scale is a
imidimensional test with good measurement properbes
Tesbng the fit between the data and the model, however,
can lead to negabve results In that situation three strategies
are recommended m literature The first strategy is to start
all over again, design a new instrument with new items and
gather new data The second is to look for another latent
model which offers a better explanation for the pattern of
item responses The third strategy is to analyse the data and
to remove items or subjects scores It may be that one or
more Items or a certam subsample of subjects are responsible for the misfit between the model and the data In this
strategy stabstical arguments and arguments of content
both have to govern item or subject reduction
Rationale
Umdimensionality is defined in terms of the stabstical
dependence among items Umdimensionality means that Within the context of item response theory each separate
the dependence among items can be accounted for by only Item and each individual response to that item is conone latent trait So one can say that a test is umdimensional sidered as a source of informabon about the scaling situif its items are dependent m the enbre populabon and ation This information is used to make inferences about
independent in eacb homogeneous (with respect to the the charactenstics of the items of the scale and about the
latent trait, e g people with the same latent trait value) scale values of subjects The Rasch model is one parasubpopulabon Since this independence is defined for a metrical because the items are charactenzed by only one
subpopulabon of people located at a smgle point on the parameter the agreement tendency of the item In the
latent trait scale, it is called local independence In gen- Rasch model each item may bave a different strength in
eral, one can say that the dimensionality of a test is equal relabon to the underlying concept In the case of a knowlto the number of latent traits required to achieve local edge test this strength usually is called the difficulty of the
independence
Item In our case of an agreement test this strength is
In the model we are using in this paper the assumption refened to as the agreement tendency An item can be
IS that only one latent trait is needed to achieve local mde- stated m more or less extreme terms thaui other items In
pendence, only one latent trait explains tbe correlabons this view a response to an item is caused by (a) the level
among the test items Researchers usmg a questionnaire or on the latent trait of the subject, and (b) the agreement
a test implicitly assume umdimensionality of their test or tendency of the item
subtest The use of total test scores further underpins this
Figure 3 IS an illustration of how the difference hetween
implicit assumpbon By using an explicit test model these the latent trait and the agreement tendency affect the probassumpbons can be tested There are several item response ability that subjects with the same level will agree with an
theones, of which we will descnbe only one the one parameter model or the Rasch model (Wnght & Stone 1979)
THE RASCH MODEL
Scaling procedure
Tbe aim of using the Rasch model is not only to scale
subjects but also to scale items on the same conbnuum
First of all we need responses of persons to the items of
the quesbonnaire The scaling procedure starts with the
estimabon of the parameters of the Rascb model These
parameters are the so-called agreement tendencies of the
items (for an explanabon see the secbon Rabonale below)
00
Figure 3 The relationship between the probability of agreement
and the difference between the magnitude of the latent trait and
the item agreement tendency
199
A van Alphen ^al
Item. In this figure fiy is the latent trait value of respondent Another group of test procedures is based on the companV and &, IS the agreement tendency of item i When Pv>Sj son of person and item parameters between subgroups of
the probabihty P for respondent v to agree vwth item l is respondents (Andersen 1973)
greater then 'A {P> 'A) Equally, when P,<8y then P< 'A
In summary, the Rasch model makes very strong
In the case of p,= 5, the difference is 0 and the probabihty assumpbons about the response pattems to the items The
of agreement to the item is 'A
mathemabcal character ofthe model allows testing of these
Now it should be clear that the probability of agreement assumpbons When the data fit the model the latent trait
to an item depends on the difference between the latent can be measured on mterval level For the analyses the
trait and the agreement tendency In the Reisch model computer program PML, published by Gustafsson (1979),
the difference between latent trait and agreement can be used (There are some other computer programs
tendency determines the probability of agreement with based on the Raseh model available Bilog 3, BEMAIN
an item In Figure 3 the item trace line is the visual These programs are distnbuted for European Countnes by
representation of these probabihties as a funcbon of the ICC Programma, PO BOX 841, 9700 AV Gromngen, The
differences The curve is charactensbc for one item and Netherlands, and for Canada and North America by SSI
therefore referred to as the item eharaetenstie eurve The (Scienbfic Software), 1525 East 53rd Street, Suite 906
mathemabeal formula that desenbes the set of ICCs m Chicago, II 60615, USA )
the Rasch model is
DISCUSSION: LIKERT OR RASCH'
As has been stated before, there is necessarily a strong
correlation between sumscores on the instrument (Likert
scale) and the Rasch person parameters That fact raises
the quesbon of why we want to use more complicated
person parameters and not the sumscores
One answer to this question is that m evaluabng an
mstrument as a Rasch model more fundamental evidence
ean be provided to jusbfy the use of the sumseore scale
on an interval level Distances on the Likert sumscore scale
are mterpreted as equal over the full range of the scale
The scale is treated as an interval scale based on ordinal
Model assimiptions and model testing
level Item scoring There has been a discussion m nursmg
As IS the case with Likert scaling, the Rasch model makes literature about the legibmacy of this praebee In this
the assumpbon that a smgle trait underlies subject's per- paper we take the stanee that this praebee eannot be
formance on each item of the instrument This assumpbon defended against the performanee of measurement theorof umdimensionality is closely related to the assumptions ies like the Raseh model The Rasch scale is a stabsbeally
of local stabsbcal mdependence, sufficiency of the sum- proven mterval seale and is to be preferred For more discussion on the controversy we refer to Knapp (1990)
score and monotonicity of the ICCs
Local stabstical independence has already been discusAnother answer to the quesbon is that a true Rascb scale
sed By sufficiency ofthe total scores is meant that the total permits us to benefit fi'om the property of specific objecseore of a respondent eontams all informabon about the bvity Speeifie objeetivity means that item parameters are
scale value of that respondent and tbe total score for one the same withm any sub-populabon of the domain for
Item over the sample of respondents eontams all mfor- whieh Rasch homogeneity is established When item parmabon for the estimabon of the agreement tendeney ameters deviate in a subpopulabon out of that domain,
Finally, the ICC should be monotonieally lnereasmg m P this means that within this subpopulabon the concept has
and monotonieally deereasmg m 5 It is the charactensbc a different meaiung
assumption of the one parametncal Rasch model that all
Specific objeebvity also means that the set of person
ICCs have the same slope and differ only in agreement parameters is the same for any subset of the ongmal
tendency
set of items Shortened item sets can be used for
As IS stated earlier the Rasch model describes mathemat- subpopulabons with an expected specific limited range of
ically the response pattem that can be expected This prop- subject values on the latent trait
erty of the Rasch model allows empincal testmg of the
assumptions underlymg the model One group of test proStrengths and weaknesses
cedures IS based on the statisbcal or graphical comparison
of expected proporbons with the observed proporbons of Finally, the strengths and the weakness^ of the Rasch
agreement to the items (Molenaar 1983, Gustafsson 1980) model c£m be summanzed Rasch-models have very strong
where X ^ = l is the dichotomous response of respondent
V to Item i and P(X=1) is the probability of agreement
given Pv and 8,
By this formula the expected response pattem of respondents to an Item set is determined given the esbmated Ps
and the 5s When the observed response pattem comcides
with or does not deviate too much from the expected response pattem then the items consbtute a true Rasch scale
200
Likert and Rasch scaling procedures
points First, the probabilisbc nature of the model, m
contrast with detemmusbc models like the Gutman scale,
takes into account that human responses are subject to
fiuctuabons Second, the assumpbons of the Rasch model
can be tested statisbcally Third, a Rasch scale is a psychometncally proven interval scale you know better what you
are measunng Fourth, the esbmates of the person and item
parameters are sample-fi«e This means they will hold for
every sample and not merely the sample under considerabon A final strong pomt is the availability of informabon
about the vanous items With this informabon, together
with theorebcal considerabons, it is possible to search for
Rasch homogeneous sets of items
One of the disadvantages lies in the fact that applying
the Rasch model requires some knowledge of and
acquaintance with mathematics As is the case with many
stabsbcal procedures, a speeific eomputer program for the
performance of the Rascb analysis is indispensable
Sometimes the handling of the program is complicated
Performmg the analysis and mterprebng the output of such
a program requires expenence and expertise For these
reasons it is strongly recommended that nursing researchers co-operate with a stabsbcian or a psychometneian who
IS aequamted with the Rasch model Seeond, a disadvantage of the Rfiseb model is the great number of observabons
or replieabons that are needed to estimate the parameters
ofthe model Third, the Rasch model holds strong assumptions which £ire not easy to meet by the observabons
CONCLUSION
Compared to Likert scales Rasch scales are a good altemabve for measunng concepts of mterest in nursing researeh
When enough observabons are available it is worthwhile
to check whether the data obey the assumptions of the
Rasch model If that is the case, pabents can be scored on
an mterval level scale that is independent of the number
of Items used m the lnstmment
References
Andersen E B (1973) A goodness of fit test for the Rasch model
Psychmetnka 38, 123-140
Carmmes EG & Zeller AR (1979) Reliability and Validity
Assessment Sage, Beverly Hills
Gustafsson J E (1979) PML A Computer Program for Conditional
Estimation and Testing m the Rasch Model for Dichotomous
Items Reports from the Institute of Education, University of
Gdteborg, no 85
Gustafsson J E (1980) Testing and obtammg fit of data to the Rasch
model Bntish Joumal of Mathematical and Statistical
Psychology 33, 205-233
Hullm Ch L , Drasgow F & Parsons Ch K (1983) Item Response
Theory Applications to Psychological Measurement Dow
Jones Irwin, Homewood
Knapp T R (1990) Treating ordinal scales as mterval scales an
attempt to resolve the controversy Nursing Research 39(2),
121-123
Mclver JP & Carmmes EG [19S1) Umdimensional Scaling Sage,
Beverly Hills
Molenaar IW (1983) Some improved diagnostics for failure of
the Rasch model Psychometnca 48(1), 49-72
Ptaszynski Howard E (1985) Applying the Rasch model to test
administraUon Joumal of Nursing Education 24(8), 340-343
Toi^erson W S (1958) Theory and Methods of Scaling John
Wiley, New York
Wnght BD & Stone MH (1979) Best Test Design Rasch
Measurement MESA Press, Chicago
201
© Copyright 2026 Paperzz