Research Method - VU Research Portal

A Framework for Developing Semantic Differentials in IS research:
Assessing the Meaning of Electronic Marketplace Quality (EMQ)
Tibert Verhagen
VU University Amsterdam, Department of Information Systems and Logistics
De Boelelaan 1105, room 3a-22
Tel: +31 (0)20 – 598 6185
Fax: +31 (0)20 – 598 6005
[email protected]
Selmar Meents
VU University Amsterdam, Department of Information Systems and Logistics
De Boelelaan 1105, room 3a-23
Tel: +31 (0)20 – 598 6185
Fax: +31 (0)20 – 598 6005
[email protected]
1
A Framework for Developing Semantic Differentials in IS research:
Assessing the Meaning of Electronic Marketplace Quality (EMQ)
ACKNOWLEDGEMENTS
The authors would like to show their gratitude to the anonymous reviewers for their feedback
and comments. Moreover, the authors thank all members of the expert panels as well as the
three C2C electronic marketplaces who participated in the project.
BIOGRAPHIES
Tibert Verhagen is an assistant professor E-business at the Department of Economics and
Business Administration of the Vrije Universiteit Amsterdam. He holds a Ph.D. in adoption
of web-based systems for consumer online purchasing. His research interests include research
methodology, measurement validation and consumer adoption of IS systems. Next to
conference proceedings, he has published in journals such as Information & Management and
the European Journal of Information Systems.
Selmar Meents is a PhD candidate at the Vrije Universiteit Amsterdam, the Department of
Economics and Business Administration. His research interests include research design,
electronic marketplaces, trust and risk in offline and online settings, and buyer-seller
relationships. His work has been published in several conference proceedings, and has
appeared in the European Journal of Information Systems.
A Framework for Developing Semantic Differentials in IS research:
Assessing the Meaning of Electronic Marketplace Quality (EMQ)
ABSTRACT
Adequate usage of measurement instruments is key for solid research. In this study we focus
on the semantic differential as general technique of measurement. Despite calls for
methodological rigor in information systems (IS) research, many of the applications of the
2
semantic differential in IS studies are characterized by flaws and weaknesses. Consequently,
the findings of these studies demand cautious usage since validity problems are likely to
exist. The aim of this study is to shed light on the semantic differential. Principles of semantic
differentiation are discussed, and used as foundation to introduce a framework for developing
and applying semantic differentials. The framework delineates the crucial role of linguistics
and concept-scale interaction, and extends available guidelines for measurement validation
with procedures to test wording credibility, linguistic contrast, psychological bipolarity, and
contextual contamination. The framework is exemplified using a demonstration exercise,
which centers on the assessment of the meaning of the concept electronic marketplace quality
(EMQ). Using a mixture of qualitative and quantitative methods, the demonstration exercise
clarifies the prerequisites for semantic differentiation and provides guidelines for researchers.
The paper concludes with a discussion of implications for researchers, reviewers and practice.
KEYWORDS
Semantic differential, measurement validation, research methodology, linguistics, contextual
contamination, electronic marketplace quality.
A Framework for Developing Semantic Differentials in IS research:
Assessing the Meaning of Electronic Marketplace Quality (EMQ)
1. INTRODUCTION
The semantic differential is frequently used in IS research. Being a technique to measure the
connotative meaning of concepts, IS researchers have applied the semantic differential to
measure concepts such as computer user satisfaction (Bailey and Pearson, 1983), information
system satisfaction (Bhattacherjee, 2001; Griffith and Northcraft, 1996; Li, McLeod and
Rogers, 1993; McKeen, Guimaraes and Wetherbe, 1994), information systems planning
success (Doherty, Marples and Suhaimi, 1999), information culture (Jarvenpaa and Staples,
2000), computer attitudes (Mykytyn and Green, 1992; Webster, Martocchio and Joseph,
1992); media perception (Chidambaram and Jones, 1993), and website performance (Huang,
2005). Both theory and empiricism support the adoption of the semantic differential in IS
research. Theory presents the semantic differential as easy and quick method to assess both
the intensity and direction of the meaning of concepts (Mindak, 1961; Heise, 1970).
Moreover, empirical results underline its reliability (see Mindak, 1961; Wirtz and Lee, 2003),
3
validity (Van Auken and Barry, 1995), robustness (see Clevenger, Lazier and Clark, 1965;
Hawkins, Albaum and Best, 1974) and relative insensitivity to systematic response errors (see
Friborg, Martinussen and Rosenvinge, 2006).
Despite the adoption of the semantic differential in IS settings, there is ample evidence
that its development and usage therein are subject to serious weaknesses. We notice, for
example, that researchers arbitrarily select bipolar scales for the concept to be measured
without testing for concept-scale interaction (e.g., Bergeron, Raymond, Rivard and Gara,
1995; Jarupathirun and Zahedi, 2007; Winter, Saunders and Hart, 2003) or linguistic contrast
(e.g., McKinney, Yoon and Zahedi, 2002; Palmer, 2002; Winter et al., 2003). Moreover,
there is proof that in some cases the selected bipolar scales have not been subject to empirical
tests of factorial structure (e.g., Bailey and Pearson, 1983; Barki, Rivard and Talbot, 2001;
Galetta, Ahuja and Hartman, 1995; Singh and Dalal, 1999), even though the relevance of
factorial testing is highlighted in the semantic differential literature (Osgood, 1952; Osgood,
Suci and Tannenbaum, 1957; Sharpe and Anderson, 1972). Furthermore, we observe that
scholars replicate established semantic differentials to measure closely related but different
concepts (e.g., Suh and Lee, 2005; Van der Heijden, 2004), hereby neglecting that semantic
differentiation demands a tailored approach since the bipolar scales form the axes of a
multidimensional space in which the meaning of a particular concept is measured. In general,
researchers too often apply existing semantic differentials without any specific adaptation to
the research context (e.g., Bhattacherjee and Premkumar, 2004; Bhattacherjee and Sanford,
2006; Watts Sussman and Sproull, 1999), which results in a lack of relevance (Dickson and
Albaum, 1977; Sharpe and Anderson, 1972). Finally, we advert to the fact that the use of
semantic differentials is subject to misinterpretation. Some scholars refer to their particular
measurement instruments as semantic-differentials, while these comprise typical Likert-scale
items (e.g., Chenoweth, Dowling and St. Louis, 2004; Liao and Cheung, 2002; Okazaki,
2006) or, vice versa, as Likert scales while they consist of semantic-differentials (e.g., Watts
Sussman and Sproull, 1999). Researchers even combine bipolar semantic differentials with
typical Likert-scale items within the same multi-item scale (e.g., Bruce, Briggs, Shepperd,
Yen and Nunamaker, 1995), hereby neglecting basic prerequisites for semantic differentiation
(see Osgood et al., 1957; Snider and Osgood, 1969).
Given these weaknesses, it becomes paramount that the results of the IS studies referred
to in the above have to be interpreted with care. By neglecting basic requirements of semantic
differential measurement such as bipolarity, relevant concept-scale pairing(s),
unidimensionality and adaptation to the research context, validity problems are likely to exist
4
(see Dickson and Albaum, 1977; Sharpe and Anderson, 1972). These observations contrast
with well-known calls for valid measurement instrument development (e.g., Boudreau, Gefen
and Straub, 2001; Straub, 1989) and highlight the need for more understanding of the
semantic differential technique and for a systematic overview of requirements regarding its
development.
This research is intended to provide more understanding of the development and use of
the semantic differential and to contribute to measurement validation in the IS research field.
Assumptions underlying the semantic differential technique are used to construct a
framework for developing and applying semantic differentials, hereby expanding well-known
general directives for scale development (e.g., Boudreau et al., 2001; Chin, Gopal and
Salisbury, 1997; Churchill, 1979; Straub, 1989) with semantic differential-specific test
procedures. The framework integrates the basics of psychometric measurement with
guideliness as recommended by semantic differential theorists, and puts emphasis on the
crucial role of linguistics.
Subsequently, the framework is elucidated and put to practice via a demonstration
exercise (cf. Straub, 1989) since “instrument validation may be best understood by seeing
how validation can be applied to an actual MIS research problem” (Straub, 1989, p. 154).
This demonstration exercise focuses on the assessment of the meaning of Electronic
Marketplace Quality (EMQ). Rooted in an established field of EM studies (e.g., Bakos, 1991;
Bakos, 1998; Cheng, Chan and Lin, 2006; Lancastre and Lages, 2006; Pavlou, 2002; Pavlou
and Gefen, 2004; Sarkar, Butler and Steinfield, 1995), EMQ refers to buyers’ quality
perceptions of consumer-to-consumer (C2C) electronic marketplaces (EMs). The adoption of
the EMQ concept is theoretically appealing since development of such conceptual framework
offers new opportunities for theoretical advances in the emerging field of EM studies (e.g.,
Hsiao, 2003; Bapna, Goes, Gupta and Jin, 2004; Cheng et al., 2006; Lancastre and Lages,
2006; Pavlou, 2002). Drawing upon works on website quality (e.g., De Wulf, Schillewaert,
Muylle and Rangarajan, 2006; Kim and Stoel, 2004a), EMQ is expected to be
multidimensional and rather complex in nature (cf. Yang, Cai, Zhou and Zhou, 2005),
making the semantic differential one of the most appropriate techniques to measure this
concept (Mindak, 1961).
This research essay makes the following major contributions. First, drawing upon a
review of the semantic differential literature, this study describes and eluminates the
fundamentals of semantic differentiation. As such, we aim at providing a basic understanding
of the semantic differential, and intent to prevent inapproriate and arbitrary usage of the
5
measurement technique due to a lack of knowledge. Second, from a methodological
perspective, we introduce an integrative framework for semantic differential development
and usage. The framework integrates established works on scale development and the more
specific literature on semantic differentiation. The absence of directives for the semantic
differential technique in conventional paradigms for measurement validation stresses the need
for such an integrative approach. Third, we offer specific guideliness for semantic differential
development and usage. Drawing upon empirical exercises, principles for semantic
differentiation are proposed. As such, we aim at expanding measurement validation in the IS
research field in general.
This paper is structured as follows. First, we deliberate on the semantic differential
technique. We briefly discuss the basics of semantic differentiation and focus on fundamental
assumptions underlying the technique. Then, we use the assumptions as a foundation to
propose a framework for semantic differential development. A demonstration exercise
clarifies the framework. Building upon literature study, linguistic tests, expert interviews,
pilot tests and data collected in three EMs in the Netherlands, the exercise elaborates on the
proposed framework and puts forward a validated semantic differential to assess the meaning
of EMQ. Finally, we discuss our work, and conclude with recommendations for researchers,
reviewers and practitioners.
2. THE SEMANTIC DIFFERENTIAL
2.1 Technique of Measurement
Introduced in behavioral sciences by Charles Osgood and his associates (e.g., Osgood and
Suci, 1955; Osgood et al., 1957), the semantic differential is applied to many content areas
including the field of IS research (e.g., Banerjee, Cronan and Jones, 1998; Bhattacherjee,
2001; Burke and Chidambaram, 1999; Watts Sussman and Sproull, 1999). Despite its
adoption in IS research, there seems to be a number of misunderstandings about the semantic
differential. The vast majority of IS researchers seem to interpret it rather narrowly as being
an alternative scaling format. This point of view, however, does not hold. The semantic
differential is neither a predefined set of measurement items, nor a scaling format (Osgood
and Succi, 1955; Osgood et al., 1957), nor a device for measurement (Carroll, 1959). A
semantic differential is a highly general technique of measurement that has to be adapted to
each research context. As such, usage of the technique depends on the research goals and
objectives of the researcher (Osgood et al., 1957, p.76).
6
In essence, the semantic differential is a technique to measure the meaning of concepts
(Mindak, 1961), consumer opinions and attitudes (Dickson and Albaum, 1977). Although,
‘meaning’ can be viewed and interpreted from various perspectives (e.g., linguistic meaning,
sociological meaning, relational meaning), the semantic differential explicitly focuses on the
observation and measurement of the psychological meaning of concepts (Kerlinger, 1973).
The concept itself, also known as stimulus, can either be a noun, verb or a noun phrase
(Osgood et al., 1957). To measure the meaning of the concept the semantic differential uses a
list of bipolar scales, where bipolar reflects opposite-in-meaning. An example is retrieved
from the work of Burke and Chidambaram (1999) who measured the concept communication
interface perceptions with the following set of bipolar adjectives:
Difficult: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Easy
Complex: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Simple
Constrained: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Free
Constricted: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Spacious
For each pair of polar adjectives, respondents place a mark on the continuum of individual
lines to indicate the point that characterizes the concept (DeVellis, 2003). Assuming that each
pair of adjectives represents a different dimension, respondents allocate the meaning of a
concept in a multidimensional space or semantic space. This process of allocation is known
as semantic differentiation. Since in practice different bipolar scales are likely to load on the
same dimension, usually factor analysis is applied to determine the underlying dimensions
that together delineate the axes of the semantic space (Clevenger et al., 1965; Heise, 1970;
Osgood, 1952; Osgood and Suci, 1955; Osgood et al., 1957). Next to bipolar adjectives and
verbs, also contrasting phrases can be applied as scale opposites (cf. Dickson and Albaum,
1977; Hawkins et al., 1974; Kelly and Stephenson, 1967). Contrasting phrases enable the
researcher to put forward more adequate descriptors for each dimension, which is likely to be
an advantage when measuring the meaning of rather complex concepts (cf. Mindak, 1961).
When applying such contrasting phrases, bipolarity is accomplished by the inclusion of
contrasting adjectives within the phrases, which function as scale anchors (cf. Dickson and
Slevin, 1975; Dickson and Albaum, 1977).
2.2 Key Assumptions of Semantic Differentiation
Being a combination of controlled association and scaling procedures, the semantic
differential technique heavily relies on linguistics. In fact, building upon linguistic encoding
as index of meaning, the “crux of the method lies in selecting the sample of descriptive polar
7
terms” (Osgood et al., 1957, p.20). The crucial role of linguistics, combined with the focus on
controlled association and scaling, implies that the following assumptions have to be taken
into account to assure appropriate usage of the semantic differential.
1. Concept relevance: the selected concept must be relevant for the particular research
problem under study (Osgood et al., 1957). Irrelevant concepts are likely to results in
small variance among respondents, making them rather useless for the purpose of
research. In case of multidimensional semantic differentials, this assumption applies to
both high-order construct and underlying dimensions (Kerlinger, 1973).
2. Representative sample of bipolar scales: the combined sample of selected bipolar scales
has to be large enough and widely distributed in meaning to cover the entire semantic
space and, thus, define the meaning of the concept (Carroll, 1959; Darnell, 1966;
Nunnally, 1967; Weinreich, 1958). The larger or more representative the sample of
bipolar scales, the better defined is the multidimensional meaning of the concept (Osgood
et al., 1957).
3. Relevant concept-scale pairings: the selected adjective pairs will have to be relevant to
the particular concept under study (Darnell, 1966; Dickson and Albaum, 1977; Osgood et
al., 1957; Sharpe and Anderson, 1972). Irrelevant concept-scale pairings are likely to
result in neutral responses and therefore reduce the amount of information gathered
(Osgood et al., 1957, p. 78-79).
4. Bipolarity: the anchors of the scales have to be bipolar (Dickson and Albaum, 1977;
Falthzik and Johnson, 1974; Osgood et al., 1957). Bipolarity involves linguistic contrast
as well as psychological bipolarity. Linguistic contrast implies that the distinct scale
anchors are truly bipolar from a pure linguistic point of view (nominal antonyms, see
Carol, 1959). Psychological bipolarity extends this view by assuming that the selected
scale anchors are not only bipolar in isolation, but also in relation to the particular concept
to be measured (functional antonyms; see Carrol, 1959). As such, psychological
bipolarity demands linguistic tailoring of the polar terms to the concept under study
(Green and Goldfried; Heise, 1969; Schriesheim and Klich, 1991).
5. Credibility of within-scale wordings: when applying combinations of adjectives, nouns or
verbs as scale opposites, these combinations have to be as credible and “natural” as
possible (Cliff, 1959; Osgood et al., 1957). This assumption is crucial since the meaning
of a word is largely affected by the combination of words it is used with.
6. Good psychometric properties: the semantic differential has to satisfy conventional
criteria for psychometric measurement (Dickson and Albaum, 1977; Osgood et al., 1957).
8
Particular attention has to be paid to the role of factor analytic approaches to discover and
define the dimensionality of the semantic space (see Deese, 1964; Osgood and Succi;
1955; Osgood et al., 1957). To allocate the meaning of a concept accurately, the
dimensions that form the axes of the semantic space have to be unidimensional and
independent (Landon, 1971; Osgood et al., 1957).
7. Concept independency: since the semantic differential heavily relies on linguistics, the
different concepts judged in a semantic differential have to be independent. Due to
anchoring effects (see Simonson and Drolet, 2004; Tversky and Kahneman, 1974), there
is a chance that later responses are biased. This problem, which is known as contextual
contamination, needs to be tested and controlled for (Landon, 1971; Osgood et al., 1957).
Since the semantic differential is often applied to measure multidimensional constructs,
the instrument usually comprises several concepts or dimensions. In this context,
particular attention has to be paid to the chance that the first concept measured functions
as frame of reference and hereby affects the subsequent evaluation of the other concepts
(Landon, 1971; Osgood et al., 1957).
2.3 Towards a Framework for Developing Semantic Differentials
The key assumptions underlying the semantic differential technique have fundamental
implications for the semantic differential development process. Most importantly, the
assumptions demonstrate that the construction of semantic differentials not only necessitates
usage of established scale development guidelines (e.g., Boudreau et al., 2001; Chin et al.,
1997; Churchill, 1979; Straub, 1989), but also requires adequate usage and thus testing of
linguistics. As such, it demands an integration of fundamentals of psychometric
measurement, and the more specific linguistic guidelines as put forward in the semantic
differential literature. Remarkably, such an integrative approach is lacking. Building upon
semantic differential studies and measurement development literature, we propose a
framework for systematic development of semantic differentials. Table 1 shows the
framework, which comprises nine steps.
Table 1: framework for developing semantic differentials in IS research
Stage
Description
Underlying
Key references
assumption
1 Concept
definition
Definition of concept and
Concept relevance
Chin et al., 1997; Devellis
delineation of research
(2003); Nunnally and
domain
Bernstein (1994); Straub,
(1989).
9
2 Generation of
bipolar scales
Preliminary selection of
Representative
Dickson and Albaum
bipolar scales that are widely
sample of bipolar
(1977); Hawkins et al.
distributed in meaning
scales
(1974); Kelly and
Stephenson (1967);
Mindak (1961); Osgood et
al. (1957); Sharpe and
Anderson (1972).
3 Judgment of
Test of applicability of scale
Relevant concept-
Bearden, Hardesty and
concept-scale
items and antonyms for the
scale pairings
Rose (2001); Hambleton
pairings
concept under study
and Rogers (1991);
Hardesty and Bearden
(2004); Malhotra (1981);
Netemeyer, Bearden and
Sharma (2003); Osgood et
al. (1957); Tittle (1982).
4 Linguistic test of
semantic
Test of linguistic contrast and
Bipolarity
psychological bipolarity
Caroll (1959); Dickson
and Albaum, (1977);
bipolarity
Green and Goldfried
(1965); Heise (1969);
Mindak (1961); Osgood et
al. (1957); Schriesheim
and Klich, (1991); Snyder
and Osgood (1969).
Test for credible and ‘normal’
Credibility of
Blair and Presser (1992);
semantic
combinations of adjectives,
within-scale
Foddy (1993); Foddy
differential
verbs and nouns within each
wording
(2004); Kahn and Cannell
wording
bipolar scale
5 Linguistic test of
(2004); Reynolds,
Diamantopoulos and
Schlegelmilch (1993);
Reynolds and
Diamantapoulos (1998);
Song and Parry (1997).
6 Pilot to purify
Adoption of factor analytic
Good psychometric
Aaker (1997); Davis
the semantic
approach: initial tests of
properties
(1989); Moore and
differential
dimensionality, validity and
Benbasat (1991);
reliability
Netemeyer et al. (2003).
10
Osgood et al. (1957) ;
Srinivasan, Vanden Abeele
and Butaye (1989).
7 Scale
purification
Adoption of factor analytic
Good psychometric
Aaker (1997); Dabholkar,
approach: exploratory and
properties
Thorpe and Rentz (1996);
confirmatory tests of
Devellis (2003); Doll, Xia
dimensionality, validity and
and Torkzadeh, (1994);
reliability
Gerbing and Anderson
(1988); Gerbing and
Hamilton (1996); Osgood
et al. (1957); Viswanathan,
Childers and Moore
(2000).
8 Test of
Test of anchoring effects
Concept
Bickart, (1993); Feldman
contextual
between the concepts within
independency
and Lynch (1988); Landon
contamination
the scale
(1971); Osgood et al.
(1957); Tourangeau and
Rasinski (1988); Tversky
and Kahneman (1974).
9 Cross-validation
Adoption of factor analytic
Good psychometric
Dabholkar et al. (1996);
approach: confirmatory tests
properties
Gerbing and Anderson
of dimensionality, validity
(1988); Netemeyer et al.
and reliability
(2003); Straub (1989);
Viswinathan et al. (2000).
The framework extends available paradigms for scale development by highlighting the
principle of concept-scale interaction and by adding the notions of linguistic contrast,
psychological bipolarity, credible wording combinations and contextual contamination. In the
following sections we will elaborate on each stage of the framework by reporting a
demonstration exercise. Building upon qualitative and quantitative techniques, the exercise
focuses on the assessment of the meaning of EMQ.
3. DEMONSTRATION EXERCISE: ASSESSING THE MEANING OF EMQ
3.1 Concept Definition
A wide body of research on EMs exists in the field of IS research. In this research exercise
11
we view EMs from a system perspective (cf. Bakos, 1991; Hsiao, 2003; Lee and Clarke,
1997). Accordingly, an EM is considered as electronic market system, referring to an online
environment with specific boundaries that is operated by a particular intermediary. More
specifically, based on such studies as Bapna et al. (2004), Cheng et al. (2006), Lancastre and
Lages (2006) and Pavlou (2002), an EM is defined here as an environment located on the
Internet that is supported and enabled by a combination of IT and various services,
procedures and regulations offered by a third-party intermediary, in which parties can meet
and engage in exchange related behavior. This conceptualization is in line with earlier studies
(e.g., Bakos, 1998; Pavlou and Gefen, 2004), and applies to EMs where transactions are fully
completed online, as well as to EMs that are primarily used to engage in search activities and
information exchange, before parties meet offline for further inspection and transaction
settlement.
The context of our research is delimited to EMs that facilitate transactions between
consumers (C2C). C2C EMs have been the context of several empirical explorations (e.g.,
Hu, Lin, Whinston and Zhang, 2004; Pavlou, 2002; Pavlou and Gefen, 2004). Remarkably, a
well-conceptualized and validated instrument to measure consumers’ overall perceptions of
an EM is lacking. Here, we apply the semantic differential technique to construct such an
instrument and assess the meaning of the EMQ concept. EMQ refers to the overall perception
a buyer has of a particular EM. It reflects an intertwined mixture of evaluations that are
derived from all kinds of implicit (i.e., imperceptible, psychological) and explicit (i.e.,
observable, concrete) functions and services (Sarkar et al., 1995) provided by the
intermediary and the population of sellers. Examples of functions and services made
available by the intermediary include providing the technological infrastructure (Pinker,
Seidmann and Vakrat, 2003), credit arrangements, logistical settlement, negotiation services
(Grewal et al., 2001) and control mechanisms (Pavlou, 2002; Pinker et al., 2003). Sellers
expand these functions and services by offering sales related functions such as product
selection, product description, and provision of contact information (Sarkar et al., 1995).
Buyers evaluate the offered functions and services to assess and evaluate the EM, as well as
the behavior of the parties behind it (cf. Belanger, Hiller and Smith, 2002; Suh and Han,
2003).
Given that it comprises multiple intertwined perceptions, this research conceptualizes
EMQ as a composite construct that is rather complex and multidimensional in nature.
Although this conceptualization conforms to the literature on website quality (e.g., De Wulf
et al., 2006; Lee and Kozar, 2006; Yang et al., 2005), EMQ substantially differs from
12
constructs addressing website quality perceptions from a webstore perspective (e.g., Kim and
Stoel, 2004a; Wolfinbarger and Gilly, 2003), where transactions and thus related behavior are
mainly dyadic in nature and quality impressions are primarily derived from the functions,
services and actions of the selling party (Verhagen, Meents and Tan, 2006).
3.2 Generation of a Sample of Bipolar Scales
To ensure content valid measures, the EMQ construct was conceptualized as in the previous
section and its domain delineated (cf. Chin et al., 1997; Devellis, 2003; Nunnally and
Bernstein; 1994; Straub, 1989). Based on a literature study, a preliminary set of bipolar scales
was then generated. The EM literature (e.g., Bakos, 1998; Grewal et al., 2001; Pinker et al.,
2003) was studied to identify aspects that should be included. Moreover, existing
measurement items for website quality (e.g., Aladwani and Palvia, 2002; Kim and Stoel
2004a, 2004b; Palmer, 2002; Van Iwaarden, Van der Wiele, Ball and Millen, 2004;
Wolfinbarger and Gilly, 2003; Yang et al. 2005) were evaluated for their applicability (cf.
Palmer, 2002) and if applicable were adapted to EM settings. Finally, a subjective content
analysis of EMs was applied to collect additional items (cf. Mindak, 1961). These steps
resulted in a preliminary list of 69 semantic differentials.
Given the complex nature of the concept to be measured, that is the multidimensional
nature of EMQ, mere adjectives were deemed to be too limited in scope to be adequate
descriptors (cf. Dickson and Albaum, 1977; Mindak, 1961). Accordingly, each of the
semantic differentials consisted of a pair of descriptive phrases (cf. Hawkins et al., 1974;
Kelly and Stephenson, 1967) containing antonyms as anchors that were based on the work of
Osgood et al. (1957). Osgood’s standardized list of antonyms, however, provided an
insufficient number of appropriate antonyms (cf. Mindak, 1961), or seemed inappropriate for
the current research context (cf. Dickson and Albaum, 1977; Sharpe and Anderson, 1972).
Therefore, we adapted the antonyms to the concept under study and, based on existing
semantic differential scales (e.g., Dickson and Albaum, 1977; Hawkins et al., 1974; Landon,
1971) and linguistic works such as Webster’s Collegiate Thesaurus (1988 edition), added
some extra antonyms.
3.3. Judgment of Concept-Scale Pairings
To judge the validity of the preliminary list of 69 bipolar scale items and their antonyms, and
to refine the item and antonym pool, a pretest was held using expert panels (cf. Netemeyer et
13
al., 2003). The experts included eleven academic researchers in the field of IS and Marketing
from a Dutch academic institution, and seven practitioners working for two C2C EMs in the
Netherlands. The pretest comprised a series of three assignments. First, we applied a free
association technique to ensure that the item pool represented a proper sample of the domain
of the EMQ construct (content validity). Members of the expert panel were requested to
freely suggest items that should be included in a measure of EMQ (cf. Netemeyer et al.,
2003). A second assignment was used to judge the applicability of the proposed list of 69
items to the EMQ concept under study (cf. Hardesty and Bearden, 2004). The experts were
asked to evaluate the applicability of each of the items (cf. Malhotra, 1981) to two existing
C2C EMs in the Netherlands. The applicability was registered using a formalized rating
procedure (cf. Hambleton and Rogers, 1991; Tittle, 1982), consisting of a seven-point scale
ranging from “very inapplicable” to “very applicable”. In addition to their ratings, we asked
the experts to explain their opinions and propose improvements. Finally, in the third
assignment an overview of all 69 preliminary EMQ items was presented to the experts, after
which they were requested to suggest additional items. Although this assignment had the
same goals as the first assignment, it was not based on free association and thus might lead to
the mention of other interesting items.
In order to identify possible refinements of the item pool, a mixture of qualitative and
quantitative procedures was then applied (cf. Netemeyer et al., 2003). Building upon the
results of the second assignment, we identified candidates for rewording or deletion using a
variant of the so-called “sumscore” technique (Hardesty and Bearden, 2004) by calculating
an average applicability score for each item. Those items that received an average
applicability rating lower than 6 (i.e., “quite applicable”) for at least one of the two EMs,
were considered candidates for rewording or deletion (cf. Bearden et al., 2001). The
explanations given by the experts during the second assignment were also taken into
consideration in the final decision. Two criteria were used to evaluate and select additional
items as suggested in the first and third assignment. The items had to be applicable to our
definition of EMQ and be mentioned multiple times by the experts (cf. Osgood et al., 1957).
The items were then edited and reworded where necessary. Of the initial item pool of 69
items, 9 items remained unedited, 23 items were reworded, 35 items were removed and 35
new items were added. The result was an updated item pool of 69 items that could be used for
further pretesting.
14
3.4 Linguistic Test of Semantic Bipolarity
Given that bipolarity should be tested empirically rather than assumed (Cacioppo and
Bernston, 1994; Schriesheim and Klich; 2001), we conducted two linguistic tests. The
primary objective of the first test was to assess the linguistic contrast of the distinct scale
anchors. We adopted the method as applied by Dickson and Albaum (1977) and split the
antonyms into one list of positive anchors and one list of negative anchors. The two lists were
then transformed into two online questionnaires asking subjects to fill in the missing
linguistic opposites. The test was performed by making use of a convenience sample of
native speakers of American English. An e-mail invitation was send to friends and relatives
living in the USA, inviting them to participate in the test by clicking on a hyperlink that
would direct them to one of the two online questionnaires 1 . The sample of native speakers of
American English ensured linguistic homogeneity, which has been argued to increase the
likelihood that respondents apply the same meaning to the antonyms being presented to them
(Dickson and Albaum, 1977). To ensure that the respondents had a higher than average
intelligence, enabling them to better differentiate between the meaning of words (cf. Osgood
et al., 1957), we decided only to include respondents who had attended college 2 . The final
data consisted of 32 cases. The results of the test confirmed the selected list of anchors and
indicated that none of the anchors needed to be reworded.
After having established linguistic contrast, we assessed psychological bipolarity (see
Caroll; 1959; Green and Goldfried; 1965; Heise; 1969), using a pretest with an expert panel.
The panel members included the eleven academic researchers and seven practitioners who
also participated in the test of concept-scale pairing (section 3.3). The experts were asked to
judge and evaluate the applicability of the bipolar scale anchors defining the proposed list of
69 semantic phrases (cf. Mindak, 1961) in relation to the concept measured. For each
antonym pair the experts judged whether the two anchors linguistically aligned with the
corresponding concept (cf. Heise; 1969) by filling in a “Yes-No” question, and to explain
their opinions and suggest improvements. Based on their answers, the wording of some
anchors was slightly modified to align them more to their concepts (i.e., to assure
psychological bipolarity).
1
The hyperlink led to a webpage that hosted a program script. The script automatically redirected
each respondent to one of the two online questionnaires, hereby splitting the sample randomly.
2
A scale for educational level (Mittal and Kamakura, 2001) was included to probe the highest
education level the respondents had achieved. Those respondents who had not attended college were
excluded from the analysis.
15
3.5 Linguistic Test of Semantic Differential Wording
To assess the credibility of the scale wording of the semantic differential, a draft
questionnaire was constructed and thoroughly tested via a pretest. We constructed the draft
questionnaire using wording guidelines as presented in the scaling and pretest literature (e.g.,
Foddy, 1993 & 2004; Kahn and Cannell, 2004). In some bipolar scales that contained
potentially difficult or ambiguous words, the context or some examples of that particular
word were included at the end of the item, put between brackets (cf. Kahn and Cannell,
2004). As is common in semantic differential measurement, a seven-point scale was used 3 .
American English was selected as the base language of the draft questionnaire. The
questionnaire was constructed and checked for any mistakes by two translators who were not
only fluent in American English but also familiar with the concept under study (cf. Sekaran,
1983). The questionnaire was then translated into the language in which the questionnaire
was to be administered (i.e., Dutch), using a combination of the back translation and the
parallel or double translation technique 4 .
To investigate whether the Dutch draft questionnaire contained any faults in the wording
that could result in comprehension difficulties or would jeopardize the previously tested
assumptions of concept-scale pairing, bipolarity and wording interpretability, a pretest was
held using an expert panel (cf. Foddy, 1993 & 1998). Three academic experts with a
background in both questionnaire design and e-commerce research were asked to evaluate the
wording of each bipolar scale and its introduction (cf. Blair and Presser, 1992; Reynolds and
Diamantopoulos, 1998). Based on the pretest literature (e.g., Foddy, 1993; Reynolds et al.,
1993), a list was made consisting of commonly made faults in bipolar scale item wording.
The faults referred to potential incomprehensibility, complexity and ambiguity of the bipolar
scales, their introductions or their concepts. The experts were asked to evaluate the likelihood
that the bipolar scales were subject to these faults, using three point rating scales (“certainly”,
“possibly”, “no”; cf. Cannell, Fowler, Kalton, Oksenberg and Bischoping, 2004). Openended questions were added, enabling respondents to explain or comment on each rating.
Finally, we had the experts evaluate the overall questionnaire by asking them whether it
3
Response categories ranged from “very <negative anchor>” (1) to “very <positive anchor>” (7) with
a midpoint labeled “neutral” (4) (cf. Watts Sussman and Sproull, 1999; McKinney et al., 2002).
4
First a bilingual speaker whose base language is Dutch translated the English questionnaire into
Dutch (cf. Malhotra, Agarwal and Peterson, 1996). A second bilingual speaker whose base language
is English then compared this Dutch questionnaire to the original English questionnaire. Afterwards
both bilingual speakers discussed whether the translation of the questionnaire was appropriate (cf.
Song and Parry, 1997).
16
contained any other faults that could lead to incorrect interpretations. After the pretest the
EMQ scale was slightly modified by changing some words into synonyms and by shortening
some items without changing their meanings. The modifications resulted in a preliminary
instrument (see Appendix A for the English version) that was used as starting point for a pilot
study.
3.6 Pilot to Purify the Semantic Differential
To purify the semantic differential, a laboratory experiment was conducted. We used a
student sample for this pilot test, which simplified its administration and limited the
extraneous variance both within and across scales (Greenberg, 1987; Peterson, 2001) that
could impede the validity of the study.
Procedure
A sample of 196 undergraduate students following a mandatory information systems course
at a Dutch university participated in the experiment. The participants were instructed to study
four different EMs (eBay.nl and three Dutch EMs facilitating C2C exchanges in the
Netherlands), to focus on the purchase of a digital camera 5 , and to complete each visit with
filling in an online questionnaire addressing perceptions of EMQ. The entire experiment was
conducted in a lab, consisting of identical computer systems, and was monitored by a
supervisor of the research team. To mitigate the impact of differential familiarity (Srinivasan
et al., 1989) the students were instructed to carefully study each EM according to predefined
tasks (cf. Van der Heijden and Verhagen, 2004). To minimize order bias, a predetermined,
randomized scheme for visiting the EMs was distributed.
Initial Item Analyses:
To further trim the item pool and to get an initial indication of the multidimensional meaning
of EMQ, four steps were taken. First, since literature on the dimensionality of EMQ was
lacking and scholars warn against the use of mere statistical techniques to purify
measurement scales (e.g., Allport and Kerler, 2003), we conducted an item sorting exercise
(cf. Davis, 1989; Moore and Benbasat, 1991). Based on our expertise and the statements from
5
The frame of reference, buying a digital camera, was selected since this type of product was offered
in sufficient amounts on the four EMs under study, and it is a product that was deemed to appeal to
students, that they are familiar with and was easy to understand.
17
experts in earlier pretests (cf. Shimp and Sharma, 1987) we grouped the 69 EMQ items in a
preliminary classification of twelve dimensions. Ten faculty members of an academic
institution then judged this classification. Overall, the judgments confirmed the preliminary
classification.
Thereupon, a factor analytic approach was adopted (cf. Osgood et al., 1957). The
primary dataset used in the analysis consisted of the scores of the respondents averaged
across the four stimuli (cf. Srinivasan et al., 1989). Principal components analysis with
varimax rotation was applied to extract the factors, remove some items, and achieve an
adequate level of preliminary unidimensionality. This resulted in a preliminary twelve-factor
solution of 56 items (KMO MSA 0.88, Bartlett’s test of spherictity: 11699, p< .001)
accounting for 74.82% of the variance. In general, the resulting factor structure showed a
more than reasonable match with the preliminary classification as established in the item
sorting exercise. To test the generality of the initial factor solution, EFA was run with the
initial item list on each of the four original datasets (cf. Aaker, 1997; Srinivasan et al., 1989).
The results indicated that the factor structure was quite homogeneous across the four datasets.
The 13 items that were removed in the analysis of the pooled data failed to meet EFA criteria
in all datasets. 14 other items did show factor loadings similar to those in the pooled dataset,
but not in the case of all original datasets. Accordingly, 27 items were considered primary
candidates for deletion.
Subsequently, the internal consistency of the multidimensional EMQ construct was
investigated for each of the four EMs datasets. Cronbach’s alpha, average interitem
correlations and corrected item-to-total correlations were calculated for each of the twelve
preliminary EMQ dimensions. Items with either Cronbach’s alpha lower than .80 (cf.
Bearden and Netemeyer, 1998; Clark and Watson, 1995), average interitem correlations
lower than .30 (cf. Robinson, Shaver and Wrightsman, 1991) or corrected item-to-total
correlations lower than .50 (cf. Bearden and Netemeyer, 1998) in one or more datasets were
candidates for deletion. In total, 20 items did not meet the internal consistency criteria. Of
these 20 items, 18 were already identified as candidates for deletion in the EFA. The other 2
items were added to the list of 27, resulting in a total of 29 candidates for removal.
Finally, we checked the 29 items for wording redundancy, face validity and content
validity. Based on this relatively subjective analysis, it was decided to retain 10 items even
though these items did not meet the EFA or internal consistency criteria (cf. Netemeyer et al.,
2003). 19 items were removed. The result was a preliminary EMQ scale consisting of 50
items that could be used for further testing.
18
3.7: Scale Purification
To purify the semantic EMQ scale, we adopted a dyadic approach building upon two
independent samples (cf. Viswanathan et al., 2000) of real EM visitors 6 . The first sample
consisted of 1428 visitors (Appendix B) of eBay.nl (EM1), the Dutch version of eBay.com.
The second sample consisted of 1051 visitors (Appendix B) of the Dutch EM with the largest
market share of EMs in the Netherlands (EM2). This EM is well known in the Netherlands
for bringing buyers and sellers together via its classifieds system, who then usually meet in
offline settings for product inspection and transaction settlement. The dyadic approach was
selected as method of strong validity testing (cf. Viswanathan et al., 2000) since it implies
that validity, dimensionality and reliability are assessed independently, hereby extending the
generalizability of our findings and contributing to the strength of the semantic differential
development process.
Procedure
Like the pilot study, the focus of the research was on the purchase of a digital camera. On the
website of both EMs, banners were placed in the digital camera section, inviting participants
to participate in the survey voluntarily 7 . For the purpose of predictive validity testing, the
questionnaire did not only include the preliminary list of 50 semantic differentials but also
two additional scales to measure the attitude towards purchasing and the intention to purchase
respectively (Appendix F). This decision was supported by literature indicating that
perceptions of website quality are likely to affect consumer purchasing in general (Aladwani
and Palvia, 2002; Kim and Stoel, 2004a, 2004b; Yang et al. 2005), and consumer purchase
attitudes and intentions in particular (Wolfinbarger and Gilly, 2003). The two additional
measurement scales were taken from Van der Heijden, Verhagen and Creemers (2003). Both
scales were slightly adjusted to fit the target specificity of our research.
Results
Initial Test of Scale Dimensionality: To develop a better understanding of the underlying
structure of the semantic differential (cf. Gerbing and Hamilton, 1996), we first applied EFA
6
The selection of real EM visitors as respondents extends the external validity of the scale to a nonstudent population (Aaker, 1997; Webb, Green and Brashear, 2000), and contributes to the
generalizability of our findings due to the heterogeneous nature of both samples (Aaker, 1997).
7
When clicking on the banner, respondents were redirected to an online questionnaire. As incentive,
respondents could engage in the raffle of a book token of 20 Euro by filling in their e-mail address.
19
using the principle components model with orthogonal varimax rotation. Following a
recommended observation-to-item ratio of ten-to-one (Hair, Anderson, Tatham and Black,
1998) we decided to draw a subsample of 500 observations (50 items) from each dataset.
EFA was run (Appendix C). The data met the thresholds for sampling adequacy (sample 1:
KMO MSA 0.932, Bartlett’s test of spherictity 23194.964, p <.001; sample 2: KMO MSA
0.902, Bartlett’s test of spherictity 21269.033, p <.001). The two datasets revealed identical
factor solutions. Except for the item Instit8, which loaded high on the factor Seller
Information for sample 2, all items strongly loaded solely on their underlying factor. As such,
preliminary though strong evidence for unidimensionality, convergent validity and
discriminant validity was provided. To obtain first indications of construct reliability,
Cronbach’s alphas were computed. All alphas exceeded the standards for established research
(> 0.70; Hair et al., 1998).
Test of Latent Factor Structure: Following Gerbing and Anderson (1988) and Gerbing and
Hamilton (1996), we validated the extracted latent structure using Confirmatory Factor
Analysis (CFA) 8 .
The remaining data of both samples was used for the purpose of the confirmation (i.e.,
data not used for the EFA), resulting in two independent sub-samples (sample 1: n = 928;
sample 2: n = 551). Taking both model size and Maximum Likelihood Estimation as most
common estimation procedure into consideration, the size of the samples seemed more then
acceptable (cf. Hair et al., 1998). Amos 5.0 with maximum likelihood estimation was used
for the analysis (Arbuckle and Wothke, 1999; Arbuckle, 2003).
To assess the EFA solution, we first tested a correlated first-order model (cf. Doll et al.,
1994; Yang et al. 2005). The model consisted of the twelve extracted basic dimensions,
functioning as inter-correlated first-order factors. The fit indices of the initial solutions of
both datasets highlighted the need for model improvement (sample l: χ2 = 4062.627,
p<.001;GFI 0.84; AGFI 0.82; RMR 0.07; NFI 0.90; TLI 0.92; CFI 0.93; RMSEA; 0.054 /
sample 2: χ2= 3066.442, p<.001; GFI 0.81; AGFI 0.78; RMR 0.099; NFI 0.88; TLI 0.91; CFI
0.92; RMSEA; 0.057). Although measures such as TLI, CFI and RMSEA revealed acceptable
fit, both chi-square test and the fit indices GFI, AGFI, RMR and NFI were below
8
Although EFA is a preferred method to identify relatively unknown measure structures (Gerbing and
Hamilton, 1996), it does not explicitly test for unidimensionality because in EFA each factor is
defined as a weighted sum of all observed variables. This implies that the extracted factors do not
correspond directly to an exclusive, predefined subset of indicators (Gerbing and Anderson, 1988, p.
189).
20
recommended standards. Following Gerbing and Anderson (1988, p.417), we then decided to
study the pattern of residuals as one of the most useful sources to locate misspecification. 13
items (Appendix D) shared large positive and negative residuals with items of other factors.
We deleted these items and re-estimated the model. The chi-square statistic again
demonstrated poor fit for both samples (sample l: χ2= 1226,605, p<.001 / sample 2: χ2=
1154,230, p<.001). It has been recognized, however, that the significant chi-square statistic is
sensitive to large sample sizes and the complexity of the model (Bearden, Sharma and Teel,
1982; Bentler and Bonnet, 1980; Stewart and Segars, 2002). Other fit indices that are less
sensitive to sample size are more useful to determine model fit and make model comparisons
(Bagozzi, Yi and Phillips, 1991; Hair et al., 1998). For both samples, well-accepted fit indices
such as GFI, AGFI, RMR, RMSEA, NFI, TLI and CFI demonstrated good to very good fit
with the data (see table 2), hereby supporting the unidimensionality of the factors and the
multidimensionality of the EMQ scale.
To further investigate the dimensionality of the 37-item EMQ instrument, and to test for
any underlying structure, two alternative models were tested. Drawing upon Doll et al. (1994)
we tested a model consisting of twelve uncorrelated first-order factors and a one-factor model
relating all single items to one first-order EMQ factor 9 . If these models showed acceptable
fit, this would not only refute the notion that the twelve factors share an underlying structure,
but also that the EMQ concept is multidimensional in nature.
Table 2: CFA results for sample 1 and sample 2
Model
χ2
Df
GFI AGFI RMR
RMSEA NFI TLI
CFI
562
.93
.92
.059
.036
.96
.97
.98
629
.68
.64
.485
.081
.85
.86
.87
Sample 1 (n= 928)
Twelve first-
1226,605
order factors
(p<.001)
(correlated)
Twelve first-
4418,899
order factors
(p<.001)
(uncorrelated)
9
Another model, consisting of 12 first-order factors loading on a second-order factor representing the
overall concept of EMQ, was considered but not included. Following Chin (1998), postulation of such
a model had little value since the second-order factor was not expected to fully mediate the
relationships between the first-order factors and other variables in the pre-specified conceptual model.
As such, and given the stage of our research, inclusion of a second-order factor was inapplicable.
21
One first-order
18256.962
factor
(p<.001)
629
.47
.40
.228
.174
.36
.33
.37
562
.90
.87
.087
.044
.94
.96
.97
629
.67
.63
.487
.084
.83
.85
.86
629
.42
.35
.258
.181
.32
.29
.33
Sample 2 (n= 551)
Twelve first-
1154,230
order factors
(p<.001)
(correlated)
Twelve first-
3058,767
order factors
(p<.001)
(uncorrelated)
One first-order
12004,672
factor
(p<.001)
The results (table 2) indicated unacceptable fit for the two alternative models. It was
concluded that the correlated twelve first-order factor model is most applicable to model
EMQ. The twelve EMQ dimensions were then defined (table 3).
Table 3: overview of EMQ dimensions and their definitions
EMQ dimension
Definition
Layout
The buyer’s experience of the layout of the EM as being
attractive and up to date.
Ease of Use
The perceived usability of the EM, including navigation options,
site structures and ease of learning how to use it.
Contacting the
Perceptions of the amount of information and options provided at
intermediary
the EM that enable buyers to get in touch easily with the
intermediary facilitating the EM.
Institutional control
Perceptions of the measures applied by the intermediary, such as
guarantees, privacy policy and rules, to protect buyers and
regulate the EM.
Community
The perceived ability of buyers to share one’s experiences and
communicate with other buyers.
Contacting sellers
Perceptions of the amount of information and options provided at
the EM that enable buyers to get in touch with sellers easily.
22
The perceived amount and clearness of the information provided
Seller information
about sellers and their reputation.
The impression a buyer has about the way sellers describe and
Product information
represent the products offered at the EM.
The perceived clearness and convenience of the mechanism that
Pricing mechanisms
is used to establish and communicate prices at the EM.
Overall buyer’s perception of the assortment at the EM,
Assortment
including a) the size of the assortment and b) alignment of the
assortment with one’s interests.
The easiness and clearness of methods used for paying and
Settlement
receiving products bought at the EM, as perceived by buyers.
The buyer’s perceived ease of meeting sellers in offline settings
Meeting sellers
to inspect, pay for and pick up products.
Reliability and Validity Testing: Having established the dimensionality of the semantic
differential, we examined its reliability and validity (table 4).
Table 4: Reliability and validity statistics for sample 1 and sample 2
Sample 1 (n=928)
Dimension
α
Minimum
Sample 2 (n=551)
AVE
α
Minimum
Item to
Item to total
total
correlation
AVE
correlation
Layout
.90
.776
.75
.90
.783
.75
Ease of use
.88
.705
.73
.90
.735
.74
Contacting the
.96
.903
.89
.95
.867
.87
Institutional control
.91
.743
.71
.90
.747
.71
Community
.84
.655
.64
.87
.740
.70
Contacting sellers
.94
.862
.83
.95
.869
.88
Seller information
.90
.715
.77
.91
.715
.80
Product information
.84
.659
.65
.77
.526
.57
Pricing mechanisms
.87
.653
.70
.87
.723
.70
Assortment
.94
.861
.85
.91
.751
.79
Settlement
.92
.782
.81
.93
.832
.81
intermediary
23
Meeting sellers
.92
.817
.80
.93
.826
.81
The reliability statistics indicate good reliability for the twelve EMQ dimensions. Except for
Product Information (α= 0.77, sample 2), all Cronbach’s alphas surpass the 0.80 level. For all
dimensions, the Average Variance Extracted (AVE) exceeds the 0.50 thresholds prescribed in
the literature (e.g., McKinney et al., 2002; Ping, 2004).
We then tested for convergent, discriminant and predictive validity (cf. Dabholkar et al.,
1996, p. 12). Convergent validity was assessed by AVE’s, Cronbach’s alphas and minimum
item-to-total correlations. All AVE’s exceed the recommended level of 0.50 (Segars, 1997;
Yi and Davis, 2003) and, except for the dimension Product Information (α= 0.77, sample 2),
all alphas surpass the 0.80 guideline (see Ping, 2004). The minimum item-to-total
correlations reveal high correlations, all exceeding the criterion of 0.40 (see Jayanti and
Burns, 1998), hereby providing strong additional support for convergent validity.
To test for discriminant validity, we studied the within-construct item-correlations for
each of the twelve EMQ dimensions and compared these loadings with cross-loadings on
items of other dimensions (cf. Ko, Kirsch and King, 2005). All within-construct item
loadings were higher than their cross-loadings, and no cross-loadings above .70 were
observed, implying discriminant validity (Ping, 2004). To further assess discriminant validity,
we measured the differences between the squared correlations between dimensions and their
individual AVE. Since the value of squared correlations was less than either of their
individual AVE’s for all pairs of dimensions we tested for, discriminant validity was
confirmed (Fornell and Larker, 1981; Yi and Davis, 2003).
Finally, we assessed the predictive validity of the EMQ scale. For both samples, the
EMQ dimensions were regressed on the attitude towards purchasing and the intention to
purchase. Table 5 reports the results.
Table 5: Standardized regression coefficients of EMQ dimensions on attitude and
intention
Sample 1 (n= 928)
Dimension
Sample 2 (n= 551)
Attitude
Intention
Attitude
Intention
Layout
-.01
.00
.00
.07
Ease of use
-.03
-.05
.02
-.06
-.00
-.00
-.06
.08
Contacting the
intermediary
24
Institutional control
.08
.06
-.02
-.08
Community
-.06
-.04
.03
.04
Contacting sellers
-.07
-.00
-.12
-.03
Seller information
.02
-.08
.07
-.04
Product information
.28 **
.20 **
.11
.00
Pricing mechanisms
.11*
.07
.01
-.07
Assortment
.16**
.20 **
.24**
.03
Settlement
.03
.08
.08
.07
.16**
.05
.25**
.00
Meeting sellers
R squared
.27
.16
.26
.02
Adjusted R squared
.26
.15
.24
.00
** significant at P< .001, * significant at P< .01,
A multicollinearity test revealed that the regression analysis had not been subject to
multicollinearity 10 . For both samples, the EMQ scale explains around 25% of the attitude
towards purchasing. Given the level of target specificity of the attitude construct, these
findings are quite encouraging. For the purchase intention, however, the results are less clear.
Even though explaining 16% of the online purchase intention for sample 1, the EMQ
dimensions do not add to the behavioral intention variance for sample 2. When focusing on
the impact of the individual dimensions for sample 1, Product Information, Pricing
Mechanisms, Assortment and Meeting Sellers are significant attitude and intention
determinants. For sample 2, only Assortment and Meeting Sellers have a significant impact
on the attitude. We believe these differences in significance are likely to be explained by
focusing on the extent to which both EMs support the different stages of the transaction
process (see also Grieger, 2003; Skjøtt-Larsen, Kotzab and Grieger, 2003). As mentioned
previously, EM1 is used by consumers to search for products, engage in bidding, and
complete transactions online. EM2, however, mainly matches buyers and sellers, who then
meet offline for further product inspection, negotiation and transaction settlement. The
absence of online transaction is likely to explain the insignificant influence of EMQ on online
purchase intentions for EM2. Moreover, we believe it clarifies why dimensions such as
Product Information and Pricing Mechanisms do not account for any variance in the purchase
attitude for this EM.
10
VIF-scores were computed. For sample 1 the highest VIF score was 1.982. For sample 2 the highest
VIF score was 1.871. All VIF values were below the cutoff value of 10 (Hair et al., 1998).
25
3.8. Test of Contextual Contamination
After having established the dimensionality, validity and reliability of the semantic
differential, we decided to address its sensitivity to what is known as contextual
contamination in the semantic differential literature (Landon, 1971; Osgood et al., 1957).
Contextual contamination concerns the presence of an order bias due to the likelihood that
later responses in the semantic differential are biased by previous questions (Landon, 1971, p.
375). As widely discussed in the psychological literature (e.g., Feldman and Lynch, 1988;
Tourangeau and Rasisnski, 1988; Tversky and Kahneman, 1974), order bias occurs when
respondents give answers that are consistent or inconsistent with beliefs rendered accessible
by a previous response (Bickart, 1993, p. 52). Consistent answers are known as carryover
effects, while inconsistent answers have been referred to as backfire effects (see Bickart,
1993). Both carryover and backfire effects are important sources of measurement error and
can cause contextual contamination.
To assess the sensitivity of the EMQ scale to contextual contamination, a test was
conducted. Following Bickart (1993) and Landon (1971), we adopted a procedure consisting
of the estimation of shifts in means evaluations. Using a quasi-experimental design, we
assessed the means of the twelve EMQ dimensions in the same sequence as used in the
development process (see Appendix C) and compared the results with the means of the EMQ
dimensions when put in reversed order.
Procedure
A sample of 192 undergraduate students following a mandatory information systems course
at a Dutch university was invited to participate in an experimental survey. The experimental
survey consisted of the study of an EM and filling in an online questionnaire addressing
perceptions of EMQ. The participants received the same instructions as those who
participated in the pilot study (see section 3.6). eBay.nl (EM1) and the Dutch EM with the
largest market share in the Netherlands (EM2) were selected as stimuli (cf. section 3.7). The
students were assigned randomly to one of the two EMs. For the sake of convenience, we
decided to hand out the instructions via a digital learning environment, and allow students to
participate in the project either at home or on campus.
To test for contextual contamination, three different versions of the questionnaire were
constructed and randomly assigned to the participants. Following Landon (1971), the first
version of the questionnaire presented the twelve EMQ dimensions and their items in the
order as used previously in the development process (standard order). In the second list, the
26
twelve dimensions were presented in reversed order (dimension reversed order). The third
version of the questionnaire extended the work of Landon by reversing the order of the items
within each dimension (item reversed order).
Results
To analyze the data and detect shifts in means, a series of independent t-tests was conducted.
We computed the means for the twelve dimensions, and compared the means in the standard
order EMQ questionnaire against the means in the two modified versions of the questionnaire
(Appendix E). The results demonstrate few significant shifts in the means of the dimensions
across the different groups. The reversal of the items within each dimension (item reversed
order) did not result in any different evaluations of the dimensions, implying an absence of
within-factor contextual contamination. The reversal of the dimensions (dimension reversed
order) did result in a few significant differences (EM1: Contacting the Intermediary, Meeting
Sellers; EM2: Community and Settlement). Although these differences are significant, a
visual inspection reveals that a pattern, indicating a structural shift in means across
subsequent dimensions, is lacking. Thus, the shift in means for one dimension is not carried
over to following dimensions. Evidently, the test did not reveal potential contextual
contamination weaknesses in the semantic differential. As such it verifies the steps taken
previously in the development process, including the tailoring of items and their antonyms to
the concept under study, and the appropriate usage of the factor analytic approach.
3.9. Cross-Validation
Finally, we cross-validated the semantic differential with new data (cf. Dabholkar et al.,
1996; Viswinathan et al., 2000). The data was collected in a Dutch EM facilitated by a
publisher of one of the largest daily newspapers in the Netherlands. Following the dyadic
approach as adopted previously (section 3.7), two independent samples were collected. The
samples included 863 visitors (Appendix C) of the automobile section of the EM (sample 3),
and 590 visitors (Appendix C) of the study books section of the EM (sample 4). The fact that
the cross-validation focused on the purchase of two different products (cars and study books)
than used for scale purification (digital camera), can be seen as test of the robustness of the
EMQ scale.
Procedure
27
Banners were placed within the two sections of the EM, inviting visitors to participate
voluntarily in the research by filling in an online questionnaire 11 . The online questionnaire
addressed basic demographics, perceptions of EMQ, purchase attitude and purchase intention
(cf. section 3.7). For the purpose of extended predictive validity testing, two additional
constructs were included, namely website satisfaction and loyalty intentions (Appendix F).
Both constructs are hypothesized to be affected by website quality perceptions (Wolfinbarger
and Gilly, 2003). The website satisfaction measure was taken directly from Szymanski and
Hise (2000) who, building upon established consumer satisfaction literature (e.g., Spreng,
MacKenzie and Olshavsky, 1996; Oliver, 1980; Zeithaml, Berry and Parasuraman, 1996),
applied a seven point two-item semantic differential. Both the work of Szymanski and Hise
(2000) and a re-examination conducted by Evanschitzky, Iyer, Hesse and Ahlert (2004),
corroborate the applicability of the items and strongly confirm the validity and reliability of
this semantic differential for website satisfaction. We slightly adapted the introduction to the
two bipolar items to make them more applicable to the context of our research. We then used
the Webster’s Collegiate Thesaurus (1988) to assess the linguistic contrast of the antonyms,
and subjectively judged the psychological bipolarity of the scales. The analysis supported
replication of the original items. To measure loyalty intentions, four Likert scale items were
taken from the work of Sirdeshmukh, Singh and Sabol (2002). The items were adapted to
reflect online loyalty intentions with respect to purchasing in an EM. Next, using three
academic experts in the field of questionnaire design, both instruments for website
satisfaction and loyalty intentions were translated into Dutch and then evaluated in terms of
concept-scale pairings (cf. Hardesty and Bearden, 2004), wording and interpretability (cf.
Cannell et al.; 2004; Foddy, 1993; Reynolds et al.; 1993).
Results
Tests of Scale Dimensionality, Reliability and Validity
The dimensionality of the correlated twelve first-order factor model was re-assessed using
Amos 5.0 with maximum likelihood estimation. Except for the chi-square tests (sample 3:
χ2= 1352,609, df = 562, p<.001; sample 4: χ2 = 1039,206; df = 562; p<.001), all fit indices
demonstrated very good fit with the data for sample 3 (sample 3: GFI .92; AGFI .90; RMR
.062; RMSEA .040; NFI .96; TLI .97; CFI: .98) and sample 4 (GFI .91; AGFI .89; RMR
11
As incentive, a raffle of 20 book tokens of 10 euro was communicated. The participants were asked
to fill in their e-mail address to engage in the raffle. The e-mail addresses were also used to verify that
respondents participated in no more than one survey.
28
.049; RMSEA .038; NFI .95; TLI .97; CFI: .98). As such, the results strongly confirm the
twelve-dimensional meaning of the semantic differential. Next, we re-assessed the reliability
and validity of the semantic differential (cf. Gerbing and Anderson, 1988; Netemeyer et al.,
2003). Table 6 displays the results.
Table 6: Reliability and validity statistics for sample 3 and 4
Sample 3 (n= 863)
Dimension
α
Minimum
Sample 4 (n=590)
AVE
α
Minimum
Item to total
Item to total
correlation
correlation
AVE
Layout
.92
.803
.71
.93
.829
.73
Ease of use
.93
.832
.75
.93
.807
.72
Contacting the
.96
.882
.82
.96
.894
.83
Institutional control
.95
.843
.76
.95
.849
.76
Community
.89
.744
.61
.90
.776
.63
Contacting sellers
.95
.894
.82
.96
.891
.83
Seller information
.94
.800
.77
.94
.801
.76
Product information
.89
.754
.61
.86
.705
.55
Pricing mechanisms
.92
.791
.70
.90
.765
.66
Assortment
.93
.807
.73
.94
.831
.77
Settlement
.93
.823
.73
.96
.893
.83
Meeting sellers
.95
.858
.80
.92
.814
.71
intermediary
Using the same criteria applied in the scale-purification process, the alphas, AVE’s and
minimum item-to-total correlations indicated good to very good reliability, and strongly
confirm the convergent validity of the twelve EMQ dimensions. Applying the same
procedures as described previously (section 3.7), discriminant validity was also strongly
confirmed.
Finally, we assessed the predictive validity of the EMQ scale by regressing the twelve
dimensions on the online purchase attitude, online purchase intention, website satisfaction
and loyalty intention. The results are reported below.
Table 7: Standardized regression coefficients of EMQ dimensions on attitude, intention,
e-satisfaction and e-loyalty.
Sample 3 (n=863)
Sample 4 (n=590)
29
Dimension
Attitude
Intention
e-Satis.
e-Loyal.
Attitude
Intention e-Satis.
e-Loyal.
Layout
.04
.06
.13*
.08
.04
-.04
.18**
.09
Ease of use
-.00
-.01
.23**
.04
.14*
.11
.25**
.04
Contacting
-.00
.03
.02
.00
.07
.06
.03
.07
.05
.10
.17**
.10
-.03
-.07
-.01
.02
Community
-.03
-.00
.07
-.00
-.09
-.07
.02
-.01
Contacting
.01
-.02
.06
.03
.11
.09
.12*
.09
.01
-.02
-.09
-.02
-.00
-.01
.08
.03
.04
.02
.05
.02
.12*
.07
-.03
.06
.09
.02
-.08
.05
-.03
.03
.04
.02
Assortment
.20**
.22**
.09
.26**
.14*
.18**
.15**
.31**
Settlement
.14*
.09
.07
.13*
.26**
.29**
.06
.08
Meeting
.19**
.17**
.10*
.13*
-.00
-.04
.02
.02
the
intermediary
Institutional
control
sellers
Seller
information
Product
information
Pricing
mechanisms
sellers
R squared
.33
.26
.40
.37
.30
.24
.40
.35
Adjusted R
.32
.25
.39
.36
.28
.22
.39
.33
squared
** significant at P< .001, * significant at P< .01, A computation of VIF-scores showed
an absence of multicollinearity 12 . We then focused on the amount of explained variance and
significance of the coefficients. The results strongly support the predictive validity of the
EMQ scale as a whole. For both samples, the EMQ dimensions explain between 22% and
39% of its dependents.
12
All scores were below the recommended cut-off value of 10 (sample 3: highest VIF = 2.268;
sample 4: highest VIF = 2.158).
30
Of the twelve EMQ dimensions, eight dimensions directly contribute to the variance of
at least one of the four dependents across the two datasets. Assortment and Settlement can be
labeled as strongest and most stable determinants. The influence of Ease of Use and Meeting
Sellers is also strong, though less robust across the dependents or samples. Other dimensions
that affect at least one of the dependents include Layout, Institutional Control, Contacting
Sellers and Product Information. Regarding the four dimensions that do not significantly
contribute to the dependents, more research is needed. Following the literature on website
quality (e.g., Kim and Stoel, 2004b; Wolfinbarger and Gilly, 2003), components of first-order
factor website quality models are likely to affect consumer purchasing both directly and
indirectly. The indirect influence of the components is likely to occur via other quality
dimensions (Kim and Stoel, 2004b; Wolfinbarger and Gilly, 2003). Such second-order effects
(also see Van der Heijden and Verhagen, 2004) were not part of our direct-effects models,
and stress the need for further research.
When comparing the results across both samples, a predictive pattern is noteworthy.
Settlement strongly predicts purchase attitudes and intentions when purchasing study books,
but does have a weak effect when purchasing a car. Meeting Sellers, in contrast, significantly
influences all dependents when purchasing a car, but has no effect in the study book context.
Possibly, the results can be explained by literature on product complexity. Cars and books
have different levels of product complexity, the first being more complex (Iacobucci, 1992),
and therefore riskier to buy online (see Bhatnagar, Misra and Rao, 2000). This might explain
why physical inspection of the product seems inevitable and online transaction settlement
improbable when purchasing a car in an EM. When buying less complex and lower risk
products such as books, however, Settlement is likely to be crucial and Meeting Sellers
trivial. Falling outside the scope of this cross-validation exercise, which centers on plausible
predictive structures instead of advanced nomological models (cf. Agarwal and Karahanna,
2000; Chin et al., 1997), more theoretical advances are needed to address these issues.
4. DISCUSSION
The goal of this research is to shed light on the development and usage of the semantic
differential in IS research. As such, it aims at preventing arbitrary application of the semantic
differential and intends to add to measurement validation in the IS research field (also see
Boudreau et al., 2001; Straub; 1989; Straub, Hoffman, Weber and Steinfield, 2002). Drawing
upon the semantic differential literature, a framework for semantic differential development
is put forward. The framework highlights the need for adoption and extension of established
31
guidelines for scale development, by inclusion of testing procedures for concept-scale
interaction, wording credibility, linguistic contrast, psychological bipolarity, and contextual
contamination. As such, it complies with the call of Straub et al. (2002) to extend
measurement methods in the academic community. A demonstration exercise focusing on the
semantic differentiation of EMQ exemplifies the framework. Using theory review, focus
group interviews, linguistic testing and data collected in three EMs in the Netherlands, a
twelve-dimensional semantic differential was developed and cross-validated. We conclude
this study with a discussion of implications, recommendations and limitations.
Implications for Researchers
A significant contribution of this study is its clarification of the prerequisites for semantic
differentiation and the provision of corresponding guidelines for research. The literature
review underlines the key role of adequate bipolar scale selection, particularly because the
scales function as axes of the semantic space that is used to allocate the meaning of the
concept. For that reason, the selection of applicable bipolar scales is crucial and demands
alignment with the particular goals and objectives of the researcher (Osgood et al., 1957).
Moreover, the framework provides guidelines for bipolar scale selection and judgment
(sections 3.2 and 3.3), which were demonstrated to result in relevant concept-scale pairings
for the EMQ construct. Since linguistic encoding forms the heart of semantic differentiation
(Osgood et al., 1957), the technique heavily relies on linguistic bipolarity, psychological
bipolarity, and credibility of within-scale wording combinations. Directions for testing
bipolarity and wording combinations are put forward and empirically illustrated (sections 3.4
and 3.5). Finally, to define the axes of the semantic space, independency of the factors
representing these axes is a necessity. The framework addresses this issue by proposing
combined adoption of established factor analytic approaches, tests of psychometric properties
(sections 3.6, 3.7, 3.9), and a test of contextual contamination (section 3.8). Directions for
successful application of these methods are proposed and clarified empirically via the
development of the twelve-dimensional semantic differential for EMQ. On the whole, the
framework provides researchers with guidelines and examples that they can apply when
developing semantic differentials for their own research.
While instrument development is a necessary step for building theory in relatively new
or unexplored research fields (Straub, 1989), researchers are also encouraged to use existing
and validated semantic differentials. Such a confirmatory approach simplifies the instrument
development process and is especially useful for the purpose of theory testing (Boudreau et
32
al., 2001; Straub, 1989). The framework adds value to future theory testing studies by
highlighting the necessity and principles of concept-scale pairing judgment (section 3.3),
linguistic and psychological bipolarity (section 3.4), and linguistic clarity of wording
combinations (section 3.5). Given the flaws that are noticed in the usage of existing semantic
differentials, we strongly encourage researchers to address these issues when adopting
existing semantic differentials. Such a considerate approach prevents that basic requirements
are not met and secures the validity of the semantic differential in theory testing and crossvalidation studies (see section 3.9).
Implications for Reviewers
Prior applications of the semantic differential technique in established academic journals are
subject to weaknesses and misunderstandings. Observed flaws include (1) subjective bipolar
scale selections; (2) absence of tests of linguistic contrast, psychological bipolarity and
wording combinations; (3) ambiguous and weak tailoring of scales and concept; (4) absence
of empirical tests of factorial structure; and (5) confusion concerning the use of the semantic
differential wording versus Likert scale measurement. To prevent such flaws, we foresee an
important task for reviewers to confront researchers with unmet basic semantic differential
requirements before paper acceptance. The directives provided in this research project offer
reviewers criteria to evaluate the usage of the semantic differential in manuscripts. Reviewers
are encouraged to use these criteria and, most importantly, point out to researchers that the
semantic differential is not a mere alternative scaling format, and in such simplified form
cannot be accepted. One should realize that adoption of the semantic differential as a
measurement technique implies acceptance of its prerequisites, and necessitates a
comprehensive conceptual and empirical preparation.
Implications for EM Studies
In line with increased academic attention in the EM research community (e.g., Bapna et al.,
2004; Cheng et al., 2006; Lancastre and Lages, 2006), we see the EMQ concept as an
interesting avenue of advanced theory testing. Since the existing literature lacks both
conceptual and confirmatory studies of EMQ, the concept is likely to fill an existing research
gap. Of particular interest would be to extend our work to more advanced nomological
networks. Even though our study draws upon the literature to test predictive validity in a
plausible structure, the goal of our work is not theory testing per se (cf. Agarwal and
Karahanna, 2000). The specified model was developed solely as a plausible means to assess
33
the predictive validity of the semantic differential and should be viewed in this light (cf.
Salisbury, Chin and Gopal, 2002, p. 98).
We encourage researchers to adopt the EMQ concept and develop and test related
theoretical structures in more advanced settings. A possible direction for such research
follows from the results of our purification exercise, which show that the influence of EMQ
on consumer purchasing is likely to depend on the support provided by the EM during the
stages of the purchase decision-making process. Accordingly, it could be useful to replicate
our predictive model across EMs that differ in the stages that they support (see Grieger, 2003;
Lindemann and Schmid, 1999; Skjøtt-Larsen et al., 2003).
Another interesting future research area concerns an extension of our predictive model
with product complexity and perceived risk. As indicated by our cross-validation study, both
concepts might moderate the relationships between EMQ and consumer purchasing. Falling
outside the scope of this methodological paper, both research directions demand future
research.
Implications for Practice
The implications for practice are twofold. First, practitioners in general should recognize that
semantic differentiation depends on the requirements as illuminated in this paper. Since
questionable usage of the semantic differential is prevalent in the IS literature, validity
problems are likely to exist. This implies that these semantic differentials as well as the
findings of the studies in which they are used demand cautious adoption, and are unlikely to
be a credible basis for management decision-making. Second, the developed EMQ semantic
differential offers intermediaries of C2C EMs a validated instrument to gather information
about the quality of the EM they operate, based on which this quality can be enhanced. By
measuring and stimulating EMQ, intermediaries can improve consumer attitudes, intentions,
loyalty and satisfaction. Since some EMQ dimensions explicitly refer to features and actions
accounted for by sellers (e.g., Product Information; Meeting Sellers), intermediaries might
share their findings with the population of sellers. Together, both parties are able to optimize
the EM and influence (potential) buyers’ impressions thereof. Additional insight into
differences across product types, as referred to above, might be used as valuable input for
further fine-tuning of different trading sections within EMs (e.g., books, cameras and cars).
Limitations and Recommendations
34
Irrespective of the conceptual and empirical support for the proposed framework, our work
has a number of limitations that generate interesting new directions for research. First, even
though studying real EMs contributes to the external validity of our findings (Aaker, 1997),
the scope of our work is limited to Dutch EMs and respondents. This implies that the
constructed semantic differential is tailored to Dutch linguistics and interpretations. As
empirically demonstrated in the literature (e.g., Osgood, 1959; Osgood, 1960; Suci, 1960;
Tanaka, Oyama and Osgood, 1963), however, wording interpretations might differ across
cultures and even result in different factor structures. In addition, research (e.g., Baack and
Singh, 2007; Singh and Baack, 2004) has demonstrated that the influence of website
perceptions on consumer online purchasing also differs across cultures and countries. Future
research could address these issues and cross-validate EMQ as well as the influence of EMQ
on consumer purchasing across different countries and cultures.
Second, despite the methodological comprehensiveness of the framework, it is probable
that further refinements can be made. One area for improvement includes further bipolarity
testing. By addressing linguistic and psychological bipolarity, the framework draws upon a
rather broad definition of bipolarity. A stricter interpretation of bipolarity (see Cogliser and
Schriesheim, 1994; Schriesheim and Klich, 1991), further extends this view by adding metric
testing of (1) the midpoint of each semantic scale pair (see Cogliser and Schriesheim, 1994),
and (2) the equidistance of the value scales to their midpoint (Messick, 1957; Schriesheim
and Klich, 1991). Future research could address this issue and extend the framework with
additional methods for bipolarity testing.
Third, falling outside the scope of this study, we did not compare the psychometric
properties of the semantic differential versus alternative measurement techniques such as
Likert scaling and ratio scaling. Even though results of empirical studies tend to favor the
semantic differential over these methods (see Friborg et al, 2006; Van Auken and Barry,
1995; Van Auken, Barry and Bagozzi, 2006), more research in IS settings is needed. Of
particular interest would be to juxtapose the most prevalent measurement methods in IS
research, and to evaluate their psychometric properties for established IS constructs. Given
that weaker psychometric properties of scaling methods result in systematic measurement
error, hereby reducing the amount of explained trait variance (Cote and Buckley, 1987; see
also Van Auken et al., 2006), selection of a scaling method has substantial implications for
theory and practice. We encourage researchers to address these issues and add to an
unexplored research field.
35
References
Aaker, J.L. "Dimensions of Brand Personality," Journal of Marketing Research (34:3), 1997,
pp. 347-356.
Agarwal R. and Karahanna, E. “Time Flies When You’re Having Fun: Cognitive Absorption
and Beliefs about Information Technology Usage,” MIS Quarterly (24:4), 2000, pp. 665-694.
Aladwani, A. M. and Palvia, P.C. "Developing and Validating an Instrument for Measuring
User-Perceived Web Quality," Information and Management (39:6), 2002, pp. 467-476.
Allport, C.D. and Kerler III, W.A. "A Research Note Regarding the Development of the
Consensus on Appropriation Scale," Information Systems Research (14:4), 2003, pp. 356-59.
Arbuckle, J.L. and Wothke, W. Amos 4.0 User’s Guide, SmallWaters Corporation, Chicago,
1999.
Arbuckle, J.L. Amos 5.0 Update to the Amos User’s Guide, SmallWaters Corporation,
Chicago, 2003.
Baack, D.W. and Singh N. “Culture and Web Communications,” Journal of Business
Research (60:3), 2007, pp. 181-188
Bagozzi, R.P., Yi, Y., and Phillips, L.W. “Assessing Construct Validity in Organizational
Research,” Administrative Science Quarterly (36:3), 1991, pp. 421-458.
Bailey, J. and Pearson, S.W. “Development of a Tool for Measuring and Analyzing
Computer User Satisfaction,” Management Science (29:5), 1983, pp. 530-545.
Bakos, Y. “A Strategic Analysis of Electronic Marketplaces,” MIS Quarterly (15:3), 1991,
pp. 295-310.
36
Bakos, Y. "The Emerging Role of Electronic Marketplaces on the Internet," Communications
of the ACM (41:8), 1998, pp. 35-42.
Banerjee, D., Cronan, T.P., and Jones, T.W. "Modeling IT Ethics: A Study in Situational
Ethics," MIS Quarterly (22:1), 1998, pp. 31-60.
Bapna, R., Goes, P., Gupta, A., and Jin, Y. “User Heterogeneity and its Impact on Electronic
Auction Market Design: An Empirical Exploration,” MIS Quarterly (28:1), 2004, pp. 21-43.
Barki, H., Rivard, S., and Talbot, J. “An Integrative Contingency Model of Software Project
Risk Management,” Journal of Management Information Systems (17:4), 2001, pp. 37-69.
Bearden W.O., Sharma S., and Teel J. E. “Sample Size Effects on Chi Square and Other
Statistics Used in Evaluating Causal Models,” Journal of Marketing Research (19:4), 1982,
pp. 425-430.
Bearden, W.O. and Netemeyer, R.G. Handbook of Marketing Scales: Multi-item Measures
for Marketing and Consumer Behavior Research. Sage Publications, Thousands Oaks,
California, 1998.
Bearden, W.O., Hardesty, D.M., and Rose, R. L. "Consumer Self-Confidence: Refinements
in Conceptualization and Measurement," Journal of Consumer Research (28:1), 2001, pp.
121-34.
Belanger, F., Hiller, J. S., and Smith, W. J. “Trustworthiness in Electronic Commerce: The
Role of Privacy, Security, and Site Sttributes,” Journal of Strategic Information Systems
(11:3-4), 2002, pp. 245-270.
Bentler, P.M. and Bonnet, D.G. “Significance Tests and Goodness-of-Fit in the Analysis of
Covariance Structure,” Psychological Bulletin (88:3), 1980, pp. 588-606.
Bergeron, F., Raymond, L., Rivard, S., and Gara, M-F. “Determinants of EIS Use: Testing a
Behavioral Model,” Decision Support Systems (14:2), 1995, pp. 131-146.
37
Bhatnagar, A., Misra, S., Rao, R.H. “On Risk, Convenience, and Internet Shopping
Behavior,” Communications of the ACM (43:11), 2000, pp. 98-105.
Bhattacherjee, A. "Understanding Information Systems Continuance: An ExpectationConfirmation Model," MIS Quarterly (25:3), 2001, pp. 351-70.
Bhattacherjee, A. and Premkunar, G. “Understanding Changes in Belief and Attitude Toward
Information Technology Usage: A Theoretical Model and Longitudinal Test,” MIS Quarterly
(28:2), 2004, pp. 229-254.
Bhattacherjee, A. and Sanford C. “Influence Processes for Information Technology
Acceptance: An Elaboration Likelihood Model,” MIS Quarterly, 2006, (30:4), 2006, pp. 805825.
Bickart, B.A. “Carryover and Backfire Effects in Marketing Research”, Journal of Marketing
Research (30:1), 1993, pp. 52-62.
Blair, J. and Presser, S. "An Experimental Comparison of Alternative Pretest Techniques: A
note on Preliminary Findings," Journal of Advertising Research (32:2), 1992, RC-2-RC-5.
Boudreau, M-C., Gefen, D., and Straub, D.W. "Validation in Information Systems Research:
A State-of-the-Art Assessment,” MIS Quarterly (25:1), 2001, pp. 1-16.
Bruce, A. R., Briggs, R.O., Shepherd, M.M., Yen, J., and Nunamaker Jr., J.F. “Affective
Reward and the Adoption of Group Support Systems: Productivity Is Not Always Enough,”
Journal of Management Information Systems (12:3), 1995, pp. 171-185.
Burke, K. and Chidambaram, L. "How Much Bandwidth is Enough? A Longitudinal
Examination of Media Characteristics and Group Outcomes," MIS Quarterly (23:4), 1999,
pp.557-80.
Cacioppo, J.T. and Berntson, G.G. “Relationships Between Attitudes and Evaluative Space:
A Critical Review, With Emphasis on the Separability of Positive and Negative Substrates,”
Psychological Bulletin (115:3), 1994, pp. 401-423.
38
Cannell, C.F., Fowler, F.J., Kalton, G., Oksenberg, L., and Bischoping, K. "New
Quantitative Techniques for Pretesting Survey Questions," in Questionnaires (4), M. Bulmer
(edt.), Sage Publications, Thousand Oaks, California, 2004, pp.187-201.
Carroll, J.B. “Review of the Measurement of Meaning”, Language (35), 1959, pp. 58-77.
Cheng, C-B., Chan, C-C.H., and Lin, K-C. “Intelligent Agents for E-marketplace:
Negotiation with Issue Trade-Offs by Fuzzy Inference Systems,” Decision Support Systems
(42:2), 2006, pp. 626-638.
Chenoweth, T., Dowling, K.L., and St. Louis, R.D. “Convincing DSS Users that Complex
Models are Worth The Effort,” Decision Support Systems (37:1), 2004, pp. 71-82.
Chidambaram, L. and Jones, B. “Impact of Communication Medium and Computer Support
on Group Perceptions and Performance: A Comparison of Face-to-Face and Dispersed
Meetings,” MIS Quarterly (17:4), 1993, pp. 465-491.
Chin, W.W. “Issues and Opinion on Structural Equation Modeling,” MIS Quarterly (22-1),
1998, pp. 7-16.
Chin, W.W., Gopal, A., and Salisbury, W.D. “Advancing the Theory of Adaptive
Structuration: The Development of a Scale to Measure Faithfulness of Appropriation,”
Information Systems Research (8:4), 1997, pp. 342-367.
Churchill Jr., G. A. "A Paradigm for Developing Better Measures of Marketing Constructs,"
Journal of Marketing Research (16:1), 1979, pp. 64-73.
Clark, L.A. and Watson, D. "Constructing Validity: Basic Issues in Scale Development,"
Psychological Assessment (7:3), 1995, pp. 309-319.
Clevenger T., Lazier, G.A., and Clark, M.L. “Measurement of Corporate Images by the
Semantic Differential,” Journal of Marketing Research (2:1), 1965, pp. 80-82.
39
Cliff, N. “Adverbs as Multipliers,” Psychological Review (66:1), 1959, pp. 27-44.
Cogliser, C.C. and Schriesheim, C.A. “Development and Application of a New Approach to
Testing the Bipolarity of Semantic Differential Items,” Educational and Psychological
Measurement (54:3), 1994, pp. 594-605.
Cote, J.A. and Buckley, M.R. “Estimating Trait, Method and Error Variance: Generalizing
Across 70 Construct Validation Studies,” Journal of Marketing Research (23:4), 1987, pp.
315-318.
Dabholkar, P.A., Thorpe, D.I., and Rentz, J.O. “A Measure of Service Quality for Retail
Stores: Scale Development and Validation,” Journal of the Academy of Marketing Science
(24:1), 1996, pp.3-16.
Darnell, D.K. “Concept Scale Interaction in the Semantic Differential,” Journal of
Communications (16:2), 1966, pp.104-115.
Davis, F.D. "Perceived Usefulness, Perceived Ease of Use, and User Acceptance of
Information Technology," MIS Quarterly (13:3), 1989, pp. 319-340.
Deese, J. “The Associative Structure of Some Common English Adjectives,” Journal of
Verbal Learning and Verbal Behavior (3:5), 1964, pp. 347-357.
DeVellis, R.F. Scale Development: Theory and Applications, Sage Publications, Thousand
Oaks, California, 2003.
De Wulf, K., Schillewaert, N., Muylle, S. and Rangarajan, D. “The Role of Pleasure in
Website Success,” Information & Management (43:4), 2006, pp. 434-446.
Dickson, J. and Albaum, G. "A Method for Developing Tailormade Semantic Differentials
for Specific Marketing Content Areas," Journal of Marketing Research (14:1), 1977, pp. 8791.
Dickson, J.W. and Slevin, D.P. "The Use of Semantic Differential Scales in Studying the
40
Innovation Boundary," Academy of Management Journal (18:2), 1975, pp. 381-388.
Doherty, N.F., Marples, C.G. and Suhaimi, A. “The Relative Success of Alternative
Approaches to Strategic Information Systems Planning: An Empirical Analysis,” Journal of
Strategic Information Systems (8:3), 1999, pp. 263-283.
Doll. W.J., Xia, W. and Torkzadeh, G. “A Confirmatory Factor Analysis of the End-User
Computing Satisfaction Instrument,” MIS Quarterly (18:4), 1994, pp. 453-461.
Evanschitzky, H., Iyer, G.R., Hesse, J., and Ahlert, D. “E-Satisfaction: A Re-examination,”
Journal of Retailing (80:3), 2004, pp. 239-247.
Falthzik, A.M. and Johnson, M.A. “Statement Polarity in Attitude Studies,” Journal of
Marketing Research (11:1), 1974, pp. 102-105.
Feldman, J.M. and Lynch, J.G. Jr. “Self-Generated Validity: Effects of Measurement on
Belief, Attitude, Intention and Behavior,” Journal of Applied Psychology (73:3), 1998, pp.
421-435.
Foddy, W. Constructing Questions for Interviews and Questionnaires: Theory and Practice
in Social Research, Cambridge University Press, Cambridge, UK, 1993.
Foddy, W. "An Empirical Evaluation of In-Depth Probes Used to Pretest Survey Questions,"
Sociological Methods & Research (27:1), 1998, pp. 103-133.
Foddy, W. "The In-Depth Testing of Survey Questions: A Critical Appraisal of Methods," in
Questionnaires (4), M. Bulmer (edt), Sage Publications, Thousand Oaks, California, 2004,
pp. 329-338.
Fornell, C. and Larker, D.F. “Evaluating Structural Equation Models with Unobserved
Variables and Measurement Error,” Journal of Marketing Research (18:1), 1981, pp. 39-50.
Friborg, O., Martinussen, M., and Rosenvinge J.H. “Likert-Based vs. Semantic DifferentialBased Scorings of Positive Psychological Constructs: A Psychometric Comparison of Two
41
Versions of a Scale Measuring Resilience,” Personality and Individual Differences (40),
2006, pp. 873-884.
Galetta, D.F., Ahuja, M., Hartman, A., Thompson, T., and Peace, A.G. “Social Influence and
End-User Training,” Communications of the ACM (38:7), 1995, pp. 70-79.
Gerbing, D.W. and Anderson, J.C. "An Updated Paradigm for Scale Development
Incorporating Unidimensionality and Its Assessment," Journal of Marketing Research (25:2),
1988, pp.186-92.
Gerbing, D.W. and Hamilton, J.G. "Viability of Exploratory Factor analysis as a Precursor to
Confirmatory Factor Analysis," Structural Equation Modeling (3:1), (1996), pp. 62-72.
Green, R.F. and Goldfried, M.R. “On the Bipolarity of Semantic Space,” Psychological
Monographs (79:6), 1965, p. 599 (whole no.).
Greenberg, J. "The College Sophomore as Guinea Pig: Setting the Record Straight," Academy
of Management Review (12:1), 1987, pp. 157-159.
Grewal, R., Comer, J.M., and Mehta, R. "An Investigation into the Antecedents of
Organizational Participation in Business-to-Business Electronic Markets," Journal of
Marketing (65:3), 2001, pp.17-33.
Grieger, M. “Electronic Marketplaces: A Literature Review and a Call for Supply Chain
Management Research,” European Journal of Operational Research (144:2), 2003, pp. 280294.
Griffith, T.L. and Northcraft, G.B. “Cognitive Elements in the Implementation of New
Technology: Can Less Information Provide More Benefits?,” MIS Quarterly (20:1), 1996, pp.
99-110.
Hair Jr., J.F., Anderson, R.E., Tatham, R.L., and Black, W.C. Multivariate Data Analysis.
Prentice-Hall, Inc., Upper Saddle River, New Jersey, 1998.
42
Hambleton, R.K. and Rogers, J.H. "Advances in Criterion-Referenced Measurement," in
Advances in Educational and Psychological Testing: Theory and Applications, R.K.
Hambleton and J.N. Zaal (eds.), Kluwer Academic Publisers, Dordrecht, the Netherlands,
1991, pp. 3-43
Hardesty, D.M. and Bearden, W.O. "The Use of Expert Judges in Scale Development:
Implications for Improving Face Validity of Measures of Unobservable Constructs," Journal
of Business Research (57:2), 2004, pp. 98-107.
Hawkins, D.I., Albaum, G., and Best, R. "Stapel Scale or Semantic Differential in Marketing
Research?," Journal of Marketing Research (11:3), 1974, pp.318-322.
Heise, D.R. “Some Methodological Issues in Semantic Differential Research,” Psychological
Bulletin (72:6), 1969, 406-422.
Heise, D.R. “The Semantic Differential and Attitude Research,” in Attitude Measurement,
G.F. Summers (edt.), Rand McNally, Chicago, 1970, pp. 235-253.
Hsiao, R-L. “Technology Fears: Distrust and Cultural Persistence in Electronic Marketplace
Adoption,” Journal of Strategic Information Systems (12:3), 2003, pp. 169-199.
Hu, X., Lin, Z., Whinston, A.B., and Zhang, H. “Hope or Hype: on the Viability of Escrow
Services as Trusted Third Parties in Online Auction Environments,” Information Systems
Research (15:3), 2004, pp. 236-249.
Huang, M-H. “Web Performance Scale,” Information & Management (42:6), 2005, pp. 841852.
Iacobucci, D. “An Empirical Examination of Some Basic Tenets in Services: Goods-Services
Continua,” in Advances in Services Marketing and Management (1), S. Teresa, D.E. Bowen,
and S.W. Brown (eds.), JAI Press, Inc., Greenwich, 1992, pp. 23-52.
Jarupathirun, S. and Zahedi, F.M. “Exploring the Influence of Perceptual Factors in the
Success of Web-Based Spatial DSS,” Decision Support Systems (43:3), 2007, pp. 933-951.
43
Jarvenpaa, S.L. and Staples, D.S. “The Use of Collaborative Electronic Media for
Information Sharing: An Exploratory Study of Determinants,” Journal of Strategic
Information Systems (9:2-3), 2000, pp. 129-154.
Jayanti, R.K. and Burns, A.C. “The Antecedents of Preventive Health Care Behavior: An
Empirical Study,” Journal of the Academy of Marketing Science (26:1), 1998, pp. 6-15.
Kahn, R.L. and Cannell, C.F. "The Formulation of Questions," in Questionnaires (1), M.
Bulmer (edt.), Sage Publications, Thousand Oaks, California, 2004, pp. 55-78.
Kelly, R.F. and Stephenson, R. "The Semantic Differential: An Information Source for
Designing Retail Patronage Appeals," Journal of Marketing (31:4), 1967, pp. 43-47.
Kerlinger, F.N. Foundations of Behavioral Research (second edition), Holt, Rinehart and
Winston, Inc., New York, 1973.
Kim, S. and Stoel, L. "Dimensional Hierarchy of Retail Website Quality," Information and
Management (41:5), 2004(a), pp. 619-633.
Kim, S. and Stoel, L. "Apparel Retailers: Website Quality Dimensions and Satisfaction,"
Journal of Retailing and Consumer Services (11:2), 2004(b), pp. 109-117.
Ko, D-G., Kirsch, L.J., and King, W.R. “Antecedents of Knowledge Transfer from
Consultants to Clients in Enterprise System Implementations,” MIS Quarterly (29:1), 2005,
pp. 59-85.
Lancastre, A. and Lages, L.F. “The Relationship Between Buyer and a B2B e-marketplace:
Cooperation Determinants in an Electronic Market Context,” Industrial Marketing
Management (35:6), 2006, pp. 774-789.
Landon Jr., E.L. "Order Bias, the Ideal Rating, and the Semantic Differential," Journal of
Marketing Research (8:3), 1971, pp. 375-78.
44
Lee, H-G. and Clark, T.H. “Market Process Reengineering Through Electronic Market
Systems: Opportunities and Challenges,” Journal of Management Information Systems
(13:3), 1997, pp. 113-136.
Lee, Y. and Kozar, K.A., “Investigating the Effect of Website Quality on E-business Success:
An Analytic Hierarchy Process (AHP) approach,”Decision Support Systems (42:3), 2006, pp.
1383-1401.
Liao, Z. and Cheung, M.T. “Internet-based E-banking and Consumer Attitudes: An Empirical
Study,” Information & Management (39:4), 2002, pp. 283-295.
Li, E.Y., McLeod, R., and Rogers J.C. “ Marketing Information Systems in the Fortune 500
Companies: Past, Present and Future,” Journal of Management Information Systems (10:1),
1993, pp. 165-192.
Lindemann, M.A. and Schmid, B.F. “Framework for Specifying, Building, and Operating
Electronic Markets,” International Journal of Electronic Commerce (3:2), 1999, pp. 7-21.
Malhotra, N. K. "A Scale to Measure Self-Concepts, Person Concepts, and Product
Concepts," Journal of Marketing (18:4), 1981, pp. 456-64.
Malhotra, N.K., Agarwal, J., and Peterson, M. "Methodological Issues in Cross-Cultural
Marketing Research," International Marketing Review (13:5), 1996, pp. 7-43.
McKinney, V, Yoon, K., and Zahedi, F.M. "The Measurement of Web-Customer
Satisfaction: An Expectation and Disconfirmation Approach," Information Systems Research
(13:3), 2002, pp. 296-315.
McKeen, J.D., Guimaraes, T., and Wetherbe, J.C. “The Relationship Between User
Participation and User Satisfaction: An Investigation of Four Contingency Factors,” MIS
Quarterly (18:4), 1994, pp. 427-451.
Messick, S.J. “Metric Properties of the Semantic Differential,” Educational and
psychological measurement (17:2), 1957, pp. 200-206.
45
Mindak, W.A. "Fitting the Semantic Differential to the Marketing Problem," Journal of
Marketing (25:4), 1961, pp. 28-33.
Mittal, V. and Kamakura, W.A. "Satisfaction, Repurchase Intent, and Repurchase Behavior:
Investigating the Moderating Effect of Customer Characteristics," Journal of Marketing
Research (38:1), 2001, pp.131-142.
Moore, G. C. and Benbasat, I. "Development of an Instrument to Measure the Perceptions of
Adopting an Information Technology Innovation," Information Systems Research (2:3),
1991, pp. 192-222.
Mykytyn, P.P. and Green, G.I. “Effects of Computer Experience and Task Complexity on
Attitude of Managers,” Information & Management (23:5), 1992, pp. 263-278.
Netemeyer, R.G., Bearden, W.O., and Sharma, S. Scaling Procedures: Issues and
Applications, Sage Publications, Thousands Oaks, California, 2003.
Nunnally, J.C. Psychometric Theory, McGraw-Hill, Inc., New York, 1967.
Nunnally J. C. and Bernstein, I.H. Psychometric Theory. McGraw-Hill, Inc., New York,
1994.
Okazaki, S. “What Do We Know About Mobile Internet Adopters? A Cluster Analysis,”
Information & Management (43:2), 2006, pp. 27-141.
Oliver, R.L. “A Cognitive Model of the Antecedents and Consequences of Satisfaction
Decisions,” Journal of Marketing Research (17:4), 1980, pp. 460-469.
Osgood, C.E. “The Nature and Measurement of Meaning,” Psychological Bulletin (49:3),
1952, pp. 197-237.
Osgood, C.E. and Suci G.J. “Factor Analysis of Meaning,” Journal of Experimental
Psychology (50:5), 1955, pp. 325-338.
46
Osgood, C.E., Suci, G.J., and Tannenbaum, P.H. The Measurement of Meaning. University of
Illinois Press, Urbana, Illinois, 1957.
Osgood, C.E. “The Cross-Cultural Generality of Visual-Verbal Synesthetic Tendencies,”
Behavioral Science (5), 1960, pp. 146-169.
Palmer, J.W. "Web Site Usability, Design, and Performance metrics," Information Systems
Research (13:2), 2002, pp. 151-167.
Pavlou, P.A. “Institution-Based Trust in Interorganizational Exchange Relationships: The
Role of Online B2B Marketplaces on Trust Formation, “ Journal of Strategic Information
Systems (11:3-4), 2002, pp. 215-243.
Pavlou, P.A. and Gefen, D. "Building Effective Online Marketplaces with Institution-Based
Trust," Information Systems Research (15:1), 2004, pp. 37-59.
Peterson, R.A. "On the Use of College Students in Social Science Research: Insights from a
Second-Order Meta-Analysis," Journal of Consumer Research (28:3), 2001, pp. 450-461.
Ping Jr., R.A. "On Assuring Valid Measures for Theoretical Models Using Survey Data,"
Journal of Business Research (57:2), 2004, pp. 125-141.
Pinker, E.J., Seidmann, A., and Vakrat, Y. "Managing Online Auctions: Current Business and
Research Issues," Management Science (49:11), 2003, pp. 1457-1484.
Reynolds, N. and Diamantopoulos, A. "The Effect of Pretest Method on Error Detection
Rates: Experimental Evidence," European Journal of Marketing (32:5-6), 1998, pp. 480-98.
Reynolds, N., Diamantopoulos, A., and Schlegelmilch, B. "Pretesting in Questionnaire
Design: A Review of the Literature and Suggestions for Further Research," Journal of the
Market Research Society (35:2), 1993, pp. 171-82.
Robinson, J.P., Shaver, P.R., and Wrightsman Jr., L.S. "Criteria for Scale Selection and
47
Evaluation," in Measures of personality and social psychological attitudes, J.P.
Robinson, P.R. Shaver, and L.S. Wrightsman Jr. (eds.), Academic Press, San Diego,
California, 1991, pp. 1-15.
Salisbury, D., Chin, W.W., Gopal, A., and Newsted, P.R. “Research Report: Better Theory
Through Measurement - Developing a Scale to Capture Consensus on Appropriation,”
Information Systems Research (13:1), 2002, pp. 91-103.
Sarkar, M.B., Butler, B., and Steinfield, C. "Intermediaries and Cybermediaries: A
Continuing Role for Mediating Players in the Electronic Marketplace," Journal of ComputerMediated Communication (1:3), 1995, available online at
http://jcmc.indiana.edu/vol1/issue3/sarkar.html.
Schriesheim, C.A. and Klich, N.R. “Fiedler’s Least Preferred Coworker (LPC) Instrument:
An Investigation of Its True Bipolarity,” Educational and Psychological Measurement (51:2),
1991, pp. 305-315.
Segars, A.H. “Assessing the Unidimensionality of Measurement: A Paradigm and Illustration
Within the Context of Information Systems Research,” Omega (25:1), 1997, pp. 107-121.
Sekaran, U. "Methodological and Theoretical Issues and Advancements in Cross-Cultural
Research," Journal of International Business Studies, (14:2), 1983, pp. 61-73.
Sharpe, L.K. and Anderson, W.T. “Concept-Scale Interaction in the Semantic Differential,”
Journal of Marketing Research (9:4), 1972, pp. 432-434.
Shimp, T.A. and Sharma, S. "Consumer Ethnocentrism: Construction and Validation of the
CETSCALE," Journal of Marketing Research (24:3), 1987, pp.280-289.
Simonson, I. and Drolet, A. “Anchoring Effects on Consumers' Willingness-to-Pay and
Willingness-to-Accept,” Journal of Consumer Research (31:3), 2004, pp. 681-690.
Singh, S. N. and Dalal, N.P. “Web Home Pages as Advertisements,” Communications of the
ACM (42:8), 1999, pp. 91-98.
48
Singh, N. and Baack, D.W. “Studying Cultural Values on the Web: A Cross-Cultural Study
of U.S. and Mexican Websites,” Journal of Computer Mediated Communication (9:4), 2004,
available online at http://jcmc.indiana.edu/vol9/issue4/singh_baack.html
Sirdeshmuk, D., Singh J., and Sabol, B. “Consumer Trust, Value, and Loyalty in Relational
Exchanges,” Journal of Marketing (66:1), 2002, pp.15-37.
Skjøtt-Larsen, T., Kotzab, H. and Grieger, M. “Electronic Marketplaces and Supply Chain
Relationships,” Industrial Marketing Management 32(3), 2003, pp.199-210.
Snider, J.G. and Osgood, C.E. (eds.) “Semantic Differential Technique: A Sourcebook,”
Aldine Publishing Company, Chicago, 1969.
Song, X.M. and Parry. M.E. "A Cross-National Comparative Study of New Product
Development Processes: Japan and the United States," Journal of Marketing (61:2), 1997, pp.
1-18.
Spreng, R.A., MacKenzie, S.B., and Olshavsky, R.W. “A Reexamination of the Determinants
of Consumer Satisfaction,” Journal of Marketing (60:3), 1996, pp. 15-32.
Srinivasan, V., Vanden Abeele, P., and Butaye, I. "The Factor Structure of Multidimensional
Response to Marketing Stimuli: A Comparison of Two Approaches," Marketing Science
(8:1), 1998, pp. 78-88.
Stewart, K.A. and Segars, A.H. “An Empirical Examination of the Concern for Information
Privacy Instrument,” Information Systems Research (13:1), 2002, pp. 36-49.
Straub, D.W. "Validating Instruments in MIS Research," MIS Quarterly (13:2), 1989,
pp.147-169.
Straub, D.W., Hoffman, D.L., Weber, B.W. and Steinfield, C. “Toward New Metrics for NetEnhanced Organizations,” Information Systems Research (13:3), 2002, pp. 227-238.
49
Suci, G.J. “A Comparison of Semantic Structures in American Southwest culture groups,”
Journal of Abnormal and Social Psychology (61:July), 1960, pp. 25-30.
Suh, K-S. and Lee Y.E. “The Effects of Virtual Learning on Consumer Learning: An
Empirical Investigation,” MIS Quarterly 9(4), 2005, pp. 673-697.
Suh, B. and Han, I. “The Impact of Customer Trust and Perceptions of Security Control on
the Acceptance of Electronic Commerce,” International Journal of Electronic Commerce
(7:3), 2003, pp. 135-161.
Szymanski, D.M. and Hise, R.T. “E-Satisfaction: An Initial Examination,” Journal of
Retailing (76:3), 2000, pp. 309-322.
Tanaka, Y., Oyama, T., and Osgood, C.E. “A Cross-Cultural and Cross-Concept Study of the
Generality of Semantic Space,” Journal of Verbal Learning and Verbal Behavior (2:5-6),
1963, pp. 392-405.
Tittle, C.K. "Use of Judgmental Methods in Item Bias Studies," in Handbook of Methods for
Detecting Test Bias, R.A. Berk (edt), The John Hopkins University Press, Baltimore,
Maryland, 1982, pp. 31-63.
Tourangeau, R. and Rasisnki, K.A. “Cognitive Processes Underlying Context Effects in
Attitude Measurement,” Psychological Bulletin (103:3), 1988, pp. 299-314.
Tversky, A. and Kahneman, D. "Judgment under Uncertainty: Heuristics and Biases,"
Science (185: 4157), 1974, pp.1124-1131.
Van Auken, S. and Barry, T.E. “An Assessment of the Trait Validity of Cognitive Age
Measures,” Journal of Consumer Psychology (4:2), 1995, pp. 107-132.
Van Auken, S., Barry, T.E., and Bagozzi, R.P. “A Cross-Country Construct Validation of
Cognitive Age,” Journal of the Academy of Marketing Science (34:3), 2006, pp. 439-455.
50
Van der Heijden, H., Verhagen, T., and Creemers, M. "Understanding Online Purchase
Intentions: Contributions from Technology and Trust Perspectives," European Journal of
Information Systems (12:1), 2003, pp.41-48.
Van der Heijden, H. and Verhagen, T. “Online Store Image: Conceptual Foundations and
Empirical Measurement”. Information & Management (41:5), 2004, pp. 609-617.
Van der Heijden, H. “User Acceptance of Hedonic Information Systems,” MIS Quarterly
(28:4), 2004, pp. 695-704.
Van Iwaarden, J., Van der Wiele, T., Ball, L., and Millen, R. "Perceptions About the Quality
of Web Sites: A Survey Amongst Students at Northeastern University and Erasmus
University," Information and Management (41:8), 2004, pp. 947-959.
Verhagen, T., Meents, S. and Tan, Y-H. "Perceived Risk and Trust Associated with
Purchasing at Electronic Marketplaces," European Journal of Information Systems (15:6),
2006, pp. 542-555.
Viswanathan M., Childers, T., and Moore E.S. “The Measurement of Intergenerational
Communication and Influence on Consumption: Development, Validation, and CrossCultural Comparison of the IGEN Scale,” Journal of the Academy of Marketing Science
(28:3), 2000, pp. 406-424.
Watts Sussman, S. and Sproull, L. "Straight Talk: Delivering Bad News Through Electronic
Communication," Information Systems Research (10:2), 1999, pp.150-166.
Webb, D.J., Green, C.L., and Brashear, T.G. “Development and Validation of Scales to
Measure Attitudes Influencing Monetary Donations to Charitable Organizations,” Journal of
the Academy of Marketing Science (28:2), 2000, pp. 299-309.
Webster, J., Martocchio, J.J., and Joseph, J. “Microcomputer Playfulness: Development of a
Measure With Workplace Limitations,” MIS Quarterly (16:2), 1992, pp. 201-226.
Webster's collegiate thesaurus. Merriam-Webster, Springfield, Massachusetts, 1988.
51
Weinreich, U. “Travels Through Semantic Space,” Word (14:2-3), 1958, pp. 346-366.
Winter, S.J., Saunders, C., and Hart, P. “Electronic Window Dressing: Impression
Management with Websites,” European Journal of Information Systems (12:4), 2003, pp.
309-322.
Wirtz, J. and Lee, M.C. “An Examination of the Quality and Context-Specific Applicability
of Commonly Used Customer Satisfaction Measures,” Journal of Service Research (5:4),
2003, pp. 345-355.
Wolfinbarger, M. and Gilly, M.C. "ETailQ: Dimensionalizing, Measuring and Predicting
Etail Quality," Journal of Retailing (79:3), 2003, pp. 183-98.
Yang, Z., Cai, S., Zhou, Z., and Zhou, N. "Development and Validation of an Instrument to
Measure User Perceived Service Quality of Information Presenting Web Portals,"
Information & Management (42:4), 2005, pp.575-589.
Yi, M.Y. and Davis, F.D. "Developing and Validating an Observational Learning Model of
Computer Software Training and Skill Acquisition," Information Systems Research (14:2),
2003, pp. 146-169.
Zeithaml, V.A., Berry, L.L., and Parasuraman, A. “The Behavioral Consequences of Service
Quality,” Journal of Marketing (60:2), 1996, pp. 31-46.
52
APPENDIX A: Overview items used in pilot study
1.
Insufficient information to contact sellers – sufficient information to contact sellers
2.
Difficult to contact sellers via the website – easy to contact sellers via the website
3.
Slow response from sellers (to questions) – fast response from sellers (to questions)***
4.
Insufficient options to contact sellers – sufficient options to contact sellers
5.
Insufficient information to contact <name intermediary> - sufficient information to contact <name
intermediary>
6.
Difficult to contact <name intermediary> via the website – easy to contact <name intermediary> via
the website
7.
Slow response from <name intermediary> (to questions) – fast response from <name intermediary> (to
questions)***
8.
Insufficient options to contact <name intermediary> - sufficient options to contact <name
intermediary>
9.
Unattractive website layout – attractive website layout
10. Outdated website layout – up to date website layout
11. Boring website layout – interesting website layout
12. Many annoying ads from sponsors (on the website) – few annoying ads from sponsors (on the
website)***
13. Many annoying ads from the company <name intermediary> (on the website) – few annoying ads from
the company <name intermediary> (on the website)***
14. Slow website – fast website***
15. Many technical problems – few technical problems***
16. Difficult to navigate website – easy to navigate website
17. Unclear website structure - clear website structure
18. Difficult to search on the website – easy to search on the website
19. Difficult to get an overview of all products from a seller– easy to get an overview of all products from
a seller***
20. Difficult to learn how to use the website - easy to learn how to use the website
21. Difficult to compare prices - easy to compare prices***
22. Difficult to compare products - easy to compare products***
23. Difficult to monitor interesting product ads – easy to monitor interesting product ads***
24. Boring to use the website – interesting to use the website***
25. Difficult to evaluate <name products> before you buy – easy to evaluate <name products> before you
buy***
26. Unclear how to pay for <name products> – clear how to pay for <name products>
27. Difficult to pay for <name products> - easy to pay for <name products>
28. Unclear how to receive <name products> – clear how to receive <name products>
29. Difficult to receive <name products> – easy to receive <name products>
53
30. Difficult to meet sellers and evaluate <name products> before you buy - easy to meet sellers and
evaluate <name products> before you buy
31. Difficult to meet sellers and pay them - easy to meet sellers and pay them
32. Difficult to pick up <name products> at the sellers’ location - easy to pick up <name products> at the
sellers’ location
33. Outdated website information – up to date website information***
34. Insufficient information on how to prevent being swindled – sufficient information on how to prevent
being swindled***
35. Unclear information on how to prevent being swindled – clear information on how to prevent being
swindled***
36. Insufficient information about sellers – sufficient information about sellers
37. Insufficient information about the company <name intermediary> - sufficient information about the
company <name intermediary>***
38. Unclear how final prices are effected – clear how final prices are effected
39. Inconvenient pricing method – convenient pricing method
40. Unreasonable prices - reasonable prices***
41. Little value for money – much value for money***
42. Unclear what final price to pay – clear what final price to pay
43. Unclear indication of sellers’ reputation – clear indication of sellers’ reputation
44. Insufficient information about sellers’ reputation - sufficient information about sellers’ reputation
45. Insufficient guarantees – sufficient guarantees
46. Unclear information about guarantees – clear information about guarantees
47. Insufficient information about the privacy policy – sufficient information about the privacy policy
48. Insufficient privacy protection - sufficient privacy protection
49. Unclear information about the rules on <name EM> – clear information about the rules on <name EM>
50. Insufficient rules that protect me on <name EM> – sufficient rules that protect me on <name EM>
51. Weak website security – strong website security
52. Insufficient monitoring of sellers - sufficient monitoring of sellers
53. Passive in removing swindlers – active in removing swindlers
54. Few interesting <name products> – many interesting <name products>
55. Limited range of <name products> – wide range of <name products>
56. Difficult to find the offered <name products> elsewhere – easy to find the offered <name products>
elsewhere***
57. Insufficient number of <name products> - sufficient number of <name products>
58. Unclear descriptions of <name products> - clear descriptions of <name products>
59. Incorrect descriptions of <name products> – correct descriptions of <name products>
60. Bad representation of <name products> (images/photos) – good representation of <name products>
(images/photos)
61. Difficult to assess the quality of <name products> - easy to assess the quality of <name products>
62. Insufficient product photos of <name products> – sufficient product photos of <name products>
54
63. Unclear whether <name products> are used - clear whether <name products> are used
64. Unclear condition of <name products> – clear condition of <name products>
65. Difficult to contact other buyers – easy to contact other buyers
66. Difficult to share experiences with other buyers – easy to share experiences with other buyers
67. Few buyers sharing their experiences on <name EM> - many buyers sharing their experiences on
<name EM>
68. Insufficient options to communicate with other buyers – sufficient options to communicate with other
buyers
69. Weak common bond between buyers – strong common bond between buyers
*** = removed after pilot study
APPENDIX B: Sample characteristics
Sample 1 (n=1428)
Sample 2 (n=1051)
Sample 3 (n=863)
Sample 4 (n=590)
% of respondents(n)
% of respondents(n)
% of respondents(n)
% of respondents(n)
Male
50.8% (725)
33.7% (354)
70.3% (607)
28.5% (168)
Female
49.2% (703)
66.3% (697)
29.7% (256)
71.5% (422)
< 21
4.1% (58)
7.2% (76)
8.2% (71)
13.6% (80)
21-30
19% (271)
21.9% (230)
24.6% (212)
24.9% (147)
31-40
34% (486)
27.7% (291)
28.4% (245)
20% (118)
41-50
25.9% (370)
22.0% (231)
19.8% (171)
21.4% (126)
51-60
14% (200)
14.5% (152)
15.5% (134)
15.3% (90)
> 60
3% (43)
6.8% (6.8)
3.5% (30)
4.9% (29)
0% (0)
0.8% (8)
2.3% (20)
8.1% (48)
0.8% (12)
1.8% (19)
3.2% (28)
9.3% (55)
Once per month
2.1% (30)
6.3% (66)
8.8% (76)
15.9% (94)
Once per week
6.6% (94)
13.6% (143)
18.5% (160)
21.5% (127)
Gender
Age
Frequency of visiting the EM
Never, this is the
first time
A couple of times
per year
55
A couple of times
90.5% (1292)
77.5% (815)
67.1% (579)
45.1% (266)
per week
Times bought via the Internet
Never
0.9% (13)
5.4% (57)
7% (60)
4.9% (29)
Once
0.7% (10)
4.2% (44)
5.4% (47)
6.4% (38)
Twice
0.6% (9)
7.2% (76)
8.8% (76)
8.1% (48)
Three times
0.8% (12)
5.8% (61)
5.7% (49)
6.6% (39)
Four times or more
96.9% (1384)
77.4% (813)
73.1% (631)
73.9% (436)
Times bought via the EM
Never
0.6% (9)
9.3% (98)
28.5% (246)
38.8% (229)
Once
1.6% (23)
9.5% (100)
18.5% (160)
17.3% (102)
Twice
2.5% (35)
10.4% (109)
16.9% (146)
14.7% (87)
Three times
2.7% (39)
8.8% (92)
9.4% (81)
8.3% (49)
Four times or more
92.6% (1322)
62% (652)
26.7% (230)
20.8% (123)
Note: all measures are self-reported
APPENDIX C: Scale purification: results EFA and initial reliability test
Variance explained (%), eigen values (italic)
Reliability (α)
and factor loadings
Sample 1
Sample 2
Sample 1
Sample 2
(n=500)
(n=500)
(n=500)
(n=500)
Layout
2.265 (1.132)
2.741 (1.271)
.90
.89
Layout1
.774
.801
Layout2
.782
.847
Layout3
.821
.866
4.317 (2.159)
4.541 (2.271)
.90
.92
ease1
.797
.830
ease2
.783
.824
ease3
.725
.834
ease4
.776
.851
2.768 (1.384)
3.261 (1.630)
.97
.96
Ease of use
Contacting the
56
intermediary
Contmed1
.844
.851
Contmed2
.852
.871
Contmed3
.842
.872
35.640 (17.819)
29.463 (14.731)
Instit1
.810
.715
Instit2
.782
.811
Instit3
.785
.810
Instit4
.745
.799
Instit5
.682
.746
Instit6
.819
.824
Instit7
.656
.651
Instit8
.708
.612
Instit9
.653
.467
5.501 (2.750)
8.460 (4.230)
Commu1
.698
.613
Commu2
.805
.802
Commu3
.725
.778
Commu4
.789
.804
Commu5
713
.751
3.962 (1.981)
3.373 (1.686)
Contsel1
.795
.820
Contsel2
.836
.846
Contsel3
.814
.826
2.199 (1.100)
3.902 (1.951)
Infsel1
.665
.750
Infsel2
.750
.823
Infsel3
.745
.831
8.505 (4.252)
8.635 (4.317)
.733
.685
Institutional
.94
.92
.89
.88
.94
.94
.92
.93
.93
.90
Control
Community
Contacting
sellers
Seller
information
Product
information
Prodinf1
57
Prodinf2
.727
.708
Prodinf3
.781
.675
Prodinf4
.802
.809
Prodinf5
.791
.788
Prodinf6
.769
.768
Prodinf7
.773
.771
Pricing
2.597 (1.298)
2.384 (1.192)
Pricing1
.876
.831
Pricing2
.850
.817
Pricing3
.747
.795
3.827 (1.914)
2.999 (1.500)
Assor1
.880
.821
Assor2
.896
.871
Assor3
.878
.858
5.066 (2.533)
5.100 (2.550)
Settl1
.827
.802
Settl2
.838
.832
Settl3
.850
.828
Settl4
.813
.818
2.985 (1.492)
2.470 (1.235)
Meet1
.838
.786
Meet2
.884
.838
Meet3
.836
.793
79.63%
77.34%
.87
.89
.95
.91
.94
.93
.91
.91
mechanisms
Assortment
Settlement
Meeting sellers
Total variance
Explained
APPENDIX D: Factor and item overview scale purification process
58
Factor
Item
Item wording
Layout
Layout1
Unattractive website layout – attractive website layout
Layout2
Outdated website layout – up to date website layout
Layout3
Boring website layout – interesting website layout
Ease1
Difficult to navigate website – easy to navigate website
Ease2
Unclear website structure - clear website structure
Ease3*
Difficult to search on the website – easy to search on the
Ease of use
website
Ease4
Difficult to learn how to use the website - easy to learn how to
use the website
Contacting the
Contmed1
intermediary
Insufficient information to contact <name intermediary> sufficient information to contact <name intermediary>
Contmed2
Difficult to contact <name intermediary> via the website – easy
to contact <name intermediary> via the website
Contmed3
Insufficient options to contact <name intermediary> - sufficient
options to contact <name intermediary>
Institutional control
Instit1*
Insufficient guarantees – sufficient guarantees
Instit2
Unclear information about guarantees – clear information about
guarantees
Instit3
Insufficient information about the privacy policy – sufficient
information about the privacy policy
Instit4
Insufficient privacy protection - sufficient privacy protection
Instit5
Unclear information about the rules on <name EM> – clear
information about the rules on <name EM>
Instit6*
Insufficient rules that protect me on <name EM> – sufficient
rules that protect me on <name EM>
Instit7*
Weak website security – strong website security
Instit8*
Insufficient monitoring of sellers - sufficient monitoring of
sellers
Community
Instit9*
Passive in removing swindlers – active in removing swindlers
Commu1*
Difficult to contact other buyers – easy to contact other buyers
Commu2
Difficult to share experiences with other buyers – easy to share
experiences with other buyers
Commu3
Few buyers sharing their experiences on <name EM> - many
buyers sharing their experiences on <name EM>
Commu4
Insufficient options to communicate with other buyers –
sufficient options to communicate with other buyers
Commu5*
Weak common bond between buyers – strong common bond
59
between buyers
Contacting sellers
Contsel1
Insufficient information to contact sellers – sufficient
information to contact sellers
Contsel2
Difficult to contact sellers via the website – easy to contact
sellers via the website
Contsel3
Insufficient options to contact sellers – sufficient options to
contact sellers
Seller information
Infsel1
Insufficient information about sellers – sufficient information
about sellers
Infsel2
Unclear indication of sellers’ reputation – clear indication of
sellers’ reputation
Infsel3
Insufficient information about sellers’ reputation - sufficient
information about sellers’ reputation
Product information
Prodinf1 *
Unclear descriptions of <name products> - clear descriptions of
<name products>
Prodinf2
Incorrect descriptions of <name products> – correct descriptions
of <name products>
Prodinf3
Bad representation of <name products> (images/photos) – good
representation of <name products> (images/photos)
Prodinf4*
Difficult to assess the quality of <name products> - easy to
assess the quality of <name products>
Prodinf5*
Insufficient product photos of <name products> – sufficient
product photos of <name products>
Prodinf6*
Unclear whether <name products> are used - clear whether
<name products> are used
Prodinf7
Unclear condition of <name products> – clear condition of
<name products>
Pricing mechanisms
Pricing1
Unclear how final prices are effected – clear how final prices are
effected
Assortment
Pricing2
Inconvenient pricing method – convenient pricing method
Pricing3
Unclear what final price to pay – clear what final price to pay
Assor1
Few interesting <name products> – many interesting <name
products>
Assor2
Limited range of <name products> – wide range of <name
products>
Assor3
Insufficient number of <name products> - sufficient number of
<name products>
Settlement
Settl1
Unclear how to pay for <name products> – clear how to pay for
<name products>
60
Settl2
Difficult to pay for <name products> - easy to pay for <name
products>
Settl3
Unclear how to receive <name products> – clear how to receive
<name products>
Settl4*
Difficult to receive <name products> – easy to receive <name
products>
Meeting sellers
Meet1
Difficult to meet sellers and evaluate <name products> before
you buy - easy to meet sellers and evaluate <name products>
before you buy
Meet2
Difficult to meet sellers and pay them - easy to meet sellers and
pay them
Meet3
Difficult to pick up <name products> at the sellers’ location easy to pick up <name products> at the sellers’ location
*: deleted after CFA with sample 1 and sample 2
APPENDIX E: Test of contextual contamination (Means and results independent
sample t-tests)
EM1 (n = 94)
Dependent
EMQ
Dimension reversed
Item reversed
(n=32)
(n=32)
(n=30)
M
M
t-value
p-level
M
t-value
p-level
Layout
4,20
3,99
.688
.494
4,23
-.125
.901
Ease of use
4,71
4,98
-.791
.432
5,03
-1.101
.275
Contacting the
4,07
5,06
-2.681
.009
4,61
-1.532
.131
Institutional control
4,91
4,81
.358
.721
5,03
-.546
.587
Community
4,88
4,65
.813
.419
5,11
-.900
.372
Contacting sellers
5,22
5,30
-.311
.757
4,99
.784
.436
intermediary
61
Seller information
5,26
5,46
-.664
.509
5,54
-.828
.411
Product information
4,80
4,61
.876
.385
4,73
.298
.771
Pricing mechanisms
4,68
4,64
.123
.902
4,70
-.066
.947
Assortment
5,51
5,71
-.691
.492
5,77
-.912
.365
Settlement
5,25
5,30
-.222
.825
5,03
1.001
.321
Meeting sellers
3,21
3,99
-2.446
.017
3,57
-1.232
.223
EM2 (n= 99)
Dependent
EMQ
Dimension reversed
Item reversed
(n=35)
(n=32)
(n=32)
M
M
t-value
p-level
M
t-value
p-level
Layout
3,79
3,53
.836
.406
3,43
1.222
.226
Ease of use
5,70
5,76
-.251
.803
5,43
1.173
.245
Contacting the
4,54
5,03
-1.525
.132
4,40
.407
.658
Institutional control
4,38
3,96
1.750
.085
4,07
1.417
.161
Community
3,89
2,82
3.532
.001
3,76
.419
.676
Contacting sellers
5,08
5,04
.139
.890
5,24
-.606
.547
Seller information
2,79
2,68
.483
.631
2,61
.594
.555
Product information
4,59
4,45
.677
.501
4,51
.411
.682
Pricing mechanisms
4,50
4,28
.718
.475
4,40
.337
.737
Assortment
5,75
5,54
.720
.474
5,63
.394
.695
Settlement
4,11
4,96
-2.955
.004
4,30
-.678
.500
intermediary
62
Meeting sellers
4,06
4.22
-.591
.557
3.92
.457
.649
APPENDIX F: Measurement instruments used for predictive validity testing
Attitude towards purchasing (seven-point Likert scale; ranging from strongly disagree to strongly
agree; Van der Heijden et al., 2003). Alpha = 0.96 (sample 1), 0.97(sample 2), 0.95 (sample 3), 0.96
(sample 4).
1. I am positive towards buying a <name product> on the <name> website.
2. The thought of buying a <name product> at the website of <name> is appealing to me.
3. I think it is a good idea to buy a <name product> at the website of <name>.
Intention to purchase (seven-point Likert scale; ranging from very unlikely to very likely; Van der
Heijden et al., 2003). Alpha = 0.78 (sample 1), 0.78 (sample 2), 0.85 (sample 3), 0.84.(sample 4).
How likely is it that you would…
1. return to the <name> website?
2. consider a purchase of a <name product> at the <name> website in the short term?
3. consider a purchase of a <name product> at the <name> website in the long term?
4. purchase a <name product> at the <name> website if you need one?
e-Satisfaction (seven-point semantic differentials; Szymanski and Hise, 2000; Evanschitzky et al.,
2004) Alpha = 0.85 (sample 3,) 0.87 (sample 4).
Overall, how do you feel about your most recent experience at <name EM>?
1. very dissatisfied (=1) to very satisfied (=7)
2. very displeased (=1) to very pleased (=7)
e-Loyalty (seven-point Likert scale; ranging from very unlikely to very likely; Sirdeshmukh et al.,
2002). Alpha = 0.89 (sample 3,) 0.88 (sample 4).
How likely is it that you would…
1. conduct most of your future <name product> purchases via the <name> website?
63
2. recommend the <name > website to friends, neighbors and relatives?
3. use the <name> website the very next time you purchase a <name product>?
4. do more than 50% of your future <name product> purchase via the <name> website?
64