A Framework for Developing Semantic Differentials in IS research: Assessing the Meaning of Electronic Marketplace Quality (EMQ) Tibert Verhagen VU University Amsterdam, Department of Information Systems and Logistics De Boelelaan 1105, room 3a-22 Tel: +31 (0)20 – 598 6185 Fax: +31 (0)20 – 598 6005 [email protected] Selmar Meents VU University Amsterdam, Department of Information Systems and Logistics De Boelelaan 1105, room 3a-23 Tel: +31 (0)20 – 598 6185 Fax: +31 (0)20 – 598 6005 [email protected] 1 A Framework for Developing Semantic Differentials in IS research: Assessing the Meaning of Electronic Marketplace Quality (EMQ) ACKNOWLEDGEMENTS The authors would like to show their gratitude to the anonymous reviewers for their feedback and comments. Moreover, the authors thank all members of the expert panels as well as the three C2C electronic marketplaces who participated in the project. BIOGRAPHIES Tibert Verhagen is an assistant professor E-business at the Department of Economics and Business Administration of the Vrije Universiteit Amsterdam. He holds a Ph.D. in adoption of web-based systems for consumer online purchasing. His research interests include research methodology, measurement validation and consumer adoption of IS systems. Next to conference proceedings, he has published in journals such as Information & Management and the European Journal of Information Systems. Selmar Meents is a PhD candidate at the Vrije Universiteit Amsterdam, the Department of Economics and Business Administration. His research interests include research design, electronic marketplaces, trust and risk in offline and online settings, and buyer-seller relationships. His work has been published in several conference proceedings, and has appeared in the European Journal of Information Systems. A Framework for Developing Semantic Differentials in IS research: Assessing the Meaning of Electronic Marketplace Quality (EMQ) ABSTRACT Adequate usage of measurement instruments is key for solid research. In this study we focus on the semantic differential as general technique of measurement. Despite calls for methodological rigor in information systems (IS) research, many of the applications of the 2 semantic differential in IS studies are characterized by flaws and weaknesses. Consequently, the findings of these studies demand cautious usage since validity problems are likely to exist. The aim of this study is to shed light on the semantic differential. Principles of semantic differentiation are discussed, and used as foundation to introduce a framework for developing and applying semantic differentials. The framework delineates the crucial role of linguistics and concept-scale interaction, and extends available guidelines for measurement validation with procedures to test wording credibility, linguistic contrast, psychological bipolarity, and contextual contamination. The framework is exemplified using a demonstration exercise, which centers on the assessment of the meaning of the concept electronic marketplace quality (EMQ). Using a mixture of qualitative and quantitative methods, the demonstration exercise clarifies the prerequisites for semantic differentiation and provides guidelines for researchers. The paper concludes with a discussion of implications for researchers, reviewers and practice. KEYWORDS Semantic differential, measurement validation, research methodology, linguistics, contextual contamination, electronic marketplace quality. A Framework for Developing Semantic Differentials in IS research: Assessing the Meaning of Electronic Marketplace Quality (EMQ) 1. INTRODUCTION The semantic differential is frequently used in IS research. Being a technique to measure the connotative meaning of concepts, IS researchers have applied the semantic differential to measure concepts such as computer user satisfaction (Bailey and Pearson, 1983), information system satisfaction (Bhattacherjee, 2001; Griffith and Northcraft, 1996; Li, McLeod and Rogers, 1993; McKeen, Guimaraes and Wetherbe, 1994), information systems planning success (Doherty, Marples and Suhaimi, 1999), information culture (Jarvenpaa and Staples, 2000), computer attitudes (Mykytyn and Green, 1992; Webster, Martocchio and Joseph, 1992); media perception (Chidambaram and Jones, 1993), and website performance (Huang, 2005). Both theory and empiricism support the adoption of the semantic differential in IS research. Theory presents the semantic differential as easy and quick method to assess both the intensity and direction of the meaning of concepts (Mindak, 1961; Heise, 1970). Moreover, empirical results underline its reliability (see Mindak, 1961; Wirtz and Lee, 2003), 3 validity (Van Auken and Barry, 1995), robustness (see Clevenger, Lazier and Clark, 1965; Hawkins, Albaum and Best, 1974) and relative insensitivity to systematic response errors (see Friborg, Martinussen and Rosenvinge, 2006). Despite the adoption of the semantic differential in IS settings, there is ample evidence that its development and usage therein are subject to serious weaknesses. We notice, for example, that researchers arbitrarily select bipolar scales for the concept to be measured without testing for concept-scale interaction (e.g., Bergeron, Raymond, Rivard and Gara, 1995; Jarupathirun and Zahedi, 2007; Winter, Saunders and Hart, 2003) or linguistic contrast (e.g., McKinney, Yoon and Zahedi, 2002; Palmer, 2002; Winter et al., 2003). Moreover, there is proof that in some cases the selected bipolar scales have not been subject to empirical tests of factorial structure (e.g., Bailey and Pearson, 1983; Barki, Rivard and Talbot, 2001; Galetta, Ahuja and Hartman, 1995; Singh and Dalal, 1999), even though the relevance of factorial testing is highlighted in the semantic differential literature (Osgood, 1952; Osgood, Suci and Tannenbaum, 1957; Sharpe and Anderson, 1972). Furthermore, we observe that scholars replicate established semantic differentials to measure closely related but different concepts (e.g., Suh and Lee, 2005; Van der Heijden, 2004), hereby neglecting that semantic differentiation demands a tailored approach since the bipolar scales form the axes of a multidimensional space in which the meaning of a particular concept is measured. In general, researchers too often apply existing semantic differentials without any specific adaptation to the research context (e.g., Bhattacherjee and Premkumar, 2004; Bhattacherjee and Sanford, 2006; Watts Sussman and Sproull, 1999), which results in a lack of relevance (Dickson and Albaum, 1977; Sharpe and Anderson, 1972). Finally, we advert to the fact that the use of semantic differentials is subject to misinterpretation. Some scholars refer to their particular measurement instruments as semantic-differentials, while these comprise typical Likert-scale items (e.g., Chenoweth, Dowling and St. Louis, 2004; Liao and Cheung, 2002; Okazaki, 2006) or, vice versa, as Likert scales while they consist of semantic-differentials (e.g., Watts Sussman and Sproull, 1999). Researchers even combine bipolar semantic differentials with typical Likert-scale items within the same multi-item scale (e.g., Bruce, Briggs, Shepperd, Yen and Nunamaker, 1995), hereby neglecting basic prerequisites for semantic differentiation (see Osgood et al., 1957; Snider and Osgood, 1969). Given these weaknesses, it becomes paramount that the results of the IS studies referred to in the above have to be interpreted with care. By neglecting basic requirements of semantic differential measurement such as bipolarity, relevant concept-scale pairing(s), unidimensionality and adaptation to the research context, validity problems are likely to exist 4 (see Dickson and Albaum, 1977; Sharpe and Anderson, 1972). These observations contrast with well-known calls for valid measurement instrument development (e.g., Boudreau, Gefen and Straub, 2001; Straub, 1989) and highlight the need for more understanding of the semantic differential technique and for a systematic overview of requirements regarding its development. This research is intended to provide more understanding of the development and use of the semantic differential and to contribute to measurement validation in the IS research field. Assumptions underlying the semantic differential technique are used to construct a framework for developing and applying semantic differentials, hereby expanding well-known general directives for scale development (e.g., Boudreau et al., 2001; Chin, Gopal and Salisbury, 1997; Churchill, 1979; Straub, 1989) with semantic differential-specific test procedures. The framework integrates the basics of psychometric measurement with guideliness as recommended by semantic differential theorists, and puts emphasis on the crucial role of linguistics. Subsequently, the framework is elucidated and put to practice via a demonstration exercise (cf. Straub, 1989) since “instrument validation may be best understood by seeing how validation can be applied to an actual MIS research problem” (Straub, 1989, p. 154). This demonstration exercise focuses on the assessment of the meaning of Electronic Marketplace Quality (EMQ). Rooted in an established field of EM studies (e.g., Bakos, 1991; Bakos, 1998; Cheng, Chan and Lin, 2006; Lancastre and Lages, 2006; Pavlou, 2002; Pavlou and Gefen, 2004; Sarkar, Butler and Steinfield, 1995), EMQ refers to buyers’ quality perceptions of consumer-to-consumer (C2C) electronic marketplaces (EMs). The adoption of the EMQ concept is theoretically appealing since development of such conceptual framework offers new opportunities for theoretical advances in the emerging field of EM studies (e.g., Hsiao, 2003; Bapna, Goes, Gupta and Jin, 2004; Cheng et al., 2006; Lancastre and Lages, 2006; Pavlou, 2002). Drawing upon works on website quality (e.g., De Wulf, Schillewaert, Muylle and Rangarajan, 2006; Kim and Stoel, 2004a), EMQ is expected to be multidimensional and rather complex in nature (cf. Yang, Cai, Zhou and Zhou, 2005), making the semantic differential one of the most appropriate techniques to measure this concept (Mindak, 1961). This research essay makes the following major contributions. First, drawing upon a review of the semantic differential literature, this study describes and eluminates the fundamentals of semantic differentiation. As such, we aim at providing a basic understanding of the semantic differential, and intent to prevent inapproriate and arbitrary usage of the 5 measurement technique due to a lack of knowledge. Second, from a methodological perspective, we introduce an integrative framework for semantic differential development and usage. The framework integrates established works on scale development and the more specific literature on semantic differentiation. The absence of directives for the semantic differential technique in conventional paradigms for measurement validation stresses the need for such an integrative approach. Third, we offer specific guideliness for semantic differential development and usage. Drawing upon empirical exercises, principles for semantic differentiation are proposed. As such, we aim at expanding measurement validation in the IS research field in general. This paper is structured as follows. First, we deliberate on the semantic differential technique. We briefly discuss the basics of semantic differentiation and focus on fundamental assumptions underlying the technique. Then, we use the assumptions as a foundation to propose a framework for semantic differential development. A demonstration exercise clarifies the framework. Building upon literature study, linguistic tests, expert interviews, pilot tests and data collected in three EMs in the Netherlands, the exercise elaborates on the proposed framework and puts forward a validated semantic differential to assess the meaning of EMQ. Finally, we discuss our work, and conclude with recommendations for researchers, reviewers and practitioners. 2. THE SEMANTIC DIFFERENTIAL 2.1 Technique of Measurement Introduced in behavioral sciences by Charles Osgood and his associates (e.g., Osgood and Suci, 1955; Osgood et al., 1957), the semantic differential is applied to many content areas including the field of IS research (e.g., Banerjee, Cronan and Jones, 1998; Bhattacherjee, 2001; Burke and Chidambaram, 1999; Watts Sussman and Sproull, 1999). Despite its adoption in IS research, there seems to be a number of misunderstandings about the semantic differential. The vast majority of IS researchers seem to interpret it rather narrowly as being an alternative scaling format. This point of view, however, does not hold. The semantic differential is neither a predefined set of measurement items, nor a scaling format (Osgood and Succi, 1955; Osgood et al., 1957), nor a device for measurement (Carroll, 1959). A semantic differential is a highly general technique of measurement that has to be adapted to each research context. As such, usage of the technique depends on the research goals and objectives of the researcher (Osgood et al., 1957, p.76). 6 In essence, the semantic differential is a technique to measure the meaning of concepts (Mindak, 1961), consumer opinions and attitudes (Dickson and Albaum, 1977). Although, ‘meaning’ can be viewed and interpreted from various perspectives (e.g., linguistic meaning, sociological meaning, relational meaning), the semantic differential explicitly focuses on the observation and measurement of the psychological meaning of concepts (Kerlinger, 1973). The concept itself, also known as stimulus, can either be a noun, verb or a noun phrase (Osgood et al., 1957). To measure the meaning of the concept the semantic differential uses a list of bipolar scales, where bipolar reflects opposite-in-meaning. An example is retrieved from the work of Burke and Chidambaram (1999) who measured the concept communication interface perceptions with the following set of bipolar adjectives: Difficult: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Easy Complex: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Simple Constrained: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Free Constricted: ___ ; ___ ; ___ ; ___ ; ___ ; ___ ; ___: Spacious For each pair of polar adjectives, respondents place a mark on the continuum of individual lines to indicate the point that characterizes the concept (DeVellis, 2003). Assuming that each pair of adjectives represents a different dimension, respondents allocate the meaning of a concept in a multidimensional space or semantic space. This process of allocation is known as semantic differentiation. Since in practice different bipolar scales are likely to load on the same dimension, usually factor analysis is applied to determine the underlying dimensions that together delineate the axes of the semantic space (Clevenger et al., 1965; Heise, 1970; Osgood, 1952; Osgood and Suci, 1955; Osgood et al., 1957). Next to bipolar adjectives and verbs, also contrasting phrases can be applied as scale opposites (cf. Dickson and Albaum, 1977; Hawkins et al., 1974; Kelly and Stephenson, 1967). Contrasting phrases enable the researcher to put forward more adequate descriptors for each dimension, which is likely to be an advantage when measuring the meaning of rather complex concepts (cf. Mindak, 1961). When applying such contrasting phrases, bipolarity is accomplished by the inclusion of contrasting adjectives within the phrases, which function as scale anchors (cf. Dickson and Slevin, 1975; Dickson and Albaum, 1977). 2.2 Key Assumptions of Semantic Differentiation Being a combination of controlled association and scaling procedures, the semantic differential technique heavily relies on linguistics. In fact, building upon linguistic encoding as index of meaning, the “crux of the method lies in selecting the sample of descriptive polar 7 terms” (Osgood et al., 1957, p.20). The crucial role of linguistics, combined with the focus on controlled association and scaling, implies that the following assumptions have to be taken into account to assure appropriate usage of the semantic differential. 1. Concept relevance: the selected concept must be relevant for the particular research problem under study (Osgood et al., 1957). Irrelevant concepts are likely to results in small variance among respondents, making them rather useless for the purpose of research. In case of multidimensional semantic differentials, this assumption applies to both high-order construct and underlying dimensions (Kerlinger, 1973). 2. Representative sample of bipolar scales: the combined sample of selected bipolar scales has to be large enough and widely distributed in meaning to cover the entire semantic space and, thus, define the meaning of the concept (Carroll, 1959; Darnell, 1966; Nunnally, 1967; Weinreich, 1958). The larger or more representative the sample of bipolar scales, the better defined is the multidimensional meaning of the concept (Osgood et al., 1957). 3. Relevant concept-scale pairings: the selected adjective pairs will have to be relevant to the particular concept under study (Darnell, 1966; Dickson and Albaum, 1977; Osgood et al., 1957; Sharpe and Anderson, 1972). Irrelevant concept-scale pairings are likely to result in neutral responses and therefore reduce the amount of information gathered (Osgood et al., 1957, p. 78-79). 4. Bipolarity: the anchors of the scales have to be bipolar (Dickson and Albaum, 1977; Falthzik and Johnson, 1974; Osgood et al., 1957). Bipolarity involves linguistic contrast as well as psychological bipolarity. Linguistic contrast implies that the distinct scale anchors are truly bipolar from a pure linguistic point of view (nominal antonyms, see Carol, 1959). Psychological bipolarity extends this view by assuming that the selected scale anchors are not only bipolar in isolation, but also in relation to the particular concept to be measured (functional antonyms; see Carrol, 1959). As such, psychological bipolarity demands linguistic tailoring of the polar terms to the concept under study (Green and Goldfried; Heise, 1969; Schriesheim and Klich, 1991). 5. Credibility of within-scale wordings: when applying combinations of adjectives, nouns or verbs as scale opposites, these combinations have to be as credible and “natural” as possible (Cliff, 1959; Osgood et al., 1957). This assumption is crucial since the meaning of a word is largely affected by the combination of words it is used with. 6. Good psychometric properties: the semantic differential has to satisfy conventional criteria for psychometric measurement (Dickson and Albaum, 1977; Osgood et al., 1957). 8 Particular attention has to be paid to the role of factor analytic approaches to discover and define the dimensionality of the semantic space (see Deese, 1964; Osgood and Succi; 1955; Osgood et al., 1957). To allocate the meaning of a concept accurately, the dimensions that form the axes of the semantic space have to be unidimensional and independent (Landon, 1971; Osgood et al., 1957). 7. Concept independency: since the semantic differential heavily relies on linguistics, the different concepts judged in a semantic differential have to be independent. Due to anchoring effects (see Simonson and Drolet, 2004; Tversky and Kahneman, 1974), there is a chance that later responses are biased. This problem, which is known as contextual contamination, needs to be tested and controlled for (Landon, 1971; Osgood et al., 1957). Since the semantic differential is often applied to measure multidimensional constructs, the instrument usually comprises several concepts or dimensions. In this context, particular attention has to be paid to the chance that the first concept measured functions as frame of reference and hereby affects the subsequent evaluation of the other concepts (Landon, 1971; Osgood et al., 1957). 2.3 Towards a Framework for Developing Semantic Differentials The key assumptions underlying the semantic differential technique have fundamental implications for the semantic differential development process. Most importantly, the assumptions demonstrate that the construction of semantic differentials not only necessitates usage of established scale development guidelines (e.g., Boudreau et al., 2001; Chin et al., 1997; Churchill, 1979; Straub, 1989), but also requires adequate usage and thus testing of linguistics. As such, it demands an integration of fundamentals of psychometric measurement, and the more specific linguistic guidelines as put forward in the semantic differential literature. Remarkably, such an integrative approach is lacking. Building upon semantic differential studies and measurement development literature, we propose a framework for systematic development of semantic differentials. Table 1 shows the framework, which comprises nine steps. Table 1: framework for developing semantic differentials in IS research Stage Description Underlying Key references assumption 1 Concept definition Definition of concept and Concept relevance Chin et al., 1997; Devellis delineation of research (2003); Nunnally and domain Bernstein (1994); Straub, (1989). 9 2 Generation of bipolar scales Preliminary selection of Representative Dickson and Albaum bipolar scales that are widely sample of bipolar (1977); Hawkins et al. distributed in meaning scales (1974); Kelly and Stephenson (1967); Mindak (1961); Osgood et al. (1957); Sharpe and Anderson (1972). 3 Judgment of Test of applicability of scale Relevant concept- Bearden, Hardesty and concept-scale items and antonyms for the scale pairings Rose (2001); Hambleton pairings concept under study and Rogers (1991); Hardesty and Bearden (2004); Malhotra (1981); Netemeyer, Bearden and Sharma (2003); Osgood et al. (1957); Tittle (1982). 4 Linguistic test of semantic Test of linguistic contrast and Bipolarity psychological bipolarity Caroll (1959); Dickson and Albaum, (1977); bipolarity Green and Goldfried (1965); Heise (1969); Mindak (1961); Osgood et al. (1957); Schriesheim and Klich, (1991); Snyder and Osgood (1969). Test for credible and ‘normal’ Credibility of Blair and Presser (1992); semantic combinations of adjectives, within-scale Foddy (1993); Foddy differential verbs and nouns within each wording (2004); Kahn and Cannell wording bipolar scale 5 Linguistic test of (2004); Reynolds, Diamantopoulos and Schlegelmilch (1993); Reynolds and Diamantapoulos (1998); Song and Parry (1997). 6 Pilot to purify Adoption of factor analytic Good psychometric Aaker (1997); Davis the semantic approach: initial tests of properties (1989); Moore and differential dimensionality, validity and Benbasat (1991); reliability Netemeyer et al. (2003). 10 Osgood et al. (1957) ; Srinivasan, Vanden Abeele and Butaye (1989). 7 Scale purification Adoption of factor analytic Good psychometric Aaker (1997); Dabholkar, approach: exploratory and properties Thorpe and Rentz (1996); confirmatory tests of Devellis (2003); Doll, Xia dimensionality, validity and and Torkzadeh, (1994); reliability Gerbing and Anderson (1988); Gerbing and Hamilton (1996); Osgood et al. (1957); Viswanathan, Childers and Moore (2000). 8 Test of Test of anchoring effects Concept Bickart, (1993); Feldman contextual between the concepts within independency and Lynch (1988); Landon contamination the scale (1971); Osgood et al. (1957); Tourangeau and Rasinski (1988); Tversky and Kahneman (1974). 9 Cross-validation Adoption of factor analytic Good psychometric Dabholkar et al. (1996); approach: confirmatory tests properties Gerbing and Anderson of dimensionality, validity (1988); Netemeyer et al. and reliability (2003); Straub (1989); Viswinathan et al. (2000). The framework extends available paradigms for scale development by highlighting the principle of concept-scale interaction and by adding the notions of linguistic contrast, psychological bipolarity, credible wording combinations and contextual contamination. In the following sections we will elaborate on each stage of the framework by reporting a demonstration exercise. Building upon qualitative and quantitative techniques, the exercise focuses on the assessment of the meaning of EMQ. 3. DEMONSTRATION EXERCISE: ASSESSING THE MEANING OF EMQ 3.1 Concept Definition A wide body of research on EMs exists in the field of IS research. In this research exercise 11 we view EMs from a system perspective (cf. Bakos, 1991; Hsiao, 2003; Lee and Clarke, 1997). Accordingly, an EM is considered as electronic market system, referring to an online environment with specific boundaries that is operated by a particular intermediary. More specifically, based on such studies as Bapna et al. (2004), Cheng et al. (2006), Lancastre and Lages (2006) and Pavlou (2002), an EM is defined here as an environment located on the Internet that is supported and enabled by a combination of IT and various services, procedures and regulations offered by a third-party intermediary, in which parties can meet and engage in exchange related behavior. This conceptualization is in line with earlier studies (e.g., Bakos, 1998; Pavlou and Gefen, 2004), and applies to EMs where transactions are fully completed online, as well as to EMs that are primarily used to engage in search activities and information exchange, before parties meet offline for further inspection and transaction settlement. The context of our research is delimited to EMs that facilitate transactions between consumers (C2C). C2C EMs have been the context of several empirical explorations (e.g., Hu, Lin, Whinston and Zhang, 2004; Pavlou, 2002; Pavlou and Gefen, 2004). Remarkably, a well-conceptualized and validated instrument to measure consumers’ overall perceptions of an EM is lacking. Here, we apply the semantic differential technique to construct such an instrument and assess the meaning of the EMQ concept. EMQ refers to the overall perception a buyer has of a particular EM. It reflects an intertwined mixture of evaluations that are derived from all kinds of implicit (i.e., imperceptible, psychological) and explicit (i.e., observable, concrete) functions and services (Sarkar et al., 1995) provided by the intermediary and the population of sellers. Examples of functions and services made available by the intermediary include providing the technological infrastructure (Pinker, Seidmann and Vakrat, 2003), credit arrangements, logistical settlement, negotiation services (Grewal et al., 2001) and control mechanisms (Pavlou, 2002; Pinker et al., 2003). Sellers expand these functions and services by offering sales related functions such as product selection, product description, and provision of contact information (Sarkar et al., 1995). Buyers evaluate the offered functions and services to assess and evaluate the EM, as well as the behavior of the parties behind it (cf. Belanger, Hiller and Smith, 2002; Suh and Han, 2003). Given that it comprises multiple intertwined perceptions, this research conceptualizes EMQ as a composite construct that is rather complex and multidimensional in nature. Although this conceptualization conforms to the literature on website quality (e.g., De Wulf et al., 2006; Lee and Kozar, 2006; Yang et al., 2005), EMQ substantially differs from 12 constructs addressing website quality perceptions from a webstore perspective (e.g., Kim and Stoel, 2004a; Wolfinbarger and Gilly, 2003), where transactions and thus related behavior are mainly dyadic in nature and quality impressions are primarily derived from the functions, services and actions of the selling party (Verhagen, Meents and Tan, 2006). 3.2 Generation of a Sample of Bipolar Scales To ensure content valid measures, the EMQ construct was conceptualized as in the previous section and its domain delineated (cf. Chin et al., 1997; Devellis, 2003; Nunnally and Bernstein; 1994; Straub, 1989). Based on a literature study, a preliminary set of bipolar scales was then generated. The EM literature (e.g., Bakos, 1998; Grewal et al., 2001; Pinker et al., 2003) was studied to identify aspects that should be included. Moreover, existing measurement items for website quality (e.g., Aladwani and Palvia, 2002; Kim and Stoel 2004a, 2004b; Palmer, 2002; Van Iwaarden, Van der Wiele, Ball and Millen, 2004; Wolfinbarger and Gilly, 2003; Yang et al. 2005) were evaluated for their applicability (cf. Palmer, 2002) and if applicable were adapted to EM settings. Finally, a subjective content analysis of EMs was applied to collect additional items (cf. Mindak, 1961). These steps resulted in a preliminary list of 69 semantic differentials. Given the complex nature of the concept to be measured, that is the multidimensional nature of EMQ, mere adjectives were deemed to be too limited in scope to be adequate descriptors (cf. Dickson and Albaum, 1977; Mindak, 1961). Accordingly, each of the semantic differentials consisted of a pair of descriptive phrases (cf. Hawkins et al., 1974; Kelly and Stephenson, 1967) containing antonyms as anchors that were based on the work of Osgood et al. (1957). Osgood’s standardized list of antonyms, however, provided an insufficient number of appropriate antonyms (cf. Mindak, 1961), or seemed inappropriate for the current research context (cf. Dickson and Albaum, 1977; Sharpe and Anderson, 1972). Therefore, we adapted the antonyms to the concept under study and, based on existing semantic differential scales (e.g., Dickson and Albaum, 1977; Hawkins et al., 1974; Landon, 1971) and linguistic works such as Webster’s Collegiate Thesaurus (1988 edition), added some extra antonyms. 3.3. Judgment of Concept-Scale Pairings To judge the validity of the preliminary list of 69 bipolar scale items and their antonyms, and to refine the item and antonym pool, a pretest was held using expert panels (cf. Netemeyer et 13 al., 2003). The experts included eleven academic researchers in the field of IS and Marketing from a Dutch academic institution, and seven practitioners working for two C2C EMs in the Netherlands. The pretest comprised a series of three assignments. First, we applied a free association technique to ensure that the item pool represented a proper sample of the domain of the EMQ construct (content validity). Members of the expert panel were requested to freely suggest items that should be included in a measure of EMQ (cf. Netemeyer et al., 2003). A second assignment was used to judge the applicability of the proposed list of 69 items to the EMQ concept under study (cf. Hardesty and Bearden, 2004). The experts were asked to evaluate the applicability of each of the items (cf. Malhotra, 1981) to two existing C2C EMs in the Netherlands. The applicability was registered using a formalized rating procedure (cf. Hambleton and Rogers, 1991; Tittle, 1982), consisting of a seven-point scale ranging from “very inapplicable” to “very applicable”. In addition to their ratings, we asked the experts to explain their opinions and propose improvements. Finally, in the third assignment an overview of all 69 preliminary EMQ items was presented to the experts, after which they were requested to suggest additional items. Although this assignment had the same goals as the first assignment, it was not based on free association and thus might lead to the mention of other interesting items. In order to identify possible refinements of the item pool, a mixture of qualitative and quantitative procedures was then applied (cf. Netemeyer et al., 2003). Building upon the results of the second assignment, we identified candidates for rewording or deletion using a variant of the so-called “sumscore” technique (Hardesty and Bearden, 2004) by calculating an average applicability score for each item. Those items that received an average applicability rating lower than 6 (i.e., “quite applicable”) for at least one of the two EMs, were considered candidates for rewording or deletion (cf. Bearden et al., 2001). The explanations given by the experts during the second assignment were also taken into consideration in the final decision. Two criteria were used to evaluate and select additional items as suggested in the first and third assignment. The items had to be applicable to our definition of EMQ and be mentioned multiple times by the experts (cf. Osgood et al., 1957). The items were then edited and reworded where necessary. Of the initial item pool of 69 items, 9 items remained unedited, 23 items were reworded, 35 items were removed and 35 new items were added. The result was an updated item pool of 69 items that could be used for further pretesting. 14 3.4 Linguistic Test of Semantic Bipolarity Given that bipolarity should be tested empirically rather than assumed (Cacioppo and Bernston, 1994; Schriesheim and Klich; 2001), we conducted two linguistic tests. The primary objective of the first test was to assess the linguistic contrast of the distinct scale anchors. We adopted the method as applied by Dickson and Albaum (1977) and split the antonyms into one list of positive anchors and one list of negative anchors. The two lists were then transformed into two online questionnaires asking subjects to fill in the missing linguistic opposites. The test was performed by making use of a convenience sample of native speakers of American English. An e-mail invitation was send to friends and relatives living in the USA, inviting them to participate in the test by clicking on a hyperlink that would direct them to one of the two online questionnaires 1 . The sample of native speakers of American English ensured linguistic homogeneity, which has been argued to increase the likelihood that respondents apply the same meaning to the antonyms being presented to them (Dickson and Albaum, 1977). To ensure that the respondents had a higher than average intelligence, enabling them to better differentiate between the meaning of words (cf. Osgood et al., 1957), we decided only to include respondents who had attended college 2 . The final data consisted of 32 cases. The results of the test confirmed the selected list of anchors and indicated that none of the anchors needed to be reworded. After having established linguistic contrast, we assessed psychological bipolarity (see Caroll; 1959; Green and Goldfried; 1965; Heise; 1969), using a pretest with an expert panel. The panel members included the eleven academic researchers and seven practitioners who also participated in the test of concept-scale pairing (section 3.3). The experts were asked to judge and evaluate the applicability of the bipolar scale anchors defining the proposed list of 69 semantic phrases (cf. Mindak, 1961) in relation to the concept measured. For each antonym pair the experts judged whether the two anchors linguistically aligned with the corresponding concept (cf. Heise; 1969) by filling in a “Yes-No” question, and to explain their opinions and suggest improvements. Based on their answers, the wording of some anchors was slightly modified to align them more to their concepts (i.e., to assure psychological bipolarity). 1 The hyperlink led to a webpage that hosted a program script. The script automatically redirected each respondent to one of the two online questionnaires, hereby splitting the sample randomly. 2 A scale for educational level (Mittal and Kamakura, 2001) was included to probe the highest education level the respondents had achieved. Those respondents who had not attended college were excluded from the analysis. 15 3.5 Linguistic Test of Semantic Differential Wording To assess the credibility of the scale wording of the semantic differential, a draft questionnaire was constructed and thoroughly tested via a pretest. We constructed the draft questionnaire using wording guidelines as presented in the scaling and pretest literature (e.g., Foddy, 1993 & 2004; Kahn and Cannell, 2004). In some bipolar scales that contained potentially difficult or ambiguous words, the context or some examples of that particular word were included at the end of the item, put between brackets (cf. Kahn and Cannell, 2004). As is common in semantic differential measurement, a seven-point scale was used 3 . American English was selected as the base language of the draft questionnaire. The questionnaire was constructed and checked for any mistakes by two translators who were not only fluent in American English but also familiar with the concept under study (cf. Sekaran, 1983). The questionnaire was then translated into the language in which the questionnaire was to be administered (i.e., Dutch), using a combination of the back translation and the parallel or double translation technique 4 . To investigate whether the Dutch draft questionnaire contained any faults in the wording that could result in comprehension difficulties or would jeopardize the previously tested assumptions of concept-scale pairing, bipolarity and wording interpretability, a pretest was held using an expert panel (cf. Foddy, 1993 & 1998). Three academic experts with a background in both questionnaire design and e-commerce research were asked to evaluate the wording of each bipolar scale and its introduction (cf. Blair and Presser, 1992; Reynolds and Diamantopoulos, 1998). Based on the pretest literature (e.g., Foddy, 1993; Reynolds et al., 1993), a list was made consisting of commonly made faults in bipolar scale item wording. The faults referred to potential incomprehensibility, complexity and ambiguity of the bipolar scales, their introductions or their concepts. The experts were asked to evaluate the likelihood that the bipolar scales were subject to these faults, using three point rating scales (“certainly”, “possibly”, “no”; cf. Cannell, Fowler, Kalton, Oksenberg and Bischoping, 2004). Openended questions were added, enabling respondents to explain or comment on each rating. Finally, we had the experts evaluate the overall questionnaire by asking them whether it 3 Response categories ranged from “very <negative anchor>” (1) to “very <positive anchor>” (7) with a midpoint labeled “neutral” (4) (cf. Watts Sussman and Sproull, 1999; McKinney et al., 2002). 4 First a bilingual speaker whose base language is Dutch translated the English questionnaire into Dutch (cf. Malhotra, Agarwal and Peterson, 1996). A second bilingual speaker whose base language is English then compared this Dutch questionnaire to the original English questionnaire. Afterwards both bilingual speakers discussed whether the translation of the questionnaire was appropriate (cf. Song and Parry, 1997). 16 contained any other faults that could lead to incorrect interpretations. After the pretest the EMQ scale was slightly modified by changing some words into synonyms and by shortening some items without changing their meanings. The modifications resulted in a preliminary instrument (see Appendix A for the English version) that was used as starting point for a pilot study. 3.6 Pilot to Purify the Semantic Differential To purify the semantic differential, a laboratory experiment was conducted. We used a student sample for this pilot test, which simplified its administration and limited the extraneous variance both within and across scales (Greenberg, 1987; Peterson, 2001) that could impede the validity of the study. Procedure A sample of 196 undergraduate students following a mandatory information systems course at a Dutch university participated in the experiment. The participants were instructed to study four different EMs (eBay.nl and three Dutch EMs facilitating C2C exchanges in the Netherlands), to focus on the purchase of a digital camera 5 , and to complete each visit with filling in an online questionnaire addressing perceptions of EMQ. The entire experiment was conducted in a lab, consisting of identical computer systems, and was monitored by a supervisor of the research team. To mitigate the impact of differential familiarity (Srinivasan et al., 1989) the students were instructed to carefully study each EM according to predefined tasks (cf. Van der Heijden and Verhagen, 2004). To minimize order bias, a predetermined, randomized scheme for visiting the EMs was distributed. Initial Item Analyses: To further trim the item pool and to get an initial indication of the multidimensional meaning of EMQ, four steps were taken. First, since literature on the dimensionality of EMQ was lacking and scholars warn against the use of mere statistical techniques to purify measurement scales (e.g., Allport and Kerler, 2003), we conducted an item sorting exercise (cf. Davis, 1989; Moore and Benbasat, 1991). Based on our expertise and the statements from 5 The frame of reference, buying a digital camera, was selected since this type of product was offered in sufficient amounts on the four EMs under study, and it is a product that was deemed to appeal to students, that they are familiar with and was easy to understand. 17 experts in earlier pretests (cf. Shimp and Sharma, 1987) we grouped the 69 EMQ items in a preliminary classification of twelve dimensions. Ten faculty members of an academic institution then judged this classification. Overall, the judgments confirmed the preliminary classification. Thereupon, a factor analytic approach was adopted (cf. Osgood et al., 1957). The primary dataset used in the analysis consisted of the scores of the respondents averaged across the four stimuli (cf. Srinivasan et al., 1989). Principal components analysis with varimax rotation was applied to extract the factors, remove some items, and achieve an adequate level of preliminary unidimensionality. This resulted in a preliminary twelve-factor solution of 56 items (KMO MSA 0.88, Bartlett’s test of spherictity: 11699, p< .001) accounting for 74.82% of the variance. In general, the resulting factor structure showed a more than reasonable match with the preliminary classification as established in the item sorting exercise. To test the generality of the initial factor solution, EFA was run with the initial item list on each of the four original datasets (cf. Aaker, 1997; Srinivasan et al., 1989). The results indicated that the factor structure was quite homogeneous across the four datasets. The 13 items that were removed in the analysis of the pooled data failed to meet EFA criteria in all datasets. 14 other items did show factor loadings similar to those in the pooled dataset, but not in the case of all original datasets. Accordingly, 27 items were considered primary candidates for deletion. Subsequently, the internal consistency of the multidimensional EMQ construct was investigated for each of the four EMs datasets. Cronbach’s alpha, average interitem correlations and corrected item-to-total correlations were calculated for each of the twelve preliminary EMQ dimensions. Items with either Cronbach’s alpha lower than .80 (cf. Bearden and Netemeyer, 1998; Clark and Watson, 1995), average interitem correlations lower than .30 (cf. Robinson, Shaver and Wrightsman, 1991) or corrected item-to-total correlations lower than .50 (cf. Bearden and Netemeyer, 1998) in one or more datasets were candidates for deletion. In total, 20 items did not meet the internal consistency criteria. Of these 20 items, 18 were already identified as candidates for deletion in the EFA. The other 2 items were added to the list of 27, resulting in a total of 29 candidates for removal. Finally, we checked the 29 items for wording redundancy, face validity and content validity. Based on this relatively subjective analysis, it was decided to retain 10 items even though these items did not meet the EFA or internal consistency criteria (cf. Netemeyer et al., 2003). 19 items were removed. The result was a preliminary EMQ scale consisting of 50 items that could be used for further testing. 18 3.7: Scale Purification To purify the semantic EMQ scale, we adopted a dyadic approach building upon two independent samples (cf. Viswanathan et al., 2000) of real EM visitors 6 . The first sample consisted of 1428 visitors (Appendix B) of eBay.nl (EM1), the Dutch version of eBay.com. The second sample consisted of 1051 visitors (Appendix B) of the Dutch EM with the largest market share of EMs in the Netherlands (EM2). This EM is well known in the Netherlands for bringing buyers and sellers together via its classifieds system, who then usually meet in offline settings for product inspection and transaction settlement. The dyadic approach was selected as method of strong validity testing (cf. Viswanathan et al., 2000) since it implies that validity, dimensionality and reliability are assessed independently, hereby extending the generalizability of our findings and contributing to the strength of the semantic differential development process. Procedure Like the pilot study, the focus of the research was on the purchase of a digital camera. On the website of both EMs, banners were placed in the digital camera section, inviting participants to participate in the survey voluntarily 7 . For the purpose of predictive validity testing, the questionnaire did not only include the preliminary list of 50 semantic differentials but also two additional scales to measure the attitude towards purchasing and the intention to purchase respectively (Appendix F). This decision was supported by literature indicating that perceptions of website quality are likely to affect consumer purchasing in general (Aladwani and Palvia, 2002; Kim and Stoel, 2004a, 2004b; Yang et al. 2005), and consumer purchase attitudes and intentions in particular (Wolfinbarger and Gilly, 2003). The two additional measurement scales were taken from Van der Heijden, Verhagen and Creemers (2003). Both scales were slightly adjusted to fit the target specificity of our research. Results Initial Test of Scale Dimensionality: To develop a better understanding of the underlying structure of the semantic differential (cf. Gerbing and Hamilton, 1996), we first applied EFA 6 The selection of real EM visitors as respondents extends the external validity of the scale to a nonstudent population (Aaker, 1997; Webb, Green and Brashear, 2000), and contributes to the generalizability of our findings due to the heterogeneous nature of both samples (Aaker, 1997). 7 When clicking on the banner, respondents were redirected to an online questionnaire. As incentive, respondents could engage in the raffle of a book token of 20 Euro by filling in their e-mail address. 19 using the principle components model with orthogonal varimax rotation. Following a recommended observation-to-item ratio of ten-to-one (Hair, Anderson, Tatham and Black, 1998) we decided to draw a subsample of 500 observations (50 items) from each dataset. EFA was run (Appendix C). The data met the thresholds for sampling adequacy (sample 1: KMO MSA 0.932, Bartlett’s test of spherictity 23194.964, p <.001; sample 2: KMO MSA 0.902, Bartlett’s test of spherictity 21269.033, p <.001). The two datasets revealed identical factor solutions. Except for the item Instit8, which loaded high on the factor Seller Information for sample 2, all items strongly loaded solely on their underlying factor. As such, preliminary though strong evidence for unidimensionality, convergent validity and discriminant validity was provided. To obtain first indications of construct reliability, Cronbach’s alphas were computed. All alphas exceeded the standards for established research (> 0.70; Hair et al., 1998). Test of Latent Factor Structure: Following Gerbing and Anderson (1988) and Gerbing and Hamilton (1996), we validated the extracted latent structure using Confirmatory Factor Analysis (CFA) 8 . The remaining data of both samples was used for the purpose of the confirmation (i.e., data not used for the EFA), resulting in two independent sub-samples (sample 1: n = 928; sample 2: n = 551). Taking both model size and Maximum Likelihood Estimation as most common estimation procedure into consideration, the size of the samples seemed more then acceptable (cf. Hair et al., 1998). Amos 5.0 with maximum likelihood estimation was used for the analysis (Arbuckle and Wothke, 1999; Arbuckle, 2003). To assess the EFA solution, we first tested a correlated first-order model (cf. Doll et al., 1994; Yang et al. 2005). The model consisted of the twelve extracted basic dimensions, functioning as inter-correlated first-order factors. The fit indices of the initial solutions of both datasets highlighted the need for model improvement (sample l: χ2 = 4062.627, p<.001;GFI 0.84; AGFI 0.82; RMR 0.07; NFI 0.90; TLI 0.92; CFI 0.93; RMSEA; 0.054 / sample 2: χ2= 3066.442, p<.001; GFI 0.81; AGFI 0.78; RMR 0.099; NFI 0.88; TLI 0.91; CFI 0.92; RMSEA; 0.057). Although measures such as TLI, CFI and RMSEA revealed acceptable fit, both chi-square test and the fit indices GFI, AGFI, RMR and NFI were below 8 Although EFA is a preferred method to identify relatively unknown measure structures (Gerbing and Hamilton, 1996), it does not explicitly test for unidimensionality because in EFA each factor is defined as a weighted sum of all observed variables. This implies that the extracted factors do not correspond directly to an exclusive, predefined subset of indicators (Gerbing and Anderson, 1988, p. 189). 20 recommended standards. Following Gerbing and Anderson (1988, p.417), we then decided to study the pattern of residuals as one of the most useful sources to locate misspecification. 13 items (Appendix D) shared large positive and negative residuals with items of other factors. We deleted these items and re-estimated the model. The chi-square statistic again demonstrated poor fit for both samples (sample l: χ2= 1226,605, p<.001 / sample 2: χ2= 1154,230, p<.001). It has been recognized, however, that the significant chi-square statistic is sensitive to large sample sizes and the complexity of the model (Bearden, Sharma and Teel, 1982; Bentler and Bonnet, 1980; Stewart and Segars, 2002). Other fit indices that are less sensitive to sample size are more useful to determine model fit and make model comparisons (Bagozzi, Yi and Phillips, 1991; Hair et al., 1998). For both samples, well-accepted fit indices such as GFI, AGFI, RMR, RMSEA, NFI, TLI and CFI demonstrated good to very good fit with the data (see table 2), hereby supporting the unidimensionality of the factors and the multidimensionality of the EMQ scale. To further investigate the dimensionality of the 37-item EMQ instrument, and to test for any underlying structure, two alternative models were tested. Drawing upon Doll et al. (1994) we tested a model consisting of twelve uncorrelated first-order factors and a one-factor model relating all single items to one first-order EMQ factor 9 . If these models showed acceptable fit, this would not only refute the notion that the twelve factors share an underlying structure, but also that the EMQ concept is multidimensional in nature. Table 2: CFA results for sample 1 and sample 2 Model χ2 Df GFI AGFI RMR RMSEA NFI TLI CFI 562 .93 .92 .059 .036 .96 .97 .98 629 .68 .64 .485 .081 .85 .86 .87 Sample 1 (n= 928) Twelve first- 1226,605 order factors (p<.001) (correlated) Twelve first- 4418,899 order factors (p<.001) (uncorrelated) 9 Another model, consisting of 12 first-order factors loading on a second-order factor representing the overall concept of EMQ, was considered but not included. Following Chin (1998), postulation of such a model had little value since the second-order factor was not expected to fully mediate the relationships between the first-order factors and other variables in the pre-specified conceptual model. As such, and given the stage of our research, inclusion of a second-order factor was inapplicable. 21 One first-order 18256.962 factor (p<.001) 629 .47 .40 .228 .174 .36 .33 .37 562 .90 .87 .087 .044 .94 .96 .97 629 .67 .63 .487 .084 .83 .85 .86 629 .42 .35 .258 .181 .32 .29 .33 Sample 2 (n= 551) Twelve first- 1154,230 order factors (p<.001) (correlated) Twelve first- 3058,767 order factors (p<.001) (uncorrelated) One first-order 12004,672 factor (p<.001) The results (table 2) indicated unacceptable fit for the two alternative models. It was concluded that the correlated twelve first-order factor model is most applicable to model EMQ. The twelve EMQ dimensions were then defined (table 3). Table 3: overview of EMQ dimensions and their definitions EMQ dimension Definition Layout The buyer’s experience of the layout of the EM as being attractive and up to date. Ease of Use The perceived usability of the EM, including navigation options, site structures and ease of learning how to use it. Contacting the Perceptions of the amount of information and options provided at intermediary the EM that enable buyers to get in touch easily with the intermediary facilitating the EM. Institutional control Perceptions of the measures applied by the intermediary, such as guarantees, privacy policy and rules, to protect buyers and regulate the EM. Community The perceived ability of buyers to share one’s experiences and communicate with other buyers. Contacting sellers Perceptions of the amount of information and options provided at the EM that enable buyers to get in touch with sellers easily. 22 The perceived amount and clearness of the information provided Seller information about sellers and their reputation. The impression a buyer has about the way sellers describe and Product information represent the products offered at the EM. The perceived clearness and convenience of the mechanism that Pricing mechanisms is used to establish and communicate prices at the EM. Overall buyer’s perception of the assortment at the EM, Assortment including a) the size of the assortment and b) alignment of the assortment with one’s interests. The easiness and clearness of methods used for paying and Settlement receiving products bought at the EM, as perceived by buyers. The buyer’s perceived ease of meeting sellers in offline settings Meeting sellers to inspect, pay for and pick up products. Reliability and Validity Testing: Having established the dimensionality of the semantic differential, we examined its reliability and validity (table 4). Table 4: Reliability and validity statistics for sample 1 and sample 2 Sample 1 (n=928) Dimension α Minimum Sample 2 (n=551) AVE α Minimum Item to Item to total total correlation AVE correlation Layout .90 .776 .75 .90 .783 .75 Ease of use .88 .705 .73 .90 .735 .74 Contacting the .96 .903 .89 .95 .867 .87 Institutional control .91 .743 .71 .90 .747 .71 Community .84 .655 .64 .87 .740 .70 Contacting sellers .94 .862 .83 .95 .869 .88 Seller information .90 .715 .77 .91 .715 .80 Product information .84 .659 .65 .77 .526 .57 Pricing mechanisms .87 .653 .70 .87 .723 .70 Assortment .94 .861 .85 .91 .751 .79 Settlement .92 .782 .81 .93 .832 .81 intermediary 23 Meeting sellers .92 .817 .80 .93 .826 .81 The reliability statistics indicate good reliability for the twelve EMQ dimensions. Except for Product Information (α= 0.77, sample 2), all Cronbach’s alphas surpass the 0.80 level. For all dimensions, the Average Variance Extracted (AVE) exceeds the 0.50 thresholds prescribed in the literature (e.g., McKinney et al., 2002; Ping, 2004). We then tested for convergent, discriminant and predictive validity (cf. Dabholkar et al., 1996, p. 12). Convergent validity was assessed by AVE’s, Cronbach’s alphas and minimum item-to-total correlations. All AVE’s exceed the recommended level of 0.50 (Segars, 1997; Yi and Davis, 2003) and, except for the dimension Product Information (α= 0.77, sample 2), all alphas surpass the 0.80 guideline (see Ping, 2004). The minimum item-to-total correlations reveal high correlations, all exceeding the criterion of 0.40 (see Jayanti and Burns, 1998), hereby providing strong additional support for convergent validity. To test for discriminant validity, we studied the within-construct item-correlations for each of the twelve EMQ dimensions and compared these loadings with cross-loadings on items of other dimensions (cf. Ko, Kirsch and King, 2005). All within-construct item loadings were higher than their cross-loadings, and no cross-loadings above .70 were observed, implying discriminant validity (Ping, 2004). To further assess discriminant validity, we measured the differences between the squared correlations between dimensions and their individual AVE. Since the value of squared correlations was less than either of their individual AVE’s for all pairs of dimensions we tested for, discriminant validity was confirmed (Fornell and Larker, 1981; Yi and Davis, 2003). Finally, we assessed the predictive validity of the EMQ scale. For both samples, the EMQ dimensions were regressed on the attitude towards purchasing and the intention to purchase. Table 5 reports the results. Table 5: Standardized regression coefficients of EMQ dimensions on attitude and intention Sample 1 (n= 928) Dimension Sample 2 (n= 551) Attitude Intention Attitude Intention Layout -.01 .00 .00 .07 Ease of use -.03 -.05 .02 -.06 -.00 -.00 -.06 .08 Contacting the intermediary 24 Institutional control .08 .06 -.02 -.08 Community -.06 -.04 .03 .04 Contacting sellers -.07 -.00 -.12 -.03 Seller information .02 -.08 .07 -.04 Product information .28 ** .20 ** .11 .00 Pricing mechanisms .11* .07 .01 -.07 Assortment .16** .20 ** .24** .03 Settlement .03 .08 .08 .07 .16** .05 .25** .00 Meeting sellers R squared .27 .16 .26 .02 Adjusted R squared .26 .15 .24 .00 ** significant at P< .001, * significant at P< .01, A multicollinearity test revealed that the regression analysis had not been subject to multicollinearity 10 . For both samples, the EMQ scale explains around 25% of the attitude towards purchasing. Given the level of target specificity of the attitude construct, these findings are quite encouraging. For the purchase intention, however, the results are less clear. Even though explaining 16% of the online purchase intention for sample 1, the EMQ dimensions do not add to the behavioral intention variance for sample 2. When focusing on the impact of the individual dimensions for sample 1, Product Information, Pricing Mechanisms, Assortment and Meeting Sellers are significant attitude and intention determinants. For sample 2, only Assortment and Meeting Sellers have a significant impact on the attitude. We believe these differences in significance are likely to be explained by focusing on the extent to which both EMs support the different stages of the transaction process (see also Grieger, 2003; Skjøtt-Larsen, Kotzab and Grieger, 2003). As mentioned previously, EM1 is used by consumers to search for products, engage in bidding, and complete transactions online. EM2, however, mainly matches buyers and sellers, who then meet offline for further product inspection, negotiation and transaction settlement. The absence of online transaction is likely to explain the insignificant influence of EMQ on online purchase intentions for EM2. Moreover, we believe it clarifies why dimensions such as Product Information and Pricing Mechanisms do not account for any variance in the purchase attitude for this EM. 10 VIF-scores were computed. For sample 1 the highest VIF score was 1.982. For sample 2 the highest VIF score was 1.871. All VIF values were below the cutoff value of 10 (Hair et al., 1998). 25 3.8. Test of Contextual Contamination After having established the dimensionality, validity and reliability of the semantic differential, we decided to address its sensitivity to what is known as contextual contamination in the semantic differential literature (Landon, 1971; Osgood et al., 1957). Contextual contamination concerns the presence of an order bias due to the likelihood that later responses in the semantic differential are biased by previous questions (Landon, 1971, p. 375). As widely discussed in the psychological literature (e.g., Feldman and Lynch, 1988; Tourangeau and Rasisnski, 1988; Tversky and Kahneman, 1974), order bias occurs when respondents give answers that are consistent or inconsistent with beliefs rendered accessible by a previous response (Bickart, 1993, p. 52). Consistent answers are known as carryover effects, while inconsistent answers have been referred to as backfire effects (see Bickart, 1993). Both carryover and backfire effects are important sources of measurement error and can cause contextual contamination. To assess the sensitivity of the EMQ scale to contextual contamination, a test was conducted. Following Bickart (1993) and Landon (1971), we adopted a procedure consisting of the estimation of shifts in means evaluations. Using a quasi-experimental design, we assessed the means of the twelve EMQ dimensions in the same sequence as used in the development process (see Appendix C) and compared the results with the means of the EMQ dimensions when put in reversed order. Procedure A sample of 192 undergraduate students following a mandatory information systems course at a Dutch university was invited to participate in an experimental survey. The experimental survey consisted of the study of an EM and filling in an online questionnaire addressing perceptions of EMQ. The participants received the same instructions as those who participated in the pilot study (see section 3.6). eBay.nl (EM1) and the Dutch EM with the largest market share in the Netherlands (EM2) were selected as stimuli (cf. section 3.7). The students were assigned randomly to one of the two EMs. For the sake of convenience, we decided to hand out the instructions via a digital learning environment, and allow students to participate in the project either at home or on campus. To test for contextual contamination, three different versions of the questionnaire were constructed and randomly assigned to the participants. Following Landon (1971), the first version of the questionnaire presented the twelve EMQ dimensions and their items in the order as used previously in the development process (standard order). In the second list, the 26 twelve dimensions were presented in reversed order (dimension reversed order). The third version of the questionnaire extended the work of Landon by reversing the order of the items within each dimension (item reversed order). Results To analyze the data and detect shifts in means, a series of independent t-tests was conducted. We computed the means for the twelve dimensions, and compared the means in the standard order EMQ questionnaire against the means in the two modified versions of the questionnaire (Appendix E). The results demonstrate few significant shifts in the means of the dimensions across the different groups. The reversal of the items within each dimension (item reversed order) did not result in any different evaluations of the dimensions, implying an absence of within-factor contextual contamination. The reversal of the dimensions (dimension reversed order) did result in a few significant differences (EM1: Contacting the Intermediary, Meeting Sellers; EM2: Community and Settlement). Although these differences are significant, a visual inspection reveals that a pattern, indicating a structural shift in means across subsequent dimensions, is lacking. Thus, the shift in means for one dimension is not carried over to following dimensions. Evidently, the test did not reveal potential contextual contamination weaknesses in the semantic differential. As such it verifies the steps taken previously in the development process, including the tailoring of items and their antonyms to the concept under study, and the appropriate usage of the factor analytic approach. 3.9. Cross-Validation Finally, we cross-validated the semantic differential with new data (cf. Dabholkar et al., 1996; Viswinathan et al., 2000). The data was collected in a Dutch EM facilitated by a publisher of one of the largest daily newspapers in the Netherlands. Following the dyadic approach as adopted previously (section 3.7), two independent samples were collected. The samples included 863 visitors (Appendix C) of the automobile section of the EM (sample 3), and 590 visitors (Appendix C) of the study books section of the EM (sample 4). The fact that the cross-validation focused on the purchase of two different products (cars and study books) than used for scale purification (digital camera), can be seen as test of the robustness of the EMQ scale. Procedure 27 Banners were placed within the two sections of the EM, inviting visitors to participate voluntarily in the research by filling in an online questionnaire 11 . The online questionnaire addressed basic demographics, perceptions of EMQ, purchase attitude and purchase intention (cf. section 3.7). For the purpose of extended predictive validity testing, two additional constructs were included, namely website satisfaction and loyalty intentions (Appendix F). Both constructs are hypothesized to be affected by website quality perceptions (Wolfinbarger and Gilly, 2003). The website satisfaction measure was taken directly from Szymanski and Hise (2000) who, building upon established consumer satisfaction literature (e.g., Spreng, MacKenzie and Olshavsky, 1996; Oliver, 1980; Zeithaml, Berry and Parasuraman, 1996), applied a seven point two-item semantic differential. Both the work of Szymanski and Hise (2000) and a re-examination conducted by Evanschitzky, Iyer, Hesse and Ahlert (2004), corroborate the applicability of the items and strongly confirm the validity and reliability of this semantic differential for website satisfaction. We slightly adapted the introduction to the two bipolar items to make them more applicable to the context of our research. We then used the Webster’s Collegiate Thesaurus (1988) to assess the linguistic contrast of the antonyms, and subjectively judged the psychological bipolarity of the scales. The analysis supported replication of the original items. To measure loyalty intentions, four Likert scale items were taken from the work of Sirdeshmukh, Singh and Sabol (2002). The items were adapted to reflect online loyalty intentions with respect to purchasing in an EM. Next, using three academic experts in the field of questionnaire design, both instruments for website satisfaction and loyalty intentions were translated into Dutch and then evaluated in terms of concept-scale pairings (cf. Hardesty and Bearden, 2004), wording and interpretability (cf. Cannell et al.; 2004; Foddy, 1993; Reynolds et al.; 1993). Results Tests of Scale Dimensionality, Reliability and Validity The dimensionality of the correlated twelve first-order factor model was re-assessed using Amos 5.0 with maximum likelihood estimation. Except for the chi-square tests (sample 3: χ2= 1352,609, df = 562, p<.001; sample 4: χ2 = 1039,206; df = 562; p<.001), all fit indices demonstrated very good fit with the data for sample 3 (sample 3: GFI .92; AGFI .90; RMR .062; RMSEA .040; NFI .96; TLI .97; CFI: .98) and sample 4 (GFI .91; AGFI .89; RMR 11 As incentive, a raffle of 20 book tokens of 10 euro was communicated. The participants were asked to fill in their e-mail address to engage in the raffle. The e-mail addresses were also used to verify that respondents participated in no more than one survey. 28 .049; RMSEA .038; NFI .95; TLI .97; CFI: .98). As such, the results strongly confirm the twelve-dimensional meaning of the semantic differential. Next, we re-assessed the reliability and validity of the semantic differential (cf. Gerbing and Anderson, 1988; Netemeyer et al., 2003). Table 6 displays the results. Table 6: Reliability and validity statistics for sample 3 and 4 Sample 3 (n= 863) Dimension α Minimum Sample 4 (n=590) AVE α Minimum Item to total Item to total correlation correlation AVE Layout .92 .803 .71 .93 .829 .73 Ease of use .93 .832 .75 .93 .807 .72 Contacting the .96 .882 .82 .96 .894 .83 Institutional control .95 .843 .76 .95 .849 .76 Community .89 .744 .61 .90 .776 .63 Contacting sellers .95 .894 .82 .96 .891 .83 Seller information .94 .800 .77 .94 .801 .76 Product information .89 .754 .61 .86 .705 .55 Pricing mechanisms .92 .791 .70 .90 .765 .66 Assortment .93 .807 .73 .94 .831 .77 Settlement .93 .823 .73 .96 .893 .83 Meeting sellers .95 .858 .80 .92 .814 .71 intermediary Using the same criteria applied in the scale-purification process, the alphas, AVE’s and minimum item-to-total correlations indicated good to very good reliability, and strongly confirm the convergent validity of the twelve EMQ dimensions. Applying the same procedures as described previously (section 3.7), discriminant validity was also strongly confirmed. Finally, we assessed the predictive validity of the EMQ scale by regressing the twelve dimensions on the online purchase attitude, online purchase intention, website satisfaction and loyalty intention. The results are reported below. Table 7: Standardized regression coefficients of EMQ dimensions on attitude, intention, e-satisfaction and e-loyalty. Sample 3 (n=863) Sample 4 (n=590) 29 Dimension Attitude Intention e-Satis. e-Loyal. Attitude Intention e-Satis. e-Loyal. Layout .04 .06 .13* .08 .04 -.04 .18** .09 Ease of use -.00 -.01 .23** .04 .14* .11 .25** .04 Contacting -.00 .03 .02 .00 .07 .06 .03 .07 .05 .10 .17** .10 -.03 -.07 -.01 .02 Community -.03 -.00 .07 -.00 -.09 -.07 .02 -.01 Contacting .01 -.02 .06 .03 .11 .09 .12* .09 .01 -.02 -.09 -.02 -.00 -.01 .08 .03 .04 .02 .05 .02 .12* .07 -.03 .06 .09 .02 -.08 .05 -.03 .03 .04 .02 Assortment .20** .22** .09 .26** .14* .18** .15** .31** Settlement .14* .09 .07 .13* .26** .29** .06 .08 Meeting .19** .17** .10* .13* -.00 -.04 .02 .02 the intermediary Institutional control sellers Seller information Product information Pricing mechanisms sellers R squared .33 .26 .40 .37 .30 .24 .40 .35 Adjusted R .32 .25 .39 .36 .28 .22 .39 .33 squared ** significant at P< .001, * significant at P< .01, A computation of VIF-scores showed an absence of multicollinearity 12 . We then focused on the amount of explained variance and significance of the coefficients. The results strongly support the predictive validity of the EMQ scale as a whole. For both samples, the EMQ dimensions explain between 22% and 39% of its dependents. 12 All scores were below the recommended cut-off value of 10 (sample 3: highest VIF = 2.268; sample 4: highest VIF = 2.158). 30 Of the twelve EMQ dimensions, eight dimensions directly contribute to the variance of at least one of the four dependents across the two datasets. Assortment and Settlement can be labeled as strongest and most stable determinants. The influence of Ease of Use and Meeting Sellers is also strong, though less robust across the dependents or samples. Other dimensions that affect at least one of the dependents include Layout, Institutional Control, Contacting Sellers and Product Information. Regarding the four dimensions that do not significantly contribute to the dependents, more research is needed. Following the literature on website quality (e.g., Kim and Stoel, 2004b; Wolfinbarger and Gilly, 2003), components of first-order factor website quality models are likely to affect consumer purchasing both directly and indirectly. The indirect influence of the components is likely to occur via other quality dimensions (Kim and Stoel, 2004b; Wolfinbarger and Gilly, 2003). Such second-order effects (also see Van der Heijden and Verhagen, 2004) were not part of our direct-effects models, and stress the need for further research. When comparing the results across both samples, a predictive pattern is noteworthy. Settlement strongly predicts purchase attitudes and intentions when purchasing study books, but does have a weak effect when purchasing a car. Meeting Sellers, in contrast, significantly influences all dependents when purchasing a car, but has no effect in the study book context. Possibly, the results can be explained by literature on product complexity. Cars and books have different levels of product complexity, the first being more complex (Iacobucci, 1992), and therefore riskier to buy online (see Bhatnagar, Misra and Rao, 2000). This might explain why physical inspection of the product seems inevitable and online transaction settlement improbable when purchasing a car in an EM. When buying less complex and lower risk products such as books, however, Settlement is likely to be crucial and Meeting Sellers trivial. Falling outside the scope of this cross-validation exercise, which centers on plausible predictive structures instead of advanced nomological models (cf. Agarwal and Karahanna, 2000; Chin et al., 1997), more theoretical advances are needed to address these issues. 4. DISCUSSION The goal of this research is to shed light on the development and usage of the semantic differential in IS research. As such, it aims at preventing arbitrary application of the semantic differential and intends to add to measurement validation in the IS research field (also see Boudreau et al., 2001; Straub; 1989; Straub, Hoffman, Weber and Steinfield, 2002). Drawing upon the semantic differential literature, a framework for semantic differential development is put forward. The framework highlights the need for adoption and extension of established 31 guidelines for scale development, by inclusion of testing procedures for concept-scale interaction, wording credibility, linguistic contrast, psychological bipolarity, and contextual contamination. As such, it complies with the call of Straub et al. (2002) to extend measurement methods in the academic community. A demonstration exercise focusing on the semantic differentiation of EMQ exemplifies the framework. Using theory review, focus group interviews, linguistic testing and data collected in three EMs in the Netherlands, a twelve-dimensional semantic differential was developed and cross-validated. We conclude this study with a discussion of implications, recommendations and limitations. Implications for Researchers A significant contribution of this study is its clarification of the prerequisites for semantic differentiation and the provision of corresponding guidelines for research. The literature review underlines the key role of adequate bipolar scale selection, particularly because the scales function as axes of the semantic space that is used to allocate the meaning of the concept. For that reason, the selection of applicable bipolar scales is crucial and demands alignment with the particular goals and objectives of the researcher (Osgood et al., 1957). Moreover, the framework provides guidelines for bipolar scale selection and judgment (sections 3.2 and 3.3), which were demonstrated to result in relevant concept-scale pairings for the EMQ construct. Since linguistic encoding forms the heart of semantic differentiation (Osgood et al., 1957), the technique heavily relies on linguistic bipolarity, psychological bipolarity, and credibility of within-scale wording combinations. Directions for testing bipolarity and wording combinations are put forward and empirically illustrated (sections 3.4 and 3.5). Finally, to define the axes of the semantic space, independency of the factors representing these axes is a necessity. The framework addresses this issue by proposing combined adoption of established factor analytic approaches, tests of psychometric properties (sections 3.6, 3.7, 3.9), and a test of contextual contamination (section 3.8). Directions for successful application of these methods are proposed and clarified empirically via the development of the twelve-dimensional semantic differential for EMQ. On the whole, the framework provides researchers with guidelines and examples that they can apply when developing semantic differentials for their own research. While instrument development is a necessary step for building theory in relatively new or unexplored research fields (Straub, 1989), researchers are also encouraged to use existing and validated semantic differentials. Such a confirmatory approach simplifies the instrument development process and is especially useful for the purpose of theory testing (Boudreau et 32 al., 2001; Straub, 1989). The framework adds value to future theory testing studies by highlighting the necessity and principles of concept-scale pairing judgment (section 3.3), linguistic and psychological bipolarity (section 3.4), and linguistic clarity of wording combinations (section 3.5). Given the flaws that are noticed in the usage of existing semantic differentials, we strongly encourage researchers to address these issues when adopting existing semantic differentials. Such a considerate approach prevents that basic requirements are not met and secures the validity of the semantic differential in theory testing and crossvalidation studies (see section 3.9). Implications for Reviewers Prior applications of the semantic differential technique in established academic journals are subject to weaknesses and misunderstandings. Observed flaws include (1) subjective bipolar scale selections; (2) absence of tests of linguistic contrast, psychological bipolarity and wording combinations; (3) ambiguous and weak tailoring of scales and concept; (4) absence of empirical tests of factorial structure; and (5) confusion concerning the use of the semantic differential wording versus Likert scale measurement. To prevent such flaws, we foresee an important task for reviewers to confront researchers with unmet basic semantic differential requirements before paper acceptance. The directives provided in this research project offer reviewers criteria to evaluate the usage of the semantic differential in manuscripts. Reviewers are encouraged to use these criteria and, most importantly, point out to researchers that the semantic differential is not a mere alternative scaling format, and in such simplified form cannot be accepted. One should realize that adoption of the semantic differential as a measurement technique implies acceptance of its prerequisites, and necessitates a comprehensive conceptual and empirical preparation. Implications for EM Studies In line with increased academic attention in the EM research community (e.g., Bapna et al., 2004; Cheng et al., 2006; Lancastre and Lages, 2006), we see the EMQ concept as an interesting avenue of advanced theory testing. Since the existing literature lacks both conceptual and confirmatory studies of EMQ, the concept is likely to fill an existing research gap. Of particular interest would be to extend our work to more advanced nomological networks. Even though our study draws upon the literature to test predictive validity in a plausible structure, the goal of our work is not theory testing per se (cf. Agarwal and Karahanna, 2000). The specified model was developed solely as a plausible means to assess 33 the predictive validity of the semantic differential and should be viewed in this light (cf. Salisbury, Chin and Gopal, 2002, p. 98). We encourage researchers to adopt the EMQ concept and develop and test related theoretical structures in more advanced settings. A possible direction for such research follows from the results of our purification exercise, which show that the influence of EMQ on consumer purchasing is likely to depend on the support provided by the EM during the stages of the purchase decision-making process. Accordingly, it could be useful to replicate our predictive model across EMs that differ in the stages that they support (see Grieger, 2003; Lindemann and Schmid, 1999; Skjøtt-Larsen et al., 2003). Another interesting future research area concerns an extension of our predictive model with product complexity and perceived risk. As indicated by our cross-validation study, both concepts might moderate the relationships between EMQ and consumer purchasing. Falling outside the scope of this methodological paper, both research directions demand future research. Implications for Practice The implications for practice are twofold. First, practitioners in general should recognize that semantic differentiation depends on the requirements as illuminated in this paper. Since questionable usage of the semantic differential is prevalent in the IS literature, validity problems are likely to exist. This implies that these semantic differentials as well as the findings of the studies in which they are used demand cautious adoption, and are unlikely to be a credible basis for management decision-making. Second, the developed EMQ semantic differential offers intermediaries of C2C EMs a validated instrument to gather information about the quality of the EM they operate, based on which this quality can be enhanced. By measuring and stimulating EMQ, intermediaries can improve consumer attitudes, intentions, loyalty and satisfaction. Since some EMQ dimensions explicitly refer to features and actions accounted for by sellers (e.g., Product Information; Meeting Sellers), intermediaries might share their findings with the population of sellers. Together, both parties are able to optimize the EM and influence (potential) buyers’ impressions thereof. Additional insight into differences across product types, as referred to above, might be used as valuable input for further fine-tuning of different trading sections within EMs (e.g., books, cameras and cars). Limitations and Recommendations 34 Irrespective of the conceptual and empirical support for the proposed framework, our work has a number of limitations that generate interesting new directions for research. First, even though studying real EMs contributes to the external validity of our findings (Aaker, 1997), the scope of our work is limited to Dutch EMs and respondents. This implies that the constructed semantic differential is tailored to Dutch linguistics and interpretations. As empirically demonstrated in the literature (e.g., Osgood, 1959; Osgood, 1960; Suci, 1960; Tanaka, Oyama and Osgood, 1963), however, wording interpretations might differ across cultures and even result in different factor structures. In addition, research (e.g., Baack and Singh, 2007; Singh and Baack, 2004) has demonstrated that the influence of website perceptions on consumer online purchasing also differs across cultures and countries. Future research could address these issues and cross-validate EMQ as well as the influence of EMQ on consumer purchasing across different countries and cultures. Second, despite the methodological comprehensiveness of the framework, it is probable that further refinements can be made. One area for improvement includes further bipolarity testing. By addressing linguistic and psychological bipolarity, the framework draws upon a rather broad definition of bipolarity. A stricter interpretation of bipolarity (see Cogliser and Schriesheim, 1994; Schriesheim and Klich, 1991), further extends this view by adding metric testing of (1) the midpoint of each semantic scale pair (see Cogliser and Schriesheim, 1994), and (2) the equidistance of the value scales to their midpoint (Messick, 1957; Schriesheim and Klich, 1991). Future research could address this issue and extend the framework with additional methods for bipolarity testing. Third, falling outside the scope of this study, we did not compare the psychometric properties of the semantic differential versus alternative measurement techniques such as Likert scaling and ratio scaling. Even though results of empirical studies tend to favor the semantic differential over these methods (see Friborg et al, 2006; Van Auken and Barry, 1995; Van Auken, Barry and Bagozzi, 2006), more research in IS settings is needed. Of particular interest would be to juxtapose the most prevalent measurement methods in IS research, and to evaluate their psychometric properties for established IS constructs. Given that weaker psychometric properties of scaling methods result in systematic measurement error, hereby reducing the amount of explained trait variance (Cote and Buckley, 1987; see also Van Auken et al., 2006), selection of a scaling method has substantial implications for theory and practice. We encourage researchers to address these issues and add to an unexplored research field. 35 References Aaker, J.L. "Dimensions of Brand Personality," Journal of Marketing Research (34:3), 1997, pp. 347-356. Agarwal R. and Karahanna, E. “Time Flies When You’re Having Fun: Cognitive Absorption and Beliefs about Information Technology Usage,” MIS Quarterly (24:4), 2000, pp. 665-694. Aladwani, A. M. and Palvia, P.C. "Developing and Validating an Instrument for Measuring User-Perceived Web Quality," Information and Management (39:6), 2002, pp. 467-476. Allport, C.D. and Kerler III, W.A. "A Research Note Regarding the Development of the Consensus on Appropriation Scale," Information Systems Research (14:4), 2003, pp. 356-59. Arbuckle, J.L. and Wothke, W. Amos 4.0 User’s Guide, SmallWaters Corporation, Chicago, 1999. Arbuckle, J.L. Amos 5.0 Update to the Amos User’s Guide, SmallWaters Corporation, Chicago, 2003. Baack, D.W. and Singh N. “Culture and Web Communications,” Journal of Business Research (60:3), 2007, pp. 181-188 Bagozzi, R.P., Yi, Y., and Phillips, L.W. “Assessing Construct Validity in Organizational Research,” Administrative Science Quarterly (36:3), 1991, pp. 421-458. Bailey, J. and Pearson, S.W. “Development of a Tool for Measuring and Analyzing Computer User Satisfaction,” Management Science (29:5), 1983, pp. 530-545. Bakos, Y. “A Strategic Analysis of Electronic Marketplaces,” MIS Quarterly (15:3), 1991, pp. 295-310. 36 Bakos, Y. "The Emerging Role of Electronic Marketplaces on the Internet," Communications of the ACM (41:8), 1998, pp. 35-42. Banerjee, D., Cronan, T.P., and Jones, T.W. "Modeling IT Ethics: A Study in Situational Ethics," MIS Quarterly (22:1), 1998, pp. 31-60. Bapna, R., Goes, P., Gupta, A., and Jin, Y. “User Heterogeneity and its Impact on Electronic Auction Market Design: An Empirical Exploration,” MIS Quarterly (28:1), 2004, pp. 21-43. Barki, H., Rivard, S., and Talbot, J. “An Integrative Contingency Model of Software Project Risk Management,” Journal of Management Information Systems (17:4), 2001, pp. 37-69. Bearden W.O., Sharma S., and Teel J. E. “Sample Size Effects on Chi Square and Other Statistics Used in Evaluating Causal Models,” Journal of Marketing Research (19:4), 1982, pp. 425-430. Bearden, W.O. and Netemeyer, R.G. Handbook of Marketing Scales: Multi-item Measures for Marketing and Consumer Behavior Research. Sage Publications, Thousands Oaks, California, 1998. Bearden, W.O., Hardesty, D.M., and Rose, R. L. "Consumer Self-Confidence: Refinements in Conceptualization and Measurement," Journal of Consumer Research (28:1), 2001, pp. 121-34. Belanger, F., Hiller, J. S., and Smith, W. J. “Trustworthiness in Electronic Commerce: The Role of Privacy, Security, and Site Sttributes,” Journal of Strategic Information Systems (11:3-4), 2002, pp. 245-270. Bentler, P.M. and Bonnet, D.G. “Significance Tests and Goodness-of-Fit in the Analysis of Covariance Structure,” Psychological Bulletin (88:3), 1980, pp. 588-606. Bergeron, F., Raymond, L., Rivard, S., and Gara, M-F. “Determinants of EIS Use: Testing a Behavioral Model,” Decision Support Systems (14:2), 1995, pp. 131-146. 37 Bhatnagar, A., Misra, S., Rao, R.H. “On Risk, Convenience, and Internet Shopping Behavior,” Communications of the ACM (43:11), 2000, pp. 98-105. Bhattacherjee, A. "Understanding Information Systems Continuance: An ExpectationConfirmation Model," MIS Quarterly (25:3), 2001, pp. 351-70. Bhattacherjee, A. and Premkunar, G. “Understanding Changes in Belief and Attitude Toward Information Technology Usage: A Theoretical Model and Longitudinal Test,” MIS Quarterly (28:2), 2004, pp. 229-254. Bhattacherjee, A. and Sanford C. “Influence Processes for Information Technology Acceptance: An Elaboration Likelihood Model,” MIS Quarterly, 2006, (30:4), 2006, pp. 805825. Bickart, B.A. “Carryover and Backfire Effects in Marketing Research”, Journal of Marketing Research (30:1), 1993, pp. 52-62. Blair, J. and Presser, S. "An Experimental Comparison of Alternative Pretest Techniques: A note on Preliminary Findings," Journal of Advertising Research (32:2), 1992, RC-2-RC-5. Boudreau, M-C., Gefen, D., and Straub, D.W. "Validation in Information Systems Research: A State-of-the-Art Assessment,” MIS Quarterly (25:1), 2001, pp. 1-16. Bruce, A. R., Briggs, R.O., Shepherd, M.M., Yen, J., and Nunamaker Jr., J.F. “Affective Reward and the Adoption of Group Support Systems: Productivity Is Not Always Enough,” Journal of Management Information Systems (12:3), 1995, pp. 171-185. Burke, K. and Chidambaram, L. "How Much Bandwidth is Enough? A Longitudinal Examination of Media Characteristics and Group Outcomes," MIS Quarterly (23:4), 1999, pp.557-80. Cacioppo, J.T. and Berntson, G.G. “Relationships Between Attitudes and Evaluative Space: A Critical Review, With Emphasis on the Separability of Positive and Negative Substrates,” Psychological Bulletin (115:3), 1994, pp. 401-423. 38 Cannell, C.F., Fowler, F.J., Kalton, G., Oksenberg, L., and Bischoping, K. "New Quantitative Techniques for Pretesting Survey Questions," in Questionnaires (4), M. Bulmer (edt.), Sage Publications, Thousand Oaks, California, 2004, pp.187-201. Carroll, J.B. “Review of the Measurement of Meaning”, Language (35), 1959, pp. 58-77. Cheng, C-B., Chan, C-C.H., and Lin, K-C. “Intelligent Agents for E-marketplace: Negotiation with Issue Trade-Offs by Fuzzy Inference Systems,” Decision Support Systems (42:2), 2006, pp. 626-638. Chenoweth, T., Dowling, K.L., and St. Louis, R.D. “Convincing DSS Users that Complex Models are Worth The Effort,” Decision Support Systems (37:1), 2004, pp. 71-82. Chidambaram, L. and Jones, B. “Impact of Communication Medium and Computer Support on Group Perceptions and Performance: A Comparison of Face-to-Face and Dispersed Meetings,” MIS Quarterly (17:4), 1993, pp. 465-491. Chin, W.W. “Issues and Opinion on Structural Equation Modeling,” MIS Quarterly (22-1), 1998, pp. 7-16. Chin, W.W., Gopal, A., and Salisbury, W.D. “Advancing the Theory of Adaptive Structuration: The Development of a Scale to Measure Faithfulness of Appropriation,” Information Systems Research (8:4), 1997, pp. 342-367. Churchill Jr., G. A. "A Paradigm for Developing Better Measures of Marketing Constructs," Journal of Marketing Research (16:1), 1979, pp. 64-73. Clark, L.A. and Watson, D. "Constructing Validity: Basic Issues in Scale Development," Psychological Assessment (7:3), 1995, pp. 309-319. Clevenger T., Lazier, G.A., and Clark, M.L. “Measurement of Corporate Images by the Semantic Differential,” Journal of Marketing Research (2:1), 1965, pp. 80-82. 39 Cliff, N. “Adverbs as Multipliers,” Psychological Review (66:1), 1959, pp. 27-44. Cogliser, C.C. and Schriesheim, C.A. “Development and Application of a New Approach to Testing the Bipolarity of Semantic Differential Items,” Educational and Psychological Measurement (54:3), 1994, pp. 594-605. Cote, J.A. and Buckley, M.R. “Estimating Trait, Method and Error Variance: Generalizing Across 70 Construct Validation Studies,” Journal of Marketing Research (23:4), 1987, pp. 315-318. Dabholkar, P.A., Thorpe, D.I., and Rentz, J.O. “A Measure of Service Quality for Retail Stores: Scale Development and Validation,” Journal of the Academy of Marketing Science (24:1), 1996, pp.3-16. Darnell, D.K. “Concept Scale Interaction in the Semantic Differential,” Journal of Communications (16:2), 1966, pp.104-115. Davis, F.D. "Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology," MIS Quarterly (13:3), 1989, pp. 319-340. Deese, J. “The Associative Structure of Some Common English Adjectives,” Journal of Verbal Learning and Verbal Behavior (3:5), 1964, pp. 347-357. DeVellis, R.F. Scale Development: Theory and Applications, Sage Publications, Thousand Oaks, California, 2003. De Wulf, K., Schillewaert, N., Muylle, S. and Rangarajan, D. “The Role of Pleasure in Website Success,” Information & Management (43:4), 2006, pp. 434-446. Dickson, J. and Albaum, G. "A Method for Developing Tailormade Semantic Differentials for Specific Marketing Content Areas," Journal of Marketing Research (14:1), 1977, pp. 8791. Dickson, J.W. and Slevin, D.P. "The Use of Semantic Differential Scales in Studying the 40 Innovation Boundary," Academy of Management Journal (18:2), 1975, pp. 381-388. Doherty, N.F., Marples, C.G. and Suhaimi, A. “The Relative Success of Alternative Approaches to Strategic Information Systems Planning: An Empirical Analysis,” Journal of Strategic Information Systems (8:3), 1999, pp. 263-283. Doll. W.J., Xia, W. and Torkzadeh, G. “A Confirmatory Factor Analysis of the End-User Computing Satisfaction Instrument,” MIS Quarterly (18:4), 1994, pp. 453-461. Evanschitzky, H., Iyer, G.R., Hesse, J., and Ahlert, D. “E-Satisfaction: A Re-examination,” Journal of Retailing (80:3), 2004, pp. 239-247. Falthzik, A.M. and Johnson, M.A. “Statement Polarity in Attitude Studies,” Journal of Marketing Research (11:1), 1974, pp. 102-105. Feldman, J.M. and Lynch, J.G. Jr. “Self-Generated Validity: Effects of Measurement on Belief, Attitude, Intention and Behavior,” Journal of Applied Psychology (73:3), 1998, pp. 421-435. Foddy, W. Constructing Questions for Interviews and Questionnaires: Theory and Practice in Social Research, Cambridge University Press, Cambridge, UK, 1993. Foddy, W. "An Empirical Evaluation of In-Depth Probes Used to Pretest Survey Questions," Sociological Methods & Research (27:1), 1998, pp. 103-133. Foddy, W. "The In-Depth Testing of Survey Questions: A Critical Appraisal of Methods," in Questionnaires (4), M. Bulmer (edt), Sage Publications, Thousand Oaks, California, 2004, pp. 329-338. Fornell, C. and Larker, D.F. “Evaluating Structural Equation Models with Unobserved Variables and Measurement Error,” Journal of Marketing Research (18:1), 1981, pp. 39-50. Friborg, O., Martinussen, M., and Rosenvinge J.H. “Likert-Based vs. Semantic DifferentialBased Scorings of Positive Psychological Constructs: A Psychometric Comparison of Two 41 Versions of a Scale Measuring Resilience,” Personality and Individual Differences (40), 2006, pp. 873-884. Galetta, D.F., Ahuja, M., Hartman, A., Thompson, T., and Peace, A.G. “Social Influence and End-User Training,” Communications of the ACM (38:7), 1995, pp. 70-79. Gerbing, D.W. and Anderson, J.C. "An Updated Paradigm for Scale Development Incorporating Unidimensionality and Its Assessment," Journal of Marketing Research (25:2), 1988, pp.186-92. Gerbing, D.W. and Hamilton, J.G. "Viability of Exploratory Factor analysis as a Precursor to Confirmatory Factor Analysis," Structural Equation Modeling (3:1), (1996), pp. 62-72. Green, R.F. and Goldfried, M.R. “On the Bipolarity of Semantic Space,” Psychological Monographs (79:6), 1965, p. 599 (whole no.). Greenberg, J. "The College Sophomore as Guinea Pig: Setting the Record Straight," Academy of Management Review (12:1), 1987, pp. 157-159. Grewal, R., Comer, J.M., and Mehta, R. "An Investigation into the Antecedents of Organizational Participation in Business-to-Business Electronic Markets," Journal of Marketing (65:3), 2001, pp.17-33. Grieger, M. “Electronic Marketplaces: A Literature Review and a Call for Supply Chain Management Research,” European Journal of Operational Research (144:2), 2003, pp. 280294. Griffith, T.L. and Northcraft, G.B. “Cognitive Elements in the Implementation of New Technology: Can Less Information Provide More Benefits?,” MIS Quarterly (20:1), 1996, pp. 99-110. Hair Jr., J.F., Anderson, R.E., Tatham, R.L., and Black, W.C. Multivariate Data Analysis. Prentice-Hall, Inc., Upper Saddle River, New Jersey, 1998. 42 Hambleton, R.K. and Rogers, J.H. "Advances in Criterion-Referenced Measurement," in Advances in Educational and Psychological Testing: Theory and Applications, R.K. Hambleton and J.N. Zaal (eds.), Kluwer Academic Publisers, Dordrecht, the Netherlands, 1991, pp. 3-43 Hardesty, D.M. and Bearden, W.O. "The Use of Expert Judges in Scale Development: Implications for Improving Face Validity of Measures of Unobservable Constructs," Journal of Business Research (57:2), 2004, pp. 98-107. Hawkins, D.I., Albaum, G., and Best, R. "Stapel Scale or Semantic Differential in Marketing Research?," Journal of Marketing Research (11:3), 1974, pp.318-322. Heise, D.R. “Some Methodological Issues in Semantic Differential Research,” Psychological Bulletin (72:6), 1969, 406-422. Heise, D.R. “The Semantic Differential and Attitude Research,” in Attitude Measurement, G.F. Summers (edt.), Rand McNally, Chicago, 1970, pp. 235-253. Hsiao, R-L. “Technology Fears: Distrust and Cultural Persistence in Electronic Marketplace Adoption,” Journal of Strategic Information Systems (12:3), 2003, pp. 169-199. Hu, X., Lin, Z., Whinston, A.B., and Zhang, H. “Hope or Hype: on the Viability of Escrow Services as Trusted Third Parties in Online Auction Environments,” Information Systems Research (15:3), 2004, pp. 236-249. Huang, M-H. “Web Performance Scale,” Information & Management (42:6), 2005, pp. 841852. Iacobucci, D. “An Empirical Examination of Some Basic Tenets in Services: Goods-Services Continua,” in Advances in Services Marketing and Management (1), S. Teresa, D.E. Bowen, and S.W. Brown (eds.), JAI Press, Inc., Greenwich, 1992, pp. 23-52. Jarupathirun, S. and Zahedi, F.M. “Exploring the Influence of Perceptual Factors in the Success of Web-Based Spatial DSS,” Decision Support Systems (43:3), 2007, pp. 933-951. 43 Jarvenpaa, S.L. and Staples, D.S. “The Use of Collaborative Electronic Media for Information Sharing: An Exploratory Study of Determinants,” Journal of Strategic Information Systems (9:2-3), 2000, pp. 129-154. Jayanti, R.K. and Burns, A.C. “The Antecedents of Preventive Health Care Behavior: An Empirical Study,” Journal of the Academy of Marketing Science (26:1), 1998, pp. 6-15. Kahn, R.L. and Cannell, C.F. "The Formulation of Questions," in Questionnaires (1), M. Bulmer (edt.), Sage Publications, Thousand Oaks, California, 2004, pp. 55-78. Kelly, R.F. and Stephenson, R. "The Semantic Differential: An Information Source for Designing Retail Patronage Appeals," Journal of Marketing (31:4), 1967, pp. 43-47. Kerlinger, F.N. Foundations of Behavioral Research (second edition), Holt, Rinehart and Winston, Inc., New York, 1973. Kim, S. and Stoel, L. "Dimensional Hierarchy of Retail Website Quality," Information and Management (41:5), 2004(a), pp. 619-633. Kim, S. and Stoel, L. "Apparel Retailers: Website Quality Dimensions and Satisfaction," Journal of Retailing and Consumer Services (11:2), 2004(b), pp. 109-117. Ko, D-G., Kirsch, L.J., and King, W.R. “Antecedents of Knowledge Transfer from Consultants to Clients in Enterprise System Implementations,” MIS Quarterly (29:1), 2005, pp. 59-85. Lancastre, A. and Lages, L.F. “The Relationship Between Buyer and a B2B e-marketplace: Cooperation Determinants in an Electronic Market Context,” Industrial Marketing Management (35:6), 2006, pp. 774-789. Landon Jr., E.L. "Order Bias, the Ideal Rating, and the Semantic Differential," Journal of Marketing Research (8:3), 1971, pp. 375-78. 44 Lee, H-G. and Clark, T.H. “Market Process Reengineering Through Electronic Market Systems: Opportunities and Challenges,” Journal of Management Information Systems (13:3), 1997, pp. 113-136. Lee, Y. and Kozar, K.A., “Investigating the Effect of Website Quality on E-business Success: An Analytic Hierarchy Process (AHP) approach,”Decision Support Systems (42:3), 2006, pp. 1383-1401. Liao, Z. and Cheung, M.T. “Internet-based E-banking and Consumer Attitudes: An Empirical Study,” Information & Management (39:4), 2002, pp. 283-295. Li, E.Y., McLeod, R., and Rogers J.C. “ Marketing Information Systems in the Fortune 500 Companies: Past, Present and Future,” Journal of Management Information Systems (10:1), 1993, pp. 165-192. Lindemann, M.A. and Schmid, B.F. “Framework for Specifying, Building, and Operating Electronic Markets,” International Journal of Electronic Commerce (3:2), 1999, pp. 7-21. Malhotra, N. K. "A Scale to Measure Self-Concepts, Person Concepts, and Product Concepts," Journal of Marketing (18:4), 1981, pp. 456-64. Malhotra, N.K., Agarwal, J., and Peterson, M. "Methodological Issues in Cross-Cultural Marketing Research," International Marketing Review (13:5), 1996, pp. 7-43. McKinney, V, Yoon, K., and Zahedi, F.M. "The Measurement of Web-Customer Satisfaction: An Expectation and Disconfirmation Approach," Information Systems Research (13:3), 2002, pp. 296-315. McKeen, J.D., Guimaraes, T., and Wetherbe, J.C. “The Relationship Between User Participation and User Satisfaction: An Investigation of Four Contingency Factors,” MIS Quarterly (18:4), 1994, pp. 427-451. Messick, S.J. “Metric Properties of the Semantic Differential,” Educational and psychological measurement (17:2), 1957, pp. 200-206. 45 Mindak, W.A. "Fitting the Semantic Differential to the Marketing Problem," Journal of Marketing (25:4), 1961, pp. 28-33. Mittal, V. and Kamakura, W.A. "Satisfaction, Repurchase Intent, and Repurchase Behavior: Investigating the Moderating Effect of Customer Characteristics," Journal of Marketing Research (38:1), 2001, pp.131-142. Moore, G. C. and Benbasat, I. "Development of an Instrument to Measure the Perceptions of Adopting an Information Technology Innovation," Information Systems Research (2:3), 1991, pp. 192-222. Mykytyn, P.P. and Green, G.I. “Effects of Computer Experience and Task Complexity on Attitude of Managers,” Information & Management (23:5), 1992, pp. 263-278. Netemeyer, R.G., Bearden, W.O., and Sharma, S. Scaling Procedures: Issues and Applications, Sage Publications, Thousands Oaks, California, 2003. Nunnally, J.C. Psychometric Theory, McGraw-Hill, Inc., New York, 1967. Nunnally J. C. and Bernstein, I.H. Psychometric Theory. McGraw-Hill, Inc., New York, 1994. Okazaki, S. “What Do We Know About Mobile Internet Adopters? A Cluster Analysis,” Information & Management (43:2), 2006, pp. 27-141. Oliver, R.L. “A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions,” Journal of Marketing Research (17:4), 1980, pp. 460-469. Osgood, C.E. “The Nature and Measurement of Meaning,” Psychological Bulletin (49:3), 1952, pp. 197-237. Osgood, C.E. and Suci G.J. “Factor Analysis of Meaning,” Journal of Experimental Psychology (50:5), 1955, pp. 325-338. 46 Osgood, C.E., Suci, G.J., and Tannenbaum, P.H. The Measurement of Meaning. University of Illinois Press, Urbana, Illinois, 1957. Osgood, C.E. “The Cross-Cultural Generality of Visual-Verbal Synesthetic Tendencies,” Behavioral Science (5), 1960, pp. 146-169. Palmer, J.W. "Web Site Usability, Design, and Performance metrics," Information Systems Research (13:2), 2002, pp. 151-167. Pavlou, P.A. “Institution-Based Trust in Interorganizational Exchange Relationships: The Role of Online B2B Marketplaces on Trust Formation, “ Journal of Strategic Information Systems (11:3-4), 2002, pp. 215-243. Pavlou, P.A. and Gefen, D. "Building Effective Online Marketplaces with Institution-Based Trust," Information Systems Research (15:1), 2004, pp. 37-59. Peterson, R.A. "On the Use of College Students in Social Science Research: Insights from a Second-Order Meta-Analysis," Journal of Consumer Research (28:3), 2001, pp. 450-461. Ping Jr., R.A. "On Assuring Valid Measures for Theoretical Models Using Survey Data," Journal of Business Research (57:2), 2004, pp. 125-141. Pinker, E.J., Seidmann, A., and Vakrat, Y. "Managing Online Auctions: Current Business and Research Issues," Management Science (49:11), 2003, pp. 1457-1484. Reynolds, N. and Diamantopoulos, A. "The Effect of Pretest Method on Error Detection Rates: Experimental Evidence," European Journal of Marketing (32:5-6), 1998, pp. 480-98. Reynolds, N., Diamantopoulos, A., and Schlegelmilch, B. "Pretesting in Questionnaire Design: A Review of the Literature and Suggestions for Further Research," Journal of the Market Research Society (35:2), 1993, pp. 171-82. Robinson, J.P., Shaver, P.R., and Wrightsman Jr., L.S. "Criteria for Scale Selection and 47 Evaluation," in Measures of personality and social psychological attitudes, J.P. Robinson, P.R. Shaver, and L.S. Wrightsman Jr. (eds.), Academic Press, San Diego, California, 1991, pp. 1-15. Salisbury, D., Chin, W.W., Gopal, A., and Newsted, P.R. “Research Report: Better Theory Through Measurement - Developing a Scale to Capture Consensus on Appropriation,” Information Systems Research (13:1), 2002, pp. 91-103. Sarkar, M.B., Butler, B., and Steinfield, C. "Intermediaries and Cybermediaries: A Continuing Role for Mediating Players in the Electronic Marketplace," Journal of ComputerMediated Communication (1:3), 1995, available online at http://jcmc.indiana.edu/vol1/issue3/sarkar.html. Schriesheim, C.A. and Klich, N.R. “Fiedler’s Least Preferred Coworker (LPC) Instrument: An Investigation of Its True Bipolarity,” Educational and Psychological Measurement (51:2), 1991, pp. 305-315. Segars, A.H. “Assessing the Unidimensionality of Measurement: A Paradigm and Illustration Within the Context of Information Systems Research,” Omega (25:1), 1997, pp. 107-121. Sekaran, U. "Methodological and Theoretical Issues and Advancements in Cross-Cultural Research," Journal of International Business Studies, (14:2), 1983, pp. 61-73. Sharpe, L.K. and Anderson, W.T. “Concept-Scale Interaction in the Semantic Differential,” Journal of Marketing Research (9:4), 1972, pp. 432-434. Shimp, T.A. and Sharma, S. "Consumer Ethnocentrism: Construction and Validation of the CETSCALE," Journal of Marketing Research (24:3), 1987, pp.280-289. Simonson, I. and Drolet, A. “Anchoring Effects on Consumers' Willingness-to-Pay and Willingness-to-Accept,” Journal of Consumer Research (31:3), 2004, pp. 681-690. Singh, S. N. and Dalal, N.P. “Web Home Pages as Advertisements,” Communications of the ACM (42:8), 1999, pp. 91-98. 48 Singh, N. and Baack, D.W. “Studying Cultural Values on the Web: A Cross-Cultural Study of U.S. and Mexican Websites,” Journal of Computer Mediated Communication (9:4), 2004, available online at http://jcmc.indiana.edu/vol9/issue4/singh_baack.html Sirdeshmuk, D., Singh J., and Sabol, B. “Consumer Trust, Value, and Loyalty in Relational Exchanges,” Journal of Marketing (66:1), 2002, pp.15-37. Skjøtt-Larsen, T., Kotzab, H. and Grieger, M. “Electronic Marketplaces and Supply Chain Relationships,” Industrial Marketing Management 32(3), 2003, pp.199-210. Snider, J.G. and Osgood, C.E. (eds.) “Semantic Differential Technique: A Sourcebook,” Aldine Publishing Company, Chicago, 1969. Song, X.M. and Parry. M.E. "A Cross-National Comparative Study of New Product Development Processes: Japan and the United States," Journal of Marketing (61:2), 1997, pp. 1-18. Spreng, R.A., MacKenzie, S.B., and Olshavsky, R.W. “A Reexamination of the Determinants of Consumer Satisfaction,” Journal of Marketing (60:3), 1996, pp. 15-32. Srinivasan, V., Vanden Abeele, P., and Butaye, I. "The Factor Structure of Multidimensional Response to Marketing Stimuli: A Comparison of Two Approaches," Marketing Science (8:1), 1998, pp. 78-88. Stewart, K.A. and Segars, A.H. “An Empirical Examination of the Concern for Information Privacy Instrument,” Information Systems Research (13:1), 2002, pp. 36-49. Straub, D.W. "Validating Instruments in MIS Research," MIS Quarterly (13:2), 1989, pp.147-169. Straub, D.W., Hoffman, D.L., Weber, B.W. and Steinfield, C. “Toward New Metrics for NetEnhanced Organizations,” Information Systems Research (13:3), 2002, pp. 227-238. 49 Suci, G.J. “A Comparison of Semantic Structures in American Southwest culture groups,” Journal of Abnormal and Social Psychology (61:July), 1960, pp. 25-30. Suh, K-S. and Lee Y.E. “The Effects of Virtual Learning on Consumer Learning: An Empirical Investigation,” MIS Quarterly 9(4), 2005, pp. 673-697. Suh, B. and Han, I. “The Impact of Customer Trust and Perceptions of Security Control on the Acceptance of Electronic Commerce,” International Journal of Electronic Commerce (7:3), 2003, pp. 135-161. Szymanski, D.M. and Hise, R.T. “E-Satisfaction: An Initial Examination,” Journal of Retailing (76:3), 2000, pp. 309-322. Tanaka, Y., Oyama, T., and Osgood, C.E. “A Cross-Cultural and Cross-Concept Study of the Generality of Semantic Space,” Journal of Verbal Learning and Verbal Behavior (2:5-6), 1963, pp. 392-405. Tittle, C.K. "Use of Judgmental Methods in Item Bias Studies," in Handbook of Methods for Detecting Test Bias, R.A. Berk (edt), The John Hopkins University Press, Baltimore, Maryland, 1982, pp. 31-63. Tourangeau, R. and Rasisnki, K.A. “Cognitive Processes Underlying Context Effects in Attitude Measurement,” Psychological Bulletin (103:3), 1988, pp. 299-314. Tversky, A. and Kahneman, D. "Judgment under Uncertainty: Heuristics and Biases," Science (185: 4157), 1974, pp.1124-1131. Van Auken, S. and Barry, T.E. “An Assessment of the Trait Validity of Cognitive Age Measures,” Journal of Consumer Psychology (4:2), 1995, pp. 107-132. Van Auken, S., Barry, T.E., and Bagozzi, R.P. “A Cross-Country Construct Validation of Cognitive Age,” Journal of the Academy of Marketing Science (34:3), 2006, pp. 439-455. 50 Van der Heijden, H., Verhagen, T., and Creemers, M. "Understanding Online Purchase Intentions: Contributions from Technology and Trust Perspectives," European Journal of Information Systems (12:1), 2003, pp.41-48. Van der Heijden, H. and Verhagen, T. “Online Store Image: Conceptual Foundations and Empirical Measurement”. Information & Management (41:5), 2004, pp. 609-617. Van der Heijden, H. “User Acceptance of Hedonic Information Systems,” MIS Quarterly (28:4), 2004, pp. 695-704. Van Iwaarden, J., Van der Wiele, T., Ball, L., and Millen, R. "Perceptions About the Quality of Web Sites: A Survey Amongst Students at Northeastern University and Erasmus University," Information and Management (41:8), 2004, pp. 947-959. Verhagen, T., Meents, S. and Tan, Y-H. "Perceived Risk and Trust Associated with Purchasing at Electronic Marketplaces," European Journal of Information Systems (15:6), 2006, pp. 542-555. Viswanathan M., Childers, T., and Moore E.S. “The Measurement of Intergenerational Communication and Influence on Consumption: Development, Validation, and CrossCultural Comparison of the IGEN Scale,” Journal of the Academy of Marketing Science (28:3), 2000, pp. 406-424. Watts Sussman, S. and Sproull, L. "Straight Talk: Delivering Bad News Through Electronic Communication," Information Systems Research (10:2), 1999, pp.150-166. Webb, D.J., Green, C.L., and Brashear, T.G. “Development and Validation of Scales to Measure Attitudes Influencing Monetary Donations to Charitable Organizations,” Journal of the Academy of Marketing Science (28:2), 2000, pp. 299-309. Webster, J., Martocchio, J.J., and Joseph, J. “Microcomputer Playfulness: Development of a Measure With Workplace Limitations,” MIS Quarterly (16:2), 1992, pp. 201-226. Webster's collegiate thesaurus. Merriam-Webster, Springfield, Massachusetts, 1988. 51 Weinreich, U. “Travels Through Semantic Space,” Word (14:2-3), 1958, pp. 346-366. Winter, S.J., Saunders, C., and Hart, P. “Electronic Window Dressing: Impression Management with Websites,” European Journal of Information Systems (12:4), 2003, pp. 309-322. Wirtz, J. and Lee, M.C. “An Examination of the Quality and Context-Specific Applicability of Commonly Used Customer Satisfaction Measures,” Journal of Service Research (5:4), 2003, pp. 345-355. Wolfinbarger, M. and Gilly, M.C. "ETailQ: Dimensionalizing, Measuring and Predicting Etail Quality," Journal of Retailing (79:3), 2003, pp. 183-98. Yang, Z., Cai, S., Zhou, Z., and Zhou, N. "Development and Validation of an Instrument to Measure User Perceived Service Quality of Information Presenting Web Portals," Information & Management (42:4), 2005, pp.575-589. Yi, M.Y. and Davis, F.D. "Developing and Validating an Observational Learning Model of Computer Software Training and Skill Acquisition," Information Systems Research (14:2), 2003, pp. 146-169. Zeithaml, V.A., Berry, L.L., and Parasuraman, A. “The Behavioral Consequences of Service Quality,” Journal of Marketing (60:2), 1996, pp. 31-46. 52 APPENDIX A: Overview items used in pilot study 1. Insufficient information to contact sellers – sufficient information to contact sellers 2. Difficult to contact sellers via the website – easy to contact sellers via the website 3. Slow response from sellers (to questions) – fast response from sellers (to questions)*** 4. Insufficient options to contact sellers – sufficient options to contact sellers 5. Insufficient information to contact <name intermediary> - sufficient information to contact <name intermediary> 6. Difficult to contact <name intermediary> via the website – easy to contact <name intermediary> via the website 7. Slow response from <name intermediary> (to questions) – fast response from <name intermediary> (to questions)*** 8. Insufficient options to contact <name intermediary> - sufficient options to contact <name intermediary> 9. Unattractive website layout – attractive website layout 10. Outdated website layout – up to date website layout 11. Boring website layout – interesting website layout 12. Many annoying ads from sponsors (on the website) – few annoying ads from sponsors (on the website)*** 13. Many annoying ads from the company <name intermediary> (on the website) – few annoying ads from the company <name intermediary> (on the website)*** 14. Slow website – fast website*** 15. Many technical problems – few technical problems*** 16. Difficult to navigate website – easy to navigate website 17. Unclear website structure - clear website structure 18. Difficult to search on the website – easy to search on the website 19. Difficult to get an overview of all products from a seller– easy to get an overview of all products from a seller*** 20. Difficult to learn how to use the website - easy to learn how to use the website 21. Difficult to compare prices - easy to compare prices*** 22. Difficult to compare products - easy to compare products*** 23. Difficult to monitor interesting product ads – easy to monitor interesting product ads*** 24. Boring to use the website – interesting to use the website*** 25. Difficult to evaluate <name products> before you buy – easy to evaluate <name products> before you buy*** 26. Unclear how to pay for <name products> – clear how to pay for <name products> 27. Difficult to pay for <name products> - easy to pay for <name products> 28. Unclear how to receive <name products> – clear how to receive <name products> 29. Difficult to receive <name products> – easy to receive <name products> 53 30. Difficult to meet sellers and evaluate <name products> before you buy - easy to meet sellers and evaluate <name products> before you buy 31. Difficult to meet sellers and pay them - easy to meet sellers and pay them 32. Difficult to pick up <name products> at the sellers’ location - easy to pick up <name products> at the sellers’ location 33. Outdated website information – up to date website information*** 34. Insufficient information on how to prevent being swindled – sufficient information on how to prevent being swindled*** 35. Unclear information on how to prevent being swindled – clear information on how to prevent being swindled*** 36. Insufficient information about sellers – sufficient information about sellers 37. Insufficient information about the company <name intermediary> - sufficient information about the company <name intermediary>*** 38. Unclear how final prices are effected – clear how final prices are effected 39. Inconvenient pricing method – convenient pricing method 40. Unreasonable prices - reasonable prices*** 41. Little value for money – much value for money*** 42. Unclear what final price to pay – clear what final price to pay 43. Unclear indication of sellers’ reputation – clear indication of sellers’ reputation 44. Insufficient information about sellers’ reputation - sufficient information about sellers’ reputation 45. Insufficient guarantees – sufficient guarantees 46. Unclear information about guarantees – clear information about guarantees 47. Insufficient information about the privacy policy – sufficient information about the privacy policy 48. Insufficient privacy protection - sufficient privacy protection 49. Unclear information about the rules on <name EM> – clear information about the rules on <name EM> 50. Insufficient rules that protect me on <name EM> – sufficient rules that protect me on <name EM> 51. Weak website security – strong website security 52. Insufficient monitoring of sellers - sufficient monitoring of sellers 53. Passive in removing swindlers – active in removing swindlers 54. Few interesting <name products> – many interesting <name products> 55. Limited range of <name products> – wide range of <name products> 56. Difficult to find the offered <name products> elsewhere – easy to find the offered <name products> elsewhere*** 57. Insufficient number of <name products> - sufficient number of <name products> 58. Unclear descriptions of <name products> - clear descriptions of <name products> 59. Incorrect descriptions of <name products> – correct descriptions of <name products> 60. Bad representation of <name products> (images/photos) – good representation of <name products> (images/photos) 61. Difficult to assess the quality of <name products> - easy to assess the quality of <name products> 62. Insufficient product photos of <name products> – sufficient product photos of <name products> 54 63. Unclear whether <name products> are used - clear whether <name products> are used 64. Unclear condition of <name products> – clear condition of <name products> 65. Difficult to contact other buyers – easy to contact other buyers 66. Difficult to share experiences with other buyers – easy to share experiences with other buyers 67. Few buyers sharing their experiences on <name EM> - many buyers sharing their experiences on <name EM> 68. Insufficient options to communicate with other buyers – sufficient options to communicate with other buyers 69. Weak common bond between buyers – strong common bond between buyers *** = removed after pilot study APPENDIX B: Sample characteristics Sample 1 (n=1428) Sample 2 (n=1051) Sample 3 (n=863) Sample 4 (n=590) % of respondents(n) % of respondents(n) % of respondents(n) % of respondents(n) Male 50.8% (725) 33.7% (354) 70.3% (607) 28.5% (168) Female 49.2% (703) 66.3% (697) 29.7% (256) 71.5% (422) < 21 4.1% (58) 7.2% (76) 8.2% (71) 13.6% (80) 21-30 19% (271) 21.9% (230) 24.6% (212) 24.9% (147) 31-40 34% (486) 27.7% (291) 28.4% (245) 20% (118) 41-50 25.9% (370) 22.0% (231) 19.8% (171) 21.4% (126) 51-60 14% (200) 14.5% (152) 15.5% (134) 15.3% (90) > 60 3% (43) 6.8% (6.8) 3.5% (30) 4.9% (29) 0% (0) 0.8% (8) 2.3% (20) 8.1% (48) 0.8% (12) 1.8% (19) 3.2% (28) 9.3% (55) Once per month 2.1% (30) 6.3% (66) 8.8% (76) 15.9% (94) Once per week 6.6% (94) 13.6% (143) 18.5% (160) 21.5% (127) Gender Age Frequency of visiting the EM Never, this is the first time A couple of times per year 55 A couple of times 90.5% (1292) 77.5% (815) 67.1% (579) 45.1% (266) per week Times bought via the Internet Never 0.9% (13) 5.4% (57) 7% (60) 4.9% (29) Once 0.7% (10) 4.2% (44) 5.4% (47) 6.4% (38) Twice 0.6% (9) 7.2% (76) 8.8% (76) 8.1% (48) Three times 0.8% (12) 5.8% (61) 5.7% (49) 6.6% (39) Four times or more 96.9% (1384) 77.4% (813) 73.1% (631) 73.9% (436) Times bought via the EM Never 0.6% (9) 9.3% (98) 28.5% (246) 38.8% (229) Once 1.6% (23) 9.5% (100) 18.5% (160) 17.3% (102) Twice 2.5% (35) 10.4% (109) 16.9% (146) 14.7% (87) Three times 2.7% (39) 8.8% (92) 9.4% (81) 8.3% (49) Four times or more 92.6% (1322) 62% (652) 26.7% (230) 20.8% (123) Note: all measures are self-reported APPENDIX C: Scale purification: results EFA and initial reliability test Variance explained (%), eigen values (italic) Reliability (α) and factor loadings Sample 1 Sample 2 Sample 1 Sample 2 (n=500) (n=500) (n=500) (n=500) Layout 2.265 (1.132) 2.741 (1.271) .90 .89 Layout1 .774 .801 Layout2 .782 .847 Layout3 .821 .866 4.317 (2.159) 4.541 (2.271) .90 .92 ease1 .797 .830 ease2 .783 .824 ease3 .725 .834 ease4 .776 .851 2.768 (1.384) 3.261 (1.630) .97 .96 Ease of use Contacting the 56 intermediary Contmed1 .844 .851 Contmed2 .852 .871 Contmed3 .842 .872 35.640 (17.819) 29.463 (14.731) Instit1 .810 .715 Instit2 .782 .811 Instit3 .785 .810 Instit4 .745 .799 Instit5 .682 .746 Instit6 .819 .824 Instit7 .656 .651 Instit8 .708 .612 Instit9 .653 .467 5.501 (2.750) 8.460 (4.230) Commu1 .698 .613 Commu2 .805 .802 Commu3 .725 .778 Commu4 .789 .804 Commu5 713 .751 3.962 (1.981) 3.373 (1.686) Contsel1 .795 .820 Contsel2 .836 .846 Contsel3 .814 .826 2.199 (1.100) 3.902 (1.951) Infsel1 .665 .750 Infsel2 .750 .823 Infsel3 .745 .831 8.505 (4.252) 8.635 (4.317) .733 .685 Institutional .94 .92 .89 .88 .94 .94 .92 .93 .93 .90 Control Community Contacting sellers Seller information Product information Prodinf1 57 Prodinf2 .727 .708 Prodinf3 .781 .675 Prodinf4 .802 .809 Prodinf5 .791 .788 Prodinf6 .769 .768 Prodinf7 .773 .771 Pricing 2.597 (1.298) 2.384 (1.192) Pricing1 .876 .831 Pricing2 .850 .817 Pricing3 .747 .795 3.827 (1.914) 2.999 (1.500) Assor1 .880 .821 Assor2 .896 .871 Assor3 .878 .858 5.066 (2.533) 5.100 (2.550) Settl1 .827 .802 Settl2 .838 .832 Settl3 .850 .828 Settl4 .813 .818 2.985 (1.492) 2.470 (1.235) Meet1 .838 .786 Meet2 .884 .838 Meet3 .836 .793 79.63% 77.34% .87 .89 .95 .91 .94 .93 .91 .91 mechanisms Assortment Settlement Meeting sellers Total variance Explained APPENDIX D: Factor and item overview scale purification process 58 Factor Item Item wording Layout Layout1 Unattractive website layout – attractive website layout Layout2 Outdated website layout – up to date website layout Layout3 Boring website layout – interesting website layout Ease1 Difficult to navigate website – easy to navigate website Ease2 Unclear website structure - clear website structure Ease3* Difficult to search on the website – easy to search on the Ease of use website Ease4 Difficult to learn how to use the website - easy to learn how to use the website Contacting the Contmed1 intermediary Insufficient information to contact <name intermediary> sufficient information to contact <name intermediary> Contmed2 Difficult to contact <name intermediary> via the website – easy to contact <name intermediary> via the website Contmed3 Insufficient options to contact <name intermediary> - sufficient options to contact <name intermediary> Institutional control Instit1* Insufficient guarantees – sufficient guarantees Instit2 Unclear information about guarantees – clear information about guarantees Instit3 Insufficient information about the privacy policy – sufficient information about the privacy policy Instit4 Insufficient privacy protection - sufficient privacy protection Instit5 Unclear information about the rules on <name EM> – clear information about the rules on <name EM> Instit6* Insufficient rules that protect me on <name EM> – sufficient rules that protect me on <name EM> Instit7* Weak website security – strong website security Instit8* Insufficient monitoring of sellers - sufficient monitoring of sellers Community Instit9* Passive in removing swindlers – active in removing swindlers Commu1* Difficult to contact other buyers – easy to contact other buyers Commu2 Difficult to share experiences with other buyers – easy to share experiences with other buyers Commu3 Few buyers sharing their experiences on <name EM> - many buyers sharing their experiences on <name EM> Commu4 Insufficient options to communicate with other buyers – sufficient options to communicate with other buyers Commu5* Weak common bond between buyers – strong common bond 59 between buyers Contacting sellers Contsel1 Insufficient information to contact sellers – sufficient information to contact sellers Contsel2 Difficult to contact sellers via the website – easy to contact sellers via the website Contsel3 Insufficient options to contact sellers – sufficient options to contact sellers Seller information Infsel1 Insufficient information about sellers – sufficient information about sellers Infsel2 Unclear indication of sellers’ reputation – clear indication of sellers’ reputation Infsel3 Insufficient information about sellers’ reputation - sufficient information about sellers’ reputation Product information Prodinf1 * Unclear descriptions of <name products> - clear descriptions of <name products> Prodinf2 Incorrect descriptions of <name products> – correct descriptions of <name products> Prodinf3 Bad representation of <name products> (images/photos) – good representation of <name products> (images/photos) Prodinf4* Difficult to assess the quality of <name products> - easy to assess the quality of <name products> Prodinf5* Insufficient product photos of <name products> – sufficient product photos of <name products> Prodinf6* Unclear whether <name products> are used - clear whether <name products> are used Prodinf7 Unclear condition of <name products> – clear condition of <name products> Pricing mechanisms Pricing1 Unclear how final prices are effected – clear how final prices are effected Assortment Pricing2 Inconvenient pricing method – convenient pricing method Pricing3 Unclear what final price to pay – clear what final price to pay Assor1 Few interesting <name products> – many interesting <name products> Assor2 Limited range of <name products> – wide range of <name products> Assor3 Insufficient number of <name products> - sufficient number of <name products> Settlement Settl1 Unclear how to pay for <name products> – clear how to pay for <name products> 60 Settl2 Difficult to pay for <name products> - easy to pay for <name products> Settl3 Unclear how to receive <name products> – clear how to receive <name products> Settl4* Difficult to receive <name products> – easy to receive <name products> Meeting sellers Meet1 Difficult to meet sellers and evaluate <name products> before you buy - easy to meet sellers and evaluate <name products> before you buy Meet2 Difficult to meet sellers and pay them - easy to meet sellers and pay them Meet3 Difficult to pick up <name products> at the sellers’ location easy to pick up <name products> at the sellers’ location *: deleted after CFA with sample 1 and sample 2 APPENDIX E: Test of contextual contamination (Means and results independent sample t-tests) EM1 (n = 94) Dependent EMQ Dimension reversed Item reversed (n=32) (n=32) (n=30) M M t-value p-level M t-value p-level Layout 4,20 3,99 .688 .494 4,23 -.125 .901 Ease of use 4,71 4,98 -.791 .432 5,03 -1.101 .275 Contacting the 4,07 5,06 -2.681 .009 4,61 -1.532 .131 Institutional control 4,91 4,81 .358 .721 5,03 -.546 .587 Community 4,88 4,65 .813 .419 5,11 -.900 .372 Contacting sellers 5,22 5,30 -.311 .757 4,99 .784 .436 intermediary 61 Seller information 5,26 5,46 -.664 .509 5,54 -.828 .411 Product information 4,80 4,61 .876 .385 4,73 .298 .771 Pricing mechanisms 4,68 4,64 .123 .902 4,70 -.066 .947 Assortment 5,51 5,71 -.691 .492 5,77 -.912 .365 Settlement 5,25 5,30 -.222 .825 5,03 1.001 .321 Meeting sellers 3,21 3,99 -2.446 .017 3,57 -1.232 .223 EM2 (n= 99) Dependent EMQ Dimension reversed Item reversed (n=35) (n=32) (n=32) M M t-value p-level M t-value p-level Layout 3,79 3,53 .836 .406 3,43 1.222 .226 Ease of use 5,70 5,76 -.251 .803 5,43 1.173 .245 Contacting the 4,54 5,03 -1.525 .132 4,40 .407 .658 Institutional control 4,38 3,96 1.750 .085 4,07 1.417 .161 Community 3,89 2,82 3.532 .001 3,76 .419 .676 Contacting sellers 5,08 5,04 .139 .890 5,24 -.606 .547 Seller information 2,79 2,68 .483 .631 2,61 .594 .555 Product information 4,59 4,45 .677 .501 4,51 .411 .682 Pricing mechanisms 4,50 4,28 .718 .475 4,40 .337 .737 Assortment 5,75 5,54 .720 .474 5,63 .394 .695 Settlement 4,11 4,96 -2.955 .004 4,30 -.678 .500 intermediary 62 Meeting sellers 4,06 4.22 -.591 .557 3.92 .457 .649 APPENDIX F: Measurement instruments used for predictive validity testing Attitude towards purchasing (seven-point Likert scale; ranging from strongly disagree to strongly agree; Van der Heijden et al., 2003). Alpha = 0.96 (sample 1), 0.97(sample 2), 0.95 (sample 3), 0.96 (sample 4). 1. I am positive towards buying a <name product> on the <name> website. 2. The thought of buying a <name product> at the website of <name> is appealing to me. 3. I think it is a good idea to buy a <name product> at the website of <name>. Intention to purchase (seven-point Likert scale; ranging from very unlikely to very likely; Van der Heijden et al., 2003). Alpha = 0.78 (sample 1), 0.78 (sample 2), 0.85 (sample 3), 0.84.(sample 4). How likely is it that you would… 1. return to the <name> website? 2. consider a purchase of a <name product> at the <name> website in the short term? 3. consider a purchase of a <name product> at the <name> website in the long term? 4. purchase a <name product> at the <name> website if you need one? e-Satisfaction (seven-point semantic differentials; Szymanski and Hise, 2000; Evanschitzky et al., 2004) Alpha = 0.85 (sample 3,) 0.87 (sample 4). Overall, how do you feel about your most recent experience at <name EM>? 1. very dissatisfied (=1) to very satisfied (=7) 2. very displeased (=1) to very pleased (=7) e-Loyalty (seven-point Likert scale; ranging from very unlikely to very likely; Sirdeshmukh et al., 2002). Alpha = 0.89 (sample 3,) 0.88 (sample 4). How likely is it that you would… 1. conduct most of your future <name product> purchases via the <name> website? 63 2. recommend the <name > website to friends, neighbors and relatives? 3. use the <name> website the very next time you purchase a <name product>? 4. do more than 50% of your future <name product> purchase via the <name> website? 64
© Copyright 2026 Paperzz