The Dynamics of Seller Reputation: Theory and Evidence from eBay Luı́s M B Cabral New York University Ali Hortaçsu University of Chicago Preliminary draft: April 2003 Abstract We test specific implications of existing models of reputation using eBay data on auction prices and seller reputation. Previous papers have looked at the relation between reputation and sale price. In addition to this relation, we look at a series of additional testable implications, including the evolution of reputation over time. Our evidence is mixed. We find some support for an adverse selection model where sellers can secretely change their identity. We also find support for a model where, in the spirit of Diamond (1989), a perfect record is very important and is worth investing on. Finally, our results reject implications of a pure moral hazard model as well as the Holmström (1999) model of “career concerns.” 1 Introduction A great many models of reputation have been proposed in the economic theory literature, some featuring adverse selection, some moral hazard, some both. We believe data from online transactions such as eBay provide an excellent ground for testing some of these theories. There is already some empirical literature that uses eBay data on seller reputation. These studies typically test whether seller reputation matters, specifically, whether seller price is influenced by seller reputation. While we too take this into consideration, we go beyond this and test implications of alternative models of seller reputation. Some implications relate reputation to selling price; other implications refer to the evolution of reputation over time. Our evidence is mixed. We find some support for an adverse selection model where sellers can secretely change their identity. We also find support for a model where, in the spirit of Diamond (1989), a perfect record is very important and is worth investing on. Finally, our results reject implications of a pure moral hazard model as well as the Holmström model of “career concerns.” The paper is structured as follows. In Section 2, we present an overview of the theoretical literature, followed by a summary of the main models that we propose to test. Section 3 presents the dataset. Section 4, divided into several subsections, presents a series of tests of the theoretical models. Finally, Section 5 concludes the paper. 2 The economic theory of reputation The economic theory of reputation can be broadly divided into two categories: models of adverse selection, where reputation corresponds to a Bayesian posterior regarding the informed player’s type; and repeated-game models, where reputation corresponds to a self-enforcing implicit contract between two parties. Seminal papers presenting these notions of reputation include Kreps et al. (1982) and Klein and Leffler (1981), respectively. Cabral (2002) argues that the term of “trust” is more appropriate to the second meaning of reputation. The fact is that, in the literature, the term “reputation” takes at these two very different meanings. In this paper, we consider both of the above notions of reputation. The literature on each of these strands is extremely vast. Here, we limit ourselves to some of the main ideas, especially as they relate to our empirical work. The original goal of Kreps et al. (1982) and companion papers was to establish the 1 notion of reputation in a dynamic model with asymmetric information. By repeatedly choosing actions that a certain type of player would choose one acquires the reputation of being of that type. If such a reputation is valuable, then players will “invest” in reputation by initially taking a suboptimal action that influences other players’ belief in the desired direction. Holmström (1982/1989) and Diamond (1989) propose two extensions of the basic framework. One question that is common to these two papers is how a player’s effort to invest in reputation evolves over time and as a function of the current reputation level. Reputation is an asset. This raises a number of issues, for example: can reputation be traded as any other asset? Tadelis (1999) provides conditions under which names must be traded in equilibrium. Mailath and Samuelson (2001) examine what types of players have the most to gain from acquiring a good name / reputation. Finally, Cabral (2000) examines when it’s better to start a new reputation instead of extending an existing one. Although these papers do not directly model a market like eBay, their main results have testable implications for seller behavior in on-line auctions. Let us now turn to the repeated-game notion of reputation. Klein and Leffler (1981) stressed that many market agreements are not backed up by formal contracts but rather by mutual trust and the desire to maintain one’s reputation. The idea follows from the folk theorem, namely, that in repeated games and with sufficiently patient players any feasible, individually rational payoff can be attained in a Nash equilibrium.1 Although there are many different equilibria that attain a given average payoff, typical equilibria are based on “grim” strategies: players move along a “good” phase until one of the players deviates from the agreed-upon sequence of strategies. If a deviation takes place, then players move to a “bad” phase (possibly forever). This “bad” phase may be interpreted as the guilty player’s loss of reputation. A number of extensions of the basic folk theorem and grim-strategy equilibria have been suggested, including the case when actions are imperfectly observable (e.g., Green and Porter, 1984). In particular, Dellarocas (2002) considers a model that is fairly close to the problem faced by eBay sellers. Application to eBay: assumptions and notation. The purpose of our paper is to test the economic theory of reputation with eBay data. Although eBay presents a fairly controlled framework, we need to make some simplifying assumptions regarding the behavior of agents before we adapt the 1 In this sense, there were a number of precursors to Klein and Leffler’s (1981) results, including Friedman (1971), Telser (1980). 2 theoretical models. Specifically, we assume that 1. A transaction has two possible outcomes: successful or unsuccessful.2 Consumer benefit is given by ω and ω, respectively.3 2. A successful transaction is reported with probability πP and always reported as a positive. An unsuccessful transaction is reported with probability πN and always reported as a negative.4 3. Consumers are risk neutral: expected utility is a linear function of the probability of a successful transaction. 4. Transaction price is proportional to buyer’s expected utility. Consistent with these assumptions, we now define the following variables: P Positive transaction. Also, number of past positives. N Negative transaction. Also, number of past negatives. n negative feedback ratio: n ≡ N/(P + N ). et seller’s effort at time t (in models with moral hazard). θ seller’s type (in models with adverse selection).5 µ(P, N ) posterior on θ given history (P, N ). Also, willingness to pay. Application to eBay: summary of models. Based on the literature on the economics of reputation, we consider a series of stylized models and derive testable empirical implications. The main models we will consider are the following: 1. Pure adverse selection. The seller’s type, θ, is the seller’s private information. Based on the seller’s record, buyers update their beliefs regarding the seller’s type.6 2 A possible extension is that the outcome is continuous and the transaction successful if the outcome is above some critical value. 3 With no further loss of generality, we assume ω = 1 and ω = 0. 4 In the paper, we will assume πP = πN = 1. However, the results can be extended to the case of misreporting. 6 We are not aware of any paper that considers this simple extreme model. As we will see below, the results are fairly intuitive and straightforward. 3 2. Adverse selection with moral hazard. Suppose that each transaction’s outcome is a function of the seller’s type, θ, as well as the effort the seller puts at time t, et . Suppose moreover that there is a large number of sales in each period. This situation is analogous to Holmström’s (1999) model of career concerns. Even though the original application was to managerial effort, the same ideas apply in the context of eBay seller reputation. 3. Adverse selection with moral hazard: the perfect history case. This is another variant of the adverse selection and moral hazard model. Suppose that there is a positive probability that the seller is “perfect,” which in the eBay context can be defined as a seller who always produces successful transactions. Then sellers will put a lot of effort into increasing and maintaining their reputation while they have a perfect record. This situation is similar to the Diamond’s (1989) model of reputation acquisition. Even though the original application was to credit markets, the same ideas apply in the context of eBay seller reputation. 4. Adverse selection: changing identities. Suppose that the seller can change its identity without the buyer’s knowledge. Specifically, suppose the seller lives for two periods and has the option of maintaining one single history or restarting a new history after the first period. This situation is analogous to Cabral’s (2000) model of reputation stretching. Although the original application was to umbrella branding, the same ideas apply in the context of eBay seller reputation. 5. Adverse selection: “buying” a reputation. Tadelis (1999), Mailath and Samuelson (2001), and others consider the problem of buying names (and the associated reputation). Name trades do not take place at eBay (to the best of our knowledge). However, there is some anecdotal evidence that many sellers started their reputations by making a series of purchases. In fact, it is easier to create a good reputation as a buyer than as a seller. Mailath and Samuelson’s (2001) question, “who wants to buy a reputation?,” can then be addressed in the eBay context. 6. Pure moral hazard with noise. Suppose that the success of a transaction at time t is a function of the seller’s effort at that time, et , and of noise. The optimal equilibrium in this situation is some mechanism whereby buyers reward sellers for good performance and punish them for bad performance. Following the seminal work by Green and Porter (1984), this problem has been examined by Abreu, Pearce and Stacchetti 4 (1986, 1990). A specific application to eBay-like markets is presented by Dellarocas (2003). In Section 3, we present the data we will use for testing these models. In Section 4, we present a series of regressions that test several implications of the theoretical models. 3 Data description In this section, we present the dataset we work with. We first provide a general description of the four products we chose (and a justification of why they were chosen). Next we provide a detailed set of descriptive statistics, with an emphasis on the frequency of negative feedback. 3.1 Description of the product markets We looked at four different relatively homogenous goods markets to isolate price effects. Two of these markets are for collectible coins. We chose coin markets since 1) several previous studies of eBay auctions have looked at this market; hence we will have a solid base for comparison of our results, and 2) the collectible coin market is one of the most active segments on eBay, with YYYYY auction listings every day. The first type of coin we look at are 1/16 oz. 5 dollar gold coins of 2002 vintage (gold American Eagle), produced by the U.S. mint. The second type of coin are 2001 silver proof sets, a set of ten coins of different denomination, also produced by the U.S. mint. An important difference between these two types of coins is that there is more room for “quality” differences between the single gold coin and the proof set. In the data, we found that the gold coins came in three different “grades” - MS–70, MS–69 and MS–67, MS–70 being the highest. There is no grading in proof sets, these are all in “mint” condition, as produced by the U.S. Mint, and are preserved in plastic container. Aside from this difference, the market for both of these coins appear to be very similar. As displayed in the tables below, the average sale price for the gold coin in our data set was $50, and the proof sets sold on average for $78. We found that of 216 gold coin auctions, 90% resulted in sale; similarly, of the 298 mint set auctions, 84% ended in a sale. The average minimum bid set by the gold coin sellers was $20, or about 40% of the average sale price; similarly, mint set sellers started their auctions at $38, or about 50% of sale price. On average 6.8 bidders participated in gold coin auctions, whereas 7.5 bidders bid for 5 mint sets. The sellers of these coins appear to be quite experienced/large: the average coin seller had 1500–1600 overall feedback points. The bidders seem less experienced, with an average of 120–150 feedback points. This suggests that the eBay coin market is populated by “coin dealers” on the sell side, and “coin collectors” on the buy side. We also collected data on characteristics of the auction listing, as constructed by the seller. 78% of the gold coin sellers and 66% of the mint set sellers wrote that they would accept credit cards for payment; similar proportions (54% and 60%, respectively) indicated their willingness to use PayPal, the popular online payment system favored by eBay users. 40% and 33% respectively of the gold coin and mint set auction listings contained an image of the coin, pointing perhaps to the larger degree of information asymmetry regarding the condition of the gold coin. Consistent with this, gold coin listings contained more words than mint set listings, although what we have measured is a rough count of the number of words within a listing, rather than making any inferences about the content of the listing. Lastly, the modal length of the coin auctions were 7 days, ranging from 1 day [[check this!]] to 10 days. The third type of object we look at are IBM Thinkpad T23 PIII notebook computers. We chose this category since, according to FBI’s online fraud investigation unit, most customer complaints regarding online auction fraud come from laptop auctions. We further chose this object since notebook computers tend to come in many different configurations (regarding memory, disk space, peripherals and screen size) , but this particular IBM model seemed to have relatively minor differences in configuration compared to notebooks of other manufacturers. The average sale price of the Thinkpad T23’s in our dataset was $580. Of the 264 auctions in our dataset, 85% of them resulted in a sale, with one auction conspicuously resulting in a $1 sale (apparently due to a seller not setting his minimum bid high enough — the average minimum bid was $105). On average 21.6 bidders participated in these auctions, much higher than for coin auctions. The average seller in these auctions was quite large, with 12 445 total feedback points, although there was a seller with 0 total feedback points (and one with 25695!). Bidders were on average less experienced than coin buyers, with average overall feedback rating of 68. 80% of the sellers used PayPal and accepted credit cars, and 80% of provided an image of the computer, using, on average 683 words to describe the object. These latter two numbers are consistent with the fact that the seller feels obliged to provide more information regarding a big ticket item like a laptop (as opposed to a $50 coin), but it is conceivable that since a laptop is a more complex product, 6 it takes more words to describe it fully. The Thinkpad auctions also appear to be somewhat shorter than the coin auctions — especially the bigger sellers in this market appear to be online computer stores who use eBay as a shopfront; these sellers might be interested in keeping inventory turnover high, and hence tend to list their items on shorter auctions — in fact, the correlation between auction length and seller size is -0.0931. The last set of objects we look at are 1998 Holiday Teddy Beanie Babies, produced by Ty toy company. Beanie babies are another hugely popular collectors item on eBay, and according to FBI’s Internet Fraud unit, comprise the second largest source of fraud complaints on online auctions. As can be seen from the summary statistics on Table 4, this is the least expensive item in our data set, with an average sale price of $10.7. Only 50% of these auctions end in a sale, and only 1.7 bidders on average attend these auctions (notice that there is a monotonic relationship between sale price of the object and number of bidders, confirming an entry cost story explored in Bajari and Hortacsu, 2003). However, sellers tend to set their minimum bids quite high, about $9.8 — which suggests that these sellers have good outside options for these items.7 The average seller once again appears to be a dealer, and the average bidder a collector. 75% of the sellers declare that they will accept PayPal, however only 39% say they will accept a creditcard, most likely reflecting the transaction charges of Visa (for a $11 item, it might not be worth paying the credit card fee). 40% of the auctions contain an image, similar to the figure for coins, and a similar number of words, 300, are used to describe the object. The average auction appears to be shorter than coin auctions, but longer than the Thinkpad auctions. 3.2 A more detailed look at seller characteristics in this market We now take a more careful look at the supply side of the market(s) we are looking at. There were 72 unique sellers in the golden coin market (translating into an average 3 auctions per seller), 157 sellers in the proof set market (2 auctions per seller), 62 sellers in the notebook market (4 auctions per seller), and 238 unique (2.4 auctions per seller) sellers in the Beanie Baby market. We should also note that one seller conducted 133 of the total 264 auctions in the notebook market — the other markets were much less concentrated. This 7 This might correspond to alternative resale venues, or just the value from keeping these toys. Compared to coins, the value of minimun bids is rather surprising, since teddy bears are more difficult to store than coins, and hence one might think that inventory considerations would lead a teddy sellers to want to sell of their items faster. 7 appears consistent with size being a more important factor in the market for notebook computers, both because of scale economies and informational asymmetries (that is, buyers tend to value size more when information asymmetries are more significant). Table 5 shows the breakdown of the distribution of total feedback points of sellers, pooled over the four markets. Assuming that a constant fraction of transactions are rated by bidders (reported to be about 50% by Resnick and Zeckhauser, 2001), the total number of feedback points is a good proxy for the total number of transactions conducted by the seller, and hence a good measure of size. We have 819 unique sellers in our sample. The average seller had 1632 total feedback responses. The median seller had 401. The largest seller has 49558 feedback responses, and the smallest had 0 (is yet to be rated, even though she was selling). As indicated in Figure 2, the distribution of sellers size (proxied by number of feedback points they got) is approximately lognormal, a fact that is hardly surprising given the large empirical literature on firm size distributions.8 3.3 Frequency of bad seller behavior We now investigate the empirical frequency of “bad” seller behavior that creates the need for a reputation mechanism to be in place on eBay. As described earlier and in previous studies of eBay, users on eBay can enter a rating (+1,0 or -1) regarding a certain buy or sell transaction on the eBay feedback forum. Some representative negative comments for sellers have the following textual content: • “THIS PERSON RIPPED ME OFF, SENT SHODDY ITEM INSTEAD OF ITEM LISTED” • “Sold product he didn’t have! Will not send refund! I am filing charges! No ansr” Although the mean and median seller in our sample is quite large (in terms of transactions conducted), they seem to have gotten very few negative comments. As can be seen from Table 6, the average seller in our sample has 4.86 negative feedback points, and the median seller has only 1. The maximum number of negative feedback points is 651. 8 We also broke down this distribution by market, but, aside from differences in concentration, we found no interesting patterns of variation. 8 eBay users can also give a neutral feedback rating to signal some amount of dissatisfaction with the seller. Our interpretation of user comments on eBay chatboards is that the information contained by a neutral rating is perceived by users to be much closer to negative feedback than positive. Table 7 shows the distribution of neutral points. Observe that the distributions of neutrals and negatives are quite close, and that the neutrals distribution first-order stochastically dominates the negative distribution. The average seller received 7.2 neutral comments in her lifetime, with a median of again 1. Since users seem to believe that “anything but a positive comment is a bad sign,” we will lump negative and neutral comments together when talking about “negative” comments. 4 Testing the theory We now consider the test of various economic theories of reputation based on the dataset presented in Section 3. We do so in a series of subsections that deal with different aspects of the data. In each of these subsections, we consider the theory’s testable implications followed by their empirical test. 4.1 The evolution of seller ratings over time eBay provides summary statistics on each seller’s past behavior, including information on how the seller’s performance evolved over time. This allows us to test some theoretical models as they imply particular dynamic patterns. Theory. The simplest model of seller reputation is the pure adverse selection model. Assume that each seller is exogenously endowed with certain characteristics, indexed by θ, which affect the probability of a P transaction. The seller’s characteristics (type) are unknown to buyers, who hold the belief that θ is drawn from a (prior) distribution µ(0, 0). Finally, suppose that sales take place according to an exogenous process that is independent of seller’s type. The pure adverse selection model is a stationary model: the process generating Ps and Ns is the same over time. This implies a straightforward testable implication, namely that the relative frequency of negative and positive feedback should be constant over time. Specifically, define n ≡ N/(P + N ), where N is the number of negative and neutral feedback comments and P the number of positive comments (P + N is therefore the total number of feedback comments). We thus expect the value of n to be constant over time. 9 Consider now the Holmström (1999) model of career concerns. Although the original application of this model was to managerial effort under adverse selection and moral hazard, the main ideas apply to seller reputation as well. Suppose that the probability of a successful transaction is a function of the seller’s type, θ, as well as the effort the seller puts into the transaction, et . Holmström’s (1999) results imply that, in early periods of a seller’s life, while there is significant uncertainty about his type, the seller has a great incentive to increase effort and convince the market that his type is high. In later periods, when the market holds a fairly precise estimate of θ, the seller has lower incentives to increase et . The implication is that n is increasing over time: since effort is decreasing, the probability of a negative is increasing (for a given θ). An alternative variation on the pure adverse selection model is to consider the possibility of identity changes. While we are unsure of the prevalence of this practice, we know that it is possible. All that buyers know is the seller’s eBay ID; they don’t know if this is the first or the nth ID used by that particular seller. Specifically, suppose that a seller’s life is divided into two periods. At the end of the first period, the seller has the option to change his identity without being noticed by buyers. This problem is similar to the problem of reputation stretching considered by Cabral (2000).9 At the end of the first period, the seller has two options: either he continues with the same name, effectively pooling the reputations form initial sales and late sales; or he creates a new name, effectively separating the two reputations. Cabral (2000) shows that, unless the seller is very patient, sellers with good initial histories are more likely to continue with the same name (reputation exploiting). This implies that (a) n is higher for young sellers than for old sellers when they were young and that (b) n is increasing for old sellers. Empirical test. Tables 8 to 11 display the distribution of the ratio of negatives and neutrals to the total number of transactions, n. We calculate this ratio for: 1) all transaction conducted by the seller, 2) transactions conducted in the last 6 months, 3) transactions conducted in the last month, and 4) transactions conducted in the last week. eBay displays the entire set of feedback points and comments for each seller; however, as a summary of the seller’s behavior, it also makes available decompositions of the data over these four time periods. 9 Tadelis (1999, 2002) considers similar questions in a model where sellers can buy and sell names. 10 These tables suggest that the average percentage of negative comments is 1%, and this average appears to be stable across the entire history of the sellers, as well as the most recent history. This suggests, as a benchmark statistical model of negative feedback occurrence, a Poisson arrival process in which, for every transaction, there is a 1% probability of a bad transaction. Further tests. From Tables 9 to 11, it appears that the distribution of n is fairly constant over a seller’s lifetime. We now investigate this a bit more carefully. We divide a seller’s lifetime into two periods: the last six months vs. the period preceding the last six months. We then calculate n for each of these periods separately, and test their equality using a t-test. The t-test can not reject the equality of n across the two periods (t = −0.2767, p = 0.7821). To account for selection effects, we redo this test using a before/after window of one month. The t-test once again fails to reject the equality of means across the seller’s record within the last month vs. his record preceding the last month (t = 0.8361, p = 0.4035). We conducted this test also using information in the feedback tables. We compared the number of negatives received by a seller in the first T reported transactions, as opposed to the number of negatives received in the last T reported transactions, for T = 20 and T = 100. The results show that there is very little difference between the frequency of negatives between the early and late periods — which is not very surprising since 95% of the sellers did not receive a single negative comment! In summary, the results on the evolution of n seem broadly consistent with the pure adverse selection model and not with the Holmström model (career concerns) or the Cabral model (changing identities). 4.2 Reputation, age and price A number of our empirical hypotheses pertain to the relation between reputation and sale price. In this section we investigate this relation. Our basic empirical strategy is to run regressions of the form: price = β(reputation measure) + γ(other demand factors) + . Such regressions have been run by several other authors. Our main contribution is to interpret the coefficients on various reputation measures in light of the theoretical models we presented in Section 2. Theory. Under a pure adverse selection model, once controlling for µ(P, N ), the buyer posterior regarding the seller’s type, seller age should not 11 affect sale price. In fact, µ(P, N ) is all the buyer needs to know to estimate the value of the seller’s product. Under the Holmström (1999) model, as we have seen in Section 4.1, the value of et decreases over time. This implies that, for a given posterior on θ, buyers should be willing to pay less as t increases; that is, age has a negative impact on sale price. Finally, consider the changed identities model. Recall that, at the end of the first period, the seller has two options: either he continues with the same name, effectively pooling the reputations form initial sales and late sales; or he creates a new name, effectively separating the two reputations. In equilibrium, high-type sellers are more likely to continue with the same name. This implies that continuing with the same name is a positive signal of seller quality. It follows that, conditional on µ(P, N ), age has a positive impact on sale price. Empirical test. To test these hypotheses and draw an inference regarding the model governing seller behavior, we run regressions of the form: price = β(reputation measure) + α(age) + γ(other demand factors) + . Under the null hypothesis of the pure adverse selection model, n is a sufficient statistic for reputation. We measure the seller’s age in two different ways. According to our model, the relevant time index is the number of (rated) transactions, since each rated transaction enables buyers to update their prior about the seller’s type. We also measure AGE measure as the number of days that the seller has been active on eBay.10 Tables 12 to 14 reports our first set of results. In these regressions, the dependent variable is the highest bid registered in the auction (according to eBay rules this is equal to the second highest bid plus the bid increment). Regression (1) gives a result that is consistent with the “changing identities” model, since the coefficient on n is negative and statistically significant, and the coefficient on AGE, measured as the total number of transactions conducted by the seller, is significantly positive. In this regression, the other highly significant determinants of the highest bid in the auction are the minimum bid set by the seller, the number of bidders in the auction. Dummy variables to control for differences in different item markets also enter quite significantly (the base object is the IBM Thinkpad). By the same token, the results seem to reject the pure adverse selection model and the Holmström model. 10 Since March 1, 2003, for each auction listing, eBay reports the eBay registration date of each seller, along with the overall feedback score, and the P/(P + N ) ratio for each seller. Before then, a user had to infer the positive feedback ratio and the registration date from data that was several clicks away. 12 There are, of course, several caveats associated with the above regression. We will address these in turn. First, not all items end up being sold; hence, for some auctions in the data set, the highest bid is automatically equal to the seller’s minimum bid. To account for this problem, Regression (2) is restricted to the items that ended up being sold. As can be observed, this results in losing about 200 auctions from the data set. The signs of the coefficients on n and the AGE measure are still in accordance with the changing identities model; however, the statistical significance of the coefficient on n disappears. We also investigate whether n and the seller’s age affect the probability that the object is sold, in a manner consistent with the implications for price. To investigate this, we run the logit analogous specification to Regression (1), with the indicator for sale on the right hand side. We drop NUMBIDS from this regression, since there being more than one bidder perfectly predicts a sale. Regression (3) contains the result of this exercise. n has a significant negative impact on the probability of sale. The coefficient on TOTALTRANS is positive, though not significant. A second issue with the above regressions is that the NUMBIDS variable (the number of unique bidders participating in the auction) is potentially correlated with variables omitted from the regressions above. Since this endogeneity could contaminate other coefficient estimates, we employ the following instrumentation strategy: we generate a dummy variable that turns on whenever the minimum bid in an auction is less than $1. The number of bidders in zero minimum bid auctions is on average XXX bidders larger than number of bidders in non-zero minimum bid auctions. Setting zero minimum bids appears to be a highly idiosyncratic marketing strategy among sellers — the distribution of seller ratings among the 150 sellers that used a zero minimum bid strategy as opposed to the 655 sellers who set nonzero minimum bids appear to be almost identical, as in Table 15. The IV regression results, reported in Table 16, hold up the robustness of the changing identities model: the coefficient on n is negative and significant, and the coefficient on TOTALTRANS is still positive an significant. Hence, the results of this section imply that the “changing identities” model, as opposed to the pure adverse selection and Holmström models, appears to be upheld by the data. 4.3 Recent vs. old reputation history In Section 4.2, we focused on the differential impact of AGE vs. n on sale price. Fortunately, our data set is rich enough to allow us to investigate several other 13 measures of reputation. In particular, as we mentioned previously, eBay allows users to display a seller’s feedback record for different time periods: the last six months, the last month, and the last week of activity. Theory. A model of pure moral hazard with no adverse selection, specifically a model based on trigger-type punishment strategies in a repeated game framework, predicts that only the most recent history of a seller’s conduct should figure into the buyers’ response, and hence into prices. Specifically, suppose that all sellers are of the same type and that the value of θ is chosen by the seller at a cost c(θ). Suppose that the equilibrium played by seller and buyers is the simplest Pareto optimal equilibrium among the set of subgame perfect Nash equilibria. An equilibrium E is simpler than an equilibrium E 0 if it based on shorter histories of observable outcomes than E. Dellarocas (2002) has shown that, in a binomial outcome model like eBay, the simplest Pareto optimal equilibrium only requires one-period histories. More generally, we would expect average price to be only a function of the more recent performance history. Empirical test. To test this prediction, we introduce the 6 month, 1 month and 1 week negative ratios into our price regressions. See Tables 17 to 19. We find the puzzling pattern that prices are positively correlated with the time-segmented values of n. This pattern persists when we insert the overall n separately into the regressions.11 We find that the overall n impacts price negatively; whereas n for the seller’s recent history impacts price positively. The magnitude of the price impact is larger for the seller’s entire history. The pattern persists when we correct for endogeneity of NUMBIDS using zero minimum bids. This finding is indeed quite puzzling. A zero coefficient on recent history negative feedback ratio would suggest that recent information does not contribute much to the market’s “learning” regarding a seller’s type. However, it is difficult to think of a theoretical explanation as to why we should find a positive coefficient on negative signals about a seller’s behavior. We further investigate this puzzle by running the probability-of-sale logit regressions (see Table 20. These regressions yield the result that the negative ratio computed for the entire seller history is the only statistically significant determinant of whether the auction results in a sale; the recent history of the seller, and the AGE of the seller do not enter significantly into these regressions. 11 We only report the results from this specification. 14 4.4 Arrival of negative feedback As suggested by the descriptive statistics presented in Section 3, the distribution of negative feedback is very skewed: most sellers have no or very few negative comments. This suggests that, in addition to looking at the evolution of n, we also look at the timing of appearance of the first, second, etc., negative comment. Theory. Suppose there is a type of seller with θ = 1, i.e., a seller who never produces a negative transaction. In the spirit of Kreps et al. (1982) and Diamond (1989), we would expect new sellers to put a lot of effort into receiving positive feedback initially. A series of P outcomes increases the buyers’ belief that the sellers is “perfect” (θ = 1), and accordingly increases willingness to pay. Because a “perfect” seller never produces negatives, the first negative implies a “discontinuous” update of the buyers’ belief—they no longer believe that the seller is perfect. More importantly, the arrival of the first negative produces a “discontinuous” effect on the seller’s incentives to increase effort: no matter how many positives he receives, buyers will never believe him to be perfect.12 The empirical implication is that, once the first negative appears, the seller’s incentive to invest in reputation is drastically reduced. Lower effort implies a higher probability of a negative. Consequently, we expect the average arrival time of the second negative to be lower than the first one. Empirical test. To test this hypothesis, we collected data on the timing of each negative/neutral comment received by the sellers in the data set (excluding those negative/neutral comments that were received as a “buyer” — though there were only 4 instances of this). We measured “time” in two ways: 1) in transactions, and 2) in days; and computed the time elapsed between consecutive negatives. Interestingly, we noticed that many times when an eBay seller receives a negative comment, there is a “war of words” between the seller and the buyer who places the negative. During this “war of words,” the two parties can give several negatives to each other within a period of two or three days. Therefore, we did not count the negatives that the sellers received during such episodes, and concentrated on the timing between de novo negatives. Tables 21 to 23 report the results of regressions of the dependent variable 12 These ideas are implicit in the models of Kreps et al. (1982) and in particular Diamond (1989). Although the latter is cast in the context of reputation acquisition in a credit market, the main ideas also apply to the reputation of an eBay seller. 15 TIME FROM LAST NEGATIVE on dummy variables that turn on if the relevant comment is the second, third, fourth, fifth, etc., negative comment received by the seller. Specification (1) (Table 21) is the base OLS result, including controls for the different product markets. The base case is the time elapsed until the first negative; hence we see that the time between the first and second negative, on average, is 50 transactions shorter than the time to the first negative (which took 83 transactions for Thinkpads, 83+161 transactions for gold coins, 83+123 transactions for mint sets and 83+181 transactions for teddy bears). The difference between means is significant at the 10% level (which becomes 8% using robust standard errors correcting for heteroskedasticity within sellers—see Table 22). We also see weaker results for the time elapsed between 2nd and 3rd negative, 3rd and 4th negative, 4th and 5th negative, and time elapsed between consecutive negatives. In column (3) (Table 23), we repeat this regression controlling for seller fixed effects to address selection concerns. The time between the first and second negative is 22 transactions shorter on average compared to the time to the first negative; however, this difference is no longer statistically significant. Interestingly, the time between the 2nd and 3rd negatives now appears to be longer than the time to the first negative. These results seem to suggest weak support for the hypothesis. However, we get much stronger support for the hypothesis if we measure “time” in days, instead of in “transactions.” One might argue that in terms of a seller’s effort provision decision, the variation across days might be more informative than variation across individual transactions, since the seller might be shipping objects out in discrete batches. Tables 24 to 26 report the same set of regressions as in column (1) through (3), using the days measure of time. In column (4), we find that it takes, on average, 149 days less for the second negative to arrive than the first negative, and that subsequent negatives come at (roughly) increasingly shorter intervals (202 less days elapse between the 3rd and 4th negative, the time increment between consecutive negatives above the 5th negative is 245 days shorter). All of these differences (from the base case) are strongly statistically significant. Column (5) (Table 25) reaffirms the basic OLS finding using heteroskedasticity correction for seller clusters. When we account for selection using seller fixed effects in column (6) (Table 26, our empirical results are still in accordance with the hypothesis. Controlling for seller fixed effects, time between 2nd and 1st negatives is 84 days shorter than the time to the first negative (which, on average, is 142 days); time between 3rd and 2nd is 59 days shorter, 4th and 3rd is 95 days shorter, 16 and consecutive negatives come at intervals that are 83 days shorter than the time to the first negative. It appears from these results that there is certainly something “special” about the very first negative that a seller receives. Once the first negative arrives, the second one arrives faster. There might be alternative explanations for this finding: for example, it might be possible that a seller takes longer to acquaint himself with the market, and does not do that much business in the early days, hence avoiding the reception of a negative due to a reason other than higher effort provision. However, there is only weak evidence for such a “starting small and gradually building up” effect. In Tables 27 to 29, we regress the variable SALES PER DAY, computed as the average number of transactions that a seller conducted between two consecutive negatives, on dummy variables for the 2nd, 3rd, 4th, etc. negatives. The raw OLS result seems to suggest that the seller is less active during the period leading up to his first negative, as opposed to subsequent periods. However, once we correct for heteroskedasticity, and especially for seller fixed effects, this result no longer holds true — i.e., we can not conclude that the seller was less active in the initial period, and hence was less likely to receive a negative comment. 4.5 “Buying” a reputation A casual look at the feedback comment histories of some eBay sellers reveals an interesting pattern: some sellers appear to have started out as “buyers,” completing a string of purchases before attempting their first sale. This brings about the possibility that some sellers might try to build up a reputation by being a careful buyer, and then leverage this reputation as sellers. Do sellers indeed do this? If so, what kind of sellers are most prone to do it? Theory. Tadelis (1999) and Mailath and Samuelson (2001) address the issue of buying reputations. Although we do not observe (to the best of our knowledge) identity sales on eBay, the initial investment strategy described above has a similar flavor. In particular, the question addressed by Mailath and Samuelson (2001), “Who wants to buy a reputation?,” seems to apply here as well: what seller has an incentive to start off by investing (as a buyer) on an initial reputation history? Is it low-type sellers or high-type sellers? Consider the following stylized model. Suppose that each seller sells one unit per period. Some sellers have two-period lives, some three-period lives. Among the two-period sellers, some have the option of starting a reputation 17 by making a purchase and receiving a P with probability one. For simplicity, we assume that only a measure zero of two-period sellers have this option. This implies that buyers take the initial record as a genuine selling record. The value of an initial P is given by v = x µ(2, 0) − µ(1, 0) + (1 − x) µ(1, 1) − µ(0, 1) . Suppose that the prior on x is exponential with mean in the order of .05. This distribution and this mean are broadly consistent with the empirical evidence. Then it can be shown that lim x,µ→0 ∂v = −1. ∂x Moreover, numerical calculations show that the derivative is negative for values of x, µ lower than .2. We thus expect that, for small values of x and µ (as is the case in the eBay data sample), the benefit from buying a signal is declining in seller type. More generally, this suggests that sellers of lower type have a greater incentive to “purchase” an initial reputation by receiving a series of Ps as a buyer. Test. To test these hypotheses, we utilize our data on the complete feedback records of the sellers. We utilize another source of variation: prior to June 26, 2001, feedback scores received by a user did not formally indicate whether the feedback was in relation to a purchase or a sale. Users had to read through the comments to figure out where the comment came from. This is feasible for sellers with ten comments, but wading through one hundred or one thousand comments would take a lot more time. After June 26, 2001, however, eBay began to indicate, next to each feedback comment, whether comment relates to a purchase (B), as opposed to sale (S). This immediately suggests that sellers who enter the marketplace after June 26, 2001 do not find it very profitable to purchase a reputation. Our data, however, suggest that this is not necessarily true. We isolated 183 sellers in our dataset who entered the marketplace after June 26, 2001. For every transaction for which they received feedback, we coded 1 if it was a “sell,” and 0 if it was a buy. We then defined the variable: U (t) = 1 No. Sells Until Transaction t. t This equals 1 if all transactions up to and including t were sells, and 0 if all were buys. We then classified the user as having used a “buy things first, 18 begin to sell later” strategy, if: U (5) < U (end of sample) and if U (5) < 0.5 and U (end of sample) > 0.5; i.e., we checked whether the user had become more of a seller at the end of the observation period, compared to his first five transactions. (We redid the classification using ten transactions as the ”initial” period, and the results we are going to report were very similar.) This classification scheme revealed that 29 of the 183 sellers in our sample who joined eBay after June 26, 2001, used a “buy first, begin to sell later” strategy. We found that only 3 of these sellers joined eBay within a month after the policy change, hence the fact that these sellers used this strategy can not be attributed to the fact that they were not aware that buyers could easily detect whether their feedback came from “buys” or ”sells.” We then investigated whether the 29 sellers who used the “buy first, begin to sell later” strategy were in any way different from the other sellers. In particular, we compared the ratio of neutral and negative feedbacks for these sellers, n, measured at the end of the observation period. Using the regression reported in Table 30, we found that the sellers who used the “buy first, begin to sell later” strategy had slightly higher negative feedback ratios (0.0008 higher, 8 parts in 10,000!), but the difference was not statistically significant. The sellers who used the “buy first” strategy who joined eBay within 30 days of the policy change had even higher negative ratios (5/1000 higher), but this difference was not statistically significant. Hence, we can not reject the hypothesis that sellers who use the “buy first” strategy are “as good” as the sellers who don’t. We have not yet managed to investigate any differences between sellers who joined eBay before June 26, 2001, and those who joined after, since this requires that we classify the feedback comments as “buys” or “sells” by interpreting the comment text. We hope to complete this for a future version of the paper. To be completed. . . 4.6 Is the percentage of negatives a good proxy for reputation? Much of our analysis is based on the assumption that n provides a good approximation for µ(P, N ), the buyer’s posterior belief regarding the seller’s type. In this section, we investigate this issue in greater detail. The relation between n and µ(P, N ). Consistently with our assumption that buyers are risk neutral, the willingness to pay for an item should be proportional to the E(θ|P, N ), where θ is the probability that the seller 19 µ(n, T ) 1.20 1.15 1.10 ........................................................................................................................................................................................... ... .... . .... ... .... ... ... ... ............... . . . . . . . . . . . . . . . . . . . . . ... ... . . ............. . . . . . . . . . . . . . . . ... ... . . ........... . . . . . . . . . . ... ... . . ....... . . . . . . . . . ... . ......... ...... . . . . . ... ... . .... . . . . ... ... . .. ... ....... ... ... ... ... ... ... ... ... . ... ... ... ... ... . ............................................................................................................................................................................................... ... ... ... 0 5 n 10 Figure 1: Relation between n ≡ N/(N + P ) and the posterior probability of a negative, µ. Two curves are plotted, one for T = 100, one for T = 100, 000. The two curves are virtually identical. produces a successful transaction. The question is then, how good a proxy is n for the value of E(θ|P, N )? Equivalently, let µ(P, N ) ≡ P(negative|P, N ) = 1 − E(θ|P, N ) be the expected probability of a negative. It is easy to see that, for a given prior µ(0, 0), µ(P, N ) is monotonically increasing in n. But how close is it to a linear relationship, as we implicitly assume in our regressions? The relation between n and µ(P, N ) depends crucially on the prior µ(0, 0). One first approach to the question at hand is to consider the null hypothesis that the model is one of pure adverse selection, assume that sales rates are approximately constant across types, and take the population distribution of n as the prior µ(0, 0). The result of this exercise is depicted in Figure 1, which shows the relation between n, the percentage of negative feedback, and µ(n, T ), the expected probability of a negative. Notice that a history (P, N ) can be equivalently denoted by (n, T ), where T is the total number of transactions. Figure 1 depicts E(µ|n, T ) for two values of T : 100 and 100,000. Two facts are suggested by the figure. First, the relation between n and µ(n, T ) is concave for lower values of n and approximately linear for higher values of n. Second, the relation is relatively flat. Third, the relation is virtually independent of the value of T . Can eBay users compute sufficient statistics? On March 1st, 2003, eBay made a change in its auction listing format. Before then, bidders would see only the seller’s overall (net positive) feedback points next to his name. To see the fraction of seller’s negative comments, the bidder would have to click on the seller’s highlighted username, which would take the user to a new “feedback profile” page. There, the bidder could see a breakdown of seller’s overall net feedback into its various components (but only the counts, not the percentages). On March 1st, eBay began to display the ratio of a seller’s 20 positives to overall transactions, in percentage terms. eBay also began to display when the seller joined eBay as a registered user. This interesting “policy change” by eBay allows us to analyze whether users’ response to the reputation measures we have been using has changed, due the increases in the salience of a “percentage” based measure of reputation, as opposed one that is in “levels.” Hence, we replicate the original price regressions, by adding a dummy-interaction term for the auctions in our data set that are from the new listing format period. We first run a regression of price, including the interaction terms of the policy change with NPN ALL, NPN MON (last month’s n), and TOTALTRANS. The result is quite striking: while there is no significant change in bidders’ response to NPN MON and TOTALTRANS before and after the change in the listing format, bidders became more responsive to NPN ALL after the listing format change occurred. In fact, the response of price to the ratio of overall negatives is about three times larger than it was before the listing change! (This result also holds up with an instrumental variables correction for endogeneity of NUMBIDS.) We then ran the logit regression to predict sale probabilities. Once again, NPN ALL appears to be the only significant determinant of the probability of sale. There is no significant change in the response of sale probabilities to the reputation measure, due to the change in listing format (though the sign of the coefficient of the interaction term, NPN ALL NF, is negative, consistent with the response becoming stronger). These results suggest that the change in eBay’s listing format did indeed affect buyers’ response to the reputation measures. To be completed. . . 5 Conclusion To be completed. . . 21 References Abreu, Dilip, David Pearce and Ennio Stacchetti (1986), “Optimal Cartel Equilibrium with Imperfect Monitoring,” Journal of Economic Theory 39, 251–269. Abreu, Dilip, David Pearce and Ennio Stacchetti (1990), “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,” Econometrica 58, 1041–1064. Bajari, Pat, and Ali Hortaçsu (2003), “Winner’s Curse,Reserve Prices and Endogenous Entry: Empirical Insights from eBay,” Rand Journal of Economics, forthcoming. Cabral, Luı́s M B (2000), “Stretching Firm and Brand Reputation,” Rand Journal of Economics 31, 658-673. Cabral, Luı́s M B (2002), “The Economics of Reputation,” Lecture Notes, New York University. Dellarocas, Chrysanthos (2003), “Efficiency and Robustness of eBay-like Online Reputation Mechanisms in Environments with Moral Hazard,” MIT. Diamond, Douglas W (1989), “Reputation Acquisition in Debt Markets,” Journal of Political Economy 97, 828–862. Friedman, James (1971), “A Noncooperative Equilibrium for Supergames,” Review of Economic Studies 28, 1–12. Green, Ed and Robert Porter (1984), “Noncooperative Collusion Under Inperfect Price Information,” Econometrica 52, 87–100. Holmström, Bengt (1999), “Managerial Incentive Problems: A Dynamic Perspective,” Review of Economic Studies 66, 169–182. Mailath, George J, and Larry Samuelson (2001), “Who Wants a Good Reputation?,” Review of Economic Studies 68, 415–441. Tadelis, S. (1999), “What’s in a Name? Reputation as a Tradeable Asset,” American Economic Review 89, 548–563. Telser, L G (1980), “A Theory of Self-enforcing Agreements,” Journal of Business 53, 27–44. 22 Table 1: Golden Coin summary statistics Variable Obs Mean Std. Dev. Min Max saleprice 194 50.75505 10.73737 34.03 107.5 highbid 216 50.03815 13.99348 .99 119 issold 216 .8981481 .303156 0 1 numbids 216 6.814815 4.628824 0 22 minbid 216 19.97801 23.05935 .01 125 sellerrating 216 1596.324 1639.755 0 7501 bidderfb 160 147.2688 304.3082 0 3464 usepaypal 216 .5416667 .4994183 0 1 image 216 .412037 .493345 0 1 wordcount 96 271.2292 115.6992 12 795 creditcard 216 .787037 .4103527 0 1 auction le h 191 5.874346 2.234872 0 10 23 Table 2: Mint set summary statistics Variable Obs Mean Std. Dev. Min Max saleprice 250 77.82204 21.62466 2.01 115 highbid 298 75.78856 25.7028 1 115 issold 298 .8389262 .3682174 0 1 numbids 298 7.540268 6.930637 0 28 minbid 298 38.34238 38.16785 .01 115 sellerrating 298 1475.211 2250.246 3 9497 bidderfb 181 118.0221 181.0399 0 1235 usepaypal 298 .6107383 .488403 0 1 image 298 .3355705 .4729838 0 1 wordcount 107 241.3458 140.0917 47 845 creditcard 298 .6610738 .4741409 0 1 auction le h 269 5.542751 2.674338 0 10 Table 3: Thinkpad summary statistics Variable Obs Mean Std. Dev. Min Max saleprice 226 578.6746 413.5531 1 999 highbid 264 529.5183 429.897 .01 999.99 issold 264 .8560606 .351695 0 1 numbids 264 21.55303 16.46141 0 60 minbid 263 104.6864 260.6692 .01 999.99 sellerrating 264 12442.13 11628.39 0 25695 bidderfb 211 68.12796 244.782 -3 1711 usepaypal 264 .7840909 .412233 0 1 image 264 .8068182 .3955442 0 1 wordcount 220 683.8455 192.5363 33 1360 creditcard 264 .8257576 .3800383 0 1 auction le h 240 4.6375 1.840346 0 10 Figure 2: Histogram. 24 Table 4: 1998 Holiday Teddy Beanie Babies summary statistics Variable Obs saleprice 293 highbid 555 issold 555 numbids 555 minbid 555 sellerrating 555 bidderfb 203 usepaypal 555 image 555 wordcount 345 creditcard 555 auction le h 431 Mean Std. Dev. Min Max 10.74406 3.833471 2 30 11.08982 4.257038 .99 30 .5279279 .4996698 0 1 1.745946 2.938349 0 15 9.821351 5.040711 .01 30 2634.148 4371.484 0 19293 154.1823 296.4487 0 2444 .7567568 .4294278 0 1 .3837838 .486745 0 1 301.0319 164.3157 52 735 .390991 .4884126 0 1 5.269142 2.260263 0 10 Table 5: Total transactions. Variable Obs Mean Std. Dev. Min Max Min Max Min Max Table 6: Negatives. Variable Obs Mean Std. Dev. Table 7: Neutrals. Variable Obs Mean Std. Dev. Table 8: Value of n over seller’s lifetime. Variable Obs Mean Std. Dev. Min Max Table 9: Value of n in most recent six months of transactions. Variable Obs Mean Std. Dev. 25 Min Max Table 10: Value of n in most recent month of transactions. Variable Obs Mean Std. Dev. Min Max Table 11: Value of n in most recent week of transactions. Variable Obs Mean Std. Dev. Min Max Table 12: Seller age and sale price: Regression (1). Variable Obs Mean Std. Dev. Min Max Table 13: Seller age and sale price: Regression (2). Variable Obs Mean Std. Dev. Min Max Table 14: Seller age and sale price: Regression (3). Variable Obs Mean Std. Dev. Min Max Table 15: Sellers with and without minimum bid. Variable Obs Mean Std. Dev. Min Max Table 16: Seller age and sale price: Regression (4). Variable Obs Mean Std. Dev. 26 Min Max Table 17: Impact of n in different periods of history: Regression (1). Variable Obs Mean Std. Dev. Min Max Table 18: Impact of n in different periods of history: Regression (2). Variable Obs Mean Std. Dev. Min Max Table 19: Impact of n in different periods of history: Regression (3). Variable Obs Mean Std. Dev. Min Max Table 20: Impact of n in different periods of history: Logit regression. Variable Obs Mean Std. Dev. Min Max Table 21: Gap between negatives in number of transactions: OLS. Variable Obs Mean Std. Dev. Min Max Table 22: Gap between negatives in number of transactions: OLS. Variable Obs Mean Std. Dev. Min Max Table 23: Gap between negatives in number of transactions: fixed effects. Variable Obs Mean Std. Dev. Min Max Table 24: Gap between negatives in days: OLS. Variable Obs Mean Std. Dev. Min Max Table 25: Gap between negatives in days: OLS. Variable Obs Mean Std. Dev. 27 Min Max Table 26: Gap between negatives in days: fixed effects. Variable Obs Mean Std. Dev. Min Max Table 27: Sales rate between negatives in days: OLS. Variable Obs Mean Std. Dev. Min Max Table 28: Sales rate between negatives in days: heteroskedasticity correction. Variable Obs Mean Std. Dev. Min Max Table 29: Sales rate between negatives in days: fixed effects. Variable Obs Mean Std. Dev. Min Max Table 30: Evidence of reputation buying behavior. Variable Obs Mean Std. Dev. 28 Min Max
© Copyright 2026 Paperzz