Matching and Sorting in Online Dating∗ Günter J. Hitsch Ali Hortaçsu Dan Ariely University of Chicago University of Chicago Duke University Graduate School of Business Department of Economics Fuqua School of Business February 20, 2008 First Version: February 2006 Abstract This paper studies the economics of match formation using a novel data set obtained from a major online dating service. Using detailed information on the users’ attributes and interactions, we estimate a model of mate preferences. We use the Gale-Shapley algorithm to compute the (man- and woman-optimal) stable matches generated by the estimated preferences, and find that they are similar to the matches observed online. We then explore whether the estimated mate preferences, in conjunction with the GaleShapley algorithm, can explain assortative mating patterns in “offline” marriages. We conclude that the estimated mate preferences can generate assortative mating patterns similar to those observed in marriages even in the absence of search frictions. ∗ We thank Babur De los Santos, Chris Olivola, and Tim Miller for their excellent research assistance. We are grateful to Derek Neal, Emir Kamenica, Peter Rossi, and Betsey Stevenson for comments and suggestions. Seminar participants at the 2006 AEA meetings, Boston College, the Choice Symposium in Estes Park, the Conference on Marriage and Matching at New York University 2006, ELSE Laboratory Experiments and the Field (LEaF) Conference, Northwestern University, the 2007 SESP Preconference in Chicago, SITE 2007, the University of Pennsylvania, the 2004 QME Conference, UC Berkeley, UCLA, the University of Chicago, UCL, the University of Naples Federico II, the University of Toronto, Stanford GSB, and Yale University provided valuable comments. This research was supported by the Kilts Center of Marketing (Hitsch), a John M. Olin Junior Faculty Fellowship, and the National Science Foundation, SES-0449625 (Hortaçsu). Please address all correspondence to Hitsch ([email protected]), Hortaçsu ([email protected]), or Ariely ([email protected]). 1 1 Introduction This paper studies the economics of match formation using a novel data set obtained from a major online dating service. Using detailed information on the users’ attributes and interactions we first estimate a rich model of mate preferences. Based on these preference estimates, we then examine whether an economic matching model can explain the observed online matching patterns, and we evaluate the efficiency of the matches obtained on the web site. Finally, we explore whether the estimated preferences and a matching model are helpful in understanding sorting patterns observed “offline,” in marriages. Two distinct literatures motivate this study. The first is the market design literature, which focuses on designing and evaluating the performance of novel market institutions. A significant branch of this literature is devoted to analyzing important real-world matching markets (e.g. Roth and Sotomayor 1990; Niederle and Roth 2003; Abdülkadiroglu and Sönmez 2003; Roth, Sönmez, and Ünver 2004 for matching of medical interns and gastroenterologists to hospitals, children to schools, kidney transfers from donors to recipients). The goal is to understand the allocation mechanism used in particular markets, and to assess whether an alternative mechanism with better theoretical properties (typically in terms of efficiency and/or stability) can improve on the allocations obtained by the current mechanism. The online dating site we analyze is similar to previously analyzed matching markets in that it is used to allocate indivisible “goods” without an explicit price/transfer mechanism. However, the decentralized nature of communication and “contracting” among the online daters makes it difficult to characterize the properties of a realistic model of match formation in this market. Thus, unlike e.g. Abdülkadiroglu and Sönmez 2003, we can not argue for the (in)efficiency and/or stability of the existing mechanism based on theoretical arguments. Moreover, unlike Roth 1984, Niederle and Roth 2003, and Frechette,Ünver, Roth 2007, we do not have empirical evidence on unravelling or related phenomena indicating the (in)efficiency of the current market design. We thus pursue a different strategy to assess the efficiency properties of the matching mechanism used by the website. We show, in Section 4, that although match formation in this market is difficult to model explicitly, the choices made by the website users as to who to communicate with can inform us about their preferences under a weak set of assumptions. Once we estimate the preferences of the users, we can construct the benchmark stable matchings corresponding to these preference profiles using the Gale-Shapley algorithm. We can then compare the observed matching outcomes obtained by the actual market mechanism to those that would have been attained under the Gale-Shapley benchmark. We do this in Section 5. The second literature that motivates this paper is the extensive body of work across economics, sociology, social psychology and anthropology that studies matching and sorting patterns in marriage markets. This literature has documented that marriages are not random, 2 but exhibit strong sorting patterns along many attributes (e.g. Browning, Chiappori, and Weiss 2007, Kalmijn 1998). For example, marriage partners are similar in age, education levels, and physical traits such as looks, height, and weight. Such sorting patterns can arise due to several distinct reasons (Kalmijn 1998). First, sorting can arise due to search frictions. For example, sorting along educational attainment might not reflect a preference for a partner with a certain education level, but rather the fact that many people spend much of their time in the company of others with a similar level of education, for example in school, college, or at work.1 Alternatively, sorting can arise in the absence of any search frictions as an equilibrium consequence of preferences and the market mechanism. For example, men and women may prefer a match with a similar partner, in which case sorting is due to “horizontal” mate preferences. On the other hand, preferences might be purely “vertical,” in the sense that each mate ranks all potential partners in the same way. In the equilibrium of a frictionless market, the ranks of the matched men and women will then be perfectly correlated. If the ranks are monotonically related to the mate’s attributes, there will also be sorting along these attributes. Online dating provides us with a unique environment to study the causes of sorting. First, search frictions in online dating markets are minimal—indeed, a main reason for the existence of online dating sites is to make the search for a partner as easy as possible. Hence, if sorting arises in online dating markets, it must be due to mate preferences and the market mechanism, but not due to search frictions. Second, online dating allows us to observe user attributes and mate choices in great detail. Moreover, the information to us, the researchers, is similar to the information available to the users of the online dating site. In particular, we observe the choice sets faced by the users and the choices they make from these choice sets. Thus, online dating data are particularly useful to estimate mate preferences over various mate attributes. Marriage market data, in contrast, usually lack information on crucial mate attributes such as physical traits, do not include information on the agents’ choice sets, and record final matches (marriages) only. The second objective of this paper is to explore whether we can predict the sorting patterns observed in marriages using the mate preference estimates obtained from our data and an economic matching model. An obvious caveat to this objective is that mate preferences and hence sorting in a dating market may differ from the mate preferences and resulting sorting patterns in a marriage market. However, this objection is not supported by previous studies that document similar matching patterns along observable attributes at various stages of a relationship: Laumann et al. (1994) demonstrate similar degrees of sorting along age, education, and ethnicity/race across married couples, cohabiting couples, and couples in a short-term relationship. We confirm these facts using data from the National Survey of 1 Laumann et al. (1994) document that 60% of all married couples in their sample met in school, at work, at a private party, in church, or at a gym/social club (Table 6.1). 3 Family Growth. Nonetheless, in order to make statement about marriage patterns based on the preference estimates obtained from our data, we need to make the important assumption that conditional on observable attributes, the users of our dating site do not differ in their mate preferences from the population at large. The steps we take towards achieving our research objectives are as follows. We first specify mate preferences as a function of own and partner attributes. To estimate these preferences, we exploit the detailed information on the site users’ attributes and partner search behavior contained in our data. The estimation strategy is based on the assumption that a user contacts a partner if and only if the potential utility from a match with that partner exceeds a threshold value (a “minimum standard”) for a mate. Such a rule is consistent with equilibrium behavior in a search model (Adachi 2003, for example). Based on the preference estimates, we predict who matches with who using the GaleShapley (1962) algorithm.2 Obviously, the centralized market mechanism of the Gale-Shapley model does not provide a literal description of the decentralized structure of the online dating site we study. However, we find that the Gale-Shapley match and actual matches realized on the online dating site are similar in many dimensions. In particular, by comparing the rank of a match (within each user’s ranking of all potential mates) across the actual matches and the matches generated by the Gale-Shapley algorithm, we find that, on average, the site users would not be able to achieve a more preferred match using Gale-Shapley. Because the Gale-Shapley algorithm achieves an efficient match, we thus conclude that the online dating market achieves an approximately efficient outcome. The users of the dating site sort along various attributes, such as age, looks, income, education, and ethnicity. While qualitatively similar to sorting patterns in marriage data, the attribute correlations observed online cannot be directly compared to the correlations in marriages due to differences between the online and “offline” populations along demographic characteristics. We thus re-weight our sample of website users to match the “offline” population along key demographic attributes. We then predict equilibrium matches for this “synthetic” sample of online-daters, using the Gale-Shapley algorithm as before. We find that the Gale-Shapley algorithm, in conjunction with our estimates of mate preferences, can predict the sorting patterns in marriages: the attribute correlations in the observed matches are similar to the correlations observed in actual marriages and dating relationships. A main conclusion, then, is that sorting is not entirely due to search frictions, but rather that “real-world” sorting patterns arise as a consequence of mate preferences and the stability concept alone. This is not to say that search frictions are not present in marriage markets. For example, we under-predict the correlation in education, possibly the main 2 We do not observe offline activities such as eventual marriages in our data. Instead, we define a match as an event where two users exchange a phone number or e-mail address, or an e-mail that contains certain keywords such as “get together” or “let’s meet.” 4 variable for which we expect that search frictions are a cause of sorting. However, our results suggest that preferences also play a very important role. In the final part of our analysis, we attempt to uncover the importance of different preference components in driving observed sorting patterns. As we discussed above, the “horizontal” and the “vertical” components of preferences can, in principle, lead to the same matching outcomes. Our framework enables us to analyze the relative importance of these two different components by constructing counter-factual preference profiles that omit one of the components and re-computing the equilibrium matches. The result of the exercise indicates that “horizontal” preference components are essential in generating the sorting patterns observed in the data. A similar exercise suggests that same-race preferences are essential in generating the lack of intermarriage across racial boundaries. 2 Market and Data Description In this section, we first provide a description of online dating that clarifies how the data were collected. We then provide a brief data description; a detailed description can be found in the Appendix and in Hitsch et al. (2008). 2.1 Overview of Online Dating Online dating is quickly becoming a common means to find a date or marriage partner. According to comScore World Metrix (2006), 17% of all North American and 18% of all European Internet users visited an online personals site in July 2006. In the U.S., 37% of all single Internet users looking for a partner have gone to a dating website (Pew Internet & American Life Project 2006). When they first join the dating service, the users answer questions from a mandatory survey and create “profiles” of themselves.3 A profile is a web page that provides information about a user and can be viewed by any other members of the dating site. The users indicate various demographic, socioeconomic, and physical characteristics, such as their age, gender, education level, height, weight, eye and hair color, and income. The users also indicate why they joined the service, for example to find a partner for a long-term relationship, or, alternatively, a partner for a “casual” relationship. In addition, the users provide information that relates to their personality, life style, or views. For example, the site members indicate what they expect on a first date, whether they have children, their religion, whether they attend church frequently, and their political views. The users can also answer essay questions that provide additional details on their attitudes and personalities, but this information is 3 Neither the names nor any contact information was provided to us in order to protect the privacy of the users. 5 too unstructured to be usable for our analysis. Many users also include one or more photos in their profile. We will explain in more detail later how we used these photos to construct a measure of the users’ physical attractiveness. After registering, the users can browse, search, and interact with the other members of the dating service. Typically, users start their search by indicating an age range and geographic location for their partners in a database query form. The query returns a list of “short profiles” indicating the user name, age, a brief description, and, if available, a thumbnail version of the photo of a potential mate. By clicking on one of the short profiles, the searcher can view the full user profile, which contains socioeconomic and demographic information, a larger version of the profile photo (and possibly additional photos), and answers to several essay questions. Upon reviewing this detailed profile, the searcher then decides whether to send an e-mail (a “first contact”) to the user. Our data contain a detailed, second by second account of all these user activities.4 We know if and when a user browses another user, views his or her photo(s), sends an e-mail to another user, answers a received e-mail, etc. We also have additional information that indicates whether an e-mail contains a phone number, e-mail address, or keyword or phrase such as “let’s meet,” based on an automated search for special words and characters in the exchanged e-mails.5 In order to initiate a contact by e-mail, a user has to become a paying member of the dating service. Once the subscription fee is paid, there is no limit on the number of e-mails a user can send. All users can reply to an e-mail that they receive, regardless of whether they are paying members or not. 2.2 Data Description The analysis in this paper is based on as sample of 3,004 men and 2,783 women located in Boston and San Diego. The sample was chosen from 22,000 users of an online dating service; see Section 4.3 for a description of how the sample was selected. We observe all user activities over a period of three and a half months in 2003. The registration survey asks users why they are joining the site. It is important to know the users’ motivation when we estimate mate preferences, because we need to be clear whether these preferences are with regard to a relationship that might end in a marriage or whether the users only seek a partner for casual sex. More than half of all observed activities (e-mails sent) are due to users who have a stated preference for a long term relationship. Furthermore, more than 20% of all activities are due to users who state that they are “just looking/curious,” an answer that is likely given as it sounds less committal than “hoping to start a long-term relationship.” Activities by users who state that they seek a casual relationship account for 4 5 We obtained this information in the form of a “computer log file.” We do not see the full content of the e-mail, or the e-mail address or phone number that was exchanged. 6 less than 4% of all activities. To understand how representative our sample is compared to the general population, we contrast the reported characteristics of the site users with two samples of the population from the CPS: one sample is representative of the general population in Boston and San Diego, while the other sample only contains Internet users. Men are somewhat over-represented among the dating service users, although the difference is fairly small in the sample used in our estimation and matching analysis (51.9% men versus 48.1% women). As expected, the site users are younger than the population at large: the median user is in the 26-35 age range, whereas the median person in both CPS samples is in the 36-45 age range. The majority of the dating service users are single; married users account for less than 1% of all activities (e-mails sent). Site users are more educated and have a higher income than the general population; once we condition on Internet use, however, the remaining differences are not large. The profile of ethnicities represented among the site users roughly reflects the profile in the corresponding geographic areas, especially when conditioning on Internet use, although Hispanics and Asians are somewhat underrepresented on the San Diego site and whites are overrepresented. Hence, along demographic and socioeconomic attributes, our sample appears to be fairly representative of the population at large. Our data set contains detailed (although self-reported) information regarding the physical attributes of the users. 27.5% post one or more photos online. For the rest of the users, the primary source of information about their appearance is the survey, which asks users to rate their looks on a subjective scale. Users also report their height and weight. As we expected that these are among the main attributes not truthfully reported, we compare the reported characteristics with information on the whole U.S. population, obtained from the National Health and Examination Survey Anthropometric Tables.6 We find that the average weight reported by women is lower (between 6 and 20 lbs) than the average weight in the population. Men report slightly higher weights than the national averages. Also, the stated height of both men and women is somewhat above the U.S. average. Thus, overall the data only provide evidence for small levels of mis-representation. We constructed an attractiveness rating for the photos posted by the site users. This measure is based on the photo ratings (on a scale from 1 to 10) of 100 students at the University of Chicago. Following Biddle and Hamermesh (1998), we standardized each photo rating by subtracting the mean rating given by the subject and dividing by the standard deviation of the subject’s ratings. We then averaged these standardized rating across subjects to obtain a single rating for each photo. Our data reveal that beauty is not entirely in the eye of the beholder: the attractiveness ratings are strongly correlated across observers. 6 The data are from the 1988-1994 survey and only cover Caucasians. 7 3 A Modeling Framework for Analyzing User Behavior Our data is in the form of user activity records that describe, for each user, which profiles were browsed, and to which profiles an e-mail was sent to. In order to interpret the data using a revealed preference framework, we make the following assumption: Assumption Suppose a user browses the profiles of two potential mates, w and w0 , and sends an introductory e-mail to w but not to w0 . Then the user prefers a potential match with w over a potential match with w0 . We will thus interpret user actions as binary choices over potential mates. Let UM (m, w) be the expected utility that male user, m, gets from a potential match with woman w, and let vM (m) be the utility m gets from his outside option of not responding to the ad. If m browses w’s profile, he chooses to send an e-mail if and only if UM (m, w) ≥ vM (m). 3.1 (1) The Adachi Model The threshold-crossing rule above arises naturally in a search model. In particular, we consider a search model due to Adachi (2003), which we believe provides a useful stylized description of user behavior on the dating site. Adachi’s model is set in discrete time, with period discount factor ρ. In each period, there are M men and W women in the market. In each period, man m comes across a randomly sampled profile, w. The sampling is done according to the distribution FW (the corresponding sampling distribution for women is FM ). We assume that the sampling distribution is stationary, and assigns positive probability of meeting each person on the opposite side of the market. A standard assumption (as in Morgan 1995, Burdett and Coles 1996, and Adachi 2003) that guarantees stationarity is that men and women who leave the market upon a match are immediately replaced by agents who are identical to them. Let vM (m) and vW (w) be the reservation utilities of man m and woman w from staying single and continuing the search for a partner. Define the following indicator functions: AW (m, w) = I {woman w accepts man m} = I {UW (m, w) ≥ vW (w)} , AM (m, w) = I {man m accepts woman w} = I {UM (m, w) ≥ vM (m)} . 8 We can then characterize the utility that man m gets upon meeting a woman w: EUM (m, w) = UM (m, w)AM (m, w)AW (m, w) + vM (m) (1 − AW (m, w)) AM (m, w) + vM (m) (1 − AM (m, w)) . The first term in this expression is the utility from a mutual match, and the second and third terms capture the continuation utility from a mismatch. The Bellman equations characterizing the optimal reservation values and search rules of man m and woman w are given by: Z vM (m) = ρ EUM (m, w)dFW (w), (2) Z vW (w) = ρ EUW (m, w)dFM (m). Adachi (2003) shows that the above system of equations defines a monotone iterative map∗ (m), v ∗ (w)) solving the system, ping that converges to a profile of reservation utilities (vM W and thus characterizing the stationary equilibrium in this market.7 The equilibrium reserva∗ (m), v ∗ (w)) can be thought of as person-specific “prices” that clear demand tion utilities (vM W for and supply of that person. 3.2 The Gale-Shapley Model Under some conditions, the predictions of who matches with whom from the Adachi model are identical to the predictions of the seminal Gale-Shapley (1962) matching model. Before explaining this result in detail, we briefly review the Gale-Shapley model. The matching market is populated by the same set of men and women as in Adachi’s model, m ∈ M = {1, . . . , M }, w ∈ W = {M + 1, . . . , M + W }. The preference orderings are generated by UM (m, w) and UW (w, m).8 Let µ(m) denote the match of man m that results from a matching procedure, and let µ(w) be the match of woman w. Note that if µ(m) ∈ / W, then µ(m) = m, and if µ(w) ∈ / M, then µ(w) = w. I.e., agents may remain single. The matching µ is defined to be stable (in the Gale-Shapley sense) if there is no man m and woman w such that UM (m, w) > UM (m, µ(m)) and UW (w, m) > UW (w, µ(w)). That is, in a stable matching it is not possible to find a pair (m, w) who are willing to abandon their partners and match with each other. The set of stable matches in the Gale-Shapley model is not unique. However, the set 7 The solution is not unique, but has a lattice structure in strong analogy to the Gale-Shapley model. See the next section for further details. 8 We impose the restriction that the preferences are strict. 9 of stable matches has two extreme points: the “men-optimal” and “women-optimal” stable matches. The men-optimal stable match is unanimously preferred by men and opposed by all women over all other stable matches, and vice versa (Roth and Sotomayor 1990). Either of these two extreme points can be reached through the use of Gale-Shapley’s deferred-acceptance algorithm.9 3.3 Equivalence Between Decentralized Search Outcomes and Gale-Shapley Stable Matches A remarkable result obtained by Adachi (2003) is that, as search costs become negligible, i.e. ρ → 1, the set of equilibrium matches obtainable in the search model outlined above is identical to the set of stable matches in a corresponding Gale-Shapley marriage model. Adachi’s insight derives from an alternative characterization of Gale-Shapley stable matchings. In particular, let vM (m) = UM (m, µ(m)), and let vW (w) = UW (w, µ(w)) be the utility that m and w get from their match partners. Adachi shows that, in a stable match, vM (m) and vW (w) satisfy the following equations: vM (m) = max {UM (m, w)|UW (w, m) ≥ vW (w)}, W∪{m} (3) vW (w) = max {UW (w, m)|UM (m, w) ≥ vM (m)}. M∪{w} Furthermore, as ρ → 1, the system of Bellman equations (2) becomes equivalent to the system of equations in (3). That is, as agents become more and more patient, or, equivalently, as search costs decline to zero, the search process will lead to matching outcomes that are stable in the Gale-Shapley sense. This is intuitive, as the equations (3) imply that in a stable match, man m is matched with the best woman who is willing to match with him, and vice versa. Generally, Adachi’s model has more than one equilibrium. Analogous to the result on men- and women-optimal matches in the Gale-Shapley model, Adachi shows that the set of solutions of the system of equations (2) has a lattice structure and possesses extreme points. At the men-optimal extreme, men are pickier (i.e., they have higher reservation utilities) and 9 The algorithm that arrives at the men-optimal match works as follows. Men make offers (proposals) to the women, and the women accept or decline these offers. The algorithm proceeds over several rounds. In the first round, each man makes an offer to his most preferred woman. The women then collect offers from the men, rank the men who made proposals to them, and keep the highest ranked men engaged. The offers from the other men are rejected. In the second round, those men who are not currently engaged make offers to the women who are next highest on their list. Again, women consider all men who made them proposals, including the currently engaged man, and keep the highest ranked man among these. In each subsequent round, those men who are not engaged make an offer to the highest ranked woman who they have not previously made an offer to, and women engage the highest ranked man among all currently available partners. The algorithm ends after a finite number of rounds. At this stage, men and women either have a partner or remain single. The women-optimal match is obtained using the same algorithm, where women make offers and men accept or decline these proposals. 10 women are less picky than in any other solution. 3.4 Discussion Of course, actual behavior in the online dating market that we study is not exactly described by the models of Adachi or Gale and Shapley. However, both models capture some basic mechanisms that apply to the workings of the dating market that we study. The Adachi model captures the search process for a partner, and the plausible notion that people have an understanding of their own dating market value, which influences their threshold or “minimum standard” for a partner. The Gale-Shapley model, especially their deferred-acceptance algorithm, captures the notion that stability can be attained through a protocol of repeated rounds of offer-making and corresponding rejections, which reflects the process of the e-mail exchanges between the site users. Moreover, if search frictions on the online dating site are low enough, the difference in matching outcomes as predicted by the two modeling frameworks will be small, as suggested by Adachi’s equivalence result. However, whether this is the case is an empirical question, which we will investigate further in Section 6. 3.5 Costly Communication and Strategic Behavior If sending e-mails is costly, the threshold rule (1) may not hold. As an example, let us assume that there is a single dimensional index of attractiveness in the market, and consider the decision by an unattractive man as to whether he should send an introductory e-mail to a very attractive woman. If composing the e-mail is costly, or the psychological cost of being rejected is high, the man may not send an e-mail, thinking that the woman is “beyond his reach,” even though he would ideally like to match with her. Thus, the estimated preferences based on the threshold crossing rule reflect not only the users’ true preferences, but also their expectations on who is likely to match with them in equilibrium. Such strategic behavior can bias the preference estimates, and thus we will first investigate whether strategic behavior is an important concern in our data (Section 4.2) before we estimate mate preferences. A priori, however, we do not anticipate that strategic behavior is important in the context of online dating. Unlike a conventional marriage market, where the cost of approaching a potential partner is often non-trivial, online dating is designed to minimize this cost. The main cost associated with sending an e-mail is the cost of composing it. However, the marginal cost of producing yet another witty e-mail is likely to be small since one can easily personalize a polished form letter, or simply use a “copy and paste” approach. The fear of rejection should be mitigated by the anonymity provided by the dating site. Furthermore, rejection is common in online dating: in our data, 71% of men’s and 56% of women’s first-contact e-mails do not receive a reply. 11 4 Mate Preference Estimation Our estimation approach is based on a sequence of binary decisions, as in the Adachi (2003) search and matching model of Section 3.1. For each user, we observe the potential mates that he or she browses, and we observe whether a first-contact e-mail was sent. In equilibrium, man m contacts woman w if and only if UM (m, w) > vM (m), and woman w contacts man m if and only if UW (m, w) > vW (w). Under this decision rule, mate preferences can be estimated using discrete choice methods. We first discuss the details of the discrete choice approach. We then examine whether there is any preliminary evidence pointing towards “strategic behavior,” as discussed in Section 3.5. Finally, we present the estimation results. 4.1 Discrete Choice Model We assume that mate preferences depend on observed own and partner attributes, and on an idiosyncratic preference shock: UM (m, w) = UM (Xm , Xw ; θM ) + mw . We split the attribute vector and the parameter vector into separate components: Xw = (xw , dw ) , θM = − + , ϑM . The latent utility of man m from a match with woman w is parametrized , γM βM , γM as UM (Xm , Xw ; θM ) = x0w βM + |xw − xm |0+ + N X α + + |xw − xm |0− γM I {dmk = 1 and dwl = 1} · ϑkl M + mw . α − γM (4) k,l=1 The first component of utility is a simple linear valuation of the woman’s attributes. The other components relate the man’s preferences to his own characteristics. |xw − xm |+ is the difference between the woman’s and man’s attributes if this difference is positive, and |xw − xm |− denotes the absolute value of this difference is the difference is negative.10 For example, consider the difference in age between man m and woman w. If the coefficient + − corresponding to the age difference in γM and γM is negative, it means that users prefer someone of their own age. Note that each component of the difference terms is taken to the power α. The fourth component in (4) relates preferences to categorical attributes of both mates. dmk and dwl are dummy variables indicating that man m and woman w possess a certain trait. For example, if dmk = 1 and dwl = 1 indicate that m is white and that w is Hispanic, then the parameter ϑkl M expresses the relative preference of white men for Hispanic women. We assume that mw has the standard logistic distribution, and is i.i.d. across all pairs of men and women. The reservation values vM (m) and vW (w) are estimated as person-specific 10 Formally, |a − b|+ = max(a − b, 0) and |a − b|− = max(b − a, 0). 12 fixed effects, denoted by cm and cw . Choice probabilities then take the standard logit form: Pr {m contacts w|m browses w} = exp (UM (Xm , Xw ; θM ) − cm ) . 1 + exp (UM (Xm , Xw ; θM ) − cm ) Under these assumptions, the preference parameters can be recovered using a fixed effects logit estimator. We estimate the model with quadratic differences (α = 2). Alternatively, if the attribute differences enter the utility function in linear form (α = 1), the fixed effects logit model is not identified.11 In Hitsch et al. (2008) we check the sensitivity of our results with respect to this functional form assumption. There, we estimate a random effects probit model and compare the probit results with α = 1 to the fixed effects logit results with α = 2. Qualitatively, the results are very similar. Most importantly, we find that the model predictions of Sections 5 and 6 are not sensitive to the exact parametric specification of how attribute differences enter the utility function. 4.2 Preliminary Evidence on “Strategic Behavior” As we discussed in Section 3.5, strategic behavior may bias our preference estimates. We now examine whether there is any preliminary evidence pointing towards such behavior in the data. We focus on decisions based on physical attractiveness, as we expect that strategic behavior would be most prevalent with regard to looks. In particular, we investigate how a user’s propensity to send an e-mail is related to the attractiveness of a potential mate, and whether this propensity is different across attractive versus unattractive searchers. We first construct a choice set for each user that contains all potential mates that this user browses. We then construct a binary variable to indicate the choice of sending an e-mail. Our basic regression specification is a linear probability model of the form e-mail ij = 10 X βd · I{attractiveness j = d} + ui + εij . (5) d=1 Here, e-mail ij = 1 if browser i sends an e-mail to mate j, attractiveness j = d denotes that mate j is in the dth decile of attractiveness ratings, and I is an indicator function. ui is a 11 To see this, note that xw − |xw − xm |+ + |xw − xm |− = xm . Suppose the estimated fixed effect for man m is cm . Let ek = (0, . . . , 1, . . . 0) be a vector with 1 as the kth component. Choose some arbitrary number a. Then the parameter vectors β̂M = βM + (a/xmk ) ek + γ̂M = + γM − (a/xmk ) ek − γ̂M = − γM + (a/xmk ) ek ` ´ + − and the fixed effect ĉm = cm + a will yield same utility function as the one parametrized by βM , γM , γM , cm . 13 person-specific fixed effect, i.e. the optimal search threshold for sending an e-mail to mate j.12 The physical attractiveness measure is used as a proxy for the overall attractiveness of a profile. We run the regression (5) separately for users in different groups of physical attractiveness. I.e., we segment the suitors, indexed by i, according to their physical attractiveness, and allow for the possibility that users in different groups respond differently to the attractiveness of the mates that they browse. Figure 1 shows the relationship between a browsed user’s photo rating and the estimated probability that the browser will send a first-contact e-mail. We see that regardless of the physical attractiveness of the browser, the probability of sending a first-contact e-mail in response to a profile is monotonically increasing in the attractiveness of the photo in that profile. Thus, even if unattractive men (or women) take the cost of rejection and composing an e-mail into account, this perceived cost is not large enough such that the net expected benefit of hearing back from a very attractive mate would be less than the net expected benefit of hearing back from a less attractive mate. These results provide support for our assumption regarding the absence of significant costs of e-mailing attractive users and (consequently) strategic behavior. Note that this evidence is not ultimately conclusive, in that multiple attributes enter into the perceived attractiveness of a given profile, while we focus only on a single dimension, physical attractiveness (the estimation results presented below confirm that physical attractiveness is one of the most important preference components). Nonetheless, the empirical evidence of this section suggests that our assumption is at least approximately correct, and we leave a more detailed examination of strategic behavior for future research. 4.3 Sample and Attribute Selection Because we are not interested in the preferences and matching patterns of users who joined the site to find a partner for “casual” sex, we only keep users who state that they are single, divorced, or describe themselves as “hopeful” in the estimation sample. Furthermore, we eliminate users who indicate that they joined the site to start a “casual” relationship. This left us with users so want to “start a long-term relationship” (59%), are “just looking/curious” (28%), “like to make new friends” (9%), and users who state that “a friend put me up on this” (4%). The latter three groups are likely to comprise users who want to sound less committal than users who state that they are “hoping to start a long-term relationship.” The estimation results and the predicted matching patterns in Sections 5 and 6 change little if we only keep users who explicitly state that they want to start a long-term relationship in our sample.13 12 Conditional logit estimates yield similar results. For the full sample, where we also included the users who may be seeking casual affairs, many parameter estimates were smaller in absolute value. The online behavior of these users appears “less focused” than the behavior of the site members who try to find a long term partner. 13 14 Given the large number of observed characteristics, we can only employ a subset of all potential variables and potential variable interactions in the estimated model. To guide us in this choice we use the results from the “outcome regressions” in Hitsch et al. (2008). These regressions relate the number of unsolicited first-contact e-mails that a user receives in a given time period to his or her attributes. Note that if mate preferences were homogeneous, these regressions would reveal the (common) utility index by which users rank their potential mates.14 This approach can be extended to allow for some preference heterogeneity by only considering first-contacts received from users who are homogeneous along some observed attribute. The outcome regressions included all observed attributes. For the discrete choice model, we select user characteristics that are of interest to us and are quantitatively (and statistically) significant in the outcome regressions. 4.4 Estimation Results Table 3 presents the maximum likelihood estimates of the fixed effects binary logit model. Before exploring the matching predictions implied by these estimates, we first provide a brief discussion of the results. For a comparison to alternative estimation approaches, robustness checks, and a discussion of the implied trade-offs among various attributes, we refer the reader to Hitsch et al. (2008). Our results are qualitatively similar to the “revealed preference” findings of Belot and Francesconi (2006) and Fisman et al. (2006, 2008) using speed dating experiments, and survey studies in psychology (Buss 2003, Etcoff 1999). As expected, we find that the users of the dating service prefer a partner whose age is similar to their own. Women who are single try to avoid divorced men, while divorced women have a preference for a partner who is also divorced. Similarly, single men avoid divorced women, but divorced men have no particular preference to date a divorced woman. Both men and women who have children prefer a partner who also has children. Members with children, however, are much less desirable to both men and women who themselves do not have children. Also, women seeking a long term relationship prefer a partner who indicates that he is also seeking a long term relationship, but no such preference is apparent for men. Looks and physique are important determinants of preferences for both men and women. The utility weights on the looks rating variable differ only slightly across men and women. Also men and women have a stronger preference for mates who describe their looks as “above average” than for average looking members, and they have an even stronger preference for members with self described “very good looks.” Regarding height, we find that men typically avoid tall women, while women have a preference for tall men. Men have a strong distaste for women with a large BMI, while women tend to prefer heavier men. The estimates of income preferences show that women place about twice as much weight on income than men. There is 14 For this statement to be true, we also need to assume that users are uniformly sampled. 15 little evidence for preference heterogeneity here—the absolute value of the distance coefficients is small, and hence own income matters only slightly in the evaluation of a partner’s earnings. Regarding education, we find that both men and women want to meet a partner with a similar education level. While women have an overall strong preference for an educated partner, but also have a relatively small tendency to avoid men who are more educated than themselves, men generally shy away from educated women. The estimated same-race preferences show that both men and women have a preference for a partner of their own ethnicity. Finally, we find that both men and women have a preference for a partner of the same religion. 5 The Structure and Efficiency of Online Matches The purpose of an online dating site is to arrange dates and ultimately matches, marriages in particular. As described in Section 2, the particular site we investigate uses a very decentralized mechanism, in which browsers communicate and interact without any intervention. In this section, we investigate the structure and efficiency of the matches that result from this mechanism. In particular, we discuss whether we can predict the resulting structure of matches using a well-known economic model of match formation—the Gale-Shapley (1962) deferred-acceptance algorithm. 5.1 Sorting: Actual and Predicted To investigate matching patterns, we first have to define what we mean by a “match” in our data. Since we can only track the users’ online behavior, we do not know whether two partners ever met offline or eventually got married. However, our data does allow us to observe whether users exchange a phone number or e-mail address, or whether an e-mail contains certain keywords or phrases such as “get together” or “let’s meet.” We therefore have some indirect information on whether the online meeting resulted in an initial match, i.e. a date between the users. We therefore define a match as a situation where both mates exchange such contact information (i.e., for a match it is not enough for a man to offer his phone number, we also require that the woman responds by sending her contact information). Correlations in the attributes of couples have been studied widely in sociology, psychology, and economics. We find that matching outcomes in online dating also exhibit strong sorting patterns. Table 5, column (I), shows the correlation of several user attributes in the observed matches. Age is strongly correlated across men and women (ρ = 0.70). Looks, as measured by the standardized photo rating, are also strongly correlated (ρ = 0.31). There are smaller but positive correlations in height (ρ = 0.15), BMI (ρ = 0.13), income (ρ = 0.15), and years of education (ρ = 0.12). Having documented the matching patterns in our data, we turn to the question of whether 16 these patterns can be explained by an economic matching model and the mate preferences of the site users. However, as we pointed out in the introduction, it is difficult to model how matches are formed on this site explicitly. Adachi’s search equilibrium model outlined in Section 3.1provides a useful stylized description of search behavior, but there are many aspects of behavior in this market that are not captured by this model. We thus pursue a different strategy, and compare the observed matching outcomes to the matching outcomes that would have been generated by a benchmark mechanism under the preference profiles we estimated. The benchmark in this case is the Gale-Shapley mechanism, which yields stable, and efficient (under declared preference profiles) matching outcomes which we can compare to the observed matching outcomes. Aside from providing a descriptive vantage-point, running the Gale-Shapley algorithm using the estimated preference profiles helps us answer an interesting market design question as well: given the revealed preferences of the users, would significant efficiency gains have been achieved if a centralized matchmaking algorithm (possibly the Gale-Shapley algorithm) had been used by the website? A lively debate on a similar issue exists in the online dating industry: while some major dating services, such as Match.com, Yahoo!Personals, etc., use a decentralized approach allowing users to freely browse and communicate, a “planner” approach is pursued by other major dating sites. eHarmony.com, for example, does not allow users to search and communicate freely, but instead limits users to choose among a subset of all potential partners selected by the company’s proprietary algorithm. To compute the Gale-Shapley matches, we use the estimated preference profiles of the site users (Table 3) and run the deferred-acceptance algorithm for both geographic markets. We calculate both the man-optimal and woman-optimal stable solutions to assess whether the extreme points on the lattice of stable matches yield significantly different matching outcomes. The preferences include an i.i.d. type I extreme value term, mw , for each pair of potential mates.15 We use random draws of these utility terms to draw from the distribution of preference profiles, and compute the corresponding man-optimal and woman-optimal stable matching, accounting for each user’s estimated utility from staying single. We repeat this process 50 times and report the average and standard deviation of the attribute correlations across these 50 repetitions. Table 5, column II (a) and II (b) reports the attribute correlations across matched couples for the man-optimal and woman-optimal Gale-Shapley matches. The correlations in the manoptimal and woman-optimal predictions are virtually identical, which we also found for the other predicted matches reported below; hence, we will no longer separately report the manoptimal and woman-optimal results from now on. We find that the predicted correlations largely agree with the observed keyword corre15 To be precise: Man m’s taste shock for woman w is different from woman w’s taste shock for man m, mw = 6 wm . 17 lations in column (I). The predicted and observed age correlations are identical (ρ = 0.70). Although the predicted correlation in looks is somewhat smaller than the looks correlation in keyword matches (ρ = 0.16 versus ρ = 0.31), this could potentially be caused by the fact that many users in our sample do not post pictures.16 All users report their height and weight, and the correlations predicted by the Gale-Shapley algorithm are slightly higher than the observed correlations. We also get similar correlations in income and education across the keyword and Gale-Shapley matches. We should re-emphasize at this point that the above prediction exercise did not utilize data on matches as an input. Our preference estimates only used data on first-contacts; the estimation did not condition on a match taking place. Table 4 provides summary statistics on the progress of online relationships. Of all first-contact e-mails sent among the users in our estimation sample, 4.8% lead to an eventual match. Typically, a match is achieved only after multiple rounds of e-mail exchanges: the median and mean numbers of e-mails sent back and forth before a match is realized are 6 and 11.9 respectively. Hence, matches are very different events than first-contacts. Hence, the equilibrium matching predictions based on the estimated preferences that we presented above can be regarded as out-of-sample, rather than in-sample predictions. 5.2 Efficiency The finding that the observed and predicted attribute correlations are largely similar suggests that the online dating market achieves an approximately efficient matching. However, these correlations are aggregate measures that might mask important differences in the allocations obtained by the actual (“decentralized”) and Gale-Shapley (“planner”) protocols. Alternatively, we could measure the difference between actual and predicted matches by comparing the identities of the matched couples. However, since the iid error terms, mw and wm , change the stable matches across each run of the deferred-acceptance algorithm, it is unreasonable to expect a one-to-one correspondence between the two match results. Therefore, we choose a different method to evaluate the efficiency of the observed match outcomes. As before, we construct each user’s preference profile over all potential mates, based on the preference estimates and the simulated error terms . For each user i we then record the rank of the match that is achieved in the data, and the rank of the match achieved by the Gale-Shapley algorithm.17 Table 6 shows summary statistics of the predicted ranks achieved by the different matching protocols, averaged across users and 50 Gale-Shapley predictions. The mean difference in ranks between the Gale-Shapley matches and the observed keyword matches is 75 for men 16 The looks correlation was only computed when both members of the matched couple had posted photos. If a user achieved more than one match in the data, we record the highest rank within the set of matched mates and the option of staying single. 17 18 and 66 for women; that is, the Gale-Shapley procedure would have matched our users with partners that are higher in their rankings. Expressed in percent of the highest rank achievable (the users’ “first choice”), the improvement is 5.4% for men and 4.4% for women. Table 6 also compares the observed and predicted matches, and the users’ first choice (the maximum achievable rank). The average difference between the users’ first choice and the ranks achieved by the Gale-Shapley algorithm is 39 for men and 31 for women. On the other hand, the difference between the first choice and the observed keyword matches is 114 for men and 97 for women. The conclusion one may draw from this result is that the estimated preference profiles give rise to equilibrium’ matches that are “not too far away,” on average, from the user’s first choice. This is very likely a result of the considerable heterogeneity in preferences, which “softens” the competition for mates. To illustrate, consider the extreme situation in which all men and women have identical preferences over the opposite sex. With approximately 1500 men and women on each side of the market, the resulting perfect assortative matching would lead to the average user being matched to a person who is 750 ranks away from his or her first choice. The previous results seem to suggest that using the Gale-Shapley procedure may result in significant efficiency gains. However, our comparison is subject to an important bias due to the presence of the iid logit error term. Per our assumption, these error terms are utility components that are observed by the site users but unobserved to us. Hence, in the event that m matches with w, E(mw |m matches with w) ≥ E(mw ) and vice versa with m and w exchanged.18 This means that conditional on a match, the matched partners received a relatively high utility draw. Our procedure thus tends to under-predict the rank of a partner in a match, and thus our comparison of predicted and actual matches tends to over-predict the efficiency gains of the Gale-Shapley protocol. In order to understand the magnitude of this bias, we conduct the following simulation exercise. Given our sample of site users, we construct their preferences as before, including the draws. We then compute a match according to the Gale-Shapley algorithm, and treat the predicted matches as “data.” We then re-draw the terms, and compute a new Gale-Shapley match. The corresponding difference in ranks allows us to explore the magnitude of the bias, for if had no influence on the match outcomes, the predicted matches would be identical. Table 6 shows that the predicted “improvement” in achieved ranks using the Gale-Shapley protocol versus the simulated data is on the same order of magnitude as the predicted difference of Gale-Shapley and observed (keyword) matches. Hence, the bias of our comparison procedure appears to account for the previous finding that the Gale-Shapley protocol leads, on average, to a match with a more preferred partner. We thus conclude that the decentral18 This result is a straightforward consequence of the asymptotic equivalence between the matching resulting from the Adachi and Gale-Shapley models. In the Adachi model, a match only arises if the threshold rule (1) is satisfied. Therefore, mw will necessarily be left-truncated conditional on a match. 19 ized search protocol utilized by the dating site does not result in significant efficiency losses compared to what could have been achieved using the Gale-Shapley procedure. Note that an intuitive rationalization for this finding can be proposed based on Adachi’s asymptotic equivalence result discussed in Section 3.1: it appears that search frictions on this particular online dating site are small enough to allow decentralized search to approximate stable matches obtained from a centralized procedure like Gale-Shapley. While we believe that a characterization of how large search frictions have to be to lead to significant departures from efficiency maybe interesting for the purpose of designing online dating sites19 , we leave this question for future research. 6 Predicting Sorting in Marriages As noted in the introduction, a large empirical literature in sociology, psychology, and economics documents strong assortative mating (homogamy) patterns in the demographic, physical, and socioeconomic characteristics of married couples. In the previous section, we saw that the Gale-Shapley algorithm, based on mate preference estimates, predicts the matching patterns among the users of our online dating site well. We now explore if the same approach can also predict some aspects of the sorting patterns observed in marriages. Comparing the predictions of the Gale-Shapley model with actual marriages is particularly interesting because Gale-Shapley, as the limit of the Adachi model, describes a market without search costs, while such search frictions may play an important role in the formation of marriages. An immediate objection to this approach is that we estimate mate preferences for “dating,” which may not be appropriate to predict patterns of sorting in a marriage market. Previous studies, however, have documented that matching along observable attributes is similar at various stages of a relationship. Laumann et al. (1994), using data from the National Health and Social Life Survey, show that the differences in sorting along age, education, and ethnicity between marriages, cohabitations, and short-term relationships are small.20 Here, following Blackwell and Lichter (2004), we document sorting patterns among dating, cohabiting, and married couples using data from the National Survey of Family Growth (NSFG), Cycle 6, which contains information on the relationship history of a sample of 7,643 women.21 Table 7 compares the correlations in age and education, and the inter-ethnicity matching patterns 19 Along with cognitive costs imposed by search interfaces and site layout, some online dating sites introduce explicit search costs by charging a browsing fee per profile (e.g. RightStuffDating.com). 20 See Table 6.4 in Laumann et al. (1994). While married couples are slightly more similar in their ethnic background, couples in a short-term relationship are slightly more similar along age and education. 21 The survey was conducted between 2002 and 2003, and the women sampled were 15-44 years old. We classify women as “dating” if they were never married, do not live with a partner, and, regarding their partner, indicate that they are “going with him or going steady,” “going out with him once in a while,” are “just friends,” or “had just met him.” We classify women as “cohabitating” if they were never married and live together with a partner of the opposite sex. 20 observed in the NSFG sample. The matching patterns among the three types of relationships largely agree; the age and education correlations among “dating” couples are even somewhat larger than the corresponding correlations in marriages.22 6.1 Sorting Patterns in Marriages Table 8, column (I), reports attribute correlations among married couples. To construct this table we utilized the 2000 Census IPUMS 5% sample for the two metropolitan areas covered by our online dating data. In order to be able to compare the results to the predicted matches based on users in our online dating sample, we only use information on men and women in the 18-55 age range. Married couples exhibit a strong degree of sorting in age (ρ = 0.85) and years of education (ρ = 0.60). There is less sorting along income (ρ = 0.13), which increases somewhat (ρ = 0.17) when we consider the correlation in the imputed income which accounts for the potential earnings of spouses who do not participate in the labor force.23 Regarding the empirical correlations in looks, height, and weight, we consulted several empirical studies in sociology and psychology, which report high degrees of correlation among these characteristics as well (height has a ρ between 0.31 and 0.63, weight between 0.08 and 0.32, and looks—measured in a similar manner as in our study—between 0.34 and 0.54). Note that the studies reporting correlations in physical attributes typically use much smaller and more selective samples than the Census, and may not reflect the correlations in the metropolitan areas we are considering. Assortative mating is observed along ethnic-racial lines as well. In a recent survey of intermarriage patterns, Kalmijn (1998) reports that in the U.S. “virtually all ethnic subgroups marry within their group more often than can be expected under random mating.” We confirm this finding of strong endogamy (marrying within a group) for the two metropolitan areas covered by our sample using the 2000 Census IPUMS 5% sample. Table 9, panel (I) illustrates the matching patterns in the IPUMS data. For white, black, Hispanic, and Asian men and women, the table displays the percent matching with a mate of another ethnicity. Table 10 illustrates the same matching patterns using the odds-ratios of matches. For example, the first column in the top panel shows that the ratio of the fraction of white men married to white women to the fraction of all white women in the population is 1.3. Random matching would predict this ratio to be 1; thus, we conclude that white men are more likely to marry white women compared to random matching. More drastically, we see that the same-race matching odds ratio for black men is 10.7, which means that black men are 10.7 times more 22 Note that the correlations from the NSFG (Table 7) cannot be directly compared with the sorting patterns obtained from the IPUMS data (Tables 8 and 9), as the two samples cover couples within different age ranges and geographies. 23 We impute income using a regression that relates the logarithm of a person’s income to years of education, years in the work force (and the square of this variable), occupation dummies, and ethnicity dummies. We ran the regressions separately for men and women. 21 likely to marry a black woman than they would under random matching. As can be seen from the diagonal entries, strong endogamy patterns are also present for Hispanic and Asian men and women. 6.2 Gale-Shapley Predictions To predict the equilibrium matches based on our preference estimates, we first re-weight our sample of dating site users to match the joint distribution of age, education, and ethnicity in the IPUMS data, separately for men and women in each market. We only consider users in the 18-55 age range, who constitute more than 95% of all dating site members. Then, as before in Section 5, we calculate the equilibrium matches among men and women using the deferred-acceptance algorithm, based on the estimated mate preferences. Table 8, column (II) displays the attribute correlations in the predicted marriage matches. Overall, as in Section 5, we predict sorting along many attributes. The predicted correlations of age, income, and height are similar to the correlations observed in marriages. The predicted correlations in looks, BMI, and in particular education, however, are smaller than the observed ones. Tables 9 and 10, panel II, report the predicted interracial sorting patterns. The results reveal strong endogamy patterns, although the observed marriages exhibit even stronger propensities for matches among ethnically homogeneous partners. An important issue that we face when using the Gale-Shapley model to simulate “offline” marriage patterns is how to treat the error terms in the fixed-effect logit model on which our preference estimates are based. The standard interpretation regards the error term as a utility component that is observed by the site users, but unobserved to us, the analysts. An alternative interpretation of the unobservable utility term is that it represents noise in the users’ behavior. I.e., the site users, while browsing through the listings and deciding who to contact by e-mail, make occasional mistakes in their selections—mistakes that they would not have made had they met the same potential partner offline. Under the latter interpretation, we should not incorporate the error terms in the utility functions on which the predictions of the Gale-Shapley simulations are based, because the errors that users make when contacting a potential mate will eventually be corrected before a final match is achieved. Table 8, column (III), displays the predicted attribute correlations when we disregard the logit error term. Most of these correlations are larger than the predicted correlations if the error term is included in the utility function. Moreover, some of these correlations are now quite close to the observed attribute correlations in marriages. In particular, the predicted age correlation is only marginally larger than the observed correlation in the IPUMS sample, while the predicted correlation in education is only marginally smaller. We over-predict the income and imputed income correlations, however. The height and BMI correlations are 22 slightly lower than the correlations suggested by the literature, while the correlation in looks is larger than the previously reported numbers. The differences in the correlations of physical characteristics should be viewed with some caution, however, as the samples in the previous literature are smaller and unlikely to be representative of the population we consider. Tables 9 and 10, panel (III), report the predicted interracial marriage patterns when we disregard the error term. These results are similar to the matching patterns observed among married couples and predict strong degrees of endogamy, although we slightly over-predict endogamy for blacks, Hispanics and white women, and under-predict endogamy for white men. Specifically, 100% of black men and 94.2% of black women are predicted to marry a black partner, whereas the respective numbers are 72.2% of black men and 87.5% of black women in the 2000 IPUMS data. Similarly, our simulations suggest that 100% of Hispanic men and 84.2% of Hispanic women are predicted to match with a Hispanic partner, compared with 81.4% of men and 75.8% of women in the Census data. 88.0% of white men and 98.0% of white women are predicted to marry a white partner, compared with 92.7% and 94.6% in the IPUMS data. Only for Asian men and women we systematically under-predict endogamy. Discussion There are two main results from the Gale-Shapley predictions using the “IPUMSmatched” sample of online dating site users. First, we demonstrated that the preference estimates of Section 4 can lead to sorting patterns that are quantitatively similar to sorting patterns in marriages. This suggests that preferences alone can explain the homogamy/endogamy patterns observed in marriages to a significant degree. That is, search frictions and/or segregation cannot be the only cause of these patterns—since the Gale-Shapley model predicts matching outcomes in a frictionless matching environment in which all members of the opposite sex are within one’s choice set. A caveat to our exercise and conclusions is that the mate preferences of the online dating sample can differ from a representative offline sample even after controlling for observables. Even though we cannot ultimately rule out this possibility without data on offline preferences, we can still offer the most conservative interpretation of our results, that strong homogamy patterns can result as an equilibrium matching outcome in the absence of searching frictions using the estimated preferences of a sample of online daters. Second, we find that the error term in our utility specification (which, we believe, is quite rich especially in that it includes numerous observable factors), plays an important role in the simulated matching patterns. Most of the correlations (except for height correlation) diminish in size when we include the error term as a structural, unobserved utility component in the simulations. The decline in correlations due to the inclusion of the error term is especially noticeable for age, education, income, looks, and ethnicity. Indeed, the results suggest that we can regard the sorting predictions from the simulations with and without errors to provide upper and lower bounds, respectively, on the correlations achievable by the preferences we 23 have estimated. Many of the “offline” correlations we observe fall within these upper bounds; those that do not, such as offline height, weight, income and education correlations, are quite close to one of the bounds. As we pointed out above, it is difficult to argue, based on available evidence, which interpretation of the error term is more appropriate. However, the two interpretations do differ with regards to their predictions as to how we might expect the nature of matches to evolve as online dating technology becomes more prevalent. If we maintain the assumption that people make mistakes or are less selective when choosing their mates online,24 we may expect strong sorting patterns along age, education, income, looks, and ethnicity to persist even if online dating becomes the norm, since people who use such sites to find a marriage partner may avoid making such errors and match in these dimensions similar to the way that “offline” couples do. However, if the error terms indeed reflect a utility component unobservable to the econometrician, we might expect widespread diffusion of online technology to lead to lower correlations in dimensions such as age, education, income, looks, and ethnicity. Indeed, many traditional settings where people commonly meet their mates (school, work, church, social clubs, bars/nightclubs) facilitate matching along such observable attributes. Being limited to finding one’s partner through these settings may preclude a wider search over potential partners who might compensate for their lack of observable qualities through other, more difficult to observe traits, (such as a particular taste for movies or music) some of which may be easier to convey on a web page than on an offline environment. For example, in a noisy bar, physical attractiveness is probably the main attribute along which matching takes place. An online dating profile, on the other hand, can contain much information regarding a person’s personality that would have been unobservable in a bar. If this is indeed the case, the diffusion of online dating technology may lead to less sorting along easily observable attributes (such as age, race, education, income and looks), and more sorting along attributes that are more difficult to measure. Of course, it could be the case that the users of the dating site have a stronger preference for hard-to-observe attributes than the general population, which is precisely the reason why they try to find a partner online. A further examination of our prediction is left for future research on the impact of the online dating technology on marriage markets. 6.3 How Do Different Preference Components Contribute to Sorting? Our approach, which is based on an explicit model of preferences and a theoretical model to predict match outcomes, enables us to answer questions regarding how different aspects 24 We mean “less selective” in terms of not paying attention to details, not in terms of having a lower threshold of staying single. 24 of men’s and women’s mate preferences affect the equilibrium matching patterns. We now present the results from two such exercises, in which we will turn on or shut off different preference components, and use the Gale-Shapley algorithm to evaluate the contribution of these components to equilibrium matching patterns. Vertical vs. Horizontal Preferences As we already indicated in the introduction, sorting can be driven both by preferences over the absolute value of a partner’s trait, or by preferences for a partner with similar traits. For example, consider a market where people are only distinguished by their beauty. Suppose everyone prefers a more beautiful mate to a less beautiful one. In a stable match, the most beautiful woman marries the best looking man, the second most beautiful woman marries the second most beautiful man, and so forth. Thus, there will be perfect correlation along beauty in this matching. Alternatively, assume that everyone tries to find a partner who is exactly as beautiful as themselves. Perfect correlation along beauty will arise under this preference structure as well. We will call the first example an instance of “vertical” preference, and the second one with “horizontal” preferences. The specification of our utility function in Section 4 accounts for both. In order to assess the importance of vertical and horizontal preference components, we set the “distance terms” in the utility specification (4) equal to zero, and then simulate the stable matches under this alternative preference specification. Table 8, column (IV) shows that all attribute correlations become small in size. These results indicate that according to our preference estimates, sorting is largely driven by preference heterogeneity. Matching in a “Color-Blind” World How do ethnicity preferences affect marriage patterns? As shown before, both actual marriages and the predicted matches from the GaleShapley model exhibit strong intra-ethnicity sorting patterns. In Section 4, we also showed that the dating site users show strong same-race preferences. However, it is not obvious whether same-race preferences alone cause sorting within ethnicity groups: because other attributes, such as income and education, are correlated with race, preferences over these other attributes could lead to intra-ethnicity sorting as well. In the counter-factual exercise below we will assume that men and women do not have preferences for a mate of a specific ethnicity. We also assume that the other preference components, such as preferences for looks, income, and education, stay the same. Tables 9 and 10, panel (IV), show that racial sorting almost completely disappears when we remove the race preferences from the utility specification. The matching frequencies are very close to what we would expected from random matching, except for Hispanic men and women who continue to display some degree of endogamy. Table 8, column (V), shows that the correlations in non-race attributes remain almost identical to those obtained in Column (IV), when race preferences were part of the simulation. 25 These results suggest that racial sorting is almost entirely due to the same-race preferences; preferences for non-race attributes can not quantitatively account for the small fraction of marriages across different ethnic groups. 7 Conclusions Based on revealed mate preferences, we demonstrated that the classic Gale-Shapley algorithm can rationalize the observed online matching patterns as the outcome of a frictionless and efficient matching market. Furthermore, the observed and predicted matches show that men and women sort along attributes such as age, education, and ethnicity. These sorting patterns resemble the sorting patterns in actual marriages. We extrapolated the efficient market benchmark provided by the Gale-Shapley model to assess how much of the sorting patterns in marriages can be attributed to preferences alone. We first re-weighted our sample of online daters to match the population at large along observable attributes. Based on our preference estimates, we then predicted matches (“marriages”) for the men and women in our sample using the Gale-Shapley algorithm. Quantitatively, the sorting patterns predicted by this exercise are not distant from actual sorting patterns. In particular, our results suggest that observed intra-ethnicity sorting patterns can arise as a result of preferences and the market mechanism alone, without resort to search frictions and/or segregation. On the other hand, our results also suggest that preferences operating in an market without search frictions may not be enough to generate some of the strong correlations observed offline: the very high degree of sorting along education in marriages, in particular, may indeed be the result of institutions by which men and women are mostly exposed to a potential partner with a similar education level. Based on the results obtained in this study, we conclude that online dating data provide valuable insights towards understanding the economic mechanisms underlying match formation and the formation of marriages. Many important issues are left for future research. For example, an obvious drawback of our analysis is that we cannot observe whether an online meeting finally resulted in a marriage, which is one outcome that we are interested in. This issue is addressed by the recent working paper of Lee (2008), who utilizes a data set that tracks participants across a sequence of dates, some of which result in marriages. Lee (2008) builds on the identification framework introduced in this paper, and augments our revealed preference specification to account for learning about unobservable match components. Another methodological drawback of our analysis concerns the issue of strategic behavior. A more structural estimation approach, such as in Choo and Siow (2006), Wong (2003), and Flinn and del Boca (2007), could potentially address this caveat to our estimation approach. 26 References [1] Abdülkadiroglu, A. and T. Sönmez (2003): “School Choice: A Mechanism Design Approach” American Economic Review, 93(3), 729-747. [2] Adachi, H. (2003): “A search model of two-sided matching under non-transferable utility,” Journal of Economic Theory, vol. 113, 182-198. [3] Belot, M. and Francesconi, M. (2006): "Can Anyone Be ’The’ One? Evidence on Mate Selection from Speed Dating," working paper. [4] Biddle J. and D. Hamermesh (1998): “Beauty, Productivity, and Discrimination: Lawyers’ Looks and Luchre,” Journal of Labor Economics, vol. 16 (1), 172-201. [5] Blackwell, D. L. and D. T. Lichter (2004): “Homogamy Among Dating, Cohabiting, and Married Couples,” Sociological Quarterly, 45 (4), 719-737. [6] Bozon, M. and F. Heran (1989): “Finding a Spouse: A Survey of How French Couples Meet,” Population, 44, 91-121. [7] Browning, M., P.-A. Chiappori, and Y. Weiss (2007): The Economics of the Family. Manuscript: http://www.tau.ac.il/~weiss/fam_econ/ [8] Buss, D. (2003): The Evolution of Desire: Strategies of Human Mating. Basic Books. [9] Chamberlain, G. (1980): “Analysis of Covariance with Qualitative Data,” Review of Economic Studies, 47, 225-238. [10] Choo, E. and A. Siow (2006): “Who Marries Whom and Why,” Journal of Political Economy, 114 (1), 175-201. [11] Etcoff, N. (1999): Survival Of The Prettiest. Doubleday. [12] Fisman, R., S. S. Iyengar, E. Kamenica, and I. Simonson (2006): “Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment,” Quarterly Journal of Economics, 121 (2), 673-697. [13] Fisman, R., S. S. Iyengar, E. Kamenica, and I. Simonson (2008): “Racial Preferences in Dating,” Review of Economic Studies, 75 (1), 117-132. [14] Flinn, C. and D. del Boca (2007): “Household Time Allocation and Models of Behavior: A Theory of Sorts,” working paper. [15] Frechette, G., U, Unver, A. Roth (2007): “Unraveling Yields Inefficient Matchings: Evidence from Post-Season College Football Bowls,” forthcoming, RAND Journal of Economics. 27 [16] Gale, D. and L. S. Shapley (1962): “College Admissions and the Stability of Marriage,” American Mathematical Monthly, 69, 9-14. [17] Hamermesh, D., J. Biddle (1994): “Beauty and the Labor Market,” American Economic Review, 84 (5), 1174-1194. [18] Hinsz, V. B. (1989), “Facial Resemblance in Engaged and Married Couples,” Journal of Social and Personal Relationships, 6, 223-229. [19] Hitsch, G. J., A. Hortaçsu, and D. Ariely (2008): “What Makes You Click? — Mate Preferences in Online Dating,” manuscript, University of Chicago. [20] Kalmijn, M. (1998): “Intermarriage and Homogamy: Causes, Patterns, Trends” Annual Review of Sociology, 24, 395-421. [21] Lee, S. (2007): “Preferences and Choice Constraints in Marital Sorting: Evidence From Korea,” manuscript, Stanford University. [22] Langlois, J. H., Kalakanis, L., Rubenstein, A. J., Larson, A., Hallam, M., and Smoot, M. (2000): “Maxims or Myths of Beauty? A Meta-Analytic and Theoretical Review,” Psychological Bulletin, 126, 390-423. [23] Laumann, E. O., J. H. Gagnon, R. T. Michael, and S. Michaels (1994): The Social Organization of Sexuality. University of Chicago Press. [20] Niederle, M. and A. E. Roth (2003), “Unraveling reduces mobility in a labor market: Gastroenterology with and without a centralized match”, Journal of Political Economy, 111 (6), 1342-1352. [24] Roth, A. (1984): “The Evolution of the Labor Market For Medical Interns and Residents: A Case Study in Game Theory,” Journal of Political Economy, 92, 991-1016. [25] Roth, A. and M. Sotomayor (1990): Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Cambridge: Cambridge University Press. [26] Roth, A., T. Sönmez and U. Ünver (2004): “Kidney Exchange,” Quarterly Journal of Economics, 119, 457-488. [21] Spuhler, J.N. (1968), “Assortative Mating with Respect to Physical Characteristics,” Social Biology, 15 (2), 128-140. [22] Stevens, G. and D. Owens and E. C. Schaefer (1990): “Education and Attractiveness in Marriage Choices,” Social Psychology Quarterly, 53 (1), 62-70. [23] Wong, L. Y. (2003): “Structural Estimation of Marriage Models,” Journal of Labor Economics, 21 (3), 699-727. 28 Appendix: Data Description Of all 22,000 users of the online dating site for which we have data, 10,721 users were located in the Boston area, and 11,024 users were located in San Diego. We observe the users’ activities over a period of three and a half months in 2003. Motivation for using the dating service The registration survey asks users why they are joining the site. It is important to know the users’ motivation when we estimate mate preferences, because we need to be clear whether these preferences are with regard to a relationship that might end in a marriage, or whether the users only seek a partner for casual sex. The majority of all users are “hoping to start a long term relationship” (36% of men and 39% of women), or are “just looking/curious” (26% of men and 27% of women). Perhaps not surprisingly, an explicitly stated goal of finding a partner for casual sex (“Seeking an occasional lover/casual relationship”) is more common among men (14%) than among women (4%). More important than the number is the share of activities accounted for by users who joined the dating service for various reasons. Users who seek a long-term relationship account for more than half of all observed activities. For example, men who are looking for a long-term relationship account for 55% of all e-mails sent by men; among women looking for a long-term relationship the percentage is 52%. The corresponding numbers for e-mails sent by users who are “just looking/curious” is 22% for men and 21% for women. Only a small percentage of activities is accounted for by members seeking a casual relationship (3.6% for men and 2.8% for women). We conclude that at least half of all observed activities is accounted for by people who have a stated preference for a long-term relationship and thus possibly for an eventual marriage. Moreover, it is likely that many of the users who state that they are “just looking/curious” chose this answer because it sounds less committal than “hoping to start a long-term relationship.” Under this assumption, about 75% of the observed activities are by users who joined the site to find a long-term partner.25 Demographic and socioeconomic attributes To understand the representativeness of our sample compared to the general population, we contrast the reported characteristics of the site users with samples of the population from the CPS Community Survey Profile (Table 1). In particular, we contrast the site users with two sub-samples of the CPS. The first sub-sample is a representative sample of the Boston and San Diego MSA’s (Metropolitan Statistical Areas), and reflects information current to 2003. The second CPS sub-sample 25 The registration also asks users about their sexual preferences. Our analysis focuses on the preferences and match formation among men and women in heterosexual relationships; therefore, we retain only the heterosexual users in our sample. 29 conditions on being an Internet user, as reported in the CPS Computer and Internet Use Supplement, which was administered in 2001. A visible difference between the dating site and the population at large is the overrepresentation of men among the site users (men represent 54.7% of all users in Boston and 56.1% San Diego). Another visible difference is in the age profiles: site users are more concentrated in the 26-35 year range than both CPS samples (the median user on the site is in the 26-35 age range, whereas the median person in both CPS samples is in the 36-45 age range). People above 56 years are underrepresented on the site compared to the general CPS sample; however, when we condition on Internet use, this difference in older users becomes smaller. While the majority of the general population is married, most users of the dating service are single (they have never been married or are divorced, separated, or widowed). About 7% of men and 2% of women state that they are married, but these users account for less than 1% of all activities (e-mails sent). This suggests that a small number of people in a long term relationship may be using the site as a “search outlet.” Of course, the true percentage of otherwise committed people may be higher than reported. The education profile of the site users shows that they are on average more educated than the general CPS population. However, the education profile is more similar to that of the Internet using population, with only a slightly higher percentage of graduate and professional degree holders. The income profile reflects a pattern that is similar to the education profile. Site users have generally higher incomes than the overall CPS population, but not compared to the Internet-using population. The profile of ethnicities represented among the site users roughly reflects the profile in the corresponding geographic areas, especially when conditioning on Internet use, although Hispanics and Asians are somewhat underrepresented on the San Diego site and whites are overrepresented.26 These comparisons show that the online dating site attracts users who are typically single, somewhat younger, more educated, and have a higher income than the general population. Once we condition on household Internet use, however, the remaining differences are not large. Hence, along demographic and socioeconomic attributes, our sample appears to be fairly representative of the population at large. Reported physical characteristics of the users Our data set contains detailed (although self-reported) information regarding the physical attributes of the users. 27.5% post one or more photos online. For the rest of the users, the survey is the primary source of 26 We should note that we had difficulty in reconciling the “other” category in the site’s ethnic classification with the CPS classification and that some of the discrepancy may be driven by this. 30 information about their appearance. The survey asks the users to rate their looks on a subjective scale. 19% of men and 24% of women possess “very good looks,” while 49% of men and 48% of women have “above average looks.” Only a minority—29% of men and 26% of women—declare that they are “looking like anyone else walking down the street.” That leaves less than 1% of users with “less than average looks,” and a few members who avoid the question and joke that a date should “bring your bag in case mine tears.” Posting a photo online is a choice, and hence one might suspect that those users who post a photo are on average better looking. On the other hand, those users who do not post a photo might misrepresent their looks and give an inflated assessment of themselves. The data suggest that the former effect is more important. Among those users who have a photo online, the fraction of “above average” or “very good looking” members is about 7% larger compared to all site users. The registration survey contains information on the users’ height and weight. We expect that these are among the main attributes that the users might not truthfully report. To investigate such possible mis-representation, we compared the reported characteristics with information on the whole U.S. population, obtained from the National Health and Examination Survey Anthropometric Tables (the data are from the 1988-1994 survey and only cover Caucasians). Table 2 reports this comparison. Among women, we find that the average stated weight is less than the average weight in the U.S. population. The discrepancy is about 6 lbs among 20-29 year old women, 18 lbs among 30-39 year old women, and 20 lbs among 40-49 year old women. On the other hand, the reported weights of men are slightly higher than the national averages. Thus, the data only provide evidence for small levels of mis-representation. Table 2 also shows that he stated height of both men and women is somewhat above the U.S. average. This difference is more pronounced among men, although the numbers are small in size. For example, among the 20-29 year old, the difference is 1.3 inches for men and 1 inch for women.27 Measured Physical Characteristics of the Users 26% of men (3174 users) and 29% of women (2811 users) post one or more photos online. To construct an attractiveness rating for these available photos, we recruited 100 subjects from the University of Chicago GSB Decision Research Lab mailing list. The subjects were University of Chicago undergraduate and graduate students in the 18-25 age group, with an equal fraction of male and female recruits. Each subject was paid $10 to rate, on a scale of 1 to 10, 400 male faces and 400 female faces displayed on a computer screen. Each picture was used approximately 12 times across subjects. We randomized the ordering of the pictures across subjects to minimize bias due to 27 The weight and height differences translate into body mass indices (BMI) that are 2 to 4 points less than national averages among women, and about 1 point less than national averages among men. 31 boredom or fatigue. Consistent with findings in a large literature in cognitive psychology, attractiveness ratings by independent observers appear to be positively correlated (for surveys of this literature, see Langlois et al. 2000, Etcoff 1999, and Buss 2003). Cronbach’s alpha across 12 ratings per photo was calculated to be 0.80 and satisfies the reliability criterion (0.80) utilized in several studies that employed similar rating schemes.28 To eliminate rater-specific mean and variance differences in rating choices, we followed Biddle and Hamermesh (1998) and standardized each photo rating by subtracting the mean rating given by the subject, and dividing by the standard deviation of the subject’s ratings. We then averaged these standardized rating across the subjects to obtain a single rating for each photo. 28 Biddle and Hamermesh (1998) report a Cronbach alpha of 0.75. 32 Table 1 – Dating Service Members and County Profile of General Demographic Characteristics Variable Dating Service San Diego General Population Internet User Dating Service Boston General Population Internet User General Information No. of Members/Population 11,024 2,026,020 1,180,020 10,721 2,555,874 1,581,711 Percentage of Men 56.1 49.9 49.4 54.7 49.0 50.6 Age Composition 18 to 20 years 21 to 25 years 26 to 35 years 36 to 45 years 46 to 55 years 56 to 60 years 61 to 65 years 66 to 75 years Over 76 20.3 30.7 27.0 10.0 6.6 4.4 0.8 0.1 0.2 6.0 9.5 21.3 23.0 18.5 6.3 2.9 6.9 5.7 6.4 11.5 18.8 28.6 19.0 6.5 3.6 4.8 0.8 19.0 33.1 27.2 10.1 6.2 3.6 0.5 0.1 0.2 5.8 9.3 17.2 23.1 17.6 7.3 4.3 8.8 6.8 7.2 12.0 19.7 26.8 20.1 6.9 3.7 2.9 0.7 Race Composition (1) Whites Blacks Hispanics Asian Other 72.4 4.3 11.0 5.3 7.1 61.9 4.8 19.5 13.0 0.9 71.3 4.2 9.8 13.6 1.1 82.2 4.8 4.1 3.9 5.0 84.2 7.4 4.4 3.8 0.3 89.1 4.2 2.3 4.2 0.2 65.6 6.3 4.0 1.8 22.3 31.8 57 1.2 2.3 8.1 28.5 62.0 0.7 1.5 7.4 67.2 7.2 4.8 1.4 19.4 35.3 54.1 1.1 3.6 6.0 36.8 56.7 0.3 1.0 5.2 62.2 2.6 3.7 3.5 28.1 20.2 57 3.9 6.3 12.3 23.9 62.5 1.9 2.0 9.7 65.9 2.0 4.3 3.0 24.7 28.0 49.0 2.4 13.4 7.2 32.7 55.9 0.9 3.5 7.0 1.4 9.4 12.1 23.0 3.0 17.8 1.6 10.2 9.2 30.1 3.2 20.4 31.9 5.2 5.4 23.6 7.3 7.6 6.8 27.9 28.5 4.6 14.1 15.0 Marital Status Men Never married Married & not separated Separated Widowed Divorced Women Never married Married & not separated Separated Widowed Divorced Educational Attainment Have not finished high school High school graduate Technical training (2-year degree) Some college 33 Bachelor's degree Master's degree Doctoral degree Professional degree 28.6 11.2 3.5 7.3 22.7 6.0 1.5 1.7 31.5 9.0 2.6 2.3 33.9 16.4 3.8 5.8 22.2 11.7 3.3 2.0 29.7 16.3 5.2 2.6 Income (2) Individuals with Income information 6,549 283,442 224,339 6,349 396,065 281,619 7.9 5.1 8.7 14.1 20.4 20.0 10.5 6.7 2.7 3.9 12.5 3.0 13.8 23.3 12.4 17.3 7.2 7.5 3.2 0.0 12.4 1.9 10.1 22.3 10.6 20.2 9.1 9.5 4.0 0.0 8.6 3.9 6.1 12.5 21.9 22.8 12.0 7.1 2.0 3.0 7.6 5.0 21.4 19.9 16.5 21.7 4.8 1.9 1.1 0.0 4.6 6.0 16.2 21.4 18.5 24.6 4.5 2.7 1.6 0.0 Less than $12,000 $12,000 to $15,000 $15,001 to $25,000 $25,001 to $35,000 $35,001 to $50,000 $50,001 to $75,000 $75,001 to $100,000 $100,001 to $150,000 $150,001 to $200,000 $200,001 or more Source. Estimates from CPS Internet and Computer use Supplement, September 2001. All the CPS estimates are weighted. We only consider individuals who are at least 18 years old. The percentages for the column "Internet user" are from the CPS sample, restricted to individuals who declare to use the Internet. Notes. The geographic information regards two MSA’s. The Boston PMSA includes a New Hampshire portion. San Diego geographic information corresponds to the San Diego MSA. The site member information is from 2003. (1) The figures for whites, blacks, Asians and “other” ethnicities for the CPS data correspond to those with nonHispanic ethnicity. (2) The income figures from the CPS data were adjusted to 2003 dollars. 34 Table 2 – Physical Characteristics of Dating Service Members vs. General Population Variable Men Dating General Service Population Women Dating General Service Population Weight (lbs) 20-29 years 30-39 years 40-49 years 50-59 years 60-69 years 70-79 years 175.3 184.6 187.9 187.0 188.5 185.9 172.1 182.5 187.3 189.2 182.8 173.6 136.3 136.9 138.4 140.8 147.2 144.1 141.7 154.2 157.4 163.7 155.9 148.2 Height (inches) 20-29 years 30-39 years 40-49 years 50-59 years 60-69 years 70-79 years 70.6 70.7 70.7 70.6 70.3 69.0 69.3 69.5 69.4 69.2 68.5 67.7 65.1 65.1 65.1 64.7 64.6 63.7 64.1 64.3 64.1 63.7 63.1 62.2 BMI** 20-29 years 30-39 years 40-49 years 50-59 years 60-69 years 70-79 years 24.7 25.9 26.4 26.3 26.7 27.7 25.2 26.5 27.3 27.8 27.3 26.7 22.6 22.7 23.0 23.6 24.8 25.0 24.3 26.3 27.0 28.4 27.6 26.9 *General population statistics obtained from the National Health and Nutrition Examination Survey, 1988-1994 Anthropometric Reference Data Tables. ** BMI (body mass index) is calculated as weight (in kilograms) divided by height (in meters) squared. 35 Table 3 – Binary Logit Estimates Age Age Difference (+) Age Difference (-) Single, Mate: Divorced a Divorced, Mate: Divorced “Long Term”, Mate: “Long Term” Has Children, Mate: Has Children No Children, Mate: No Children Has Photo Looks Rating “Very Good” Looks “Above Average” Looks “Other Looks” Height Height Difference (+) Height Difference (-) BMI BMI2 BMI Difference (+) BMI Difference (-) Education (Years) Education Difference (+) Education Difference (-) Income ($ 1,000) Income (>50) b Income (>100) b Income (>200) b Income Difference (+) Income Difference (-) Income “Only Accountant Knows” Income “What, Me Work?” White, Mate: Black White, Mate: Hispanic White, Mate: Asian White, Mate: Other Black, Mate: White Black, Mate: Hispanic Black, Mate: Asian Black, Mate: Other Hispanic, Mate: White Hispanic, Mate: Black Hispanic, Mate: Asian Hispanic, Mate: Other Asian, Mate: White Asian, Mate: Black Asian, Mate: Hispanic Asian, Mate: Other Same Religion Men Estimate -0.0597 -0.0007 -0.0050 -0.0466 0.0954 0.0180 0.1872 -0.2650 -0.0638 0.5602 0.5755 0.2769 0.1715 -0.1419 -0.0019 -0.0099 -0.3965 0.0043 0.0034 -0.0101 -0.0028 -0.0039 -0.0026 0.0052 -0.0026 -0.0047 -0.0017 0.0000 0.0000 0.3309 0.2833 -0.8305 -0.2813 -0.4944 -0.0510 -0.2383 -0.2297 -0.6817 0.0052 -0.3678 -0.3810 -0.3176 -0.3656 -0.5154 SE 0.0023 0.0002 0.0001 0.0231 0.0275 0.0178 0.0271 0.0224 0.0341 0.0144 0.0397 0.0363 0.2045 0.0066 0.0036 0.0005 0.0280 0.0006 0.0008 0.0005 0.0056 0.0010 0.0008 0.0012 0.0019 0.0021 0.0034 0.0000 0.0000 0.0453 0.0542 0.0862 0.0368 0.0437 0.0226 0.3706 0.4212 0.4611 0.3894 0.1445 0.3548 0.2548 0.1714 0.3096 -0.0709 -0.0703 0.1804 0.4206 0.3607 0.0218 36 Women Estimate -0.0093 -0.0016 -0.0063 -0.0748 0.1712 0.2399 0.2041 -0.3642 0.1297 0.5855 0.5502 0.1719 0.0587 0.1841 -0.0096 -0.0229 0.1321 -0.0007 -0.0103 0.0022 0.0466 -0.0085 -0.0022 0.0169 -0.0067 -0.0081 0.0073 0.0000 0.0000 1.1103 0.7260 -0.7340 -0.5688 -1.5841 -0.0716 -1.5767 -1.5868 SE 0.0034 0.0002 0.0004 0.0316 0.0305 0.0258 0.0298 0.0333 0.0457 0.0211 0.0554 0.0494 0.2073 0.0093 0.0006 0.0093 0.0497 0.0010 0.0008 0.0009 0.0076 0.0012 0.0013 0.0029 0.0035 0.0016 0.0018 0.0000 0.0000 0.1284 0.1439 0.1194 0.0898 0.2408 0.0300 0.3851 0.8775 -1.2764 -0.6646 -0.8447 0.4387 0.2314 0.5075 -0.6044 -0.0260 -0.7650 -0.4890 -0.2559 0.2965 0.2640 0.4619 0.9053 0.5987 0.4870 0.0264 Log-likelihood -72,078 -49,029 Observations Individuals 242,478 3,004 196,363 2,783 a b The user who makes the first-contact choice is single, and the potential mate is divorced Income (> x) is the amount of income (in $1,000) above the income level x Note: The dependent variable is the 0/1 choice to contact another previously “browsed” user. The model includes fixed effects for each suitor. The estimation is based on a subsample of users who state they are single, divorced, or describe themselves as “hopeful.” Furthermore, we eliminate users who indicate that they joined the site to start a “casual” relationship and only keep users who want to “start a long-term relationship,” are “just looking/curious,” “like to make new friends,” and users who state that “a friend put me up on this.” In the full sample we observe more choices for men than for women. In order to reduce the computation time, we took a random sample of the men’s choices (i.e., we kept all men, but randomly discarded some of their observed choices). 37 Table 4 – User Behavior Summary Statistics Men 3,004 Users First contact behavior Profiles browsed First contact e-mails (% of browses) Women 2,783 385,470 49,223 12.7 172,946 14,178 8.2 Matching First contacts that lead to match (% of first contacts) 2,130 4.3 914 6.4 E-mails exchanged until match is achieved Mean Median SD 11.6 6 22.8 12.6 6 26.3 Note: The summary statistics apply to the sample of users employed in the estimation and matching sections of this paper. 38 Table 5 – Attribute Correlations in Observed and Predicted Online Matches Observed Online Matches Simulated Gale-Shapley Matches From Estimated Preferences Men-Optimal Women-Optimal (II a) (II b) (I) Age 0.70 [3034] Height 0.15 [3034] BMI 0.13 [3034] Looks Rating 0.31 [1729] Income 0.15 [670] Education 0.12 [3034] 0.70 (0.01) [1785] 0.70 (0.01) [1785] 0.25 (0.02) [1785] 0.25 (0.02) [1785] 0.19 (0.03) [1785] 0.19 (0.03) [1785] 0.16 (0.05) [481] 0.16 (0.05) [481] 0.11 (0.05) [427] 0.11 (0.05) [427] 0.15 (0.02) [1785] 0.15 (0.02) [1785] Note: The table displays Pearson correlation coefficients between mate attributes. Column (I) reports the matching patterns observed in the “keyword” matches in our online dating sample. Columns (II a) and (II b) report attribute correlations in the predicted GaleShapley matches, based on the logit preference estimates. To simulate the matches, we first drew the random utility terms and constructed the preference orderings for each user. We then ran the Gale-Shapley algorithm. We repeated this process 50 times to account for randomness in the preference orderings, and report the average and standard deviation of the attribute correlations across these 50 repetitions. The figures in square brackets indicate the median number of matches on which the correlations are based. For some users we do not have looks or income information; therefore, the matches involving these users are not included in the reported attribute correlation numbers. 39 Table 6 – Rank Differences of “Keyword” and Simulated Match Outcomes Mean Men Median SD Mean Women Median SD Rank Differences Gale-Shapley vs. Actual Match First Choice vs. Actual Match First Choice vs. Gale-Shapley 75.2 114.4 39.2 37 76 25 121.6 122.1 45.8 66.2 96.9 30.7 35 65 19 100.0 100.6 36.1 Rank Diff. as % of Max Rank. Gale-Shapley vs. Actual Match First Choice vs. Actual Match First Choice vs. Gale-Shapley 5.4 8.2 2.8 2.7 5.5 1.8 8.8 8.8 3.3 4.4 6.5 2.1 2.3 4.3 1.3 6.6 6.7 2.5 Observed Keyword Matches Simulated Data: Keyword Matches Generated Using Gale-Shapley Algorithm Rank Differences Gale-Shapley vs. Actual Match First Choice vs. Actual Match First Choice vs. Gale-Shapley 120.8 168.1 47.3 70 114 29 163.6 164.1 61.2 114.5 147.6 33.1 69 101 20 148.0 148.1 40.7 Rank Diff. as % of Max Rank. Gale-Shapley vs. Actual Match First Choice vs. Actual Match First Choice vs. Gale-Shapley 8.7 12.1 3.4 5.0 8.2 2.1 11.8 11.8 4.4 7.6 9.8 2.2 4.6 6.8 1.4 9.8 9.8 2.8 Note. The table shows summary statistics on the preference rank difference across various actual and potential match outcomes. All numbers are with respect to the site users who achieved at least one “keyword match.” If a user achieved more than one match in the data, we record the highest rank within the set of matched mates and the option of staying single. “First choice” denotes the outcome where a user is matched to his or her top-ranked partner. 40 Table 7 – Sorting Patterns Among Dating, Cohabiting, and Married Couples (a) Attribute Correlations Dating Couples Cohabiting Couples Married Couples 0.81 0.56 0.76 0.57 0.78 0.54 Age Education (b) Sorting Along Race/Ethnicity (I) Dating Couples Matched with: White Black White Black Hispanic Asian Other 83.3 6.2 7.1 3.0 0.4 2.9 94.8 2.3 0.0 0.0 % of Women Hispanic Asian 21.9 10.2 67.1 0.0 0.8 11.7 0.0 0.0 88.3 0.0 Population 51.5 34.2 11.3 1.5 1.5 (II) Cohabiting Couples Matched with: White Black White Black Hispanic Asian Other 80.1 6.1 11.9 0.4 1.6 7.1 88.9 2.8 0.0 1.2 % of Women Hispanic Asian 8.9 5.8 83.4 1.1 0.9 13.4 7.3 26.5 52.8 0.0 Population 56.5 16.3 23.8 1.7 1.8 (III) Married Couples Matched with: White Black White Black Hispanic Asian Other 92.4 1.0 4.7 1.0 0.8 4.2 88.5 3.9 1.1 2.4 % of Women Hispanic Asian 17.0 2.5 79.4 0.3 0.8 14.6 0.7 2.7 80.8 1.2 Population 72.2 8.0 14.6 3.6 1.5 Note: The tables are based on data from the National Survey of Family Growth (NSFG), Cycle 6, which contains information on the relationship history of a sample of 7,643 women. The survey was conducted between 2002 and 2003, and the women sampled were 15-44 years old. We classify women as “dating” if they were never married, do not live with a partner, and, regarding their partner, indicate that they are “going with him or going steady,” “going out with him once in a while,” are “just friends,” or “had just met him.” We classify women as “cohabitating” if they were never married and live together with a partner of the opposite sex. 41 Table 8 – Observed and Predicted Attribute Correlations in Marriages Marriages (I) Age 0.85 Height 0.31 - 0.63 Weight (I), BMI (II-IV) 0.08 - 0.32 Looks Rating 0.34 - 0.54 Income 0.11 Imputed Income 0.17 Education 0.60 Gale-Shapley Predictions With Random Random Utility Utility Terms = 0 (II) (III) 0.73 (0.01) [3,318] 0.28 (0.02) [3,318] 0.02 (0.02) [3,318] 0.14 (0.04) [593] 0.14 (0.03) [1,040] 0.37 (0.01) [3,318] 0.19 (0.02) [3,318] 0.89 [258] 0.20 [258] 0.06 [258] 0.72 [21] 0.42 [90] 0.55 (0.01) [258] 0.55 [258] Difference Terms = 0 (IV) Ethnicity Terms = 0 (V) 0.09 (0.02) [3,966] 0.01 (0.01) [719] -0.01 (0.02) [3,966] 0.02 (0.04) [3,966] 0.03 (0.03) [1,180] 0.09 (0.01) [3,966] 0.06 (0.02) [3,966] 0.73 (0.01) [3,479] 0.27 (0.01) [3,479] 0.03 (0.02) [3,479] 0.14 (0.04) [614] 0.12 (0.03) [1,091] 0.32 (0.01) [3,479] 0.17 (0.02) [3,479] Note: The table displays Pearson correlation coefficients between mate attributes. The age, income, and education numbers in column (I) are from the 2000 IPUMS 5% sample, restricted to Boston and San Diego, the metropolitan areas for which we also have online dating data. We restricted the sample to men and women in the 18-55 age range, and transformed the age and education variables to conform with the online dating data. The height and weight entries in column (I) report the range of results obtained by anthropometric studies (no. obs. = 46-984) surveyed by Spuhler (1968). The entries for looks correlations in from Hinsz (1989) and Stevens et al. (1990) who construct the attractiveness of engaged and married couples whose engagement, marriage, and 25th anniversary announcements were published in local newspapers. Columns (II)-(IV) report attribute correlations in the predicted Gale-Shapley matches, based on the logit preference estimates. To obtain these matches, we re-weighted our sample to match the joint distribution of age, education, and ethnicity (separately for men and women in each market) observed in the IPUMS sample. To simulate the matches, we first drew the random utility terms and constructed the preference orderings for each user. We then ran the Gale-Shapley algorithm. We repeated this process 50 times to account for randomness in the preference orderings, and report the average and standard deviation of attribute correlations across these 50 repetitions. The figures in square brackets indicate the median number of matches on which the correlations are based. For some users we do not have looks or income information; therefore, the matches involving these users are not included in the reported attribute correlation numbers. Column (II) reports attribute correlations that are obtained in the men-optimal stable matches; the results for the women-optimal matches are almost identical. In column (IIII), we set the unobserved utility terms to zero. In (IV), we set the difference or distance terms in the utility specification, which account for preference heterogeneity, to zero. 42 Table 9 – Observed and Predicted Sorting Patterns Along Race/Ethnicity (I) IPUMS Matched with: White Black White Black Hispanic Asian Other 92.7 0.4 3.9 2.5 0.4 16.2 72.2 6.2 4.0 1.4 % of Men Hispanic 16.0 1.0 81.4 1.3 0.4 Asian 10.4 0.4 2.3 86.5 0.4 Population 69.9 6.7 14.7 6.6 1.7 White Black 94.6 1.0 2.9 1.0 0.5 7.9 87.5 3.3 0.7 0.7 White Black 88.3 3.5 6.1 1.3 0.8 40.0 43.0 11.2 4.4 1.3 White Black 98.0 0.0 0.0 0.0 2.0 0.0 94.2 0.0 5.8 0.0 White Black 77.3 6.4 10.9 4.7 0.8 75.8 7.1 11.7 4.5 0.9 % of Women Hispanic Asian 20.6 2.1 75.8 1.1 0.5 23.6 2.3 2.1 71.6 0.5 Population 70.3 6.7 14.7 6.6 1.7 (II) Gale-Shapley, Random Utility Terms Included Matched with: White Black White Black Hispanic Asian Other 78.4 3.5 10.8 6.3 0.9 36.1 44.7 14.5 3.5 1.2 % of Men Hispanic Asian 30.1 5.5 56.3 7.4 0.6 24.9 8.5 21.1 42.9 2.6 Population 70.8 6.0 15.5 7.0 0.8 % of Women Hispanic Asian 46.4 5.3 43.6 4.2 0.4 63.0 3.0 13.3 19.8 0.9 Population 71.8 6.1 15.3 6.0 0.7 (III) Gale-Shapley, Random Utility Terms = 0 Matched with: White White Black Hispanic Asian Other 88.0 0.0 1.8 7.8 2.4 Black 0.0 100.0 0.0 0.0 0.0 % of Men Hispanic Asian 0.0 0.0 100.0 0.0 0.0 0.0 66.7 0.0 33.3 0.0 Population 70.8 6.0 15.5 7.0 0.8 % of Women Hispanic Asian 15.8 0.0 84.2 0.0 0.0 81.3 0.0 0.0 12.5 6.3 Population 71.8 6.1 15.3 6.0 0.7 (IV) Gale-Shapley, Race Terms = 0 Matched with: White Black White Black Hispanic Asian Other 68.3 8.1 15.5 7.2 1.0 64.1 8.6 18.5 7.8 1.0 % of Men Hispanic Asian 51.3 6.6 31.8 9.5 0.7 57.7 6.7 23.9 11.0 0.8 Population 70.8 6.0 15.5 7.0 0.8 % of Women Hispanic Asian 62.3 6.5 23.9 6.8 0.5 68.3 6.5 17.0 7.5 0.8 Population 71.8 6.1 15.3 6.0 0.7 Note: Panel (I) reports the inter-marriage ethnicity patterns observed in the 2001 IPUMS 5% sample, restricted to Boston and San Diego. We also restricted the sample to men and women in the 18-55 age range. The columns in the left half of the table display the percent of men in each ethnic group that are matched with women of the ethnicities given in the rows. Panel (II) reports the intermarriage patterns that are generated using the (male-optimal) Gale-Shapley procedure, based on the fixed effects logit preference estimates. The simulation results are averaged across 50 draws of the unobservable utility term. Panel (III) reports the marriage patterns if the random utility term is discarded from the preference specification, and panel (IV) the race preferences are set to zero. 43 Table 10 – Observed and Predicted Sorting Patterns Along Race/Ethnicity: Odds Ratios (I) IPUMS Matched with: White Black White Black Hispanic Asian Other 1.3 0.1 0.3 0.4 0.2 0.2 10.7 0.4 0.6 0.8 Odds Ratio Hispanic Asian 0.2 0.1 5.6 0.2 0.2 0.1 0.1 0.2 13.1 0.3 Population 69.9 6.7 14.7 6.6 1.7 White Black 1.3 0.2 0.2 0.1 0.3 0.1 13.0 0.2 0.1 0.4 White Black 1.2 0.6 0.4 0.2 1.1 0.6 7.0 0.7 0.7 1.8 White Black 1.4 0.0 0.0 0.0 2.7 0.0 15.4 0.0 1.0 0.0 White Black 1.1 1.0 0.7 0.8 1.1 1.1 1.2 0.8 0.8 1.2 % of Women Hispanic Asian 0.3 0.3 5.2 0.2 0.2 0.3 0.3 0.1 10.9 0.3 Population 70.3 6.7 14.7 6.6 1.7 (II) Gale-Shapley, Random Utility Terms Included Matching with: White Black Hispanic Asian Other White Black 1.1 0.6 0.7 0.9 1.1 0.5 7.5 0.9 0.5 1.4 % of Men Hispanic Asian 0.4 0.9 3.6 1.1 0.8 0.4 1.4 1.4 6.2 3.2 Population 70.8 6.0 15.5 7.0 0.8 % of Women Hispanic Asian 0.6 0.9 2.9 0.7 0.6 0.9 0.5 0.9 3.3 1.2 Population 71.8 6.1 15.3 6.0 0.7 (III) Gale-Shapley, Random Utility Terms = 0 Matching with: White Black Hispanic Asian Other White Black 1.2 0.0 0.1 1.1 2.9 0.0 16.7 0.0 0.0 0.0 % of Men Hispanic Asian 0.0 0.0 6.5 0.0 0.0 0.0 11.1 0.0 4.8 0.0 Population 70.8 6.0 15.5 7.0 0.8 % of Women Hispanic Asian 0.2 0.0 5.5 0.0 0.0 1.1 0.0 0.0 2.1 8.4 Population 71.8 6.1 15.3 6.0 0.7 (IV) Gale-Shapley, Race Terms = 0 Matching with: White Black Hispanic Asian Other White Black 1.0 1.3 1.0 1.0 1.2 0.9 1.4 1.2 1.1 1.3 % of Men Hispanic Asian 0.7 1.1 2.1 1.4 0.9 0.8 1.1 1.5 1.6 0.9 Population 70.8 6.0 15.5 7.0 0.8 % of Women Hispanic Asian 0.9 1.1 1.6 1.1 0.6 68.3 6.5 17.0 7.5 0.8 Population 71.8 6.1 15.3 6.0 0.7 Note: Panel (I) reports the inter-marriage ethnicity patterns observed in the 2001 IPUMS 5% sample, restricted to Boston and San Diego. We also restricted the sample to men and women in the 18-55 age range. The columns in the left half of the table display the percent of men in each ethnic group that are matched with women of the ethnicities given in the rows. Panel (II) reports the intermarriage patterns that are generated using the (male-optimal) Gale-Shapley procedure, based on the fixed effects logit preference estimates. The simulation results are averaged across 50 draws of the unobservable utility term. Panel (III) reports the marriage patterns if the random utility term is discarded from the preference specification, and panel (IV) the race preferences are set to zero. 44 Men’s First−Contact Response to Women’s Photo Rating Probability of Sending E−Mail .25 Men’s Looks Rating .2 1−20th Percentile 21−40th Percentile 41−60th Percentile 61−80th Percentile 81−100th Percentile .15 .1 .05 91−100 81−90 71−80 61−70 51−60 41−50 31−40 21−30 11−20 1−10 0 Women’s Looks Rating: Percentiles Women’s First−Contact Response to Men’s Photo Rating .175 Women’s Looks Rating Probability of Sending E−Mail .15 1−20th Percentile 21−40th Percentile 41−60th Percentile 61−80th Percentile 81−100th Percentile .125 .1 .075 .05 .025 91−100 81−90 71−80 61−70 51−60 41−50 31−40 21−30 11−20 1−10 0 Men’s Looks Rating: Percentiles Figure 1: Evidence For/Against Strategic Behavior Note: The graphs disply the results of OLS regressions where the dependent variable is an indicator variable for whether a user sends a first-contact e-mail after browsing the profile of a potential mate. The independent variables are indicators for the photo rating of the user being browsed. The regressions control for browser fixed effects. The vertical axis plots the estimated mean probability of sending a first-contact e-mail to a browsed profile, while the horizontal axis indicates the photo rating of the browsed profile. The regressions were estimated separately for different groups of suitors. The first group comprises users who fall within percentiles 1-20 of the photo ratings distribution within their gender, etc. The estimates shown are from a sample of users in the 30-39 age range. 45
© Copyright 2026 Paperzz