Matching and Sorting in Online Dating

Matching and Sorting in Online Dating∗
Günter J. Hitsch
Ali Hortaçsu
Dan Ariely
University of Chicago
University of Chicago
Duke University
Graduate School of Business
Department of Economics
Fuqua School of Business
February 20, 2008
First Version: February 2006
Abstract
This paper studies the economics of match formation using a novel data set obtained
from a major online dating service. Using detailed information on the users’ attributes
and interactions, we estimate a model of mate preferences. We use the Gale-Shapley
algorithm to compute the (man- and woman-optimal) stable matches generated by the
estimated preferences, and find that they are similar to the matches observed online.
We then explore whether the estimated mate preferences, in conjunction with the GaleShapley algorithm, can explain assortative mating patterns in “offline” marriages. We
conclude that the estimated mate preferences can generate assortative mating patterns
similar to those observed in marriages even in the absence of search frictions.
∗
We thank Babur De los Santos, Chris Olivola, and Tim Miller for their excellent research assistance. We
are grateful to Derek Neal, Emir Kamenica, Peter Rossi, and Betsey Stevenson for comments and suggestions.
Seminar participants at the 2006 AEA meetings, Boston College, the Choice Symposium in Estes Park, the
Conference on Marriage and Matching at New York University 2006, ELSE Laboratory Experiments and the
Field (LEaF) Conference, Northwestern University, the 2007 SESP Preconference in Chicago, SITE 2007, the
University of Pennsylvania, the 2004 QME Conference, UC Berkeley, UCLA, the University of Chicago, UCL,
the University of Naples Federico II, the University of Toronto, Stanford GSB, and Yale University provided
valuable comments. This research was supported by the Kilts Center of Marketing (Hitsch), a John M. Olin
Junior Faculty Fellowship, and the National Science Foundation, SES-0449625 (Hortaçsu). Please address all
correspondence to Hitsch ([email protected]), Hortaçsu ([email protected]), or Ariely
([email protected]).
1
1
Introduction
This paper studies the economics of match formation using a novel data set obtained from
a major online dating service. Using detailed information on the users’ attributes and interactions we first estimate a rich model of mate preferences. Based on these preference
estimates, we then examine whether an economic matching model can explain the observed
online matching patterns, and we evaluate the efficiency of the matches obtained on the web
site. Finally, we explore whether the estimated preferences and a matching model are helpful
in understanding sorting patterns observed “offline,” in marriages.
Two distinct literatures motivate this study. The first is the market design literature, which
focuses on designing and evaluating the performance of novel market institutions. A significant
branch of this literature is devoted to analyzing important real-world matching markets (e.g.
Roth and Sotomayor 1990; Niederle and Roth 2003; Abdülkadiroglu and Sönmez 2003; Roth,
Sönmez, and Ünver 2004 for matching of medical interns and gastroenterologists to hospitals,
children to schools, kidney transfers from donors to recipients). The goal is to understand
the allocation mechanism used in particular markets, and to assess whether an alternative
mechanism with better theoretical properties (typically in terms of efficiency and/or stability)
can improve on the allocations obtained by the current mechanism.
The online dating site we analyze is similar to previously analyzed matching markets in
that it is used to allocate indivisible “goods” without an explicit price/transfer mechanism.
However, the decentralized nature of communication and “contracting” among the online
daters makes it difficult to characterize the properties of a realistic model of match formation
in this market. Thus, unlike e.g. Abdülkadiroglu and Sönmez 2003, we can not argue for
the (in)efficiency and/or stability of the existing mechanism based on theoretical arguments.
Moreover, unlike Roth 1984, Niederle and Roth 2003, and Frechette,Ünver, Roth 2007, we do
not have empirical evidence on unravelling or related phenomena indicating the (in)efficiency
of the current market design.
We thus pursue a different strategy to assess the efficiency properties of the matching
mechanism used by the website. We show, in Section 4, that although match formation in
this market is difficult to model explicitly, the choices made by the website users as to who
to communicate with can inform us about their preferences under a weak set of assumptions.
Once we estimate the preferences of the users, we can construct the benchmark stable matchings corresponding to these preference profiles using the Gale-Shapley algorithm. We can then
compare the observed matching outcomes obtained by the actual market mechanism to those
that would have been attained under the Gale-Shapley benchmark. We do this in Section 5.
The second literature that motivates this paper is the extensive body of work across
economics, sociology, social psychology and anthropology that studies matching and sorting
patterns in marriage markets. This literature has documented that marriages are not random,
2
but exhibit strong sorting patterns along many attributes (e.g. Browning, Chiappori, and
Weiss 2007, Kalmijn 1998). For example, marriage partners are similar in age, education
levels, and physical traits such as looks, height, and weight. Such sorting patterns can arise
due to several distinct reasons (Kalmijn 1998). First, sorting can arise due to search frictions.
For example, sorting along educational attainment might not reflect a preference for a partner
with a certain education level, but rather the fact that many people spend much of their time
in the company of others with a similar level of education, for example in school, college,
or at work.1 Alternatively, sorting can arise in the absence of any search frictions as an
equilibrium consequence of preferences and the market mechanism. For example, men and
women may prefer a match with a similar partner, in which case sorting is due to “horizontal”
mate preferences. On the other hand, preferences might be purely “vertical,” in the sense that
each mate ranks all potential partners in the same way. In the equilibrium of a frictionless
market, the ranks of the matched men and women will then be perfectly correlated. If the
ranks are monotonically related to the mate’s attributes, there will also be sorting along these
attributes.
Online dating provides us with a unique environment to study the causes of sorting. First,
search frictions in online dating markets are minimal—indeed, a main reason for the existence
of online dating sites is to make the search for a partner as easy as possible. Hence, if sorting
arises in online dating markets, it must be due to mate preferences and the market mechanism,
but not due to search frictions. Second, online dating allows us to observe user attributes and
mate choices in great detail. Moreover, the information to us, the researchers, is similar to
the information available to the users of the online dating site. In particular, we observe the
choice sets faced by the users and the choices they make from these choice sets. Thus, online
dating data are particularly useful to estimate mate preferences over various mate attributes.
Marriage market data, in contrast, usually lack information on crucial mate attributes such as
physical traits, do not include information on the agents’ choice sets, and record final matches
(marriages) only.
The second objective of this paper is to explore whether we can predict the sorting patterns
observed in marriages using the mate preference estimates obtained from our data and an
economic matching model. An obvious caveat to this objective is that mate preferences and
hence sorting in a dating market may differ from the mate preferences and resulting sorting
patterns in a marriage market. However, this objection is not supported by previous studies
that document similar matching patterns along observable attributes at various stages of
a relationship: Laumann et al. (1994) demonstrate similar degrees of sorting along age,
education, and ethnicity/race across married couples, cohabiting couples, and couples in a
short-term relationship. We confirm these facts using data from the National Survey of
1
Laumann et al. (1994) document that 60% of all married couples in their sample met in school, at work,
at a private party, in church, or at a gym/social club (Table 6.1).
3
Family Growth. Nonetheless, in order to make statement about marriage patterns based on
the preference estimates obtained from our data, we need to make the important assumption
that conditional on observable attributes, the users of our dating site do not differ in their
mate preferences from the population at large.
The steps we take towards achieving our research objectives are as follows. We first specify
mate preferences as a function of own and partner attributes. To estimate these preferences,
we exploit the detailed information on the site users’ attributes and partner search behavior
contained in our data. The estimation strategy is based on the assumption that a user contacts
a partner if and only if the potential utility from a match with that partner exceeds a threshold
value (a “minimum standard”) for a mate. Such a rule is consistent with equilibrium behavior
in a search model (Adachi 2003, for example).
Based on the preference estimates, we predict who matches with who using the GaleShapley (1962) algorithm.2 Obviously, the centralized market mechanism of the Gale-Shapley
model does not provide a literal description of the decentralized structure of the online dating
site we study. However, we find that the Gale-Shapley match and actual matches realized
on the online dating site are similar in many dimensions. In particular, by comparing the
rank of a match (within each user’s ranking of all potential mates) across the actual matches
and the matches generated by the Gale-Shapley algorithm, we find that, on average, the site
users would not be able to achieve a more preferred match using Gale-Shapley. Because the
Gale-Shapley algorithm achieves an efficient match, we thus conclude that the online dating
market achieves an approximately efficient outcome.
The users of the dating site sort along various attributes, such as age, looks, income,
education, and ethnicity. While qualitatively similar to sorting patterns in marriage data,
the attribute correlations observed online cannot be directly compared to the correlations in
marriages due to differences between the online and “offline” populations along demographic
characteristics. We thus re-weight our sample of website users to match the “offline” population
along key demographic attributes. We then predict equilibrium matches for this “synthetic”
sample of online-daters, using the Gale-Shapley algorithm as before.
We find that the Gale-Shapley algorithm, in conjunction with our estimates of mate
preferences, can predict the sorting patterns in marriages: the attribute correlations in the
observed matches are similar to the correlations observed in actual marriages and dating
relationships. A main conclusion, then, is that sorting is not entirely due to search frictions,
but rather that “real-world” sorting patterns arise as a consequence of mate preferences and
the stability concept alone. This is not to say that search frictions are not present in marriage
markets. For example, we under-predict the correlation in education, possibly the main
2
We do not observe offline activities such as eventual marriages in our data. Instead, we define a match
as an event where two users exchange a phone number or e-mail address, or an e-mail that contains certain
keywords such as “get together” or “let’s meet.”
4
variable for which we expect that search frictions are a cause of sorting. However, our results
suggest that preferences also play a very important role.
In the final part of our analysis, we attempt to uncover the importance of different preference components in driving observed sorting patterns. As we discussed above, the “horizontal”
and the “vertical” components of preferences can, in principle, lead to the same matching outcomes. Our framework enables us to analyze the relative importance of these two different
components by constructing counter-factual preference profiles that omit one of the components and re-computing the equilibrium matches. The result of the exercise indicates that
“horizontal” preference components are essential in generating the sorting patterns observed
in the data. A similar exercise suggests that same-race preferences are essential in generating
the lack of intermarriage across racial boundaries.
2
Market and Data Description
In this section, we first provide a description of online dating that clarifies how the data were
collected. We then provide a brief data description; a detailed description can be found in
the Appendix and in Hitsch et al. (2008).
2.1
Overview of Online Dating
Online dating is quickly becoming a common means to find a date or marriage partner.
According to comScore World Metrix (2006), 17% of all North American and 18% of all
European Internet users visited an online personals site in July 2006. In the U.S., 37% of
all single Internet users looking for a partner have gone to a dating website (Pew Internet &
American Life Project 2006).
When they first join the dating service, the users answer questions from a mandatory
survey and create “profiles” of themselves.3 A profile is a web page that provides information
about a user and can be viewed by any other members of the dating site. The users indicate
various demographic, socioeconomic, and physical characteristics, such as their age, gender,
education level, height, weight, eye and hair color, and income. The users also indicate
why they joined the service, for example to find a partner for a long-term relationship, or,
alternatively, a partner for a “casual” relationship. In addition, the users provide information
that relates to their personality, life style, or views. For example, the site members indicate
what they expect on a first date, whether they have children, their religion, whether they
attend church frequently, and their political views. The users can also answer essay questions
that provide additional details on their attitudes and personalities, but this information is
3
Neither the names nor any contact information was provided to us in order to protect the privacy of the
users.
5
too unstructured to be usable for our analysis. Many users also include one or more photos
in their profile. We will explain in more detail later how we used these photos to construct a
measure of the users’ physical attractiveness.
After registering, the users can browse, search, and interact with the other members of the
dating service. Typically, users start their search by indicating an age range and geographic
location for their partners in a database query form. The query returns a list of “short profiles”
indicating the user name, age, a brief description, and, if available, a thumbnail version of the
photo of a potential mate. By clicking on one of the short profiles, the searcher can view the
full user profile, which contains socioeconomic and demographic information, a larger version
of the profile photo (and possibly additional photos), and answers to several essay questions.
Upon reviewing this detailed profile, the searcher then decides whether to send an e-mail (a
“first contact”) to the user. Our data contain a detailed, second by second account of all these
user activities.4 We know if and when a user browses another user, views his or her photo(s),
sends an e-mail to another user, answers a received e-mail, etc. We also have additional
information that indicates whether an e-mail contains a phone number, e-mail address, or
keyword or phrase such as “let’s meet,” based on an automated search for special words and
characters in the exchanged e-mails.5
In order to initiate a contact by e-mail, a user has to become a paying member of the
dating service. Once the subscription fee is paid, there is no limit on the number of e-mails a
user can send. All users can reply to an e-mail that they receive, regardless of whether they
are paying members or not.
2.2
Data Description
The analysis in this paper is based on as sample of 3,004 men and 2,783 women located in
Boston and San Diego. The sample was chosen from 22,000 users of an online dating service;
see Section 4.3 for a description of how the sample was selected. We observe all user activities
over a period of three and a half months in 2003.
The registration survey asks users why they are joining the site. It is important to know
the users’ motivation when we estimate mate preferences, because we need to be clear whether
these preferences are with regard to a relationship that might end in a marriage or whether
the users only seek a partner for casual sex. More than half of all observed activities (e-mails
sent) are due to users who have a stated preference for a long term relationship. Furthermore,
more than 20% of all activities are due to users who state that they are “just looking/curious,”
an answer that is likely given as it sounds less committal than “hoping to start a long-term
relationship.” Activities by users who state that they seek a casual relationship account for
4
5
We obtained this information in the form of a “computer log file.”
We do not see the full content of the e-mail, or the e-mail address or phone number that was exchanged.
6
less than 4% of all activities.
To understand how representative our sample is compared to the general population, we
contrast the reported characteristics of the site users with two samples of the population from
the CPS: one sample is representative of the general population in Boston and San Diego,
while the other sample only contains Internet users. Men are somewhat over-represented
among the dating service users, although the difference is fairly small in the sample used in
our estimation and matching analysis (51.9% men versus 48.1% women). As expected, the
site users are younger than the population at large: the median user is in the 26-35 age range,
whereas the median person in both CPS samples is in the 36-45 age range. The majority
of the dating service users are single; married users account for less than 1% of all activities
(e-mails sent). Site users are more educated and have a higher income than the general
population; once we condition on Internet use, however, the remaining differences are not
large. The profile of ethnicities represented among the site users roughly reflects the profile in
the corresponding geographic areas, especially when conditioning on Internet use, although
Hispanics and Asians are somewhat underrepresented on the San Diego site and whites are
overrepresented. Hence, along demographic and socioeconomic attributes, our sample appears
to be fairly representative of the population at large.
Our data set contains detailed (although self-reported) information regarding the physical
attributes of the users. 27.5% post one or more photos online. For the rest of the users, the
primary source of information about their appearance is the survey, which asks users to rate
their looks on a subjective scale. Users also report their height and weight. As we expected
that these are among the main attributes not truthfully reported, we compare the reported
characteristics with information on the whole U.S. population, obtained from the National
Health and Examination Survey Anthropometric Tables.6 We find that the average weight
reported by women is lower (between 6 and 20 lbs) than the average weight in the population.
Men report slightly higher weights than the national averages. Also, the stated height of both
men and women is somewhat above the U.S. average. Thus, overall the data only provide
evidence for small levels of mis-representation.
We constructed an attractiveness rating for the photos posted by the site users. This
measure is based on the photo ratings (on a scale from 1 to 10) of 100 students at the University
of Chicago. Following Biddle and Hamermesh (1998), we standardized each photo rating by
subtracting the mean rating given by the subject and dividing by the standard deviation of
the subject’s ratings. We then averaged these standardized rating across subjects to obtain
a single rating for each photo. Our data reveal that beauty is not entirely in the eye of the
beholder: the attractiveness ratings are strongly correlated across observers.
6
The data are from the 1988-1994 survey and only cover Caucasians.
7
3
A Modeling Framework for Analyzing User Behavior
Our data is in the form of user activity records that describe, for each user, which profiles
were browsed, and to which profiles an e-mail was sent to. In order to interpret the data using
a revealed preference framework, we make the following assumption:
Assumption Suppose a user browses the profiles of two potential mates, w and w0 , and
sends an introductory e-mail to w but not to w0 . Then the user prefers a potential match with
w over a potential match with w0 .
We will thus interpret user actions as binary choices over potential mates. Let UM (m, w)
be the expected utility that male user, m, gets from a potential match with woman w, and
let vM (m) be the utility m gets from his outside option of not responding to the ad. If m
browses w’s profile, he chooses to send an e-mail if and only if
UM (m, w) ≥ vM (m).
3.1
(1)
The Adachi Model
The threshold-crossing rule above arises naturally in a search model. In particular, we consider
a search model due to Adachi (2003), which we believe provides a useful stylized description
of user behavior on the dating site.
Adachi’s model is set in discrete time, with period discount factor ρ. In each period, there
are M men and W women in the market. In each period, man m comes across a randomly
sampled profile, w. The sampling is done according to the distribution FW (the corresponding sampling distribution for women is FM ). We assume that the sampling distribution is
stationary, and assigns positive probability of meeting each person on the opposite side of the
market. A standard assumption (as in Morgan 1995, Burdett and Coles 1996, and Adachi
2003) that guarantees stationarity is that men and women who leave the market upon a match
are immediately replaced by agents who are identical to them.
Let vM (m) and vW (w) be the reservation utilities of man m and woman w from staying
single and continuing the search for a partner. Define the following indicator functions:
AW (m, w) = I {woman w accepts man m} = I {UW (m, w) ≥ vW (w)} ,
AM (m, w) = I {man m accepts woman w} = I {UM (m, w) ≥ vM (m)} .
8
We can then characterize the utility that man m gets upon meeting a woman w:
EUM (m, w) = UM (m, w)AM (m, w)AW (m, w)
+ vM (m) (1 − AW (m, w)) AM (m, w) + vM (m) (1 − AM (m, w)) .
The first term in this expression is the utility from a mutual match, and the second and third
terms capture the continuation utility from a mismatch.
The Bellman equations characterizing the optimal reservation values and search rules of
man m and woman w are given by:
Z
vM (m) = ρ
EUM (m, w)dFW (w),
(2)
Z
vW (w) = ρ
EUW (m, w)dFM (m).
Adachi (2003) shows that the above system of equations defines a monotone iterative map∗ (m), v ∗ (w)) solving the system,
ping that converges to a profile of reservation utilities (vM
W
and thus characterizing the stationary equilibrium in this market.7 The equilibrium reserva∗ (m), v ∗ (w)) can be thought of as person-specific “prices” that clear demand
tion utilities (vM
W
for and supply of that person.
3.2
The Gale-Shapley Model
Under some conditions, the predictions of who matches with whom from the Adachi model
are identical to the predictions of the seminal Gale-Shapley (1962) matching model. Before
explaining this result in detail, we briefly review the Gale-Shapley model.
The matching market is populated by the same set of men and women as in Adachi’s
model, m ∈ M = {1, . . . , M }, w ∈ W = {M + 1, . . . , M + W }. The preference orderings are
generated by UM (m, w) and UW (w, m).8
Let µ(m) denote the match of man m that results from a matching procedure, and let
µ(w) be the match of woman w. Note that if µ(m) ∈
/ W, then µ(m) = m, and if µ(w) ∈
/ M,
then µ(w) = w. I.e., agents may remain single.
The matching µ is defined to be stable (in the Gale-Shapley sense) if there is no man m
and woman w such that UM (m, w) > UM (m, µ(m)) and UW (w, m) > UW (w, µ(w)). That is,
in a stable matching it is not possible to find a pair (m, w) who are willing to abandon their
partners and match with each other.
The set of stable matches in the Gale-Shapley model is not unique. However, the set
7
The solution is not unique, but has a lattice structure in strong analogy to the Gale-Shapley model. See
the next section for further details.
8
We impose the restriction that the preferences are strict.
9
of stable matches has two extreme points: the “men-optimal” and “women-optimal” stable
matches. The men-optimal stable match is unanimously preferred by men and opposed by all
women over all other stable matches, and vice versa (Roth and Sotomayor 1990). Either of
these two extreme points can be reached through the use of Gale-Shapley’s deferred-acceptance
algorithm.9
3.3
Equivalence Between Decentralized Search Outcomes and Gale-Shapley
Stable Matches
A remarkable result obtained by Adachi (2003) is that, as search costs become negligible,
i.e. ρ → 1, the set of equilibrium matches obtainable in the search model outlined above is
identical to the set of stable matches in a corresponding Gale-Shapley marriage model.
Adachi’s insight derives from an alternative characterization of Gale-Shapley stable matchings. In particular, let vM (m) = UM (m, µ(m)), and let vW (w) = UW (w, µ(w)) be the utility
that m and w get from their match partners. Adachi shows that, in a stable match, vM (m)
and vW (w) satisfy the following equations:
vM (m) = max {UM (m, w)|UW (w, m) ≥ vW (w)},
W∪{m}
(3)
vW (w) = max {UW (w, m)|UM (m, w) ≥ vM (m)}.
M∪{w}
Furthermore, as ρ → 1, the system of Bellman equations (2) becomes equivalent to the system
of equations in (3). That is, as agents become more and more patient, or, equivalently, as
search costs decline to zero, the search process will lead to matching outcomes that are stable
in the Gale-Shapley sense. This is intuitive, as the equations (3) imply that in a stable match,
man m is matched with the best woman who is willing to match with him, and vice versa.
Generally, Adachi’s model has more than one equilibrium. Analogous to the result on
men- and women-optimal matches in the Gale-Shapley model, Adachi shows that the set of
solutions of the system of equations (2) has a lattice structure and possesses extreme points.
At the men-optimal extreme, men are pickier (i.e., they have higher reservation utilities) and
9
The algorithm that arrives at the men-optimal match works as follows. Men make offers (proposals) to
the women, and the women accept or decline these offers. The algorithm proceeds over several rounds. In the
first round, each man makes an offer to his most preferred woman. The women then collect offers from the
men, rank the men who made proposals to them, and keep the highest ranked men engaged. The offers from
the other men are rejected. In the second round, those men who are not currently engaged make offers to the
women who are next highest on their list. Again, women consider all men who made them proposals, including
the currently engaged man, and keep the highest ranked man among these. In each subsequent round, those
men who are not engaged make an offer to the highest ranked woman who they have not previously made an
offer to, and women engage the highest ranked man among all currently available partners. The algorithm
ends after a finite number of rounds. At this stage, men and women either have a partner or remain single.
The women-optimal match is obtained using the same algorithm, where women make offers and men accept
or decline these proposals.
10
women are less picky than in any other solution.
3.4
Discussion
Of course, actual behavior in the online dating market that we study is not exactly described
by the models of Adachi or Gale and Shapley. However, both models capture some basic
mechanisms that apply to the workings of the dating market that we study. The Adachi
model captures the search process for a partner, and the plausible notion that people have
an understanding of their own dating market value, which influences their threshold or “minimum standard” for a partner. The Gale-Shapley model, especially their deferred-acceptance
algorithm, captures the notion that stability can be attained through a protocol of repeated
rounds of offer-making and corresponding rejections, which reflects the process of the e-mail
exchanges between the site users. Moreover, if search frictions on the online dating site are low
enough, the difference in matching outcomes as predicted by the two modeling frameworks
will be small, as suggested by Adachi’s equivalence result. However, whether this is the case
is an empirical question, which we will investigate further in Section 6.
3.5
Costly Communication and Strategic Behavior
If sending e-mails is costly, the threshold rule (1) may not hold. As an example, let us assume
that there is a single dimensional index of attractiveness in the market, and consider the
decision by an unattractive man as to whether he should send an introductory e-mail to a
very attractive woman. If composing the e-mail is costly, or the psychological cost of being
rejected is high, the man may not send an e-mail, thinking that the woman is “beyond his
reach,” even though he would ideally like to match with her. Thus, the estimated preferences
based on the threshold crossing rule reflect not only the users’ true preferences, but also their
expectations on who is likely to match with them in equilibrium.
Such strategic behavior can bias the preference estimates, and thus we will first investigate
whether strategic behavior is an important concern in our data (Section 4.2) before we estimate
mate preferences. A priori, however, we do not anticipate that strategic behavior is important
in the context of online dating. Unlike a conventional marriage market, where the cost of
approaching a potential partner is often non-trivial, online dating is designed to minimize
this cost. The main cost associated with sending an e-mail is the cost of composing it.
However, the marginal cost of producing yet another witty e-mail is likely to be small since
one can easily personalize a polished form letter, or simply use a “copy and paste” approach.
The fear of rejection should be mitigated by the anonymity provided by the dating site.
Furthermore, rejection is common in online dating: in our data, 71% of men’s and 56% of
women’s first-contact e-mails do not receive a reply.
11
4
Mate Preference Estimation
Our estimation approach is based on a sequence of binary decisions, as in the Adachi (2003)
search and matching model of Section 3.1. For each user, we observe the potential mates that
he or she browses, and we observe whether a first-contact e-mail was sent. In equilibrium, man
m contacts woman w if and only if UM (m, w) > vM (m), and woman w contacts man m if and
only if UW (m, w) > vW (w). Under this decision rule, mate preferences can be estimated using
discrete choice methods. We first discuss the details of the discrete choice approach. We then
examine whether there is any preliminary evidence pointing towards “strategic behavior,” as
discussed in Section 3.5. Finally, we present the estimation results.
4.1
Discrete Choice Model
We assume that mate preferences depend on observed own and partner attributes, and on
an idiosyncratic preference shock: UM (m, w) = UM (Xm , Xw ; θM ) + mw . We split the attribute vector and the parameter vector into separate components: Xw = (xw , dw ) , θM =
−
+
, ϑM . The latent utility of man m from a match with woman w is parametrized
, γM
βM , γM
as
UM (Xm , Xw ; θM ) = x0w βM + |xw − xm |0+
+
N
X
α
+
+ |xw − xm |0−
γM
I {dmk = 1 and dwl = 1} · ϑkl
M + mw .
α
−
γM
(4)
k,l=1
The first component of utility is a simple linear valuation of the woman’s attributes. The
other components relate the man’s preferences to his own characteristics. |xw − xm |+ is
the difference between the woman’s and man’s attributes if this difference is positive, and
|xw − xm |− denotes the absolute value of this difference is the difference is negative.10 For
example, consider the difference in age between man m and woman w. If the coefficient
+
−
corresponding to the age difference in γM
and γM
is negative, it means that users prefer
someone of their own age. Note that each component of the difference terms is taken to the
power α. The fourth component in (4) relates preferences to categorical attributes of both
mates. dmk and dwl are dummy variables indicating that man m and woman w possess a
certain trait. For example, if dmk = 1 and dwl = 1 indicate that m is white and that w is
Hispanic, then the parameter ϑkl
M expresses the relative preference of white men for Hispanic
women.
We assume that mw has the standard logistic distribution, and is i.i.d. across all pairs of
men and women. The reservation values vM (m) and vW (w) are estimated as person-specific
10
Formally, |a − b|+ = max(a − b, 0) and |a − b|− = max(b − a, 0).
12
fixed effects, denoted by cm and cw . Choice probabilities then take the standard logit form:
Pr {m contacts w|m browses w} =
exp (UM (Xm , Xw ; θM ) − cm )
.
1 + exp (UM (Xm , Xw ; θM ) − cm )
Under these assumptions, the preference parameters can be recovered using a fixed effects
logit estimator.
We estimate the model with quadratic differences (α = 2). Alternatively, if the attribute
differences enter the utility function in linear form (α = 1), the fixed effects logit model is not
identified.11 In Hitsch et al. (2008) we check the sensitivity of our results with respect to this
functional form assumption. There, we estimate a random effects probit model and compare
the probit results with α = 1 to the fixed effects logit results with α = 2. Qualitatively, the
results are very similar. Most importantly, we find that the model predictions of Sections 5
and 6 are not sensitive to the exact parametric specification of how attribute differences enter
the utility function.
4.2
Preliminary Evidence on “Strategic Behavior”
As we discussed in Section 3.5, strategic behavior may bias our preference estimates. We
now examine whether there is any preliminary evidence pointing towards such behavior in
the data. We focus on decisions based on physical attractiveness, as we expect that strategic
behavior would be most prevalent with regard to looks. In particular, we investigate how a
user’s propensity to send an e-mail is related to the attractiveness of a potential mate, and
whether this propensity is different across attractive versus unattractive searchers.
We first construct a choice set for each user that contains all potential mates that this
user browses. We then construct a binary variable to indicate the choice of sending an e-mail.
Our basic regression specification is a linear probability model of the form
e-mail ij =
10
X
βd · I{attractiveness j = d} + ui + εij .
(5)
d=1
Here, e-mail ij = 1 if browser i sends an e-mail to mate j, attractiveness j = d denotes that
mate j is in the dth decile of attractiveness ratings, and I is an indicator function. ui is a
11
To see this, note that xw − |xw − xm |+ + |xw − xm |− = xm . Suppose the estimated fixed effect for man
m is cm . Let ek = (0, . . . , 1, . . . 0) be a vector with 1 as the kth component. Choose some arbitrary number a.
Then the parameter vectors
β̂M
=
βM + (a/xmk ) ek
+
γ̂M
=
+
γM
− (a/xmk ) ek
−
γ̂M
=
−
γM
+ (a/xmk ) ek
`
´
+
−
and the fixed effect ĉm = cm + a will yield same utility function as the one parametrized by βM , γM
, γM
, cm .
13
person-specific fixed effect, i.e. the optimal search threshold for sending an e-mail to mate
j.12
The physical attractiveness measure is used as a proxy for the overall attractiveness of a
profile. We run the regression (5) separately for users in different groups of physical attractiveness. I.e., we segment the suitors, indexed by i, according to their physical attractiveness,
and allow for the possibility that users in different groups respond differently to the attractiveness of the mates that they browse. Figure 1 shows the relationship between a browsed
user’s photo rating and the estimated probability that the browser will send a first-contact
e-mail. We see that regardless of the physical attractiveness of the browser, the probability
of sending a first-contact e-mail in response to a profile is monotonically increasing in the
attractiveness of the photo in that profile. Thus, even if unattractive men (or women) take
the cost of rejection and composing an e-mail into account, this perceived cost is not large
enough such that the net expected benefit of hearing back from a very attractive mate would
be less than the net expected benefit of hearing back from a less attractive mate.
These results provide support for our assumption regarding the absence of significant costs
of e-mailing attractive users and (consequently) strategic behavior. Note that this evidence is
not ultimately conclusive, in that multiple attributes enter into the perceived attractiveness of
a given profile, while we focus only on a single dimension, physical attractiveness (the estimation results presented below confirm that physical attractiveness is one of the most important
preference components). Nonetheless, the empirical evidence of this section suggests that our
assumption is at least approximately correct, and we leave a more detailed examination of
strategic behavior for future research.
4.3
Sample and Attribute Selection
Because we are not interested in the preferences and matching patterns of users who joined
the site to find a partner for “casual” sex, we only keep users who state that they are single,
divorced, or describe themselves as “hopeful” in the estimation sample. Furthermore, we
eliminate users who indicate that they joined the site to start a “casual” relationship. This
left us with users so want to “start a long-term relationship” (59%), are “just looking/curious”
(28%), “like to make new friends” (9%), and users who state that “a friend put me up on this”
(4%). The latter three groups are likely to comprise users who want to sound less committal
than users who state that they are “hoping to start a long-term relationship.” The estimation
results and the predicted matching patterns in Sections 5 and 6 change little if we only keep
users who explicitly state that they want to start a long-term relationship in our sample.13
12
Conditional logit estimates yield similar results.
For the full sample, where we also included the users who may be seeking casual affairs, many parameter
estimates were smaller in absolute value. The online behavior of these users appears “less focused” than the
behavior of the site members who try to find a long term partner.
13
14
Given the large number of observed characteristics, we can only employ a subset of all
potential variables and potential variable interactions in the estimated model. To guide us in
this choice we use the results from the “outcome regressions” in Hitsch et al. (2008). These
regressions relate the number of unsolicited first-contact e-mails that a user receives in a
given time period to his or her attributes. Note that if mate preferences were homogeneous,
these regressions would reveal the (common) utility index by which users rank their potential mates.14 This approach can be extended to allow for some preference heterogeneity by
only considering first-contacts received from users who are homogeneous along some observed
attribute. The outcome regressions included all observed attributes. For the discrete choice
model, we select user characteristics that are of interest to us and are quantitatively (and
statistically) significant in the outcome regressions.
4.4
Estimation Results
Table 3 presents the maximum likelihood estimates of the fixed effects binary logit model.
Before exploring the matching predictions implied by these estimates, we first provide a brief
discussion of the results. For a comparison to alternative estimation approaches, robustness
checks, and a discussion of the implied trade-offs among various attributes, we refer the reader
to Hitsch et al. (2008). Our results are qualitatively similar to the “revealed preference”
findings of Belot and Francesconi (2006) and Fisman et al. (2006, 2008) using speed dating
experiments, and survey studies in psychology (Buss 2003, Etcoff 1999).
As expected, we find that the users of the dating service prefer a partner whose age is
similar to their own. Women who are single try to avoid divorced men, while divorced women
have a preference for a partner who is also divorced. Similarly, single men avoid divorced
women, but divorced men have no particular preference to date a divorced woman. Both
men and women who have children prefer a partner who also has children. Members with
children, however, are much less desirable to both men and women who themselves do not
have children. Also, women seeking a long term relationship prefer a partner who indicates
that he is also seeking a long term relationship, but no such preference is apparent for men.
Looks and physique are important determinants of preferences for both men and women.
The utility weights on the looks rating variable differ only slightly across men and women.
Also men and women have a stronger preference for mates who describe their looks as “above
average” than for average looking members, and they have an even stronger preference for
members with self described “very good looks.” Regarding height, we find that men typically
avoid tall women, while women have a preference for tall men. Men have a strong distaste for
women with a large BMI, while women tend to prefer heavier men. The estimates of income
preferences show that women place about twice as much weight on income than men. There is
14
For this statement to be true, we also need to assume that users are uniformly sampled.
15
little evidence for preference heterogeneity here—the absolute value of the distance coefficients
is small, and hence own income matters only slightly in the evaluation of a partner’s earnings.
Regarding education, we find that both men and women want to meet a partner with a similar
education level. While women have an overall strong preference for an educated partner, but
also have a relatively small tendency to avoid men who are more educated than themselves,
men generally shy away from educated women. The estimated same-race preferences show
that both men and women have a preference for a partner of their own ethnicity. Finally, we
find that both men and women have a preference for a partner of the same religion.
5
The Structure and Efficiency of Online Matches
The purpose of an online dating site is to arrange dates and ultimately matches, marriages
in particular. As described in Section 2, the particular site we investigate uses a very decentralized mechanism, in which browsers communicate and interact without any intervention.
In this section, we investigate the structure and efficiency of the matches that result from
this mechanism. In particular, we discuss whether we can predict the resulting structure of
matches using a well-known economic model of match formation—the Gale-Shapley (1962)
deferred-acceptance algorithm.
5.1
Sorting: Actual and Predicted
To investigate matching patterns, we first have to define what we mean by a “match” in
our data. Since we can only track the users’ online behavior, we do not know whether two
partners ever met offline or eventually got married. However, our data does allow us to
observe whether users exchange a phone number or e-mail address, or whether an e-mail
contains certain keywords or phrases such as “get together” or “let’s meet.” We therefore
have some indirect information on whether the online meeting resulted in an initial match,
i.e. a date between the users. We therefore define a match as a situation where both mates
exchange such contact information (i.e., for a match it is not enough for a man to offer his
phone number, we also require that the woman responds by sending her contact information).
Correlations in the attributes of couples have been studied widely in sociology, psychology,
and economics. We find that matching outcomes in online dating also exhibit strong sorting
patterns. Table 5, column (I), shows the correlation of several user attributes in the observed
matches. Age is strongly correlated across men and women (ρ = 0.70). Looks, as measured
by the standardized photo rating, are also strongly correlated (ρ = 0.31). There are smaller
but positive correlations in height (ρ = 0.15), BMI (ρ = 0.13), income (ρ = 0.15), and years
of education (ρ = 0.12).
Having documented the matching patterns in our data, we turn to the question of whether
16
these patterns can be explained by an economic matching model and the mate preferences
of the site users. However, as we pointed out in the introduction, it is difficult to model
how matches are formed on this site explicitly. Adachi’s search equilibrium model outlined
in Section 3.1provides a useful stylized description of search behavior, but there are many
aspects of behavior in this market that are not captured by this model.
We thus pursue a different strategy, and compare the observed matching outcomes to the
matching outcomes that would have been generated by a benchmark mechanism under the
preference profiles we estimated. The benchmark in this case is the Gale-Shapley mechanism,
which yields stable, and efficient (under declared preference profiles) matching outcomes which
we can compare to the observed matching outcomes. Aside from providing a descriptive
vantage-point, running the Gale-Shapley algorithm using the estimated preference profiles
helps us answer an interesting market design question as well: given the revealed preferences
of the users, would significant efficiency gains have been achieved if a centralized matchmaking
algorithm (possibly the Gale-Shapley algorithm) had been used by the website? A lively
debate on a similar issue exists in the online dating industry: while some major dating
services, such as Match.com, Yahoo!Personals, etc., use a decentralized approach allowing
users to freely browse and communicate, a “planner” approach is pursued by other major
dating sites. eHarmony.com, for example, does not allow users to search and communicate
freely, but instead limits users to choose among a subset of all potential partners selected by
the company’s proprietary algorithm.
To compute the Gale-Shapley matches, we use the estimated preference profiles of the
site users (Table 3) and run the deferred-acceptance algorithm for both geographic markets.
We calculate both the man-optimal and woman-optimal stable solutions to assess whether
the extreme points on the lattice of stable matches yield significantly different matching
outcomes. The preferences include an i.i.d. type I extreme value term, mw , for each pair of
potential mates.15 We use random draws of these utility terms to draw from the distribution of
preference profiles, and compute the corresponding man-optimal and woman-optimal stable
matching, accounting for each user’s estimated utility from staying single. We repeat this
process 50 times and report the average and standard deviation of the attribute correlations
across these 50 repetitions.
Table 5, column II (a) and II (b) reports the attribute correlations across matched couples
for the man-optimal and woman-optimal Gale-Shapley matches. The correlations in the manoptimal and woman-optimal predictions are virtually identical, which we also found for the
other predicted matches reported below; hence, we will no longer separately report the manoptimal and woman-optimal results from now on.
We find that the predicted correlations largely agree with the observed keyword corre15
To be precise: Man m’s taste shock for woman w is different from woman w’s taste shock for man m,
mw =
6 wm .
17
lations in column (I). The predicted and observed age correlations are identical (ρ = 0.70).
Although the predicted correlation in looks is somewhat smaller than the looks correlation in
keyword matches (ρ = 0.16 versus ρ = 0.31), this could potentially be caused by the fact that
many users in our sample do not post pictures.16 All users report their height and weight,
and the correlations predicted by the Gale-Shapley algorithm are slightly higher than the
observed correlations. We also get similar correlations in income and education across the
keyword and Gale-Shapley matches.
We should re-emphasize at this point that the above prediction exercise did not utilize
data on matches as an input. Our preference estimates only used data on first-contacts; the
estimation did not condition on a match taking place. Table 4 provides summary statistics
on the progress of online relationships. Of all first-contact e-mails sent among the users in
our estimation sample, 4.8% lead to an eventual match. Typically, a match is achieved only
after multiple rounds of e-mail exchanges: the median and mean numbers of e-mails sent
back and forth before a match is realized are 6 and 11.9 respectively. Hence, matches are very
different events than first-contacts. Hence, the equilibrium matching predictions based on the
estimated preferences that we presented above can be regarded as out-of-sample, rather than
in-sample predictions.
5.2
Efficiency
The finding that the observed and predicted attribute correlations are largely similar suggests
that the online dating market achieves an approximately efficient matching. However, these
correlations are aggregate measures that might mask important differences in the allocations
obtained by the actual (“decentralized”) and Gale-Shapley (“planner”) protocols. Alternatively,
we could measure the difference between actual and predicted matches by comparing the
identities of the matched couples. However, since the iid error terms, mw and wm , change
the stable matches across each run of the deferred-acceptance algorithm, it is unreasonable
to expect a one-to-one correspondence between the two match results. Therefore, we choose
a different method to evaluate the efficiency of the observed match outcomes. As before,
we construct each user’s preference profile over all potential mates, based on the preference
estimates and the simulated error terms . For each user i we then record the rank of the
match that is achieved in the data, and the rank of the match achieved by the Gale-Shapley
algorithm.17
Table 6 shows summary statistics of the predicted ranks achieved by the different matching
protocols, averaged across users and 50 Gale-Shapley predictions. The mean difference in
ranks between the Gale-Shapley matches and the observed keyword matches is 75 for men
16
The looks correlation was only computed when both members of the matched couple had posted photos.
If a user achieved more than one match in the data, we record the highest rank within the set of matched
mates and the option of staying single.
17
18
and 66 for women; that is, the Gale-Shapley procedure would have matched our users with
partners that are higher in their rankings. Expressed in percent of the highest rank achievable
(the users’ “first choice”), the improvement is 5.4% for men and 4.4% for women.
Table 6 also compares the observed and predicted matches, and the users’ first choice (the
maximum achievable rank). The average difference between the users’ first choice and the
ranks achieved by the Gale-Shapley algorithm is 39 for men and 31 for women. On the other
hand, the difference between the first choice and the observed keyword matches is 114 for
men and 97 for women. The conclusion one may draw from this result is that the estimated
preference profiles give rise to equilibrium’ matches that are “not too far away,” on average,
from the user’s first choice. This is very likely a result of the considerable heterogeneity in
preferences, which “softens” the competition for mates. To illustrate, consider the extreme
situation in which all men and women have identical preferences over the opposite sex. With
approximately 1500 men and women on each side of the market, the resulting perfect assortative matching would lead to the average user being matched to a person who is 750 ranks
away from his or her first choice.
The previous results seem to suggest that using the Gale-Shapley procedure may result
in significant efficiency gains. However, our comparison is subject to an important bias due
to the presence of the iid logit error term. Per our assumption, these error terms are utility
components that are observed by the site users but unobserved to us. Hence, in the event
that m matches with w, E(mw |m matches with w) ≥ E(mw ) and vice versa with m and
w exchanged.18 This means that conditional on a match, the matched partners received a
relatively high utility draw. Our procedure thus tends to under-predict the rank of a partner
in a match, and thus our comparison of predicted and actual matches tends to over-predict
the efficiency gains of the Gale-Shapley protocol.
In order to understand the magnitude of this bias, we conduct the following simulation
exercise. Given our sample of site users, we construct their preferences as before, including the
draws. We then compute a match according to the Gale-Shapley algorithm, and treat the
predicted matches as “data.” We then re-draw the terms, and compute a new Gale-Shapley
match. The corresponding difference in ranks allows us to explore the magnitude of the bias,
for if had no influence on the match outcomes, the predicted matches would be identical.
Table 6 shows that the predicted “improvement” in achieved ranks using the Gale-Shapley
protocol versus the simulated data is on the same order of magnitude as the predicted difference of Gale-Shapley and observed (keyword) matches. Hence, the bias of our comparison
procedure appears to account for the previous finding that the Gale-Shapley protocol leads,
on average, to a match with a more preferred partner. We thus conclude that the decentral18
This result is a straightforward consequence of the asymptotic equivalence between the matching resulting
from the Adachi and Gale-Shapley models. In the Adachi model, a match only arises if the threshold rule (1)
is satisfied. Therefore, mw will necessarily be left-truncated conditional on a match.
19
ized search protocol utilized by the dating site does not result in significant efficiency losses
compared to what could have been achieved using the Gale-Shapley procedure.
Note that an intuitive rationalization for this finding can be proposed based on Adachi’s
asymptotic equivalence result discussed in Section 3.1: it appears that search frictions on this
particular online dating site are small enough to allow decentralized search to approximate
stable matches obtained from a centralized procedure like Gale-Shapley. While we believe that
a characterization of how large search frictions have to be to lead to significant departures
from efficiency maybe interesting for the purpose of designing online dating sites19 , we leave
this question for future research.
6
Predicting Sorting in Marriages
As noted in the introduction, a large empirical literature in sociology, psychology, and economics documents strong assortative mating (homogamy) patterns in the demographic, physical, and socioeconomic characteristics of married couples. In the previous section, we saw
that the Gale-Shapley algorithm, based on mate preference estimates, predicts the matching
patterns among the users of our online dating site well. We now explore if the same approach
can also predict some aspects of the sorting patterns observed in marriages. Comparing the
predictions of the Gale-Shapley model with actual marriages is particularly interesting because Gale-Shapley, as the limit of the Adachi model, describes a market without search costs,
while such search frictions may play an important role in the formation of marriages.
An immediate objection to this approach is that we estimate mate preferences for “dating,”
which may not be appropriate to predict patterns of sorting in a marriage market. Previous
studies, however, have documented that matching along observable attributes is similar at
various stages of a relationship. Laumann et al. (1994), using data from the National Health
and Social Life Survey, show that the differences in sorting along age, education, and ethnicity
between marriages, cohabitations, and short-term relationships are small.20 Here, following
Blackwell and Lichter (2004), we document sorting patterns among dating, cohabiting, and
married couples using data from the National Survey of Family Growth (NSFG), Cycle 6,
which contains information on the relationship history of a sample of 7,643 women.21 Table
7 compares the correlations in age and education, and the inter-ethnicity matching patterns
19
Along with cognitive costs imposed by search interfaces and site layout, some online dating sites introduce
explicit search costs by charging a browsing fee per profile (e.g. RightStuffDating.com).
20
See Table 6.4 in Laumann et al. (1994). While married couples are slightly more similar in their ethnic
background, couples in a short-term relationship are slightly more similar along age and education.
21
The survey was conducted between 2002 and 2003, and the women sampled were 15-44 years old. We
classify women as “dating” if they were never married, do not live with a partner, and, regarding their partner,
indicate that they are “going with him or going steady,” “going out with him once in a while,” are “just friends,”
or “had just met him.” We classify women as “cohabitating” if they were never married and live together with
a partner of the opposite sex.
20
observed in the NSFG sample. The matching patterns among the three types of relationships
largely agree; the age and education correlations among “dating” couples are even somewhat
larger than the corresponding correlations in marriages.22
6.1
Sorting Patterns in Marriages
Table 8, column (I), reports attribute correlations among married couples. To construct this
table we utilized the 2000 Census IPUMS 5% sample for the two metropolitan areas covered
by our online dating data. In order to be able to compare the results to the predicted matches
based on users in our online dating sample, we only use information on men and women in
the 18-55 age range.
Married couples exhibit a strong degree of sorting in age (ρ = 0.85) and years of education
(ρ = 0.60). There is less sorting along income (ρ = 0.13), which increases somewhat (ρ = 0.17)
when we consider the correlation in the imputed income which accounts for the potential
earnings of spouses who do not participate in the labor force.23 Regarding the empirical
correlations in looks, height, and weight, we consulted several empirical studies in sociology
and psychology, which report high degrees of correlation among these characteristics as well
(height has a ρ between 0.31 and 0.63, weight between 0.08 and 0.32, and looks—measured
in a similar manner as in our study—between 0.34 and 0.54). Note that the studies reporting
correlations in physical attributes typically use much smaller and more selective samples than
the Census, and may not reflect the correlations in the metropolitan areas we are considering.
Assortative mating is observed along ethnic-racial lines as well. In a recent survey of
intermarriage patterns, Kalmijn (1998) reports that in the U.S. “virtually all ethnic subgroups
marry within their group more often than can be expected under random mating.” We confirm
this finding of strong endogamy (marrying within a group) for the two metropolitan areas
covered by our sample using the 2000 Census IPUMS 5% sample. Table 9, panel (I) illustrates
the matching patterns in the IPUMS data. For white, black, Hispanic, and Asian men and
women, the table displays the percent matching with a mate of another ethnicity. Table 10
illustrates the same matching patterns using the odds-ratios of matches. For example, the
first column in the top panel shows that the ratio of the fraction of white men married to
white women to the fraction of all white women in the population is 1.3. Random matching
would predict this ratio to be 1; thus, we conclude that white men are more likely to marry
white women compared to random matching. More drastically, we see that the same-race
matching odds ratio for black men is 10.7, which means that black men are 10.7 times more
22
Note that the correlations from the NSFG (Table 7) cannot be directly compared with the sorting patterns
obtained from the IPUMS data (Tables 8 and 9), as the two samples cover couples within different age ranges
and geographies.
23
We impute income using a regression that relates the logarithm of a person’s income to years of education,
years in the work force (and the square of this variable), occupation dummies, and ethnicity dummies. We
ran the regressions separately for men and women.
21
likely to marry a black woman than they would under random matching. As can be seen from
the diagonal entries, strong endogamy patterns are also present for Hispanic and Asian men
and women.
6.2
Gale-Shapley Predictions
To predict the equilibrium matches based on our preference estimates, we first re-weight our
sample of dating site users to match the joint distribution of age, education, and ethnicity
in the IPUMS data, separately for men and women in each market. We only consider users
in the 18-55 age range, who constitute more than 95% of all dating site members. Then, as
before in Section 5, we calculate the equilibrium matches among men and women using the
deferred-acceptance algorithm, based on the estimated mate preferences.
Table 8, column (II) displays the attribute correlations in the predicted marriage matches.
Overall, as in Section 5, we predict sorting along many attributes. The predicted correlations
of age, income, and height are similar to the correlations observed in marriages. The predicted
correlations in looks, BMI, and in particular education, however, are smaller than the observed
ones.
Tables 9 and 10, panel II, report the predicted interracial sorting patterns. The results reveal strong endogamy patterns, although the observed marriages exhibit even stronger propensities for matches among ethnically homogeneous partners.
An important issue that we face when using the Gale-Shapley model to simulate “offline”
marriage patterns is how to treat the error terms in the fixed-effect logit model on which
our preference estimates are based. The standard interpretation regards the error term as a
utility component that is observed by the site users, but unobserved to us, the analysts. An
alternative interpretation of the unobservable utility term is that it represents noise in the
users’ behavior. I.e., the site users, while browsing through the listings and deciding who to
contact by e-mail, make occasional mistakes in their selections—mistakes that they would not
have made had they met the same potential partner offline. Under the latter interpretation,
we should not incorporate the error terms in the utility functions on which the predictions of
the Gale-Shapley simulations are based, because the errors that users make when contacting
a potential mate will eventually be corrected before a final match is achieved.
Table 8, column (III), displays the predicted attribute correlations when we disregard the
logit error term. Most of these correlations are larger than the predicted correlations if the
error term is included in the utility function. Moreover, some of these correlations are now
quite close to the observed attribute correlations in marriages. In particular, the predicted
age correlation is only marginally larger than the observed correlation in the IPUMS sample,
while the predicted correlation in education is only marginally smaller. We over-predict the
income and imputed income correlations, however. The height and BMI correlations are
22
slightly lower than the correlations suggested by the literature, while the correlation in looks
is larger than the previously reported numbers. The differences in the correlations of physical
characteristics should be viewed with some caution, however, as the samples in the previous
literature are smaller and unlikely to be representative of the population we consider.
Tables 9 and 10, panel (III), report the predicted interracial marriage patterns when we
disregard the error term. These results are similar to the matching patterns observed among
married couples and predict strong degrees of endogamy, although we slightly over-predict
endogamy for blacks, Hispanics and white women, and under-predict endogamy for white
men. Specifically, 100% of black men and 94.2% of black women are predicted to marry a
black partner, whereas the respective numbers are 72.2% of black men and 87.5% of black
women in the 2000 IPUMS data. Similarly, our simulations suggest that 100% of Hispanic
men and 84.2% of Hispanic women are predicted to match with a Hispanic partner, compared
with 81.4% of men and 75.8% of women in the Census data. 88.0% of white men and 98.0%
of white women are predicted to marry a white partner, compared with 92.7% and 94.6% in
the IPUMS data. Only for Asian men and women we systematically under-predict endogamy.
Discussion There are two main results from the Gale-Shapley predictions using the “IPUMSmatched” sample of online dating site users. First, we demonstrated that the preference estimates of Section 4 can lead to sorting patterns that are quantitatively similar to sorting patterns in marriages. This suggests that preferences alone can explain the homogamy/endogamy
patterns observed in marriages to a significant degree. That is, search frictions and/or segregation cannot be the only cause of these patterns—since the Gale-Shapley model predicts
matching outcomes in a frictionless matching environment in which all members of the opposite sex are within one’s choice set.
A caveat to our exercise and conclusions is that the mate preferences of the online dating
sample can differ from a representative offline sample even after controlling for observables.
Even though we cannot ultimately rule out this possibility without data on offline preferences,
we can still offer the most conservative interpretation of our results, that strong homogamy
patterns can result as an equilibrium matching outcome in the absence of searching frictions
using the estimated preferences of a sample of online daters.
Second, we find that the error term in our utility specification (which, we believe, is quite
rich especially in that it includes numerous observable factors), plays an important role in the
simulated matching patterns. Most of the correlations (except for height correlation) diminish
in size when we include the error term as a structural, unobserved utility component in the
simulations. The decline in correlations due to the inclusion of the error term is especially
noticeable for age, education, income, looks, and ethnicity. Indeed, the results suggest that
we can regard the sorting predictions from the simulations with and without errors to provide
upper and lower bounds, respectively, on the correlations achievable by the preferences we
23
have estimated. Many of the “offline” correlations we observe fall within these upper bounds;
those that do not, such as offline height, weight, income and education correlations, are quite
close to one of the bounds.
As we pointed out above, it is difficult to argue, based on available evidence, which
interpretation of the error term is more appropriate. However, the two interpretations do
differ with regards to their predictions as to how we might expect the nature of matches to
evolve as online dating technology becomes more prevalent. If we maintain the assumption
that people make mistakes or are less selective when choosing their mates online,24 we may
expect strong sorting patterns along age, education, income, looks, and ethnicity to persist
even if online dating becomes the norm, since people who use such sites to find a marriage
partner may avoid making such errors and match in these dimensions similar to the way that
“offline” couples do.
However, if the error terms indeed reflect a utility component unobservable to the econometrician, we might expect widespread diffusion of online technology to lead to lower correlations in dimensions such as age, education, income, looks, and ethnicity. Indeed, many
traditional settings where people commonly meet their mates (school, work, church, social
clubs, bars/nightclubs) facilitate matching along such observable attributes. Being limited
to finding one’s partner through these settings may preclude a wider search over potential
partners who might compensate for their lack of observable qualities through other, more difficult to observe traits, (such as a particular taste for movies or music) some of which may be
easier to convey on a web page than on an offline environment. For example, in a noisy bar,
physical attractiveness is probably the main attribute along which matching takes place. An
online dating profile, on the other hand, can contain much information regarding a person’s
personality that would have been unobservable in a bar. If this is indeed the case, the diffusion of online dating technology may lead to less sorting along easily observable attributes
(such as age, race, education, income and looks), and more sorting along attributes that are
more difficult to measure. Of course, it could be the case that the users of the dating site
have a stronger preference for hard-to-observe attributes than the general population, which
is precisely the reason why they try to find a partner online. A further examination of our
prediction is left for future research on the impact of the online dating technology on marriage
markets.
6.3
How Do Different Preference Components Contribute to Sorting?
Our approach, which is based on an explicit model of preferences and a theoretical model
to predict match outcomes, enables us to answer questions regarding how different aspects
24
We mean “less selective” in terms of not paying attention to details, not in terms of having a lower threshold
of staying single.
24
of men’s and women’s mate preferences affect the equilibrium matching patterns. We now
present the results from two such exercises, in which we will turn on or shut off different
preference components, and use the Gale-Shapley algorithm to evaluate the contribution of
these components to equilibrium matching patterns.
Vertical vs. Horizontal Preferences As we already indicated in the introduction, sorting
can be driven both by preferences over the absolute value of a partner’s trait, or by preferences
for a partner with similar traits. For example, consider a market where people are only
distinguished by their beauty. Suppose everyone prefers a more beautiful mate to a less
beautiful one. In a stable match, the most beautiful woman marries the best looking man,
the second most beautiful woman marries the second most beautiful man, and so forth. Thus,
there will be perfect correlation along beauty in this matching. Alternatively, assume that
everyone tries to find a partner who is exactly as beautiful as themselves. Perfect correlation
along beauty will arise under this preference structure as well.
We will call the first example an instance of “vertical” preference, and the second one with
“horizontal” preferences. The specification of our utility function in Section 4 accounts for
both. In order to assess the importance of vertical and horizontal preference components, we
set the “distance terms” in the utility specification (4) equal to zero, and then simulate the
stable matches under this alternative preference specification. Table 8, column (IV) shows
that all attribute correlations become small in size. These results indicate that according to
our preference estimates, sorting is largely driven by preference heterogeneity.
Matching in a “Color-Blind” World How do ethnicity preferences affect marriage patterns? As shown before, both actual marriages and the predicted matches from the GaleShapley model exhibit strong intra-ethnicity sorting patterns. In Section 4, we also showed
that the dating site users show strong same-race preferences. However, it is not obvious
whether same-race preferences alone cause sorting within ethnicity groups: because other attributes, such as income and education, are correlated with race, preferences over these other
attributes could lead to intra-ethnicity sorting as well.
In the counter-factual exercise below we will assume that men and women do not have
preferences for a mate of a specific ethnicity. We also assume that the other preference
components, such as preferences for looks, income, and education, stay the same.
Tables 9 and 10, panel (IV), show that racial sorting almost completely disappears when
we remove the race preferences from the utility specification. The matching frequencies are
very close to what we would expected from random matching, except for Hispanic men and
women who continue to display some degree of endogamy. Table 8, column (V), shows that
the correlations in non-race attributes remain almost identical to those obtained in Column
(IV), when race preferences were part of the simulation.
25
These results suggest that racial sorting is almost entirely due to the same-race preferences;
preferences for non-race attributes can not quantitatively account for the small fraction of
marriages across different ethnic groups.
7
Conclusions
Based on revealed mate preferences, we demonstrated that the classic Gale-Shapley algorithm
can rationalize the observed online matching patterns as the outcome of a frictionless and
efficient matching market. Furthermore, the observed and predicted matches show that men
and women sort along attributes such as age, education, and ethnicity. These sorting patterns
resemble the sorting patterns in actual marriages.
We extrapolated the efficient market benchmark provided by the Gale-Shapley model to
assess how much of the sorting patterns in marriages can be attributed to preferences alone.
We first re-weighted our sample of online daters to match the population at large along observable attributes. Based on our preference estimates, we then predicted matches (“marriages”)
for the men and women in our sample using the Gale-Shapley algorithm. Quantitatively,
the sorting patterns predicted by this exercise are not distant from actual sorting patterns.
In particular, our results suggest that observed intra-ethnicity sorting patterns can arise as
a result of preferences and the market mechanism alone, without resort to search frictions
and/or segregation. On the other hand, our results also suggest that preferences operating
in an market without search frictions may not be enough to generate some of the strong
correlations observed offline: the very high degree of sorting along education in marriages,
in particular, may indeed be the result of institutions by which men and women are mostly
exposed to a potential partner with a similar education level.
Based on the results obtained in this study, we conclude that online dating data provide
valuable insights towards understanding the economic mechanisms underlying match formation and the formation of marriages. Many important issues are left for future research. For
example, an obvious drawback of our analysis is that we cannot observe whether an online
meeting finally resulted in a marriage, which is one outcome that we are interested in. This
issue is addressed by the recent working paper of Lee (2008), who utilizes a data set that
tracks participants across a sequence of dates, some of which result in marriages. Lee (2008)
builds on the identification framework introduced in this paper, and augments our revealed
preference specification to account for learning about unobservable match components. Another methodological drawback of our analysis concerns the issue of strategic behavior. A
more structural estimation approach, such as in Choo and Siow (2006), Wong (2003), and
Flinn and del Boca (2007), could potentially address this caveat to our estimation approach.
26
References
[1] Abdülkadiroglu, A. and T. Sönmez (2003): “School Choice: A Mechanism Design Approach” American Economic Review, 93(3), 729-747.
[2] Adachi, H. (2003): “A search model of two-sided matching under non-transferable utility,”
Journal of Economic Theory, vol. 113, 182-198.
[3] Belot, M. and Francesconi, M. (2006): "Can Anyone Be ’The’ One? Evidence on Mate
Selection from Speed Dating," working paper.
[4] Biddle J. and D. Hamermesh (1998): “Beauty, Productivity, and Discrimination:
Lawyers’ Looks and Luchre,” Journal of Labor Economics, vol. 16 (1), 172-201.
[5] Blackwell, D. L. and D. T. Lichter (2004): “Homogamy Among Dating, Cohabiting, and
Married Couples,” Sociological Quarterly, 45 (4), 719-737.
[6] Bozon, M. and F. Heran (1989): “Finding a Spouse: A Survey of How French Couples
Meet,” Population, 44, 91-121.
[7] Browning, M., P.-A. Chiappori, and Y. Weiss (2007): The Economics of the Family.
Manuscript: http://www.tau.ac.il/~weiss/fam_econ/
[8] Buss, D. (2003): The Evolution of Desire: Strategies of Human Mating. Basic Books.
[9] Chamberlain, G. (1980): “Analysis of Covariance with Qualitative Data,” Review of
Economic Studies, 47, 225-238.
[10] Choo, E. and A. Siow (2006): “Who Marries Whom and Why,” Journal of Political
Economy, 114 (1), 175-201.
[11] Etcoff, N. (1999): Survival Of The Prettiest. Doubleday.
[12] Fisman, R., S. S. Iyengar, E. Kamenica, and I. Simonson (2006): “Gender Differences
in Mate Selection: Evidence From a Speed Dating Experiment,” Quarterly Journal of
Economics, 121 (2), 673-697.
[13] Fisman, R., S. S. Iyengar, E. Kamenica, and I. Simonson (2008): “Racial Preferences in
Dating,” Review of Economic Studies, 75 (1), 117-132.
[14] Flinn, C. and D. del Boca (2007): “Household Time Allocation and Models of Behavior:
A Theory of Sorts,” working paper.
[15] Frechette, G., U, Unver, A. Roth (2007): “Unraveling Yields Inefficient Matchings: Evidence from Post-Season College Football Bowls,” forthcoming, RAND Journal of Economics.
27
[16] Gale, D. and L. S. Shapley (1962): “College Admissions and the Stability of Marriage,”
American Mathematical Monthly, 69, 9-14.
[17] Hamermesh, D., J. Biddle (1994): “Beauty and the Labor Market,” American Economic
Review, 84 (5), 1174-1194.
[18] Hinsz, V. B. (1989), “Facial Resemblance in Engaged and Married Couples,” Journal of
Social and Personal Relationships, 6, 223-229.
[19] Hitsch, G. J., A. Hortaçsu, and D. Ariely (2008): “What Makes You Click? — Mate
Preferences in Online Dating,” manuscript, University of Chicago.
[20] Kalmijn, M. (1998): “Intermarriage and Homogamy: Causes, Patterns, Trends” Annual
Review of Sociology, 24, 395-421.
[21] Lee, S. (2007): “Preferences and Choice Constraints in Marital Sorting: Evidence From
Korea,” manuscript, Stanford University.
[22] Langlois, J. H., Kalakanis, L., Rubenstein, A. J., Larson, A., Hallam, M., and Smoot,
M. (2000): “Maxims or Myths of Beauty? A Meta-Analytic and Theoretical Review,”
Psychological Bulletin, 126, 390-423.
[23] Laumann, E. O., J. H. Gagnon, R. T. Michael, and S. Michaels (1994): The Social
Organization of Sexuality. University of Chicago Press.
[20] Niederle, M. and A. E. Roth (2003), “Unraveling reduces mobility in a labor market:
Gastroenterology with and without a centralized match”, Journal of Political Economy,
111 (6), 1342-1352.
[24] Roth, A. (1984): “The Evolution of the Labor Market For Medical Interns and Residents:
A Case Study in Game Theory,” Journal of Political Economy, 92, 991-1016.
[25] Roth, A. and M. Sotomayor (1990): Two-Sided Matching: A Study in Game-Theoretic
Modeling and Analysis. Cambridge: Cambridge University Press.
[26] Roth, A., T. Sönmez and U. Ünver (2004): “Kidney Exchange,” Quarterly Journal of
Economics, 119, 457-488.
[21] Spuhler, J.N. (1968), “Assortative Mating with Respect to Physical Characteristics,”
Social Biology, 15 (2), 128-140.
[22] Stevens, G. and D. Owens and E. C. Schaefer (1990): “Education and Attractiveness in
Marriage Choices,” Social Psychology Quarterly, 53 (1), 62-70.
[23] Wong, L. Y. (2003): “Structural Estimation of Marriage Models,” Journal of Labor
Economics, 21 (3), 699-727.
28
Appendix: Data Description
Of all 22,000 users of the online dating site for which we have data, 10,721 users were located in
the Boston area, and 11,024 users were located in San Diego. We observe the users’ activities
over a period of three and a half months in 2003.
Motivation for using the dating service The registration survey asks users why they
are joining the site. It is important to know the users’ motivation when we estimate mate
preferences, because we need to be clear whether these preferences are with regard to a
relationship that might end in a marriage, or whether the users only seek a partner for casual
sex. The majority of all users are “hoping to start a long term relationship” (36% of men
and 39% of women), or are “just looking/curious” (26% of men and 27% of women). Perhaps
not surprisingly, an explicitly stated goal of finding a partner for casual sex (“Seeking an
occasional lover/casual relationship”) is more common among men (14%) than among women
(4%).
More important than the number is the share of activities accounted for by users who
joined the dating service for various reasons. Users who seek a long-term relationship account
for more than half of all observed activities. For example, men who are looking for a long-term
relationship account for 55% of all e-mails sent by men; among women looking for a long-term
relationship the percentage is 52%. The corresponding numbers for e-mails sent by users who
are “just looking/curious” is 22% for men and 21% for women. Only a small percentage of
activities is accounted for by members seeking a casual relationship (3.6% for men and 2.8%
for women).
We conclude that at least half of all observed activities is accounted for by people who have
a stated preference for a long-term relationship and thus possibly for an eventual marriage.
Moreover, it is likely that many of the users who state that they are “just looking/curious”
chose this answer because it sounds less committal than “hoping to start a long-term relationship.” Under this assumption, about 75% of the observed activities are by users who joined
the site to find a long-term partner.25
Demographic and socioeconomic attributes To understand the representativeness of
our sample compared to the general population, we contrast the reported characteristics of
the site users with samples of the population from the CPS Community Survey Profile (Table
1). In particular, we contrast the site users with two sub-samples of the CPS. The first
sub-sample is a representative sample of the Boston and San Diego MSA’s (Metropolitan
Statistical Areas), and reflects information current to 2003. The second CPS sub-sample
25
The registration also asks users about their sexual preferences. Our analysis focuses on the preferences
and match formation among men and women in heterosexual relationships; therefore, we retain only the
heterosexual users in our sample.
29
conditions on being an Internet user, as reported in the CPS Computer and Internet Use
Supplement, which was administered in 2001.
A visible difference between the dating site and the population at large is the overrepresentation of men among the site users (men represent 54.7% of all users in Boston
and 56.1% San Diego). Another visible difference is in the age profiles: site users are more
concentrated in the 26-35 year range than both CPS samples (the median user on the site is
in the 26-35 age range, whereas the median person in both CPS samples is in the 36-45 age
range). People above 56 years are underrepresented on the site compared to the general CPS
sample; however, when we condition on Internet use, this difference in older users becomes
smaller.
While the majority of the general population is married, most users of the dating service
are single (they have never been married or are divorced, separated, or widowed). About 7%
of men and 2% of women state that they are married, but these users account for less than
1% of all activities (e-mails sent). This suggests that a small number of people in a long
term relationship may be using the site as a “search outlet.” Of course, the true percentage
of otherwise committed people may be higher than reported.
The education profile of the site users shows that they are on average more educated than
the general CPS population. However, the education profile is more similar to that of the
Internet using population, with only a slightly higher percentage of graduate and professional
degree holders.
The income profile reflects a pattern that is similar to the education profile. Site users
have generally higher incomes than the overall CPS population, but not compared to the
Internet-using population.
The profile of ethnicities represented among the site users roughly reflects the profile in
the corresponding geographic areas, especially when conditioning on Internet use, although
Hispanics and Asians are somewhat underrepresented on the San Diego site and whites are
overrepresented.26
These comparisons show that the online dating site attracts users who are typically single,
somewhat younger, more educated, and have a higher income than the general population.
Once we condition on household Internet use, however, the remaining differences are not
large. Hence, along demographic and socioeconomic attributes, our sample appears to be
fairly representative of the population at large.
Reported physical characteristics of the users Our data set contains detailed (although self-reported) information regarding the physical attributes of the users. 27.5% post
one or more photos online. For the rest of the users, the survey is the primary source of
26
We should note that we had difficulty in reconciling the “other” category in the site’s ethnic classification
with the CPS classification and that some of the discrepancy may be driven by this.
30
information about their appearance.
The survey asks the users to rate their looks on a subjective scale. 19% of men and 24% of
women possess “very good looks,” while 49% of men and 48% of women have “above average
looks.” Only a minority—29% of men and 26% of women—declare that they are “looking
like anyone else walking down the street.” That leaves less than 1% of users with “less than
average looks,” and a few members who avoid the question and joke that a date should “bring
your bag in case mine tears.” Posting a photo online is a choice, and hence one might suspect
that those users who post a photo are on average better looking. On the other hand, those
users who do not post a photo might misrepresent their looks and give an inflated assessment
of themselves. The data suggest that the former effect is more important. Among those users
who have a photo online, the fraction of “above average” or “very good looking” members is
about 7% larger compared to all site users.
The registration survey contains information on the users’ height and weight. We expect
that these are among the main attributes that the users might not truthfully report. To
investigate such possible mis-representation, we compared the reported characteristics with
information on the whole U.S. population, obtained from the National Health and Examination Survey Anthropometric Tables (the data are from the 1988-1994 survey and only cover
Caucasians). Table 2 reports this comparison. Among women, we find that the average stated
weight is less than the average weight in the U.S. population. The discrepancy is about 6 lbs
among 20-29 year old women, 18 lbs among 30-39 year old women, and 20 lbs among 40-49
year old women. On the other hand, the reported weights of men are slightly higher than the
national averages. Thus, the data only provide evidence for small levels of mis-representation.
Table 2 also shows that he stated height of both men and women is somewhat above the
U.S. average. This difference is more pronounced among men, although the numbers are small
in size. For example, among the 20-29 year old, the difference is 1.3 inches for men and 1 inch
for women.27
Measured Physical Characteristics of the Users 26% of men (3174 users) and 29% of
women (2811 users) post one or more photos online. To construct an attractiveness rating
for these available photos, we recruited 100 subjects from the University of Chicago GSB
Decision Research Lab mailing list. The subjects were University of Chicago undergraduate
and graduate students in the 18-25 age group, with an equal fraction of male and female
recruits.
Each subject was paid $10 to rate, on a scale of 1 to 10, 400 male faces and 400 female
faces displayed on a computer screen. Each picture was used approximately 12 times across
subjects. We randomized the ordering of the pictures across subjects to minimize bias due to
27
The weight and height differences translate into body mass indices (BMI) that are 2 to 4 points less than
national averages among women, and about 1 point less than national averages among men.
31
boredom or fatigue.
Consistent with findings in a large literature in cognitive psychology, attractiveness ratings
by independent observers appear to be positively correlated (for surveys of this literature, see
Langlois et al. 2000, Etcoff 1999, and Buss 2003). Cronbach’s alpha across 12 ratings per
photo was calculated to be 0.80 and satisfies the reliability criterion (0.80) utilized in several
studies that employed similar rating schemes.28 To eliminate rater-specific mean and variance
differences in rating choices, we followed Biddle and Hamermesh (1998) and standardized
each photo rating by subtracting the mean rating given by the subject, and dividing by the
standard deviation of the subject’s ratings. We then averaged these standardized rating across
the subjects to obtain a single rating for each photo.
28
Biddle and Hamermesh (1998) report a Cronbach alpha of 0.75.
32
Table 1 – Dating Service Members and County Profile of General Demographic Characteristics
Variable
Dating
Service
San Diego
General
Population Internet User
Dating
Service
Boston
General
Population
Internet User
General Information
No. of Members/Population
11,024
2,026,020
1,180,020
10,721
2,555,874
1,581,711
Percentage of Men
56.1
49.9
49.4
54.7
49.0
50.6
Age Composition
18 to 20 years
21 to 25 years
26 to 35 years
36 to 45 years
46 to 55 years
56 to 60 years
61 to 65 years
66 to 75 years
Over 76
20.3
30.7
27.0
10.0
6.6
4.4
0.8
0.1
0.2
6.0
9.5
21.3
23.0
18.5
6.3
2.9
6.9
5.7
6.4
11.5
18.8
28.6
19.0
6.5
3.6
4.8
0.8
19.0
33.1
27.2
10.1
6.2
3.6
0.5
0.1
0.2
5.8
9.3
17.2
23.1
17.6
7.3
4.3
8.8
6.8
7.2
12.0
19.7
26.8
20.1
6.9
3.7
2.9
0.7
Race Composition (1)
Whites
Blacks
Hispanics
Asian
Other
72.4
4.3
11.0
5.3
7.1
61.9
4.8
19.5
13.0
0.9
71.3
4.2
9.8
13.6
1.1
82.2
4.8
4.1
3.9
5.0
84.2
7.4
4.4
3.8
0.3
89.1
4.2
2.3
4.2
0.2
65.6
6.3
4.0
1.8
22.3
31.8
57
1.2
2.3
8.1
28.5
62.0
0.7
1.5
7.4
67.2
7.2
4.8
1.4
19.4
35.3
54.1
1.1
3.6
6.0
36.8
56.7
0.3
1.0
5.2
62.2
2.6
3.7
3.5
28.1
20.2
57
3.9
6.3
12.3
23.9
62.5
1.9
2.0
9.7
65.9
2.0
4.3
3.0
24.7
28.0
49.0
2.4
13.4
7.2
32.7
55.9
0.9
3.5
7.0
1.4
9.4
12.1
23.0
3.0
17.8
1.6
10.2
9.2
30.1
3.2
20.4
31.9
5.2
5.4
23.6
7.3
7.6
6.8
27.9
28.5
4.6
14.1
15.0
Marital Status
Men
Never married
Married & not separated
Separated
Widowed
Divorced
Women
Never married
Married & not separated
Separated
Widowed
Divorced
Educational Attainment
Have not finished high school
High school graduate
Technical training
(2-year degree)
Some college
33
Bachelor's degree
Master's degree
Doctoral degree
Professional degree
28.6
11.2
3.5
7.3
22.7
6.0
1.5
1.7
31.5
9.0
2.6
2.3
33.9
16.4
3.8
5.8
22.2
11.7
3.3
2.0
29.7
16.3
5.2
2.6
Income (2)
Individuals with
Income information
6,549
283,442
224,339
6,349
396,065
281,619
7.9
5.1
8.7
14.1
20.4
20.0
10.5
6.7
2.7
3.9
12.5
3.0
13.8
23.3
12.4
17.3
7.2
7.5
3.2
0.0
12.4
1.9
10.1
22.3
10.6
20.2
9.1
9.5
4.0
0.0
8.6
3.9
6.1
12.5
21.9
22.8
12.0
7.1
2.0
3.0
7.6
5.0
21.4
19.9
16.5
21.7
4.8
1.9
1.1
0.0
4.6
6.0
16.2
21.4
18.5
24.6
4.5
2.7
1.6
0.0
Less than $12,000
$12,000 to $15,000
$15,001 to $25,000
$25,001 to $35,000
$35,001 to $50,000
$50,001 to $75,000
$75,001 to $100,000
$100,001 to $150,000
$150,001 to $200,000
$200,001 or more
Source. Estimates from CPS Internet and Computer use Supplement, September 2001. All the CPS estimates are
weighted. We only consider individuals who are at least 18 years old. The percentages for the column "Internet
user" are from the CPS sample, restricted to individuals who declare to use the Internet.
Notes. The geographic information regards two MSA’s. The Boston PMSA includes a New Hampshire portion.
San Diego geographic information corresponds to the San Diego MSA. The site member information is from 2003.
(1) The figures for whites, blacks, Asians and “other” ethnicities for the CPS data correspond to those with nonHispanic ethnicity.
(2) The income figures from the CPS data were adjusted to 2003 dollars.
34
Table 2 – Physical Characteristics of Dating Service Members vs. General Population
Variable
Men
Dating
General
Service Population
Women
Dating
General
Service Population
Weight (lbs)
20-29 years
30-39 years
40-49 years
50-59 years
60-69 years
70-79 years
175.3
184.6
187.9
187.0
188.5
185.9
172.1
182.5
187.3
189.2
182.8
173.6
136.3
136.9
138.4
140.8
147.2
144.1
141.7
154.2
157.4
163.7
155.9
148.2
Height (inches)
20-29 years
30-39 years
40-49 years
50-59 years
60-69 years
70-79 years
70.6
70.7
70.7
70.6
70.3
69.0
69.3
69.5
69.4
69.2
68.5
67.7
65.1
65.1
65.1
64.7
64.6
63.7
64.1
64.3
64.1
63.7
63.1
62.2
BMI**
20-29 years
30-39 years
40-49 years
50-59 years
60-69 years
70-79 years
24.7
25.9
26.4
26.3
26.7
27.7
25.2
26.5
27.3
27.8
27.3
26.7
22.6
22.7
23.0
23.6
24.8
25.0
24.3
26.3
27.0
28.4
27.6
26.9
*General population statistics obtained from the National Health and
Nutrition Examination Survey, 1988-1994 Anthropometric
Reference Data Tables.
** BMI (body mass index) is calculated as weight (in kilograms)
divided by height (in meters) squared.
35
Table 3 – Binary Logit Estimates
Age
Age Difference (+)
Age Difference (-)
Single, Mate: Divorced a
Divorced, Mate: Divorced
“Long Term”, Mate: “Long Term”
Has Children, Mate: Has Children
No Children, Mate: No Children
Has Photo
Looks Rating
“Very Good” Looks
“Above Average” Looks
“Other Looks”
Height
Height Difference (+)
Height Difference (-)
BMI
BMI2
BMI Difference (+)
BMI Difference (-)
Education (Years)
Education Difference (+)
Education Difference (-)
Income ($ 1,000)
Income (>50) b
Income (>100) b
Income (>200) b
Income Difference (+)
Income Difference (-)
Income “Only Accountant Knows”
Income “What, Me Work?”
White, Mate: Black
White, Mate: Hispanic
White, Mate: Asian
White, Mate: Other
Black, Mate: White
Black, Mate: Hispanic
Black, Mate: Asian
Black, Mate: Other
Hispanic, Mate: White
Hispanic, Mate: Black
Hispanic, Mate: Asian
Hispanic, Mate: Other
Asian, Mate: White
Asian, Mate: Black
Asian, Mate: Hispanic
Asian, Mate: Other
Same Religion
Men
Estimate
-0.0597
-0.0007
-0.0050
-0.0466
0.0954
0.0180
0.1872
-0.2650
-0.0638
0.5602
0.5755
0.2769
0.1715
-0.1419
-0.0019
-0.0099
-0.3965
0.0043
0.0034
-0.0101
-0.0028
-0.0039
-0.0026
0.0052
-0.0026
-0.0047
-0.0017
0.0000
0.0000
0.3309
0.2833
-0.8305
-0.2813
-0.4944
-0.0510
-0.2383
-0.2297
-0.6817
0.0052
-0.3678
-0.3810
-0.3176
-0.3656
-0.5154
SE
0.0023
0.0002
0.0001
0.0231
0.0275
0.0178
0.0271
0.0224
0.0341
0.0144
0.0397
0.0363
0.2045
0.0066
0.0036
0.0005
0.0280
0.0006
0.0008
0.0005
0.0056
0.0010
0.0008
0.0012
0.0019
0.0021
0.0034
0.0000
0.0000
0.0453
0.0542
0.0862
0.0368
0.0437
0.0226
0.3706
0.4212
0.4611
0.3894
0.1445
0.3548
0.2548
0.1714
0.3096
-0.0709
-0.0703
0.1804
0.4206
0.3607
0.0218
36
Women
Estimate
-0.0093
-0.0016
-0.0063
-0.0748
0.1712
0.2399
0.2041
-0.3642
0.1297
0.5855
0.5502
0.1719
0.0587
0.1841
-0.0096
-0.0229
0.1321
-0.0007
-0.0103
0.0022
0.0466
-0.0085
-0.0022
0.0169
-0.0067
-0.0081
0.0073
0.0000
0.0000
1.1103
0.7260
-0.7340
-0.5688
-1.5841
-0.0716
-1.5767
-1.5868
SE
0.0034
0.0002
0.0004
0.0316
0.0305
0.0258
0.0298
0.0333
0.0457
0.0211
0.0554
0.0494
0.2073
0.0093
0.0006
0.0093
0.0497
0.0010
0.0008
0.0009
0.0076
0.0012
0.0013
0.0029
0.0035
0.0016
0.0018
0.0000
0.0000
0.1284
0.1439
0.1194
0.0898
0.2408
0.0300
0.3851
0.8775
-1.2764
-0.6646
-0.8447
0.4387
0.2314
0.5075
-0.6044
-0.0260
-0.7650
-0.4890
-0.2559
0.2965
0.2640
0.4619
0.9053
0.5987
0.4870
0.0264
Log-likelihood
-72,078
-49,029
Observations
Individuals
242,478
3,004
196,363
2,783
a
b
The user who makes the first-contact choice is single, and the potential mate is divorced
Income (> x) is the amount of income (in $1,000) above the income level x
Note: The dependent variable is the 0/1 choice to contact another previously “browsed”
user. The model includes fixed effects for each suitor. The estimation is based on a subsample of users who state they are single, divorced, or describe themselves as “hopeful.”
Furthermore, we eliminate users who indicate that they joined the site to start a “casual”
relationship and only keep users who want to “start a long-term relationship,” are “just
looking/curious,” “like to make new friends,” and users who state that “a friend put me up
on this.” In the full sample we observe more choices for men than for women. In order to
reduce the computation time, we took a random sample of the men’s choices (i.e., we kept
all men, but randomly discarded some of their observed choices).
37
Table 4 – User Behavior Summary Statistics
Men
3,004
Users
First contact behavior
Profiles browsed
First contact e-mails
(% of browses)
Women
2,783
385,470
49,223
12.7
172,946
14,178
8.2
Matching
First contacts that lead to match
(% of first contacts)
2,130
4.3
914
6.4
E-mails exchanged until match is
achieved
Mean
Median
SD
11.6
6
22.8
12.6
6
26.3
Note: The summary statistics apply to the sample of users
employed in the estimation and matching sections of this paper.
38
Table 5 – Attribute Correlations in Observed and Predicted Online Matches
Observed
Online
Matches
Simulated Gale-Shapley
Matches From
Estimated Preferences
Men-Optimal Women-Optimal
(II a)
(II b)
(I)
Age
0.70
[3034]
Height
0.15
[3034]
BMI
0.13
[3034]
Looks Rating
0.31
[1729]
Income
0.15
[670]
Education
0.12
[3034]
0.70
(0.01)
[1785]
0.70
(0.01)
[1785]
0.25
(0.02)
[1785]
0.25
(0.02)
[1785]
0.19
(0.03)
[1785]
0.19
(0.03)
[1785]
0.16
(0.05)
[481]
0.16
(0.05)
[481]
0.11
(0.05)
[427]
0.11
(0.05)
[427]
0.15
(0.02)
[1785]
0.15
(0.02)
[1785]
Note: The table displays Pearson correlation coefficients between
mate attributes. Column (I) reports the matching patterns observed
in the “keyword” matches in our online dating sample. Columns (II
a) and (II b) report attribute correlations in the predicted GaleShapley matches, based on the logit preference estimates. To
simulate the matches, we first drew the random utility terms and
constructed the preference orderings for each user. We then ran the
Gale-Shapley algorithm. We repeated this process 50 times to
account for randomness in the preference orderings, and report the
average and standard deviation of the attribute correlations across
these 50 repetitions. The figures in square brackets indicate the
median number of matches on which the correlations are based. For
some users we do not have looks or income information; therefore,
the matches involving these users are not included in the reported
attribute correlation numbers.
39
Table 6 – Rank Differences of “Keyword” and Simulated Match Outcomes
Mean
Men
Median
SD
Mean
Women
Median
SD
Rank Differences
Gale-Shapley vs. Actual Match
First Choice vs. Actual Match
First Choice vs. Gale-Shapley
75.2
114.4
39.2
37
76
25
121.6
122.1
45.8
66.2
96.9
30.7
35
65
19
100.0
100.6
36.1
Rank Diff. as % of Max Rank.
Gale-Shapley vs. Actual Match
First Choice vs. Actual Match
First Choice vs. Gale-Shapley
5.4
8.2
2.8
2.7
5.5
1.8
8.8
8.8
3.3
4.4
6.5
2.1
2.3
4.3
1.3
6.6
6.7
2.5
Observed Keyword Matches
Simulated Data: Keyword Matches Generated Using Gale-Shapley Algorithm
Rank Differences
Gale-Shapley vs. Actual Match
First Choice vs. Actual Match
First Choice vs. Gale-Shapley
120.8
168.1
47.3
70
114
29
163.6
164.1
61.2
114.5
147.6
33.1
69
101
20
148.0
148.1
40.7
Rank Diff. as % of Max Rank.
Gale-Shapley vs. Actual Match
First Choice vs. Actual Match
First Choice vs. Gale-Shapley
8.7
12.1
3.4
5.0
8.2
2.1
11.8
11.8
4.4
7.6
9.8
2.2
4.6
6.8
1.4
9.8
9.8
2.8
Note. The table shows summary statistics on the preference rank difference across various actual and
potential match outcomes. All numbers are with respect to the site users who achieved at least one
“keyword match.” If a user achieved more than one match in the data, we record the highest rank
within the set of matched mates and the option of staying single. “First choice” denotes the outcome
where a user is matched to his or her top-ranked partner.
40
Table 7 – Sorting Patterns Among Dating, Cohabiting, and Married Couples
(a) Attribute Correlations
Dating
Couples
Cohabiting
Couples
Married
Couples
0.81
0.56
0.76
0.57
0.78
0.54
Age
Education
(b) Sorting Along Race/Ethnicity
(I) Dating Couples
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
83.3
6.2
7.1
3.0
0.4
2.9
94.8
2.3
0.0
0.0
% of Women
Hispanic
Asian
21.9
10.2
67.1
0.0
0.8
11.7
0.0
0.0
88.3
0.0
Population
51.5
34.2
11.3
1.5
1.5
(II) Cohabiting Couples
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
80.1
6.1
11.9
0.4
1.6
7.1
88.9
2.8
0.0
1.2
% of Women
Hispanic
Asian
8.9
5.8
83.4
1.1
0.9
13.4
7.3
26.5
52.8
0.0
Population
56.5
16.3
23.8
1.7
1.8
(III) Married Couples
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
92.4
1.0
4.7
1.0
0.8
4.2
88.5
3.9
1.1
2.4
% of Women
Hispanic
Asian
17.0
2.5
79.4
0.3
0.8
14.6
0.7
2.7
80.8
1.2
Population
72.2
8.0
14.6
3.6
1.5
Note: The tables are based on data from the National Survey of Family Growth
(NSFG), Cycle 6, which contains information on the relationship history of a sample of
7,643 women. The survey was conducted between 2002 and 2003, and the women
sampled were 15-44 years old. We classify women as “dating” if they were never
married, do not live with a partner, and, regarding their partner, indicate that they are
“going with him or going steady,” “going out with him once in a while,” are “just
friends,” or “had just met him.” We classify women as “cohabitating” if they were
never married and live together with a partner of the opposite sex.
41
Table 8 – Observed and Predicted Attribute Correlations in Marriages
Marriages
(I)
Age
0.85
Height
0.31 - 0.63
Weight (I), BMI (II-IV)
0.08 - 0.32
Looks Rating
0.34 - 0.54
Income
0.11
Imputed Income
0.17
Education
0.60
Gale-Shapley Predictions
With Random Random Utility
Utility
Terms = 0
(II)
(III)
0.73
(0.01)
[3,318]
0.28
(0.02)
[3,318]
0.02
(0.02)
[3,318]
0.14
(0.04)
[593]
0.14
(0.03)
[1,040]
0.37
(0.01)
[3,318]
0.19
(0.02)
[3,318]
0.89
[258]
0.20
[258]
0.06
[258]
0.72
[21]
0.42
[90]
0.55
(0.01)
[258]
0.55
[258]
Difference
Terms = 0
(IV)
Ethnicity
Terms = 0
(V)
0.09
(0.02)
[3,966]
0.01
(0.01)
[719]
-0.01
(0.02)
[3,966]
0.02
(0.04)
[3,966]
0.03
(0.03)
[1,180]
0.09
(0.01)
[3,966]
0.06
(0.02)
[3,966]
0.73
(0.01)
[3,479]
0.27
(0.01)
[3,479]
0.03
(0.02)
[3,479]
0.14
(0.04)
[614]
0.12
(0.03)
[1,091]
0.32
(0.01)
[3,479]
0.17
(0.02)
[3,479]
Note: The table displays Pearson correlation coefficients between mate attributes. The age, income, and education
numbers in column (I) are from the 2000 IPUMS 5% sample, restricted to Boston and San Diego, the metropolitan
areas for which we also have online dating data. We restricted the sample to men and women in the 18-55 age
range, and transformed the age and education variables to conform with the online dating data. The height and
weight entries in column (I) report the range of results obtained by anthropometric studies (no. obs. = 46-984)
surveyed by Spuhler (1968). The entries for looks correlations in from Hinsz (1989) and Stevens et al. (1990) who
construct the attractiveness of engaged and married couples whose engagement, marriage, and 25th anniversary
announcements were published in local newspapers.
Columns (II)-(IV) report attribute correlations in the predicted Gale-Shapley matches, based on the logit
preference estimates. To obtain these matches, we re-weighted our sample to match the joint distribution of age,
education, and ethnicity (separately for men and women in each market) observed in the IPUMS sample. To
simulate the matches, we first drew the random utility terms and constructed the preference orderings for each user.
We then ran the Gale-Shapley algorithm. We repeated this process 50 times to account for randomness in the
preference orderings, and report the average and standard deviation of attribute correlations across these 50
repetitions. The figures in square brackets indicate the median number of matches on which the correlations are
based. For some users we do not have looks or income information; therefore, the matches involving these users are
not included in the reported attribute correlation numbers. Column (II) reports attribute correlations that are
obtained in the men-optimal stable matches; the results for the women-optimal matches are almost identical. In
column (IIII), we set the unobserved utility terms to zero. In (IV), we set the difference or distance terms in the
utility specification, which account for preference heterogeneity, to zero.
42
Table 9 – Observed and Predicted Sorting Patterns Along Race/Ethnicity
(I) IPUMS
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
92.7
0.4
3.9
2.5
0.4
16.2
72.2
6.2
4.0
1.4
% of Men
Hispanic
16.0
1.0
81.4
1.3
0.4
Asian
10.4
0.4
2.3
86.5
0.4
Population
69.9
6.7
14.7
6.6
1.7
White
Black
94.6
1.0
2.9
1.0
0.5
7.9
87.5
3.3
0.7
0.7
White
Black
88.3
3.5
6.1
1.3
0.8
40.0
43.0
11.2
4.4
1.3
White
Black
98.0
0.0
0.0
0.0
2.0
0.0
94.2
0.0
5.8
0.0
White
Black
77.3
6.4
10.9
4.7
0.8
75.8
7.1
11.7
4.5
0.9
% of Women
Hispanic
Asian
20.6
2.1
75.8
1.1
0.5
23.6
2.3
2.1
71.6
0.5
Population
70.3
6.7
14.7
6.6
1.7
(II) Gale-Shapley, Random Utility Terms Included
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
78.4
3.5
10.8
6.3
0.9
36.1
44.7
14.5
3.5
1.2
% of Men
Hispanic
Asian
30.1
5.5
56.3
7.4
0.6
24.9
8.5
21.1
42.9
2.6
Population
70.8
6.0
15.5
7.0
0.8
% of Women
Hispanic
Asian
46.4
5.3
43.6
4.2
0.4
63.0
3.0
13.3
19.8
0.9
Population
71.8
6.1
15.3
6.0
0.7
(III) Gale-Shapley, Random Utility Terms = 0
Matched
with:
White
White
Black
Hispanic
Asian
Other
88.0
0.0
1.8
7.8
2.4
Black
0.0
100.0
0.0
0.0
0.0
% of Men
Hispanic
Asian
0.0
0.0
100.0
0.0
0.0
0.0
66.7
0.0
33.3
0.0
Population
70.8
6.0
15.5
7.0
0.8
% of Women
Hispanic
Asian
15.8
0.0
84.2
0.0
0.0
81.3
0.0
0.0
12.5
6.3
Population
71.8
6.1
15.3
6.0
0.7
(IV) Gale-Shapley, Race Terms = 0
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
68.3
8.1
15.5
7.2
1.0
64.1
8.6
18.5
7.8
1.0
% of Men
Hispanic
Asian
51.3
6.6
31.8
9.5
0.7
57.7
6.7
23.9
11.0
0.8
Population
70.8
6.0
15.5
7.0
0.8
% of Women
Hispanic
Asian
62.3
6.5
23.9
6.8
0.5
68.3
6.5
17.0
7.5
0.8
Population
71.8
6.1
15.3
6.0
0.7
Note: Panel (I) reports the inter-marriage ethnicity patterns observed in the 2001 IPUMS 5% sample, restricted to
Boston and San Diego. We also restricted the sample to men and women in the 18-55 age range. The columns in
the left half of the table display the percent of men in each ethnic group that are matched with women of the
ethnicities given in the rows. Panel (II) reports the intermarriage patterns that are generated using the (male-optimal)
Gale-Shapley procedure, based on the fixed effects logit preference estimates. The simulation results are averaged
across 50 draws of the unobservable utility term. Panel (III) reports the marriage patterns if the random utility term
is discarded from the preference specification, and panel (IV) the race preferences are set to zero.
43
Table 10 – Observed and Predicted Sorting Patterns Along Race/Ethnicity: Odds Ratios
(I) IPUMS
Matched
with:
White
Black
White
Black
Hispanic
Asian
Other
1.3
0.1
0.3
0.4
0.2
0.2
10.7
0.4
0.6
0.8
Odds Ratio
Hispanic
Asian
0.2
0.1
5.6
0.2
0.2
0.1
0.1
0.2
13.1
0.3
Population
69.9
6.7
14.7
6.6
1.7
White
Black
1.3
0.2
0.2
0.1
0.3
0.1
13.0
0.2
0.1
0.4
White
Black
1.2
0.6
0.4
0.2
1.1
0.6
7.0
0.7
0.7
1.8
White
Black
1.4
0.0
0.0
0.0
2.7
0.0
15.4
0.0
1.0
0.0
White
Black
1.1
1.0
0.7
0.8
1.1
1.1
1.2
0.8
0.8
1.2
% of Women
Hispanic
Asian
0.3
0.3
5.2
0.2
0.2
0.3
0.3
0.1
10.9
0.3
Population
70.3
6.7
14.7
6.6
1.7
(II) Gale-Shapley, Random Utility Terms Included
Matching
with:
White
Black
Hispanic
Asian
Other
White
Black
1.1
0.6
0.7
0.9
1.1
0.5
7.5
0.9
0.5
1.4
% of Men
Hispanic
Asian
0.4
0.9
3.6
1.1
0.8
0.4
1.4
1.4
6.2
3.2
Population
70.8
6.0
15.5
7.0
0.8
% of Women
Hispanic
Asian
0.6
0.9
2.9
0.7
0.6
0.9
0.5
0.9
3.3
1.2
Population
71.8
6.1
15.3
6.0
0.7
(III) Gale-Shapley, Random Utility Terms = 0
Matching
with:
White
Black
Hispanic
Asian
Other
White
Black
1.2
0.0
0.1
1.1
2.9
0.0
16.7
0.0
0.0
0.0
% of Men
Hispanic
Asian
0.0
0.0
6.5
0.0
0.0
0.0
11.1
0.0
4.8
0.0
Population
70.8
6.0
15.5
7.0
0.8
% of Women
Hispanic
Asian
0.2
0.0
5.5
0.0
0.0
1.1
0.0
0.0
2.1
8.4
Population
71.8
6.1
15.3
6.0
0.7
(IV) Gale-Shapley, Race Terms = 0
Matching
with:
White
Black
Hispanic
Asian
Other
White
Black
1.0
1.3
1.0
1.0
1.2
0.9
1.4
1.2
1.1
1.3
% of Men
Hispanic
Asian
0.7
1.1
2.1
1.4
0.9
0.8
1.1
1.5
1.6
0.9
Population
70.8
6.0
15.5
7.0
0.8
% of Women
Hispanic
Asian
0.9
1.1
1.6
1.1
0.6
68.3
6.5
17.0
7.5
0.8
Population
71.8
6.1
15.3
6.0
0.7
Note: Panel (I) reports the inter-marriage ethnicity patterns observed in the 2001 IPUMS 5% sample, restricted to
Boston and San Diego. We also restricted the sample to men and women in the 18-55 age range. The columns in
the left half of the table display the percent of men in each ethnic group that are matched with women of the
ethnicities given in the rows. Panel (II) reports the intermarriage patterns that are generated using the (male-optimal)
Gale-Shapley procedure, based on the fixed effects logit preference estimates. The simulation results are averaged
across 50 draws of the unobservable utility term. Panel (III) reports the marriage patterns if the random utility term
is discarded from the preference specification, and panel (IV) the race preferences are set to zero.
44
Men’s First−Contact Response to Women’s Photo Rating
Probability of Sending E−Mail
.25
Men’s Looks Rating
.2
1−20th Percentile
21−40th Percentile
41−60th Percentile
61−80th Percentile
81−100th Percentile
.15
.1
.05
91−100
81−90
71−80
61−70
51−60
41−50
31−40
21−30
11−20
1−10
0
Women’s Looks Rating: Percentiles
Women’s First−Contact Response to Men’s Photo Rating
.175
Women’s Looks Rating
Probability of Sending E−Mail
.15
1−20th Percentile
21−40th Percentile
41−60th Percentile
61−80th Percentile
81−100th Percentile
.125
.1
.075
.05
.025
91−100
81−90
71−80
61−70
51−60
41−50
31−40
21−30
11−20
1−10
0
Men’s Looks Rating: Percentiles
Figure 1: Evidence For/Against Strategic Behavior
Note: The graphs disply the results of OLS regressions where the dependent variable is an
indicator variable for whether a user sends a first-contact e-mail after browsing the profile
of a potential mate. The independent variables are indicators for the photo rating of the
user being browsed. The regressions control for browser fixed effects. The vertical axis plots
the estimated mean probability of sending a first-contact e-mail to a browsed profile, while
the horizontal axis indicates the photo rating of the browsed profile. The regressions were
estimated separately for different groups of suitors. The first group comprises users who
fall within percentiles 1-20 of the photo ratings distribution within their gender, etc. The
estimates shown are from a sample of users in the 30-39 age range.
45