Experimental and Behavioral Economics for ECO

TILBURG UNIVERSITY
Experimental and
Behavioral Economics for
ECO
Guideline
Spring 2013
Content
Lecture Notes ......................................................................................................................................... 3
Lecture Notes 1: Economics as an Experimental Discipline ............................................................... 3
Lecture Notes 2: Designing and Conducting Economic Experiments ................................................ 5
Lecture Notes 3: Data Analysis Part 1 .............................................................................................. 10
Lecture Notes 4: Data Analysis Part 2 .............................................................................................. 15
Lecture Slides....................................................................................................................................... 17
Lecture Slides Introduction Lecture EvdH ........................................................................................ 17
Lecture Slides Part 1.......................................................................................................................... 17
Lecture Slides Part 2.......................................................................................................................... 19
Lecture Slides Part 3b........................................................................................................................ 20
Lecture Slides Part 4a ........................................................................................................................ 23
Lecture Slides Part 5b........................................................................................................................ 24
Lecture Slides Part 6b........................................................................................................................ 26
Lecture Slides Part 8a ........................................................................................................................ 27
Lecture Slides Part 8b........................................................................................................................ 29
Lecture Slides Part 9a ........................................................................................................................ 30
Lecture Slides Part 9b........................................................................................................................ 32
Lecture Slides Part 10a ...................................................................................................................... 33
Lecture Slides Part 13b...................................................................................................................... 35
2
Lecture Notes
Lecture Notes 1: Economics as an Experimental Discipline
Economic science: the science of the use of scarce resources.
Economic experiment: controlled economic environment in which experimental subjects make
decisions that the experimenter records for the purpose of scientific analysis.

Controlled economic environment: individual economics agents together with an institution
through which the agents interact.

An experiment with proper (e.g. monetary) incentives is economic reality, as the decisions
have economic effects.

Data sources:
o
Experimental data: deliberately created under controlled conditions.
o
Happenstance data: by-product of ongoing uncontrolled processes.
o
Laboratory data: gathered in an artificial environment.
o
Field data: gathered in a naturally occurring environment.
Field/empirical data
Real and detailed
Not controlled, allows for limited analysis
Validity: OVB, measurement errors, skewed
coverage, etc.
Experimental data
Simplified economic reality
Control to isolate effects and to learn causalities
step by step
Validity:
 Internal: do the data permit causal
inferences?
 External: can we generalize our
inferences from lab to field? (problem)
3
Economic environment happens to be what it is;
no repetition and no deliberate manipulation.
Parallelism and inductive reasoning: if relevant
conditions remain unchanged, we expect to
observe the same causalities.
Induced-value theory: proper use of reward
medium allows an experimenter to induce prespecified characteristics in experimental subjects
and subjects’ innate characteristics become
largely irrelevant.
Advantages of economic experiments
Replicability
Control
Limitations of economic experiments
Internal validity
External validity
Descriptive experiments: means of testing the descriptive validity of assumptions about human
behavior on which theory is based (most psychologists).
Prescriptive/normative experiments: theory is of direct practical interest if conclusions provide good
approximations of actual behavior even when assumptions are not precisely satisfied (economics).
David and Holt, types of experiments:
1. Tests of behavioral hypotheses/theory falsification.
2. Theory stress tests/sensitivity tests.
3. Searching for empirical regularities.
Friedman and Sunder, purposes of experiments:
1. Generate data that might influence a specific decision (policymaking).
a. Roth: whispering in the ears of princes.
2. Discover empirical regularities in areas for which existing theory has little to say.
3. In case of competing theories with differing predictions experiments may help map the range
of applicability of each theory.
4. Test theory for robustness (are there conditions under which theory can account for the data?).
5. Measure individual characteristics in a population.
6. Simulate natural economic processes (rare).
7. Study new situations in the lab before introducing them in the field (in particular newly
invented institutions or situations that are too complicated to be analyzed theoretically).
8. Pedagogical purposes.
Induced-value theory: create a simplified economic reality with a reward medium, which is used to
induce specific preferences of subjects. Conditions to achieve control over agents’ characteristics:
4

Monotonicity: more reward is better than less (no satiation).

Salience: reward depends on subject’s actions as defined by institutional rules that (s)he
understands (no flat payoff).

Dominance: changes in subject’s utility from the experiment become predominantly from the
reward medium and not from other elements in the experimental procedure.
Experimental economics
Focus on behavior in specific institutions (e.g.
markets) that tightly constrain behavior
Theory-based
Salient reward
Hardly any deception
Experimental psychology
Focus on behavior in the absence of institutions
Based on process of actions
Reward not salient (none, or flat fee)
Deception or manipulation of subjects
(generating more data in an easy way, but
spoiling the subject pool)
History of experimental economics:

Market experiments (1945-1965).

Game experiments (1950-1970).

Individual-choice experiments (1950-1970).

Mainstream economics (1980- ).

Nobel prizes in Economic Science:
o
2002: Vernon Smith and Daniel Kahneman.
o
2009: Elinor Ostrom.
o
2012: Al Roth (with Lloyd Shaply)
Lecture Notes 2: Designing and Conducting Economic Experiments
Observations:

Casual vs. statistical observation:
o
Any relevant information recorded during the experiment is a casual observation, but
only those observations that can be used for statistical tests are statistical observations.
o
Scientific knowledge is based on hard facts (statistical observations), soft facts (casual
observations) can be useful for inspiring new theories and research opportunities.

Data descriptors: single vs. compound observations: instead of using observation of single
decisions (e.g. buyer 3 bid 100), groups of (non-independent) single decisions can be
aggregated to one compound observation (e.g. average bid of buyer 3).
5
Some definitions:

Trial: logically or organizationally indivisible set of decisions (e.g. one market period in a
double auction market), also called run, period or round.

Cohort: all subjects that interact during a session, also called independent subject group.

Session: all trials that are conducted with the same group of subjects at one occasion (e.g. the
Thursday afternoon session).

Treatment: one or more sessions with an identical economic environment, also called block.

Experiment: collection of sessions of one or more related treatments.

Experiment design: a specification of sessions in one or more treatments to evaluate the
propositions of interest. It specifies how variables are controlled within and across a block of
trials.
Independent requirement:

A stochastic variable X is independent of a stochastic variable Y if the conditional distribution
of X does not vary with the value of Y.

Procedural independence: observations are assumed to be independent, because they are
generated by subjects that could not interact due to the experimental procedure.

Hypothesized independence: observations are assumed to be independent, even though
subjects could have interacted (or were forced to), given the experimental procedure. It is only
justified if it is statistically tested.
Uniformity requirement:

Comparison of observations is only feasible if ceteris paribus holds (i.e. all parameters not
explicitly controlled for are held constant).

Uniformity requirement for statistically independent observations: have to be identically and
independently distributed (i.i.d.).

Uniformity may be assumed if all observations are generated by the same experimental
procedure and the same process of recruitment.

Systematic biases should be avoided by using standardization if control is possible
(deterministic uniformity) and randomization if control is not possible (stochastic uniformity).
o
Subject recruiting processes should be standardized (only one type of advertisement is
used, identical pre-experimental information is given to all recruits, etc.).
o
The random assignment of subjects to treatments is necessary to ensure i.i.d. over the
personal characteristics of the subjects.
6
Focus variable: the variable that we want to observe and record. In a market experiment, this is e.g.
bids, asks, contract prices.
Nuisance variable: the variable that we do not want to observe, but must record in case they affect the
results. In a market experiment, this is e.g. sex, race, education.
Treatment variables: parameters of the economic environment that are set at specific values in order
to test treatment differences. Variables controlled at two or more levels are treatment variables.’
Confounding effects: should be avoided by using controlled variation: treatment variables should be
varied one by one, otherwise observed differences cannot be attributed to a specific treatment variable.
Control by randomization – between-subject design:

Completely randomized design: subjects are randomly designed to different treatments. Any
effect of the nuisance variables on the results is cancelled out if there are enough subjects.

Factorial design: the experimenter makes sure that there are the same number (or minimum
number) of observations in each treatment.

Fractional factorial design: runs a (balanced) subset of the factorial design.

Parameter variation design: sub-form of the factorial design. Many levels of a treatment
variable are systematically used, but only a very small number of independent observations
per level are gathered. All independent observations are used in the correlation analysis.
Control by blocking – within-subject design:

Simple blocking design (AB): each subject is his/her own control (block), since every subject
takes part in all treatments. Because there is an observation for every subject in every
treatment, the nuisance variables can be cancelled out by looking at the difference between
each subject’s decisions. Also called before-and-after design.

Crossover design (sequential: ABA): by crossing over, the effect that experimental sequence
of actions may induce is controlled. If subjects are learning from one round to another, then
there will be an order effect on the switch from one treatment to another. To control for this,
the original treatment is run a second time after the second treatment.

Dual trial design (simultaneous: A&B): subjects do not participate in both treatments
sequentially, but make more decisions at once (separate decision sheet).

Enhanced within subject design (combining: AB and BA): subjects are randomly assigned to
one experimental sequence (part BA, part AB). The data is checked within each group and
across the groups.
7
Type of elicited decisions:

Spontaneous decisions: subjects decide on the spot during the experiment, making one
decision at a time as the experiment proceeds.

Heuristics: subjects provide a decision rule for some number of future decisions, possibly even
for slightly varying environments.

Strategies: subjects provide a complete set of decision rules for all possible states.
Outcome data: all final decisions and the states of the world that are actually reached.
Process data: outcome data plus data on the path of decision making as well as on states of the world
that were not reached.
Process data collection:

Think aloud studies: express mental processing of the decision task out loud. These protocols
are examined with the goal of understanding the decision process.

Mouse lab studies: subjects collect their information and input their decisions on a computer
that records mouse movements. The sequence in which the information is collected is
analyzed.

Videotaped group discussions: subjects are put into groups (often of three) in which they can
freely communicate and reach an agreement on the mutual group decision. The earnings are
equally split. The discussions are videotaped and then typed into discussion protocols, which
are analyzed with the goal of understanding individual and group decision processes.
Monetary incentives:

Show-up fees: flat fees in order to motivate subjects to participate. This is often used in
experiments in which subjects may incur high losses.

Monotonicity, salience, dominance.

Best alternative payoff or opportunity cost approach: subjects are paid an amount that is close
to the average pay they could expect for the same amount of time on a typical alternative job.
Experiments in which subjects are paid much more than this benchmark are called high-stake
experiments.

Bankruptcy problems: subjects that go bankrupt during the experiment do not face a real
sanction, so the control over their actions is lost and they will take extreme risks.
Non-monetary incentives:
8

Grades.

Peer comparisons: e.g. if the names of the three subjects with the highest round payoffs are
announced in each round, this may spur their peers to try harder.
Experimental subjects:

Choosing the number of subjects:
o
Check the economic design requirements (e.g. market experiment: number of subjects
needed depends on the number of traders in the market examined).


o
Check the statistical testing requirements.
o
Budget.
Choosing the type of subjects:
o
Ideal: fully randomized population.
o
Normal: students.
o
Special cases: specific subject pools.

Experience of subjects can be a treatment variable.

Subject population can be a treatment variable.
Recruiting subjects:
o
Randomized population sample: expensive, seldom practical, ideal.
o
Posters and flyers: cheap, practical, randomization depends on location/spread.
o
Direct access to special groups: necessary if subject population is a treatment variable,
within group randomization should be employed.
o
Via the internet/database (TiU).
Experimental procedure:
1. Instructions
a. Welcome statement.
b. Exact description of the economic environment incl. possible states, actions,
information feedback and rules determining payoffs.
c. Examples and quizzes: exactness vs. suggestiveness.
i. Exactness: examples very specific to the experimental environment.
ii. Suggestiveness: subjects often anchor on numbers or decisions that are
presented to them.
d. Comprehensiveness vs. comprehensibility:
i. Comprehensive instructions: all subjects receive all information on the
experimental setting in exactly the same way.
ii. Comprehensibility: longer instructions are more likely to confuse or bore.
9
e. Typically instructions are read aloud to establish common knowledge.
2. Running pilot experiments to enhance the experimental design.
3. Debriefing: talk to subjects about their experiences after participation.
4. Payment:
a. Anonymous: spare subjects from possible peer pressure or envy.
b. Knowing that the payoff will be revealed might affect decision behavior.
5. Bailout: if subjects may go bankrupt in the experiment, it is crucial to have a bailout plan that
specifies how to deal with this problem.
6. Other aspects (extra subjects, duration, human subjects committees, subject history, etc.).
Lecture Notes 3: Data Analysis Part 1
Eyeballing raw data:



Scatter plots:
o
Method: plot observations vs. an experimental parameter in an x-y diagram.
o
Goal: visual impression of the observation-to-parameter relationship.
Histograms/frequency distributions
o
Method: plot frequencies of observations vs. intervals or categories.
o
Goal: visual impression of the frequency relationships.
Cumulative distributions
o
Method: plot cumulative frequencies of observations vs. intervals or categories of an
ordered experimental parameters.
o

Goal: visual impression of observed distribution.
Time series
o
Method: plot observations vs. time or decision periods.
o
Goal: visual impression of development over time.
Eyeballing data descriptors:


Frequently used data descriptors:
o
Location: mean, median, mode.
o
Variation: variance, standard deviation, min-max-range.
o
Correlation: (rank) correlation coefficients.
o
Benchmarking: squared or absolute deviation from theoretical prediction.
Advantages of data descriptors:
10
o


Data point reduction (smoothing), easy discovery of effects.
Disadvantages of data descriptors:
o
Obscuring of variation may visualize non-significant effects.
o
Variation among groups may become less clear.
Appropriate use of descriptors:
o
Do not over-aggregate (stay on the level of independent observations).
o
Visualize variation whenever possible (e.g. use means together with min, max, 90%
intervals).
o
Calculate various descriptors, mention discrepancies in results (e.g. mean vs median).
Hypothesis testing:

Null hypothesis vs. alternative hypothesis:
o
Null hypothesis H0: corresponds to the absence of a regularity.
o
Alternative hypothesis H1: corresponds to a regularity that is suggested by a wellfounded conjecture.


Two types of errors:
o
Type 1 error (α-error): rejection of null hypothesis H0 although it is true.
o
Type 2 error (β-error): failure to reject H0 although it is not true.
o
Statistical tests generally focus on type 1 errors.
Significance level:
o
A significance level α is an upper bound for the error probability p under the null
hypothesis H0 (so, given H0 the probability p is calculated that the observed outcome
can occur, a small p-value indicates that the observed outcome is very unlikely).
o
For
, H0 is rejected in favor of H1.

H1 us a statistically ‘better’ hypothesis but it does not mean it is the best
possible explanation.
o
For
o
The smaller the significance level α, the stronger the statistical support. Typically used
, H0 is not rejected but it does not follow that H0 can be accepted.
values are 0.01, 0.02, 0.05 (and sometimes 0.10).

One-tailed vs. two-tailed tests:
o
One-tailed: the alternative hypothesis H1 contains the direction of the effect.
o
Two-tailed: the alternative hypothesis H1 does not contain the direction of the effect.
o
A one-tailed test allows rejection of H0 at an increased level of confidence relative to
the two-tailed test.
11
Choosing an appropriate test:

Levels of measurement:
o
o
Nominal: classified observations.

Example: A or B.

Descriptors: mode, frequency counts.

Tests: binomial test, Fisher’s exact test, χ²-test.
Ordinal: ranked observations.

Example: low, medium or high.

Descriptors: median, percentiles, rank correlation coefficients.

Tests: sign test, median test, signed-rank test, Mann-Whitney U test,
Kolmogorov-Smirnov test.
o
Interval (cardinal): ranked observations with a measure of distance.

Example: all kinds of numerical measurements.

Descriptors: mean, variance.

Tests: randomization test and under certain circumstances parametric tests
(e.g. t test, F test).
o
Interval level data are a subset of ordinal level data, which are a subset of nominal
level data. The smaller the set, the more information is available.


Types of data:
o
Binary.
o
Discrete.
o
Continuous.
Data structure:
o
One sample: single observation from a single subject group.
o
Two (or more) related samples: multiple observations from a single subject group,
control by blocking (within-subject design).
o
Two (or more) independent samples: multiple observations from several independent
subject groups, control by randomization (between-subject design).

Parametrics vs. non-parametrics:
o
Parametric tests:

Basic assumptions on the distribution of the data or the errors are made.

Distributional assumptions allow powerful tests that rely on the distribution
parameters.

Advantage: easy to construct confidence intervals.
12

Disadvantage: results are only valid if the distributional assumptions are valid.
Due to the law of large numbers, the assumptions are acceptable if many
independent observations are available.
o
Non-parametric tests:

Fewer basic assumptions on the distribution.

Advantages:

Results are valid no matter what the underlying distributions were.

Some non-parametric test may be applied appropriately to ordinal
data and others to data in a nominal or categorical scale.

Disadvantage: quantification of observed significant differences is generally
not possible.
Non-parametric tests for one sample:

Binomial test (binary data):
o
Most basic for a single sample of n nominal level observations that can be portioned
in two classes containing x and n-x observations correspondingly. The classes are
assumed to be with relative frequencies of q and 1-q in the population.
o
Inference can be of two types:

Given q and 1-q are correct, is it reasonable to believe that the process of
selection leading to the observed x and n-x was truly random?

Given the selection process was truly random, is it reasonable to believe the
true distribution in the population is given by q and 1-q?
o
Binominal test returns the probability p with which realizations can be observed that
are as extreme or even more extreme than the current observation:
(

o

)
∑ ( )
(
)

( )

Where x=number of observations in A, n=number of observations in sample.
(
)
Check for significance:

: the test is significant at level α.

: the test is not significant.
The χ² goodness-of-fit test:
o
Assess the degree of correspondence between the observed and expected observations
in each category, based on the null hypothesis (goodness-to-fit).
13
o
H0 states the proportion of observations falling in each of the categories, which we can
use to deduce the expected frequencies. The test gives the probability that the
observed frequencies could have been sampled from a population given expectations.
∑
o

(
)
Where Oi = observed number of cases in the ith category, Ei = expected
number of cases in the ith category, k = number of categories.
o
The sampling distribution of X² under H0 follows the χ² distribution with df=k-1. To
obtain p, look at the table.
o
If the probability p of occurrence under H0 of the obtained X² for df=k-1 is smaller
than α  reject H0.
o
Limitations:

When k = 2 (two categories), each expected frequency should be at least 5.

When k > 2, the test should not be used if more than 20% of the expected
frequencies are less than 5 or when any expected frequency is less than 1.
Non-parametric test for two dependent (related) samples:

Sign test:
o
Application of the binomial test for the case of two dependent variables, with n paired
observations.
o
Each pair of observations is put into one of the classes:

+ class (before measurement > after measurement)

0 class (before = after)

- class (before < after)
o
0 class observations are dropped and n is reduced correspondingly.
o
We assume + and – are equally likely (q=0.5) and apply a binomial test.
o
If significant: reasonable to believe that before measurements are lower (higher) than
after measurements.

Wilcoxon matched-pairs signed-ranks test:
o
Extension of the sign-test that uses both signs and the relative magnitude of the
differences (ranks).
o
Test construction:


, where
= ranks of
and
are paired observations (e.g. before and after).
(where ranks of tied differences are averaged and 1 is the
lowest difference).

= signed rank of

= sum of all positive
(where
if
and
if
).
’s.
14

= absolute sum of all negative


o
’s.
.
Compare T to the critical value for the chosen α.
This test is more efficient that the sign test, since it uses both the signs and ranks the
differences.
Lecture Notes 4: Data Analysis Part 2
Non-parametric tests for two independent samples:

Fisher’s exact test:
o
A test for two independent samples that fall into two distinct classes.
o
2x2 contingency table:
o
Class 1
Class 2
Sample I
A
B
A+B
Sample II
C
D
C+D
A+C
B+D
N=A+B+C+D
The greater the difference between the samples, the higher the occupation is in one of
the diagonals (i.e. large A+D or B+C).
o
Probability p for observing exactly such a distribution of frequencies:
(
o
) (
) (
) (
)
If no cell frequency is 0, there are ‘more extreme’ distributions to be considered: the
probability of the observed and all of the more extreme distributions should be added
up to p. So, make the distribution more extreme and add those p’s until one of the
cells equals 0.
o
Significance is checked by comparing total p to the chosen level of α.
o
If significant, this is statistical evidence for the difference between the two samples
concerning the distribution over the two classes.

Median test:
o
Application of the Fisher’s exact test to the case in which classes are constructed by
comparison of each observation to the median of all observation. So, data must be at
least at the ordinary level.
o
2x2 contingency table:
> median
< median
Sample I
A
B
A+B
Sample II
C
D
C+D
15
A+C
B+D
N=A+B+C+D
o
Calculate the median, dispose of every observation that equals the median.
o
Add the extreme options and calculate p.
o
If significant, then it is reasonable to believe that the two samples differ in the location
of the observed (tested) variable.

Mann-Whitney U test:
o
To test differences in two independent samples. Most widely used non-parametric test,
and the non-parametric alternative to Student’s t-test.
o
o
Two independent samples with sizes n1 and n2, with n1 ≤ n2

Sample 1 has observations
.

Sample 2 has observations
.
Rank all
observations in an increasing order (1 is lowest). If the
observations from both samples are similar, then the average ranks of the two samples
should have similar values. The test statistic U considered this.
o
= sum of ranks of sample 1.
(
)
.
o
= sum of ranks of sample 2.
(
)
.
for given α can be checked in tables.
o
Critical values of
o
If observed value U < critical value, reject H0 in favor of H1.
o
If significant, then it is reasonable to believe that the two samples differ in size of the
observed (tested) variable.
o
The Mann-Whitney U test is more efficient than the median test, since it uses both
whether observations are smaller/larger than the median and the ranks of the
observations.

The χ² test for two independent variables:
o
Data consist of frequencies in discrete categories. We test whether the groups differ
w.r.t. the relative frequency with which group members fall into several categories.
o
Contingency table: columns represent groups, rows represent categories of the
measured variable. The observed frequency of the ith category for the jth group is
o
.
Null hypothesis: groups are sampled from the same population.
∑
o

∑
Where
(
)
.
= observed number of cases in the ith row of the jth column,
=
expected number of cases in the ith row of the jth column, r = number of
rows, c = number of columns.
o
Sampling distribution of X² under H0 follows the χ² distribution with df = (r-1)(c-1).
16
o
Under the assumption of independence, the expected frequency of observation in each
cell should be proportional to the distribution of row and column totals.
o
If the probability p of occurrence under H0 of the obtained X² for df = (r-1)(c-1) is
smaller than the chosen value of α  reject H0.
RANK CORRELATION ANALYSIS
Lecture Slides
Lecture Slides Introduction Lecture EvdH
Goal of empirical economics: one wants to draw inferences, or make predictions about new policy
measures and examine the effects of treatments.
Proper counterfactual is important!
Notation:

: outcome with treatment.

: outcome without treatment.


.
Treatment effect for person i:
.
Controlled methods: lab experiments. This is a method of creating the counterfactual, since a control
group is constructed via randomization (can be used to create variation among participants). This
avoids confounding effects.
Lecture Slides Part 1
Economic experiment: controlled economic environment in which experimental subjects make
decisions that the experimenter records for the purpose of scientific analysis.
Controlled economic environment: individual economic agents together with an institution through
which the agents interact.
A ‘good’ experiment is economic reality, because the decisions have economic effects.
17
Data sources:

Experimental data (deliberately, controlled conditions) vs. happenstance data (by-product,
uncontrolled processes).

Laboratory data (artificial environment) vs. field data (natural environment).
Six criteria that define the field context of an experiment:
1. Nature of the subject pool
2. Nature of the information that the subjects bring to the task
3. Nature of the commodity
4. Nature of the task or trading rules applied
5. Nature of the stakes
6. Nature of the environment that the subject operates in
A field experiment bridge (Harisson and List, 2004):

Conventional Lab Experiment (Lab): standard subject pool of students, abstract framing,
imposed set of rules.
o

Artefactual Field Experiment (AFE): same as Lab, but with a non-standard subject pool.
o

e.g. measure cooperation among fisherman, students in the lab, abstract framing.
e.g. measure cooperation among fisherman, fishermen in the lab, abstract framing.
Framed Field Experiment (FFE): same as AFE, but with field context in either the commodity,
task, or information set that subjects can use.
o

e.g. measure cooperation among fisherman, fishermen at the pond, catching fish.
Natural Field Experiment (NFE): same as FFE, but where the environment is one where the
subjects naturally undertake these tasks and do not know they are in an experiment.
o
e.g. experiment in charitable fundraising, letters asking to contribute to charity, each
dollar matched with 1 dollar/0.50 dollar.
Field Data
Real and detailed
Not controlled, limited analysis
Validity, sometimes limited
Given environment, no repetition, no deliberate
manipulation
Experimental Data
Simplified economic reality
Control to isolate effects and to learn causalities
step by step
Internal and external validity questionable
Induced-value theory
Parallelism and inductive reasoning
18
Vernon Smith, parallelism precept: propositions about the behavior of individuals and the
performance of institutions that have been tested in laboratory microeconomics apply also to nonlaboratory economies where similar ceteris paribus conditions hold  presumption of external validity.
Advantages of economic experiments
Replicability
Control
Limitations of economic experiments
Internal validity (population validity)
External validity (environmental validity)
Types of experiments:

Test of behavioral hypotheses/theory falsification:
Chamberlin’s results inspired Smith to look at trading rules.

Theory stress test/sensitivity test:
Oligopoly market with/without sunk costs.

Searching for empirical regularities.
Induced-value theory: a reward medium is used to induce specific preferences of subjects. Three
conditions must hold:

Monotonicity: more reward is better than less, no satiation.

Salience: rewards are explicitly and unambiguously connected to subject’s actions/decisions.

Dominance: changes in subject’s utility from the experiment come mainly from the reward
medium and other subjective costs or benefits are rendered negligible by comparison.
Lecture Slides Part 2
Independence requirement: a stochastic variable X is independent of stochastic variable Y, if the
conditional distribution of X does not vary with the value of Y.

Procedural independence: observations are assumed to be independent, because they are
generated by subjects that could not interact, due to the experimental procedure.

Hypothesized independence: observations are assumed to be independent, even though
subjects could have interacted given the experimental procedure. This is only justified if it is
statistically tested.
Uniformity requirement:

Comparison of observations is only feasible if the ceteris paribus assumption holds.

Uniformity may be assumed, if all observations are generated by the same experimental
procedure and process of recruitment.
19

Systematic biases should be avoided by using standardization of control if possible and
randomization if control is not possible.
Basic ways to control for the effect of nuisance variables:

Control by randomization: between-subject design. Each subject participates in one treatment.

Control by blocking: within-subject design. Each subject participates in more than one
treatment.
Lecture Slides Part 3b
Experiment 1: Investment decision

Measure how much risk subjects are willing to take.

Design:
o
Investment S
return = 1
o
Investment R
return = 0
with probability 50%
return = 2.5
with probability 50%
o



Three simultaneous decisions; one chosen at random for payment.
Dreber, Rand, Garcia, Wernerfelt, Lum, and Zeckhauser (2010:
o
105 males: 79.5%.
o
81 females: 48.0%.

Wilcoxon ranksum test Z = 5.91 (p<0.001).

T-test: t = 6.35 (p<0.001).
Dreber Hoffman (2007):
o
92 males: 68.8%.
o
54 females: 49.6%.

Wilcoxon ranksum test Z = 3.77 (p<0.001).

T-test: t = 3.79 (p<0.000).
Average over three decisions:
o
24 males: 65.8%.
o
14 females: 48.5%

Wilcoxon ranksum test Z = 2.11 (p=0.035).

T-test: t = 2.01 (p=0.052).
Factors affecting a person’s willingness to take risk:
20

Stakes.

Wealth and income.

Timing: sequential vs simultaneous.
Experiment 2: pair wise lottery choices.

Problem 1
Class
(KT)
A: 100% chance for $3000
55%
(80%)
B: 80% chance for $4000
45%
(20%)
Class
(KT)
73%
(65%)
27%
(35%)
20% chance for $0

Problem 2
A: 20% chance for $4000
80% chance for $0
B: 25% chance for $3000
75% chance for $0

Expected Utility Theory:
o
Method to value risky options:
( )


(
).
Take the option with the highest EU.
o
Problem 1: (
o
Problem 2:
o
( )
)
(
(
)
(
)
(
)
)
( ).
( )
(
( )
( )
)
(
(
( )
)
)
Inconsistent with Expected Utility Theory!
Prospect Theory (Kahneman and Tversky (1979)):

People do typically not weigh outcomes linearly with the corresponding probabilities.

Outcomes are weighed non-linearly in probabilities:
( ) ( )
( ) ( )

People are risk averse for positive outcomes, and risk seeking for negative outcomes.

Reflection effect: risk attitude (curvature of the utility function) is different for positive
outcomes (gains) than for negative outcomes (losses).

Properties that capture many of the descriptive failures of Expected Utility Theory:
o
Reference-dependence: changes rather than final states are the carriers of utility.
o
Loss aversion: losses loom larger than gains.
o
Diminishing sensitivity: risk aversion for gains, risk loving for losses.
21
o

Probability weighting: small probabilities are over weighted.
Prospect theory:
o
(
A prospect

( )
while

) is evaluated by:
( ) ( )
( )
(
) ( ).
(
)
(
) (
).
Value function v:

( )

( )


.
(
( )
).
for
( )
,
Often used form: ( )
{
for
(
.
)
Endowment effect:

Once a person comes to possess a good, its value increases.

Kahneman, Knetsch, Thaler 1991:
o
Randomly 22 subjects are given a coffee mug, the other 22 only look at the mug.
o
WTA is elicited for the first group, WTP for the second group.
o
Market clearing price is determined and the items are traded at that price.
o
Standard theory would predict:
o
What happened:
o
Prospect theory explanation:
(
 11 trades expected.
)
(

Buy if
(
)

Sell if
(
)

So,

Important for Coase theorem.
)  3 trades.


(
(
)
.
) .
.
22
Disposition effect:

People are reluctant to sell at a nominal loss.

Odean (1998):
o
When investors sell stocks, they are more prone to sell their winners than their losers.
Lecture Slides Part 4a
Intertemporal choice:

Many economic decisions involve an intertemporal tradeoff.

Exponential discounting model:
o
Intertemporal utility: (
o
Where
.
is the discount factor.


)
: delays make consequences count less.
You want to do actions with positive utility as early as possible, actions with
negative utility you want to postpone.
o
Time-consistency: intertemporal preferences depend on the delay (k) between two
periods, not how far in the future these periods are (s). If you prefer plan A over B
now, you should also prefer plan A over B as time passes.

Many people are time-inconsistent.
o
Whether they prefer A over B depends on when you ask them.
o
Many people realize this to some extent and try to commit themselves or tie their
hands.

Hyperbolic discounting (β-δ model):
o
(
)
o
is the standard discount factor,


(
).
is the present-bias.
: everything in the future counts less than the present.
Captures the tendency to want pleasant things now (immediate gratification)
and postpone unpleasant things (procrastination).
Probability judgments:

Many decisions are based on beliefs concerning the likelihood of uncertain events.

Determination of those beliefs:
o
Standard economics (rational choice): people adhere to the laws of statistics.
23
o
Psychology: people use heuristics (mental shortcuts, which speed up cognition, but
sometimes lead to mistakes).

Bayes’ Rule:
o
(
)
(
)
( )
( )
(
(
) ( )
) ( )
(
) (
)
.
Judgment heuristics:

Representativeness: infer from a (small) sample on the basis of how representative it is for the
parent population.

Implication of representativeness:
o
Base rate neglect: when assessing P(A|B), people focus on P(B|A) and neglect the
prior probabilities P(A)/P(B).
o
Gambler’s fallacy: HHHH is less likely than HTHT.
Lecture Slides Part 5b
Theory of demand and supply:

Normally, we do not observe supply and demand functions. So it hard to test using field data.

In the lab: assign costs to sellers and values to buyers  create S and D curves.

o
We can derive the equilibrium price, quantity and surplus of the theory.
o
To test the theory, we compare the observations to the corresponding predictions.
o
Pioneered by Vernon Smith.
Continuous Double Auction Market:
o
Market institution (rules of trade):

Market opens for a fixed period of time.

At any time, buyers and seller can send public offers and accept outstanding
offers to buy or sell. If accepted, a transaction occurs at that price.

o
Similar to the rules on financial markets .
Typical results:

Prices and quantities converge to near the competitive equilibrium with
repetitions.

Efficiency is usually above 95%.
Auctions:

Used to allocate goods in fixed supply, or in fixed demand.
24

Many possible formats: open (Dutch, English), closed (First-price, Second-price/Vickrey).

Revenue Equivalence Theorem (Vickrey, 1961): the four basic auction types all raise the same
amount of money, under certain assumptions (bidders maximize expected payoffs,
equilibrium).

Equilibrium bidding:
o
o

First-price Sealed-bid auction:

Bidders bid simultaneously.

Highest bidder wins, and pays highest bid.

Equilibrium bids:
. Generally,
.
Second-price Sealed-bid auction:

Bidders bid simultaneously.

Highest bidder wins, pays second-highest bid.

Equilibrium bids:
.
Experimental test of Revenue Equivalence Theorem:
o
Difficult with field data (values unobserved, value distributions may vary with auction
format). In the lab, we can induce values and keep distributions constant.]
o
There is persistent overbidding in sealed-bid auctions (relative to the theoretical
equilibrium) in both second-price and first-price auctions.

Risk aversion.

Bounded rationality.

Regret: what is best ex-ante may not turn out to be best ex-post.

Winner regret: could have won with a lower bid.

Loser regret: could have made a profit with a higher bid.
Duopoly experiments:


Cournot model (quantity competition):
o
Two firms, each sets quantity
o
Inverse demand function:
o
Production cost:
o
Profits:
o
Cournot equilibrium:
.
, with
.
.
(
)
(
, hence
) .
and
.
Bertrand model (price competition):
o
Two firms, each sets price
o
Demand function:
.
{
25

o
Production costs:
o
Profits:
o
Bertrand equilibrium:
.
(
) .
, hence
and
.
Bertrand vs. Cournot:
o
Bertrand market is less competitive than predicted:

o
Average observed prices are higher than equilibrium prices.
Cournot market is more competitive than predicted:

Average observed quantities are higher than equilibrium quantities.
Lecture Slides Part 6b
Game theory:


A game (in normal form):
o
2 players, i = 1, 2.
o
Strategies:
o
Payoffs:
.
(
).
Nash equilibrium:
o
A pair of strategies
such that:

(
)
(
) for all
.

(
)
(
) for all
.
o
No player can earn a higher profit by deviating from an equilibrium unilaterally.
o
Each player maximizes her payoffs given her beliefs, and beliefs are constant.
Learning in games:

Reinforcement learning:
o
Tendency to play a particular strategy increases with the payoff you received in the
past when playing that strategy.
o

Also displayed by animals.
Belief learning:
o
Update your beliefs about others on the basis of their past play.
o
Play a best response given these updated beliefs.
Guessing game (aka Beauty Contest Game):

Choose a number between 0 and 100, win when your guess is closest to 2/3 of the average.
26

Equilibrium: Iterated Elimination of Dominated Strategies:
o
The average cannot be larger than 100, so it cannot be optimal to choose a number
larger than 2/3 * 100 = 67.
o
If no-one chooses a number larger than 67, it cannot be optimal to choose a number
larger than 2/3 * 67 = 44.
o
Iterating this process of elimination eventually leads to the conclusion that it cannot be
optimal to choose a number larger than 0.

The equilibrium relies on the assumptions:
o
Each player is rational.
o
Each player believes the other players are rational.
o
Each player believes that the other players believe that other players are rational… etc.
Rationality is common knowledge.

Irrationality is a multiplier: if you believe others are irrational  adjust your strategy.
Conclusions:



Nash equilibrium predicts rather well:
o
In simple games.
o
In more complex games, after learning.
o
When there is a unique equilibrium.
o
When there is no clear conflict between rationality and efficiency or equity.
When there are multiple equilibria, coordination failure is frequent and selection depends on:
o
Riskiness,
o
Efficiency,
o
Focality of the different equilibria.
When an equilibrium is either very inefficient or very unequal, social preferences may start to
play a role.
Lecture Slides Part 8a
Social preferences:

In many economic settings, there are interdependencies and externalities (people are affected
by the actions taken by others).

Standard economic models: all people are always exclusively motivated by their own interest
and do not care about these effects on others.

Evidence: self-interest assumption is wrong, people have ‘social preferences’:
o
Private life: family, friends, neighbors.
27

Public life: work, leisure, travel, cyberspace.

Ultimatum game:
o
2 players divide a sum of 10, responder accepts or rejects.
o
Standard theory (rationality, self-interest:
o
o


Responder accepts any positive amount.

Proposer proposes 9.99 to himself and 0.01 to responder.
Observation:

Average offer percentage is 65% for the proposer and 35% for the responder.

Modal (most common) is 50-50 split.

A majority of offers giving the responder less than 20% is rejected.
Why offer positive amounts:

Fear of rejection.

Altruism/fairness.

Negative reciprocity: punish someone who has been (very) selfish.
Dictator game:
o
2 players divide a sum of 10, proposed division is implemented.
o
Standard theory:
o

Responder has to accept any offer.

Proposer proposes 10 to himself and 0 to responder.
Observation:

Altruism: 64% of the proposers give positive offers.

35% give 0, 31% give 40% or more.

Lower average offers in Dictator Game than in Ultimatum game, so fear of
rejection also plays a role.

Henrich et al (2001): In search of Homo Economicus: Behavioral Experiments in 15 SmallScale Countries.
o
Average UG offers 28-58% (compared to 44% in industrial societies).
o
Rejection rate of offers 0-40%. Variation is rejection rates of low offers (<20%) even
larger.

Trust game:
o
Test positive reciprocity (reward someone who has been kind).
o
Player C’s payoff: 10 – X + Y, player D’s payoff: 3X – Y.
o
Standard prediction: X = 0, Y = 0.
o
Results:

On average, player D returns a little bit less than he receives. The more he
receives, the more he returns.
28

Player C sends on average around 50% to D. Trust yields slightly negative
profits on average.

Player C gets highest expected profit by sending all his money.

On average, men trust more than women, but women reciprocate more.
Modeling social preferences:

Models of inequity aversion (Fehr and Schmidt, 1999; Bolton and Ockenfels, 2000).

Two components:
o
Self-centered motivation: person’s own payoff ( ).
o
Other-regarding motives, care for fairness: inequity aversion (
o
(
)
).
.

: dislike disadvantageous inequality (envy).

: dislike advantageous inequality (guilt).
Lecture Slides Part 8b
Volunteer dilemma game:

Symmetric Nash equilibria volunteer rate:

Probability that nobody volunteers: (

If group size increases:
o
)
(
)
)
.
.
Probability of volunteering drops.

o
(
Supported by data.
Probability that nobody volunteers increases.

Not supported by data.
Experimental methodology:

Design issues:
o
o
Anonymity:

Single-blind: one subject vs. other subjects.

Double-blind: subject vs. experimenter.
Incentives:

Hypothetical vs. real (actions speak louder than words).
29

Small vs. large.

Earned vs. unearned endowment.
o
Between-subject vs. within-subject.
o
Order effects.
o
One-shot vs. repeated interaction.
o
Strategy method: ask for complete strategy.

E.g. in UG not just ask for accept/reject in response to actual offer of proposer,
but for all possible offers of proposer.
o
External validity:

Are social preferences observed in the lab also relevant in the field?

Field experiments.
Isolating motives for rejection in UG:


Two possibilities:
o
Envy.
o
Revenge.
Measure envy:
o
Have a computer generate the offer randomly. Player B accepts or rejects.
o
Rejection can only arise from envy, not from revenge.
o
Compare which offers are rejected compared to the standard version of UG.
o
Strategy method:
o

Take players B before they see the offer they are going to get.

Have them indicate the minimum they would accept, and hold them to it.
People are more willing to take a smaller share when a computer proposed it then
when the other player did  revenge plays a role.
Lecture Slides Part 9a
Levels of measurement:


Nominal/categorical/classificatory scale:
o
Examples: car license plates, states f mind, A or B, red, green, blue.
o
Appropriate descriptors: mode, frequency counts.
o
Appropriate tests: binomial test, Fisher’s exact test, χ² test.
Ordinal/ranking scale: some relation between objects in scale.
o
Examples: socio-economic status, grades, low/medium/high.
30
o
Appropriate descriptors: median, percentiles, rank correlation coefficients.
o
Appropriate tests: sign test, median test, signed-rank test, Mann-Whitney U test,
Kolmogorov-Smirnov test.

Interval/cardinal scale: distances or differences between any two numbers on the scale have
meaning.
o
Examples: temperature.
o
Appropriate descriptors: mean, variance, correlations.
o
Appropriate tests: randomization tests and (under some circumstances) parametric
tests like t-test and F-test.

Ratio scale: all characteristics of interval scale, plus a true zero point as its origin.
o
Examples: weight.
Data structure:

One sample:
o


Single observation from a single subject group.
Two (or more) related samples:
o
Multiple observations from a single subject group.
o
Control by blocking (within-subject design).
Two (or more) independent samples:
o
Multiple observations from several independent subject groups.
o
Control by randomization (between-subject design).
Parametrics and non-parametrics:

Parametric tests:
o
Some basic assumptions on the distribution of the data or the errors are made.
o
These assumptions allow powerful tests that rely only on the distribution parameters.
o
Advantage: easy to construct confidence intervals.
o
Disadvantage: validity of assumptions  law of large numbers  assumptions are
acceptable if many independent observations are available.
o

Experimental data often appear to be highly non-normal.
Non-parametric tests:
o
Fewer/no basic assumptions on the distribution of the data or errors.
o
Also possible with small n.
o
Advantage:

Results are valid no matter what the underlying distribution were.

Also possible to use data in ordinal or nominal scale.
31
o
Disadvantage: quantification of observed significant differences is generally not
possible.
Non-parametric tests for one sample:”

Binomial test (nominal or categorical data):
o
Most basic test for a single sample of n nominal level observations that can be
portioned into classes A and B, containing x and n-x observations.
o
The classes are assumed to be represented with relative frequencies of q (for A) and
1-q (for B) in the population.
(
o
)

∑ ( ) (
.
x = number of observations in A, n = number of observations in sample.


)
: significant.
The χ² goodness-of-fit test (nominal or categorical data):
o
Test whether there is a significant difference between an observed number of
outcomes falling in each category and an expected number based on H0.
∑
o
(
∑
.

: observed number of cases in category i.

: expected number of cases in category i when H0 is true.


: number of categories.
Sampling distribution of X² follows the χ² distribution with df = k-1.

o
)
: significant, reject H0.
Limitations:

If there are only two categories, each expected frequency should be at least 5.

If there are more than 2 categories, the test should not be used if more than 20%
of the expected frequencies are less than 5 or when any expected frequency is
less than 1.

Wilcoxon Signed-Ranks test (interval or ratio data):
o
Non-parametric alternative to one sample t-test.
o
Assumption: underlying population is symmetrical.
Lecture Slides Part 9b
Non-parametric tests for two dependent (related) samples (within-subject design):

Sign test:
32
o
Application of the binomial test for the case of two dependent samples, with n paired
observations (e.g. before and after measurement, or high costs and low costs).
o
o
Three classes:

+: before measurement > after measurement.

0: before measurement = after measurement.

-: before measurement < after measurement.

0 class observations are dropped and n is reduced correspondingly.
Binomial test applied to +/- observations, with the assumption that plusses and
minuses are equally likely (i.e. q = 0.5).
o
If
, the test is significant and it is reasonable to believe that before
measurements are lower/higher than after measurements.

Wilcoxon matched-pairs signed-ranks test:
o
Uses both the signs and the relative magnitudes (ranks) of differences).
o
Test construction:
, where
ranks of
are paired observations
, where ranks of tied differences are averaged
signed rank of
o
and
, where
sum of all positive
’s
sum of all negative
’s
if
, and
if
.
Compare T to the critical value for the chosen α.
Lecture Slides Part 10a
Non-parametric tests for two independent samples (between-subject design):

Fisher exact test:
o
A test for two independent variables that fall into two distinct classes.
o
2x2 contingency table
o
The greater the difference between the samples, the higher the occupation in the
diagonals (A+D or B+C is very large).
33
(
)(
)(
)(
)
o
Probability p for exactly this distribution:
o
If no cell frequency is 0, more extreme distributions must be considered  compute
.
total p by adding up the probabilities of the observed and more extreme distributions.
o

: significant, reject H0.
Median test:
o
Application of Fisher’s exact test. Classes are constructed by comparison of each
observation to the median of all observations (of both independent samples).
o

2x2 contingency table
Mann-Whitney U test (Wilcoxon rank-sum test):
o
Two independent samples with sample sizes
and

Sample 1 has observations
.

Sample 2 has observations
.
(
).
o
Rank all
o
If the observations from both samples are similar, then the average ranks of the two
observations in an increasing order.
samples should have similar values.
o
Calculate:

sum of ranks of sample 1.

sum of ranks of sample 2.

(
)
.

(
)
.


.
o
Find critical values for U statistic for given α in tables.
o
If
, significant  reject H0.
The χ² test for two independent samples:
o
Data consists of frequencies in discrete categories. Test whether groups differ with
respect to the relative frequency with which group members fall into categories.
o
Arrange data into frequency/contingency table (columns groups, rows categories).
o
observed frequency of occurrence of the ith category for the jth group.
o
∑
o
∑
(
)
∑
∑
.
The sampling distribution of X² follows the χ² distribution with df = (r-1)(c-1).
34
o
Under the assumption of independence, the expected frequency of observation in each
cell should be proportional to the distribution of row and column totals.

Spearman’s Rank Correlation Coefficient:
o
Rank correlation analysis, remarks:

Used to assess the correlation between two matched variables.

Both variables may be experimental observation (or one may be control).

Non-parametric alternative to the parametric correlation analysis.

Only uses data on the ordinal level (compares only the ranks, not values).

Rank correlation coefficient (RCC) is +1 if variable x is perfectly
positively correlated to variable y.

RCC is -1 if x is perfectly negatively correlated to y.
o
n paired observations (either both measurements or one control/one measurement).
o
Observations
o
Rank the n observations of each variable independently in an increasing order and
and
calculate the squared rank differences
o
Let
be the ranks of
(

where
and
are matched.
.
and
the ranks of
) .
If matched observations are positively correlated:
is small, since each
should almost have the same rank as the corresponding

.
If matched observations are negatively correlated:
is large, since each
should have the mirrored rank of the corresponding
.
∑
o
o
, then
.
: significant  reject H0.
If
Lecture Slides Part 13b
Laboratory experiments:

Key features:
o
Control

o
Environment and institutions

Incentives, resources, information.

Rules of the game, market communication.
Observation
35



Choices, messages (even brain activity).

Prices, efficiency.
Main advantages:
o
o

Behavior and outcomes
Testing theory ‘on its own domain’:

Align environment with theoretical assumptions.

Compare behavior to theoretical predictions.
Controlled variation:

Ceteris paribus condition (comparative statics).

Examine new institutions (experimental test-bedding).
External validity:
o
o
Population-validity: students vs. ‘real people’.

Allow for learning.

Replications with managers, consumers, ….
Environmental-validity: experiments are simple and artificial.

Theories are often simple.

Principle of increasing complexity.

Replications in the field.
Field experiments:



Increase external validity:
o
Relevant target population of subjects.
o
Natural environment.
Maintain internal validity:
o
Control.
o
Observation.
Types of field experiments:
o
o
o
Artefactual field experiment:

Lab environment.

Target population of subjects.
Framed field experiment:

Natural environment.

Target population of subjects, who know they are in an experiment.
Natural field experiment;

Natural environment.

Target population of subjects, who do not know they are in an experiment.
36

Key element is random assignment to treatment and control, which allows the creation of a
sound counterfactual. This is a major problem with field data.
Practical problems:

Attrition: participants dropping out in the course of the experiment.
o

Problematic if the rate of attrition is correlated with the treatment vs. control.
Spill-over effects: the treatment group can affect the control group.
37