Slide 1 - University of Texas Inequality Project

Measuring Inequality
A practical workshop
On theory and technique
San Jose, Costa Rica
August 4 -5, 2004
Panel Session on:
Mechanics to Calculate the Theil
and
Use of Programs and Macros
by
James K. Galbraith and Enrique Garcilazo
The University of Texas Inequality Project
http://utip.gov.utexas.edu
Session 3
Outline
1.
2.
3.
4.
5.
6.
Two hierarchical level grouping
Data Collection
Between Group Theil
Within – Group Theil
Programs and Macros
Working with Output – Analysis
Two Level Hierarchical grouping
 Data typically given by geographical units
and several sectors within each unit
 
 n  Y

T    n i
 i 1   Yi
  i 1



 Yi
  ln 

n
i




Y

Y 
  n  Yi  k   Yip
i 1
        
n
   i 1  Y  p 1   Yi
n

i 
i 1
 
n
 Yip Yi   

  
  ln 

 nij ni   
 The between group component
 The within group component
Collecting data
 Variables:
– #1 Compensation earned – wages
– #2 Number of people earning – employees
 Two sets of data
– Equal number of observations for each data set
– Making sure there are no zeros
– Arrange data into a yearly matrix of columns and
rows (columns are geographical units, rows sectors)
Data Management
Year k
employment
a
1
2
3
4
b
compensation
c
a
b
c
1
2
3
4
 Sectors = {a, b, c}
 Geographical units = {1, 2, 3}
 Check:
– matching # observations between both matrixes
– non zero observations
Cleaning The Data
 Compare both data-bases and look for
missing observations
– count function in Excel
– Compare # of observations in rows and columns
 Take out observations equal to zero
 If the database is time series, make sure
that the number of industries included in
the analysis is consistent through time
Calculating The Between Theil
 The between component can be expressed by two
equivalent equations:
n
n
y

 yi 
TB    ni  ni    ln  
i 1 
i 1
 Yi 
 Yi 
 


Yi
n  Y 

  i 
TB   n
 ln 


n
i

1
   Yi 
 i
  i 1 

YY  


i 1

n

n

i 
i 1
 
n
– First equation – more intuitive
– Second equation – easier to calculate
Calculating The Between Theil
Year k
employment
a
b
c
compensation
sum
1
2
3
4
sum
– Add across rows and columns
a
1
2
3
4
sum
b
c
sum
Calculating The Between Theil
Year k
i
1
2
3
4
sum
employment
j
a
b
c
10 15
11
8
12 14
9 16
42 53
compensation
20
22
21
23
86
sum
45
41
47
48
181
a
b
c
1 40 65 85
2 30 66 88
3 35 67 90
4 33 60 91
sum 138 258 354
sum
190
184
192
184
750
  190  45 
 190 
  0.0048
TB1  
 * log  

 750 
  750  181 
TBa
  138  42 
 138 
  0.04269

 * log  

 750 
  750  181 
b/w i Theil
component
0.0048
0.0196
-0.0036
-0.0191
0.0016
b/w j Theil
component
-0.04269
0.055415
-0.00313
0.009601
4
TB  region   TBi
i 1
4
TB sec tor   TBj
j 1
Calculating The Between Theil
 Using natural logs instead of base 10
 Interpreting the between components
– Positive component means the group is above the mean
– Negative component means it is below the mean
 Make sure the sum of all components
(between Theil) is positive and inside a
reasonable range
 Between Theil used as estimate of total
Theil
Calculating The Within Theil
 Yi
Tw   
i 1  Y
n

 Tw


 Yip Yi   
 k   Yip 

Tw       ln 




n
n
p 1  Yi 


ij
i


 
 

Tw  unweighted within Theil
Tw  weighted within Theil
Calculating The Within Theil
1. First: Calculate the unweighted within Theil
in each group (among p individuals/groups)


Calculate the Theil by sector or by region
The unweighted Theil is the total Theil for the group in the second
hierarchical level

 k   Yip
Twi    

p 1  Yi

 
 Yip Yi   


  
  ln 
 nij ni   


Calculating The Within Theil
2. Second - multiply the unweighted within Theil
component by the income weight of each group
3. Third – take the sum of all the individual
components to obtain the weighted within Theil
 Yi
Tw   
i 1  Y
n

 Twi

Calculating The Within Theil
Year k
 .employment
compensation
a
b
c
sum
w/in Th
Unweigh.
Theil Weights comp
1
40
65
85
190
0.00044
0.2533
0.00011
41
2
30
66
88
184
0.08215
0.2453
0.02015
21
47
3
35
67
90
192
0.01629
0.256
0.00417
16
23
48
4
33
60
91
184
0.0005
0.2453
0.00012
53
86
181
sum
138
258
354
750
j
i
a
b
c
sum
1
10
15
20
45
2
11
8
22
3
12
14
4
9
sum
42
-0.0114 0.0089 0.0029
-0.0812 0.2184
-0.055
-0.0614 0.0552 0.0225
-0.008 -0.0072 0.0156
0.02456
  40  10 
 40 
  0.011
T 
 * log  

 190 
  190  45 
*
ia
T1*  (0.011)  .0089  .0029  0.00044
Ti*  Tia*  Tib*  Tic*
TW / in1  0.2533 * 0.00044  .00011
Output
 For a given year obtain:
– Total Theil
– Between group Theil
– Weighted within group Theil
 Interpreting the individual between Theil
components
 The unweighted within group Theil
component interpreted as the total Theil in
the 2nd hierarchical level
Programs and Macros
 Excel Spreadsheet
 Stata program
 SAS options
For more information:
The University of Texas Inequality Project
http://utip.gov.utexas.edu
Type “Inequality” into Google to find us on the Web