Benford Behavior of Dependent Random Variables

1
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Benford Behavior of
Dependent Random Variables
Taylor Corcoran - University of Arizona
[email protected]
Jaclyn Porfilio - Williams College
[email protected]
Co-Authors: Joseph Iafrate, Jirapat Samranvedhya
Advisor: Steven J. Miller
Williams College
YMC - Ohio State
Friday, August 9th 2013
References
2
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Summary
Benford’s Law
Stick Decomposition
Conjectures and Other Dependent Systems
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Consider the dataset of the populations of Spanish cities.
How often do you expect a leading digit of 1 to occur?
3
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Consider the dataset of the populations of Spanish cities.
How often do you expect a leading digit of 1 to occur?
References
5
Introduction
Fixed Proportion Decomposition
Additive Decomposition
First Digit Bias
Population of Spanish Cities
Other Work and Conclusions
References
6
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
First Digit Bias
Population of Spanish Cities
Twitter Followers per User
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
First Digit Bias
Population of Spanish Cities
File Sizes in Linux Source Tree
7
Twitter Followers per User
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
First Digit Bias
8
Population of Spanish Cities
Twitter Followers per User
File Sizes in Linux Source Tree
Fibonacci numbers
References
9
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Benford’s Law
Definition
A dataset is said to follow Benford’s Law (base b) if the
probability of observing a first digit of d is logb 1+d
d .
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Logarithms and Benford’s Law
P(leading digit d) =
log10 (d + 1) − log10 (d)
Benford’s law ↔ mantissa of
logarithms of data are uniformly
distributed
10
Other Work and Conclusions
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Applications of Benford’s Law
Fraud detection
Data integrity
Analyzing
round-off errors
11
Other Work and Conclusions
References
12
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Previous Work
Arithmetic operations on random variables.
Reliance on independence of random variables.
Becker, Greaves-Tunnell, Miller, Ronan, Strauch:
techniques to deal with dependencies.
Lemons: process of fragmenting a conserved quantity.
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Fixed Proportion Decomposition
13
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Fixed Proportion Decomposition Process
Decomposition Process
1
Consider a stick of length L.
Other Work and Conclusions
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Fixed Proportion Decomposition Process
Decomposition Process
15
1
Consider a stick of length L.
2
Uniformly choose a proportion p ∈ (0, 1).
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Fixed Proportion Decomposition Process
Decomposition Process
16
1
Consider a stick of length L.
2
Uniformly choose a proportion p ∈ (0, 1).
3
Break the stick into two pieces: lengths pL and (1 − p)L.
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Fixed Proportion Decomposition Process
Decomposition Process
1
Consider a stick of length L.
2
Uniformly choose a proportion p ∈ (0, 1).
3
Break the stick into two pieces: lengths pL and (1 − p)L.
4
Repeat N times (using the same proportion).
18
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Fixed Proportion Decomposition Process
L
(1 − p)L
pL
p2 L
p(1 − p)L
p(1 − p)L
(1 − p)2 L
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Fixed Proportion Conjecture
Joy Jing’s Conjecture
The above decomposition process results in stick lengths that
obey Benford’s Law as N → ∞ for any p ∈ (0, 1), p 6= 12 .
19
20
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Counterexample: p =
1
11 ,
Other Work and Conclusions
1−p =
10
11 .
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Benford Analysis
After Nth interation,
2N sticks
N + 1 distinct lengths.
Distinct lengths are given by
1−p
xj+1 =
xj , x0 = pN .
p
Frequency of xj =
21
N
j
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Benford Analysis
After Nth interation,
2N sticks
N + 1 distinct lengths.
Distinct lengths are given by
1−p
xj+1 =
xj , x0 = pN .
p
Frequency of xj =
Let
22
1−p
p
= 10y .
N
j
References
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
= 10y , y ∈ Q
Theorem
y
Let 1−p
p = 10 . If y ∈ Q, the described decomposition process
results in stick lengths that do not obey Benford’s Law.
23
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
= 10y , y ∈ Q
Theorem
y
Let 1−p
p = 10 . If y ∈ Q, the described decomposition process
results in stick lengths that do not obey Benford’s Law.
Let y = qr .
Leading digit of xj repeats every q indices. Thus,
X
X N P(xj+kq ) =
.
j + kq
k
24
k
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Series Multisection
Multisection Formula
∞
P
an x n ,
If f (x) =
n=−∞
∞
X
akq+j x kq+j
k =−∞
=
q−1
1 X −jp
ω f (ω p x)
q
p=0
where ω is the primitive qth root of unity e2πi/q .
Multisection of Binomial Coefficients
X N =
j + kq
k
25
q−1 2N X
πs N
π(N − 2j)s
cos
cos
.
q
q
q
s=0
References
26
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈ Q
X
k
1
P(xj+kq ) =
q
"
#!
π N
1 + E (q − 1) cos
q
References
27
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈ Q
X
k
1
P(xj+kq ) =
q
"
#!
π N
1 + E (q − 1) cos
q
Digit frequencies are multiples of q1 .
Benford frequencies are irrational, so not perfect Benford.
References
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
= 10y , y ∈
/ Q: Outline
Theorem
y
Let 1−p
/ Q, the described decomposition process
p = 10 . If y ∈
results in stick lengths that obey Benford’s Law.
28
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
= 10y , y ∈
/ Q: Outline
Theorem
y
Let 1−p
/ Q, the described decomposition process
p = 10 . If y ∈
results in stick lengths that obey Benford’s Law.
{xj } ∼ Bin(N, 12 )
mean:
N
2
standard deviation:
29
Outline of proof strategy:
√
N
2
1
Truncation
2
Break into intervals
- Roughly equal probability
- Equidistribution
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Truncation
For > 0, Chebyshev’s Inequality gives
1
1
N N +
P x − ≥ N 2
= P x − ≥ N N 2
2
2
1
≤
.
N 2
30
References
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Truncation
For > 0, Chebyshev’s Inequality gives
1
1
N N +
P x − ≥ N 2
= P x − ≥ N N 2
2
2
1
≤
.
N 2
So we can limit our analysis to
N standard deviations
Right half of binomial
31
References
32
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Intervals and Roughly Equal Probability
I` = {x` , x` + 1, . . . , x` + N δ − 1}.
Let x0 = N/2. It follows that x` = N/2 + `N δ .
N
1
N N
N − 2 +δ+ ,
x` − x
≤
x`
`+1
when δ < 1/2 − and ` ≤ N 1/2−δ+ .
References
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Equidistribution
Definition
{xn }∞
n=1 is equidistributed modulo 1 if for any [a, b] ⊂ [0, 1],
P (xn mod 1 ∈ [a, b]) → b − a:
#{n ≤ N : xn mod 1 ∈ [a, b]}
= b − a.
N
N→∞
lim
Recall: Leading digits of stick lengths are Benford if their
logarithms are equidistributed modulo 1.
33
References
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Equidistribution
Consider an interval I` where
I` = {x` + i : i ∈ {0, 1, . . . , N δ − 1}}
J` ⊂ {0, 1, . . . , N δ − 1} = {i : log(x` + i) mod 1 ∈ [a, b]}.
34
References
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Equidistribution
Consider an interval I` where
I` = {x` + i : i ∈ {0, 1, . . . , N δ − 1}}
J` ⊂ {0, 1, . . . , N δ − 1} = {i : log(x` + i) mod 1 ∈ [a, b]}.
If the irrationality exponent κ of y is finite,
1
|J` | = (b − a)N δ + O(N δ(1− κ +) )
35
References
36
Introduction
1−p
p
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
= 10y , y ∈
/ Q: Equidistribution
Using
equidistribution within intervals
roughly equal probability
we have
XX
`
i∈J`
1
1
f (x` + i) = (b − a) + O(N δ(− κ +) + N − 2 +δ+ ).
where the irrationality exponent κ of y is finite.
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Additive Decomposition
37
References
38
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Additive Decomposition Process
Decomposition Process
1
Consider a stick of length L.
2
Break the stick into two pieces, both of integer length.
3
Freeze a piece if its length exists in the specified stopping
sequence.
4
Repeat decomposition with pieces that have not been
frozen.
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Additive Decomposition Processes: Conjectures
Benford
Non-Benford
Stop at evens (proved)
Stop at squares
Stop at primes
Stop at powers of two
Stop at powers of three
Stop at Fibonnaci numbers
39
40
Introduction
Fixed Proportion Decomposition
Continuous Model
Additive Decomposition
Other Work and Conclusions
References
41
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Continuous Model
Approach:
Draw cut proportions from uniform distributon on (0, 1).
References
42
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Continuous Model
Approach:
Draw cut proportions from uniform distributon on (0, 1).
Label sticks according to number of proportions in their
product.
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Continuous Model Stick Labeling
Iteration 0:
43
L
Other Work and Conclusions
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Continuous Model Stick Labeling
Iteration 0:
L
Iteration 1:
p1 L
x1 = (1 − p1 )L
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Continuous Model Stick Labeling
45
Iteration 0:
L
Iteration 1:
p1 L
x1 = (1 − p1 )L
Iteration 2:
p2 p1 L
x2 = (1 − p2 )p1 L
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Continuous Model Stick Labeling
Iteration 0:
L
Iteration 1:
p1 L
x1 = (1 − p1 )L
Iteration 2:
p2 p1 L
x2 = (1 − p2 )p1 L
Iteration N: pN pN−1 · · · p1 L
46
xN = (1 − pN )pN−1 · · · p1 L
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Continuous Model
Approach:
Draw cut proportions from uniform distributon on (0, 1).
Label sticks according to number of proportions in their
product.
47
References
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Continuous Model
Approach:
Draw cut proportions from uniform distributon on (0, 1).
Label sticks according to number of proportions in their
product.
Do not consider first log N sticks as well as pairs of sticks
xi , xj where i and j differ by less than log N.
48
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Continuous Model
Approach:
Draw cut proportions from uniform distributon on (0, 1).
Label sticks according to number of proportions in their
product.
Do not consider first log N sticks as well as pairs of sticks
xi , xj where i and j differ by less than log N.
Show E[PN (s)] → log s and Var[PN (s)] → 0.
49
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Other Work and Conclusions
50
References
51
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Determinant Expansions
Theorem
Let A be an n × n matrix with i.i.d. entries aij ∼ X with density f .
The n! terms in the determinant expansion of A are Benford if
lim
n→∞
n
∞ Y
X
l=−∞ m=1
l6=0
2πil
Mf 1 −
=0
log 10
52
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Determinant Expansions
Theorem
Let A be an n × n matrix with i.i.d. entries aij ∼ X with density f .
The n! terms in the determinant expansion of A are Benford if
lim
n→∞
n
∞ Y
X
l=−∞ m=1
l6=0
2πil
Mf 1 −
=0
log 10
Important Technique: Quantify dependencies among terms.
53
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Conclusions
Dependent vs Independent Random Variables
Stick Decomposition
Fixed Proportion Decomposition
1−p
p
- y ∈ Q, not Benford
-y ∈
/ Q, Benford (finite vs infinite κ)
Additive Stick Decomposition
- Conjectures
- Continuous Model
Determinant Expansion
= 10y
References
54
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Acknowledgements
We would like to thank our advisor, Steven J. Miller, our
co-authors Joseph R. Iafrate, Jirapat Samranvedhya, B. G.
Opher as well as Frederick W. Strauch and Joy Jing.
We would also like to thank the Williams College SMALL
summer research program.
This research was funded by NSF grant DMS0850577.
References
55
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Irrationality Exponent
Let y ∈ R. Denote by A the set of positive numbers κ for which
p
1
0 ≤ y − ≤ n
q
q
has at most finitely many solutions for p, q ∈ Z.
The irrationality measure of y , denoted κ(y ), is inf κ.
κ∈A
If A is empty, κ(y ) = ∞
For nonempty
A,


1 if y is rational
κ(y ) = 2 if y is algebraic of degree > 1


≥ 2 if y is transcendental
56
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Proof of Multisection Formula I
As f (x) =
P∞
kX
−1
n=0 an x
n,
ω −jm f (ω j x) =
j=0
kX
−1
ω −jm
j=0
=
∞
X
n=0
∞
X
an (ω j x)n
n=0

an 
kX
−1
j=0

ω ( n − m)j  x n
References
57
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Proof of Multisection Formula II
If n − m = kl for some l ∈ Z, then using the fact that ω k = 1
gives
ω (n−m)j = ω klj = 1
which gives
Pk −1
j=0
kX
−1
j=0
ω (n−m)j = k If n − m 6= lk ,
ω (n−m)j =
1 − ω (n−m)k
=0
1 − ω n−m
References
58
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
κ Infinite
Let α ∈
/ Q. It is well known that nα mod 1 is equidistributed.
For all [a, b] ⊂ [0, 1], given > 0, there exists M(, a, b, α) such
that
#{n ≤ N : nα mod 1 ∈ [a, b]} = (b − a)N + E (N)}
for all N ≥ M(, a, b, α).
Now let N be sufficiently large so that N δ is greater than
M(, a, b, α).
59
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
Quantifying Dependencies
Given a fixed Xi,n , the number of terms that share k elements is
n−k
n X n−k
(n − k − α)!(−1)α
k
α
α=0
Let Ki,j be the number of matrix entries shared by Xi,n and Xj,n .
Fixing Xi,n ,
P(Ki,j = k ) =
=
n−k
1 X (−1)α
k!
α!
α=0
1
n
1
+O
ek !
k (n − k )n!
60
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
Quantifying Dependencies
E[Ki,j ]
=
n−2
n−2
X
X
k
n
k
O
+
k (n − k )n!
ek !
k =0
k =0
→ 1.
Var Ki,j
=
n−2
n−2
X
X
k2
n
k2
+
O
−1
ek !
k n!(n − k )
k =0
k =0
→ 1
Using Chebyshev’s Inequality,
P(|Ki,j − 1| ≥ γ) ≤ 1/γ 2
References
61
Introduction
Fixed Proportion Decomposition
Additive Decomposition
Other Work and Conclusions
References
References
BGMRS T. Becker, A. Greaves-Tunnell, S. Miller, R. Ronan, and F. Strauch, Benford’s Law and Continuous
Dependent Random Variables. http://arxiv.org/abs/1111.0568.
BLD Benford’s Law Distribution Leading Digit. Digital image. ISACA. N.p., n.d. Web. 15 June 2013.
http://www.isaca.org/Journal/Past-Issues/2011/Volume-3/PublishingImages/
11v3-understanding-and.jpg.
C Chen, Hongwei. On the Summation of First Subseries in Closed Form. International Journal of
Mathematical Education in Science and Technology 41:4, 538-547.
DFD Distribution of First Digit. Digital image. Benford’s Law. DataGenetics, n.d. Web. 15 July 2013.
http://www.datagenetics.com/blog/march52012/.
JJ Jing, Joy. Benford’s Law and Stick Decomposition. Undergraduate Mathematics Thesis, Williams College.
Advisor: Steven Miller.
JKKKM D. Jang, J. U. Kang, A. Kruckman, J. Kudo and S. J. Miller, Chains of distributions, hierarchical Bayesian
models and Benford’s Law, to appear in the Journal of Algebra, Number Theory: Advances and
Applications.
KM Kontorovich, Alex V. and Steven J. Miller. Benford’s Law, Values of L-functions and the 3x + 1 Problem.
http://arxiv.org/pdf/math/0412003v2.pdf
MT Miller, Steven J. and Ramin Takloo-Bighash. “Needed Gaussian Integral." An Invitation to Modern Number
Theory. Princeton: Princeton UP, 2006. 222. Print. http://arxiv.org/pdf/1111.0568v1.pdf
S Singleton, Tommie W., Ph.D. Benford’s Law Test/Comparison. Digital image. ISACA. N.p., 2011. Web. 16
July 2013. http://www.isaca.org/Journal/Past-Issues/2011/Volume-3/Pages/
Understanding-and-Applying-Benfords-Law.aspx.