Head/Tail Breaks: A New Classification Scheme for Data with a

Movement trajectories versus patterns

Head/Tail Breaks: A New
Classification Scheme for Data with a
Heavy-tailed Distribution


Bin Jiang
University of Gävle, Sweden
http://fromto.hig.se/~bjg/


Commenting on city growth and city
modeling, Paul Krugman had this insightful
statement:
“We have complex messy models, yet
reality is startlingly neat and simple.”
I think this statement applies to movement
data as well.
While focusing on trajectories, we need
very complicated models or algorithms,
While focusing on patterns, the movement
data is startlingly simple...as simple as “far
more small things than large ones.”
2
Human mobility patterns related work



The fourth paradigm (BIG data)
Jiang B., Yin J., and Zhao S. (2009), Characterizing
human mobility patterns in a large street network,
Physical Review E, 80, 021136.
Jiang B. (2009), Street hierarchies: a minority of
streets account for a majority of traffic flow,
International Journal of Geographical Information
Science, 23(8), 1033-1048.
Jiang B. and Jia T. (2011), Agent-based simulation of
human movement shaped by the underlying street
structure, International Journal of Geographical
Information Science, 25(1), 51-64.
The bigger the data, the more
likely heavy tailed
Scaling or heavy tailed distributions are ubiquitously observed
3
Classification

Natural breaks (Jenks 1963)
Data classification involves two basic issues:


4

Number of classes, and
Class intervals
5
Minimizes the variance within classes and
maximizes the variance between classes
6
A sample data about population densities
Histogram: Gaussian way of thinking
The highly improbable as an outlier under the Gaussian thinking
7
Rank-size: Scaling way of thinking
8
What is scaling?

Scaling = A recurring structure of far more small things
than large ones.
NO
YES
The highly improbable ranked number 1 under the scaling thinking
9
Scaling of geographic space (a hidden order)
Heavy tailed distributions
ln y   ln x
Jiang B., Liu X. and Jia T. (2012), Scaling of geographic space as a universal rule for map
generalization, Annals of AAG, Preprint: http://arxiv.org/abs/1102.1561.
11
12
Head/tail division rule

Head/tail movement
Given a variable x, if its values follow a heavy tailed
distribution, then the mean of x can divide all the values
into two parts: those above the mean in the head and
those below the mean in the tail (Jiang and Liu 2012).




AT&T
Britinica
National mapping agency
Governments/CNN
Centralized mindset, top-down
Jiang B. and Liu X. (2012), Scaling of geographic space from the perspective of
city and field blocks and using volunteered geographic information, International
Journal of Geographical Information Science, 26(2), 215-229.
Head




Skype
Wikipedia
OpenStreetMap
WikiLeaks(OpenLeaks)
Decentralized mindset, bottom-up
Looooooooong tail
13
Victory of the long tail again



14
Head/tail breaks
Obama is re-elected for the second term
Romney represents the top 1%
Obama represents the long long tail




Iteratively apply the head/tail division rule to
dataset with a heavy tailed distribution, untill the
data in head is no longer heavy tailed
distributed, or specifically, the number in the
head is no longer a minority (e.g., < 40%).
Both the number of classes and class intervals
are automatically or naturally determined.
For example, four classes: [min, m1), [m1, m2),
[m2, m3), [m3, max].
Head/tail breaks is more natural than natural
breaks (comes later as to why...).
16
Head/tail breaks: A first look
Scaling of USCities
17
18
Unrevealed scaling of USCities
minorities
Scaling of Swedish streets
19
20
21
22
Conclusion
Why more natural than natural breaks?






Reflects human binary thinking.
Captures the scaling pattern of the data.
Both the number of classes and class
intervals are automatically or naturally
determined.
Reflects figure/ground perception.
Nature and society are like that – ”far more
small things than large ones”.
Unique thing about the scheme is simplicity.




23
This paper has proposed a novel classification
scheme, the head/tail breaks, for data that are
heavy-tailed distributed.
The head/tail breaks scheme captures the
hierarchy of the data.
Head/tail breaks can be used for statistical
mapping, map generalization and cognitive
mapping.
We can unite cartographic mapping and
cognitive mapping under the same scaling law.
24
Thank you very much for your attention!

Questions and comments?
26