T Tutorial fo or applyin ng head/ta ail breaks s in ArcGI IS

T
Tutorial
fo
or applyin
ng head/ta
ail breakss in ArcGIIS
Sirui Wu
Departtment of Tecchnology andd Built Envirronment, Div
vision of Geoomatics
University oof Gävle, 801
1 76, Sweden
n
Email: w
wsr102209@
@163.com
Introdu
uction
Power laaws and lognormal distrributions are called heavy-tailed distrributions, whhich imply that
t
there
are far m
more small things
t
than larger ones. To better illustrate the underlying
u
sscaling patteern of far
more sm
mall things thhan large onees, a new claassification scheme
s
nameely head/tail breaks (Jian
ng, 2013)
has beenn developed. It divides things
t
arounnd an averag
ge, according
g to their geeometric, top
pological,
and/or ssemantic properties, into a few large things (in th
he head) and
d many smalll things (in the tail),
and recuursively conttinue for thee large thingss or those in
n the head, until
u
the notioon of far mo
ore small
things thhan large onees is violated
d. The appliccations of ussing head/tail breaks havve been found
d of vital
importannce in mappping, map geeneralizationn and percep
ption of beau
uty (Jiang 20014). This tu
utorial is
intendedd to provide a step by steep guide for applying heead/tail break
ks method inn ArcGIS. In
n order to
have a ccomprehensiive understaanding of us ing head/tail breaks method, conven
entional classsification
methodss, Jenks naturral breaks (Jenks, 1967), will also be conducted in
n this tutoriaal as a compaarison.
The rem
mainder of thhe tutorial is organized aas five parts: Creating yo
our own artifficial data; im
mporting
data intoo ArcGIS; Jeenks natural breaks; headd/tail breaks and visualized comparisoon. Please make
m
sure
you hadd already insttalled ArcGIS software aand Microsofft Excel is allso availablee. No specificc version
of softw
ware is requirred in this tu
utorial but thhis tutorial su
uggests that you can usee the ArcGIS
S 10.0 or
latest veersion.
Create data
In this tuutorial you are
a required to
t create youur own data that
t follows heavy-tail
h
diistribution. Rank
R
Size
distributtion, known as one of th
he typical heeavy-tail disstributions, is a very suittable distribution for
testing hhead/tail breeaks method.. Now, go thhrough follo
owing steps to learn how
w to create rank
r
size
distributtion.
ft Excel. Creaate value in column
c
A (naamely rank) with 1, 2,
1. Starrt a new blankk workbook in Microsoft
3, 4… until 10233, column B (named sizee) with 1/A (F
Figure 1).
Figure1: Create Rank and size in co
olumn A and column
c
B.
2. Addd two more columns,
c
nam
med x and yy, assigning some
s
random
m numbers to x and y. Note:
N
you
can use “=RAND
ND ()” functio
on in columnn C2 and cliick Enter to generate ranndom valuess (Fig 2).
1 Fill all columnss from column C2 to ccolumn C1024, i.e. creaate a random
m column th
hen drag
wnwards wheen the cursor turns into a pplus sign.
dow
Figuree 2: Use RAND
ND () function to create rand
dom values.
3. Fill all these fouur columns frrom row 2 too row 1024 and
a make sure there is noo gap left. When
W
you
b
.txt filee and paste all
a data from
m this Excel ffile to that fiile, saved
finissh above stepps, create a blank
as In
nputdata.txt.
t.
Importt data
1. Starrt ArcGIS andd import prev
vious data crreated in Miccrosoft Excel by clickingg the button Add
A data
2. Wheen Inputdataa.txt data has been impoorted successsfully, right click
c
Inputddata.txt > Dissplay XY
dataa. A new winndow will app
pear as follow
wing (Figuree 3).
Fig
gure 3: Set WG
GS1984 as inp
put coordinatee system.
3. Makke sure x and y are correspondding to X Field and Y Field, rrespectively. Choose
GCSS_WGS_1984 Geographiic Coordinatee System as input
i
coordin
nate system. Click OK >YES.
4. Righht click Inpu
utdata.txt Eveents > Data > Export Da
ata, named ass naturalbreeaks.shp.
5. Export one moree shpfile followed by stepp 4 and nameed as headtailbreaks.shpp.
pared these ttwo test dataa in ArcGIS. Following parts will guide
g
you
Now yoou have succcessfully prep
how to pprocess data with two diffferent classiffication meth
hods.
2 Jenks n
natural breeaks
1. Righht click naturalbreaks.sh
hp > Propertties > Select Symbology > Quantities.. Select Size as Value
(Figgure 4). Youu may noticee that naturaal Breaks (Jeenks) have been
b
automaatically seleccted as a
defaault and theree are five claasses. Then, cclick Apply > OK. F
Figure
4: Jenk
ks natural breaaks can be auttomatically calculated in ArrcGIS.
Head/taail breaks
Unlike JJenks naturall breaks that can be autom
matically claassified in ArrcGIS, head/ttail breaks need to be
manuallyy calculated. Now we neeed to calcul
ulate the meaan of given data
d m1 and repeat this step
s
until
there is nno long heavvy-tail distrib
bution.
1. Righht click headdtailbreaks.sh
hp > Open aattribute Tablle > Right cliick size coluumn in attribu
ute table >
Statistics. Take notes
n
for cou
unt, sum and mean inform
mation (Figurre 5) and clo se it.
Figure
F
5: Thee statistics of headtailbreaks
h
data.
2. Clicck Select by attributes
a
under the attribbute table > Type
T
“Size > 0.007339” as a Query under
u
the
SEL
LECT*FROM
M HeadTailBreaks WHER
RE (Figure 6). Click apply > Close. T
This Query helps
h
you
to seelect those daata whose sizze values aree greater than
n the first meean m1.
3 Figure 6: Usin
ng Select by aattributes funcction to select required dataa.
3. Oncce you have done
d
this steep, a new tabble will be created.
c
This new table oonly containss features
whoose size valuues are greateer than the ffirst mean m1.
m You can check this nnew table by
y clicking
Show
w selected reecords button
n (Figure 7). It can be seeen below thaat there are 1336 points wh
hich have
beenn selected as a new data.
Figure 7: Click
C
show sele
lected records button to cheeck selected daata.
4. The given data will
w be divided into headd part (136 po
oints) and taiil part accordding to the fiirst mean
d obtain m2.. Again, calcculate the
valuue m1. Calcuulate the m2 for those vaalues greater than m1 and
m3 for those vaalues greaterr than m2 annd obtain m3
3. Repeat th
hese steps unntil head parrt are no
n.
longger heavy-taiil distribution
d tail part?
Tipss: How to define head and
Heaad part: valuues which aree greater thann the mean value
v
(not incclude the meaan value).
Taill part: valuess which are equal
e
to or leess than the mean
m
value (include
(
the m
mean value)..
Morre tips can bee found in htttp://en.wikippedia.org/wik
ki/Head/tail_
_Breaks (metthod part).
ng mean valuue, record alll mean valuees and calcullate relevant statistics
5. Wheen you finishhed calculatin
information show
wed as follow
wing (Table11).
4 Table 1: Th
he statistics innformation of using
u
head/taiil breaks methhod
(#= nnumber, %= peercentage). According to thhe above tablle, you can ffind that perrcentage of head
h
part is increasing while
w
the
w
the
perccentage of taail part decreeased. In thiis tutorial, head/tail breaaks calculatioon stopped when
perccentage of heead part is eq
qual to the peercentage of tail
t part.
mbology > Quantities
Q
> Select Size as value
6. Righht click headtailbreaks.sshp > Propeerties > Sym
agaiin > Click claassify (Figurre 8).
Figure 8: Clicck Classify annd manually seet break values for classificaation.
w
you cliick classify button. Impo
ort above m
mean values as break
7. A new window appeared when
make sure these five classses have folllowing interrval rule:
valuues and clickk ok (Figure 9). Please m
[minnimum, m1],, (m1, m2], (m2, m3], (m
m3, m4], (m
m4, m5]. Theen, new classs intervals have been
definned. Click Appply > OK.
Figure 9: Im
mport mean vaalues calculatted above as break values.
5 Visualiize two classsification methods
m
When thhese two meethods have been
b
successsfully applieed, you need to set an eqqual symbol (such as
point) annd color for verification. Besides, youu should plaace them equally in the saame width an
nd height
by usingg data frame properties under
u
the layyout view. Fiigure 10 is a layout view
w of two classsification
results. Y
You can easiily find out that
t head/taill breaks metthod reflects a relatively real phenom
menon. In
other woords, the distribution of point using head/tail breeaks is much
h more naturral than Jenk
ks natural
breaks.
Figuure 10: The 10
023 points usinng (a) natural breaks, and (b
b) head/tail brreaks
Acknow
wledgemen
nts
I would like to thankk Prof. Bin Jiang
J
and M
Mr. Ding Ma for their com
mments and suggestions.. A small
m developed by
b Mr. Ding Ma for autom
nning the head/tail breakks process is available
program
matically run
at https:///github.com
m/digmaa/HeaadTailBreakss
Referen
nces
Jenks G. F. (1967), The
T data mod
del concept iin statistical mapping, International YYearbook of
1
Cartogrraphy, 7(1), 186-190.
Jiang B. (2013), Heaad/tail breakss: A new classsification sccheme for daata with a heaavy-tailed
94.
distributtion, The Proofessional Geeographer, 665(3), 482-49
ure of maps aand mapping,, International Journal of Geographiical
Jiang B. (2014), Thee fractal natu
xx, DOI: DOII: 10.1080/13658816.201
14.953165
Information Science, xx(x), xx-x
6