Chapter 7 `Hot Spot` Analysis II

Chapter 7
‘Hot Spot’ Analysis II
Th is ch a pt er con t inu es t h e discuss ion of h ot spots . Thr ee a ddit ion a l rout ines a r e
discuss ed: ICJ IA’s STAC rout ine (discu ssed by Rich a r d a n d Ca r olyn Block), th e K-m ea n s
r ou t ine, a n d An selin’s Loca l Mor a n . Figur e 7.1 displays t h e ‘H ot Spot’ An a lysis II p a ge.
The fir st of t h ese r ou t ines, th e Spa t ial an d Temp or a l An a lysis of Crim e (STAC), was
developed by t h e Illin ois Cr im in a l J u st ice In for m a t ion Au t h orit y a n d in t egr a t ed in t o
Crim eS tat in ver sion 2. The secon d r ou t ine - K-m ea n s, is a p a r t itioning t ech n ique. The
t h ir d t echn iqu e - An selin ’s Local Mor a n , is a zon a l h ot s pot m et h od. We’ll st a r t fir st wit h
S TAC, a n d wh o bet t er t o exp la in it t h a n th e a u t h or s of t h e r ou t in e, Rich a r d a n d Ca r olyn
Block .
Spa tial an d Temp oral Ana lysis of Crime (S TAC )
by
Rich a r d Block
P r ofes sor of Sociology
Cr imin a l J u st ice
Loyola Un iver sit y
Chicago, IL
Ca r olyn Re becca Block
Senior Resear ch Ana lyst
Illin ois Cr im in a l J u st ice In for m a t ion Au t h orit y
Chicago, IL
The amount of inform at ion a vailable in a n a ut oma ted pin ma p can be enorm ous.
When geogra ph ic infor m a t ion syst em s wer e firs t int r odu ced int o policing, th er e wer e few
wa ys t o su m m a r ize t h e h u ge r es er voir of ma pp ed in for m a t ion t h a t wa s s u dd en ly a va ila ble.
In 1989, p olice depa r t m en t s in Illin ois a sk ed t h e Illin ois Cr im in a l J u st ice In for m a t ion
Au t h or it y t o d evelop a t ech n iqu e t o id en t ify H ot S pot Ar ea s (t h e d en s es t clu s t er s of p oin t s
on a m a p). Th e r es u lt wa s S TAC, t h e first crim e h ot s pot pr ogra m . 1 Through th e years,
“bells an d whistles” ha ve been a dded to STAC, but t he algorith m h as r emained essent ially
t h e sa m e. STAC is a qu ick, vis u a l, ea sy-t o-u se pr ogra m for iden t ifying H ot S pot Area s.
Th e ST AC Hot Sp ot Area r out in e in Crim eS tat searches for a nd ident ifies th e
den sest clu st er s of in cid en t s ba sed on t h e sca t t er of poin t s on t h e m a p. Th e STAC H ot
Spot Ar ea r ou t in e cr ea t es a r ea l u n it s fr om poin t da t a . It id en t ifies t h e m a jor
con cen t r a t ion s of p oin t s for a given d is t r ibu t ion . It t h en r ep r es en t s ea ch d en s e a r ea by t h e
STAC is a s ca n -t ype clus t er ing a lgor ith m in wh ich a circle is r epea t edly la id over a
grid a n d t h e n u m ber of poin t s wit h in t h e circle a r e cou n t ed (Opens h a w, Cha r lton, Wym er
a n d Cr a ft , 1987; Open sh a w, Cra ft , Cha r lton, a n d Birch, 1988; Tur n bu ll, Iwa n o, Bur n et t ,
H owe, an d Clar k, 1990; Ku ldor ff, 1995). It, t h u s, sh a r es wit h t h ose oth er scan r ou t ines t h e
pr oper t y of mu lt iple t est s, bu t it differ s in t h a t t h e overla pp in g clu st er s a r e com bin ed in t o
la r ger clu st er u n t il t h er e a r e n o lon ger a n y over la ppin g cir cles. Th u s, S TAC clu st er s ca n
be of differ in g sizes . Th e r out in e, t h er efor e, com bin es som e elem en t s of pa r t it ion in g
clus t er ing (t h e sea r ch circles) with h iera r ch ica l clus t er ing (t h e a ggr egat ing of sm a ller
clu st er s in t o la r ger clu st er s).
7.1
Figure 7.1:
'Hot Spot' Analysis II Screen
Th e ST AC Hot Sp ot Area r out in e in Crim eS tat searches for a nd ident ifies th e
den sest clu st er s of in cid en t s ba sed on t h e sca t t er of poin t s on t h e m a p. Th e STAC H ot
Spot Ar ea r ou t in e cr ea t es a r ea l u n it s fr om poin t da t a . It id en t ifies t h e m a jor
con cen t r a t ion s of p oin t s for a given d is t r ibu t ion . It t h en r ep r es en t s ea ch d en s e a r ea by
eith er a st a n da r d deviat ion a l ellipse or a con vex h u ll, or bot h (see cha pt er 4). The
bou n da r ies of t h e ellipses or con vex h u lls ca n ea sily be displa yed as m a pped layer s by
st a n da r d G IS softwa r e.
STAC is n ot con st r a in ed by a r t ificia l or polit ica l bou n da r ies, s u ch a s police bea t s or
cen su s t r a ct s. Th is is im por t a n t , beca u se clu st er s of even t s a n d pla ces (su ch a s dr u g
m a r k et s, ga n g t er r it or ies, h igh violen ce t a ver n s, or gr a ffit i) d o n ot n ecessa r ily s t op a t t h e
bord er of a police bea t . Also, sh a din g over a n en t ir e a r ea m a y m a k e it seem t h a t t h e wh ole
n eigh bor h ood is h igh -cr im e (or low-cr im e), even t h ou gh t h e a r ea m a y con t a in on ly on e or
t wo den se pocket s of cr im e. Th er efor e, a r ea -sh a ded m a ps cou ld be m is lea din g. In con t r a st ,
STAC H ot Spot Area s a r e bas ed on t h e a ct u a l clus t er s of event s or pla ces on t h e m a p.
STAC is des igned t o help t h e crim e a n a lyst su m m a r ize a va st a m oun t of geogr a ph ic
in for m a t ion so t h a t pr a ct ica l p olicy-r ela t ed is su es ca n be a dd res sed , s uch a s r es ou r ce
a lloca t ion , cr im e a n a lysis , bea t defin it ion , t a ct ica l a n d in vest iga t ion decis ion s, or
developm en t of int er vent ion st r a t egies. An imm edia t e con cer n of a law en for cem en t u ser
of a u t om a t ed pin m a ps is th e id en t ifica t ion of a r ea s t h a t con t a in es pecia lly d en se clu st er s
of event s. Thes e pocket s of cr ime d em a n d police a t t en t ion a n d cou ld indicat e differ en t
t h ings for differ en t cr imes . For ins t a n ce, a gr ou pin g of Crim ina l Dam a ge to Pr oper t y
offen ses cou ld in dica t e ga n g a ct ivit y. If m ot or veh icle t h eft s con sis t en t ly clu st er in on e
section of t own , it cou ld poin t t o t h e n eed t o ch a n ge pat r ol pa t t er n s a n d pr ocedu r es.
To t a k e a n exa m ple, F igu r e 7.2 sh ows t h e loca t ion of t h e seven den sest H ot Spot
Areas of str eet r obbery in 1999 in Chicago. Four of th e seven span th e boun dar ies of police
distr icts a nd t wo cover only a sm all par t of a larger district. In a sh aded ar ea ma p, th ese
dense clusters of robbery might be not easily identifiable. An a rea t ha t is really dense
m igh t a pp ea r t o be low-cr im e becau se it is d ivided by a n a r bit r a r y boun da r y. Usin g a
sh a ded a r ea l m a p a ggr ega t in g t h e da t a wit h in ea ch dis t r ict wou ld give a gen er a l idea of
t h e dist r ibut ion of cr ime over t h e ent ire m a p, but it would n ot t ell exactly wher e t h e
clus t er s of cr im e a r e loca t ed.
F or exa m ple, figu r e 7.3 zoom s in on H ot Spot Ar ea 4 (t h e n or t h er n m ost H ot Spot
Ar ea in F igu r e 7.2). H ot S pot Ar ea 4 cover s pa r t s of t wo d is t r ict s (s h own by a p in k
bou n da r y line in figu r e 7.2) Ther e a r e a lso fou r beat s (shown by blue boun da r y lines ). The
sh a ded m a p in dica t es m a n y in ciden t s in beat 2311, bu t few in bea t s 2312, an d 2313. 2 Th e
inciden t dist r ibut ion ind ica t es t h a t wh ile few inciden t s occu r r ed over a ll in 2312 a n d 2313,
m ost of t h e in ciden t s t h a t did occur wer e n ea r t o beat 2311. In ciden t s in bea t 2311 m a in ly
occu r r ed on it s ea st er n bou n da r y. P or t ion s of t h e bea t wer e r ela t ively fr ee fr om s tr eet
r obber y. Th e H ot Spot Ar ea id en t ifies t h is clu st er in g t h a t spa n s bea t s a n d dis t r ict s. H ot
Sp ot Area s t h a t over la p be a t a n d d ist r ict bou n da r ies m ight in dica t e t o pa t r ol officer s in
t h ese n eigh bor ing a r ea s t h a t t h ey should coor din a t e t h eir effor t s in com ba t ing cr ime.
7.3
Figure 7.2:
STAC Hot Spots for 1999 Street Robberies
Figure 7.3:
STAC 1999 Street Robbery Hot Spot Area 4
H o w S T AC Id e n t i fi e s H o t S p o t Ar e a s
Th e following pr ocedu r es iden t ifies h ot spots in STAC. The p r ogra m imp lemen t s a
se a r ch a lgorit h m , look in g for H ot S pot Area s.
1.
STAC lays out a 20 x 20 grid st r u ct u r e (t r ian gula r or r ect a n gula r , defined by
t h e u se r ) on t h e pla n e defin ed by t h e a r ea boun da r y (defin ed by t h e u se r ).
2.
STAC places a circle on every n ode of t h e grid, with a r a diu s equ a l to 1.414
(t h e squ a r e r oot of 2) t imes t h e specified sea r ch r a diu s. Thu s, t h e circles
overlap.
3.
STAC coun t s t h e n u m ber of poin t s fa llin g wit h in ea ch cir cle, an d r a n k s t h e
circles in descen din g ord er .
4.
For a ma ximum of 25 circles, STAC records a ll circles with at least t wo dat a
poin t s a lon g wit h t h e n u m ber of p oin t s wit h in ea ch cir cle. Th e X a n d Y
coor d in a t es of a n y n od e wit h a t lea s t t wo in cid en t s wit h in t h e s ea r ch r a d iu s
a r e r ecor ded, a lon g with t h e n u m ber of da t a poin t s fou n d for ea ch n ode.
5.
Th ese cir cles a r e t h en r a n k ed a ccor din g t o t h e n u m ber of poin t s a n d t h e t op
25 s ea r ch a r ea s a r e select ed.
6.
If a poin t belon gs t o t wo differ en t circles, t h e poin t s wit h in t h e circles ar e
com bin ed. Th is p r oces s is r epea t ed u n t il t h er e a r e n o over la pp in g circles .
Th is r ou t in e a voids t h e pr oblem of da t a poin t s belon gin g t o m or e t h a n on e
clus t er , an d t h e a dd it ion a l pr oblem of differ en t clu st er a r r a n gem en t s bein g
possible with th e same points. The result is called Hot Clusters.
7.
Usin g t h e da t a poin t s in ea ch H ot Clu st er , for ea ch clu st er t h e pr ogr a m ca n
ca lcu lat e t h e best -fitt ing st a n da r d deviat ion a l ellipse or con vex h u ll (see
cha pter 4). These ar e called Hot S pot Areas. Becau se the sta nda rd
devia t ion a l ellips e is a st a t ist ical su m m a r y of th e H ot Clu st er point s, it m a y
n ot con t a in ever y H ot Clu st er poin t . It a ls o m a y con t a in poin t s t h a t a r e n ot
in t h e H ot Clu st er . On t h e ot h er h a n d, t h e con vex h u lls will cr ea t e a polygon
a r ou n d a ll poin t s in t h e clus t er .
The user can specify different search ra dii an d re-ru n t he routine. Given th e same
a r ea bou n da r y, differ en t sea r ch r a dii will oft en pr odu ce sligh t ly differ en t n u m ber s of H ot
Clu st er s. A sea r ch r a diu s t h a t is eit h er t oo la r ge or t oo sm a ll m a y fail t o pr odu ce an y.
E xper ien ce an d exper im en t a t ion a r e n eed ed t o det er m in e t h e m ost u seful s ea r ch r a dii.
7.6
S t e p s i n U s i n g S TAC
S T A C is ava ilable on t h e H ot Spot Ana lysis II t a b un der Spa t ial Descript ion (see
figu r e 7.1). A br ief su m m a r y of t h e st eps is a s follows:
1.
S T A C r equ ir es a p rim a r y file a n d a r efer en ce file (se e cha pt er 3). Opt iona lly,
STAC requ ires t h e r efer en ce file a r ea (on t h e m ea su r em en t pa r a m et er s t a b)
if sim u lat ion r u n s a r e r equ est ed. Note: while S T A C r u n s qu it e qu ickly, it
r u n s m or e qu ick ly wit h a E u clidea n coor din a t e sys t em su ch a s UTM or St a t e
P la n e. F or exa m ple, an a n a lysis of 13,000 st r eet r obber ies in Ch icago ra n in
less t h a n t wo secon ds on a 800 mh z PC wit h pr oject ed coor din a t es
(E u clidean ), while it t ook longer wit h sph er ica l coor din a t es
(la t it u de/lon git u de).
2.
Define t h e r efer en ce file (see cha pt er 3). Wh ile Crim eS tat does not in clud e a
da t a ba se m a n a ger or qu er y system , a u ser ca n ca r r y ou t a n a lysis of differ en t
a r ea s of a ju r isd iction by u sin g t h e bou n da r ies of sever a l r efer en ce ar ea s.
F or exa m ple, defin e a ll of Ch ica go a s a r efer en ce a r ea a n d defin e ea ch of th e
t wen t y-five police dist r ict s a s a ddit ion a l referen ce a r ea s. Hot S pot Ar ea s
ca n be id en t ified for t h e cit y a s a wh ole a n d for ea ch dis t r ict . In ot h er wor ds,
t h e sa m e inciden t file m a y be used for a n a lysis of differ en t m a p a r ea s by
using multiple reference files.
3.
Defin e t h e sea r ch r a diu s. Gen er a lly, a t wo-st a ge a n a lysis is best . St a r t wit h
a lar ger s ea r ch r a diu s a n d t h en a n a lyze Hot Sp ot Ar ea s wit h a sm a ller
sea r ch r a diu s. A sea r ch r a diu s of mor e t h a n one m ile m a y n ot yield u sefu l
r esu lt s in a n a r ea t h e size of Ch ica go (320 squ a r e m iles).
4.
Set t h e out pu t u n it s t o mile s or kilomet er s.
5.
Sp ecify t h e file out pu t n a m e for t h e ellip se s or con vex h u lls.
6.
Click on t h e S T A C p a r a m et er s bu t t on .
Th e object of S T A C is t o ident ify hot sp ot s a n d disp lay t h em with ellipses or con vex
h u lls. It s key fu n ct ion is visu a l. Sa ve t h e ellip ses or h u lls in t h e for m m ost a ppr opr ia t e for
t h e syst em (e.g., ArcView , Atlas, M apIn fo). Becau se t h e ellipses or con vex h u lls a r e
gen er a t ed a s p olygon s, t h ey can be u sed for selection s, qu er ies, or t h em a t ic m a ps in t h e
GIS. In a ddit ion t o t h e ellip ses a n d con vex h u lls, a t a ble is ou t pu t wit h a ll t h e in for m a t ion
on den sit y an d loca t ion for ea ch ellipse. It ca n be sa ved to a ‘dbf’ file, wh ich ca n t h en be
read by any spreadsheet pr ogra m. The ellipses and convex hu lls ar e num bered in th e same
or der a s t h e p rin t ed ou t pu t .
7.7
Figure 7.4:
STAC Parameters Setup
S TAC P ara m e te rs
Th e t wo m ost im por t a n t pa r a m et er s for r u n n in g STAC a r e t h e bou n da r y of t h e
st u dy ar ea (r efer en ce a r ea ) a n d t h e sea r ch r a diu s. A det a iled discu ssion of t h e pa r a m et er s
follows. Figu r e 7.4 shows t h e S T A C pa r a m et er s s cre en .
S ea r c h R a d i u s
1.
Th e sea r ch r a diu s is t h e k ey set t in g in S T A C. In gen er a l, t h e lar ger t h e
sea r ch r a diu s, t h e m or e inciden t s t h a t will be inclu ded in ea ch H ot Clust er
a n d t h e la r ger t h e ellipse t h a t will be d isp la yed. S m a ller sea r ch r a dii
gen er a lly resu lt in m ore e llips es of a sm a ller size. A good st r a t egy is t o
in it ia lly u s e a la r ger r a d iu s a n d t h en r e-a n a lyze a r ea s t h a t a r e ‘h ot ’ wit h a
sm a ller r a diu s. In Ch icago, we h a ve fou n d t h a t a 750 m et er r a diu s is
a ppr opr ia t e for t h e cit y a s a wh ole a n d a 200 m et er sea r ch r a diu s for on e of
t h e 25 dist r ict s. It will be necessa r y to exper imen t t o det er m ine a n
appr opriat e sear ch r adius.
Units
2.
Sp ecify t h e u n it s for t h e sea r ch r a diu s. Th e defau lt is m iles a n d t h e defau lt
sear ch r adius is 0.5 miles. Be car eful about using lar ger sea rch ra dii. In
Ch ica go, a sea r ch r a diu s la r ger t h a n on e m ile gen er a t es ellip ses t h a t a r e t oo
lar ge to be of a n y ta ct ica l or pla n n ing u se. Oth er good choices a r e 750
met ers or 0.25 miles.
Min im u m P oin ts P er Clu st er
3.
Specify th e m inim u m n u m ber of poin t s t o be inclu ded in a H ot Clust er . The
limit for t h e m inim u m poin t s in a H ot Clust er is t wo. We u su a lly u se a
m inim u m of 10.
B ou n d a r y
4.
Select t h e r efer en ce file t o be us ed for t h e a n a lysis. The u ser ca n ch oose t h e
bou n da r y fr om t h e da t a set (i.e., t h e m in im u m a n d m a xim u m X/Y va lu es) or
fr om t h e r efer en ce bou n da r y. In ou r opin ion , t h e ch oice of t h e r efer en ce
boun da r y is bes t . If th e da t a set is u se d t o defin e t h e r efer en ce boun da r y,
th e smallest recta ngle tha t encompa sses all incident will be used.
S ca n Typ e
5.
Select t h e scan t ype for t h e grid. Choose Rect a n gula r if t h e a n a lysis a r ea h a s
a m ost ly grided s t r eet pa t t er n . Chose Tria n gula r if t h e a n a lysis a r ea
gen er a lly h a s a n ir r egu la r s t r eet p a t t er n .
7.9
Gr a p h ica l ou tp u t fi les
6.
Select whet her t he graph ical out put will be displayed as stan dar d
deviat ion a l ellipse or a s con vex h u lls, or bot h (see cha pt er 4). For ellipses ,
se lect t h e n u m ber of st a n da r d d evia t ions for t h e ellip se s. On e (1X), 1.5X, a n d
2X st a n da r d deviat ion s can be select ed. One s t a n da r d deviat ion a l ellipses
sh ou ld be su fficien t for m ost a n a lysis . Wh ile on e st a n da r d devia t ion a l
ellipses r a r ely overla p, 1.5X a n d 2X t wo st a n da r d deviat ion a l ellipses oft en
do. A la r ger ellips e will in clud e m ore of th e H ot Clu st er point s; a sm a ll
ellips e will pr odu ce a m ore focu sed H ot Clu st er iden t ificat ion . Th e u ser will
h a ve to wor k out a ba lan ce bet ween defin ing a clus t er pr ecisely com pa r ed t o
ma king it so lar ge as to be un clear where one sta rt s an d an oth er ends.
S i m u l a t i on R u n s
7.
Specify wh et h er a n y s im u la t ion r u n s a r e t o be m a de. To t est t h e sign ifica n ce
of S T A C clust ers , it is necessa r y to ru n a Mont e Car lo simu lat ion (Dwa ss,
1957; Ba r n a r d, 1963). Crim eS tat in clud es a Mont e Ca r lo sim u la t ion r out in e
t h a t pr odu ces a ppr oxim a t e con fid en ce in t er va ls for t h e pa r t icu la r STAC
m odel t h a t h a s been r u n . Th e differ en ce bet ween t h e den sit y of in ciden t s in
S T A C ellipses in a spa t ially ra n dom da t a set a n d t h e S T A C ellips es in t h e
a ctu a l da t a set is a t es t of t h e st r en gt h of t h e clust er in g det ect ed by S T A C.
E ssen t ia lly, t h e Mon t e Ca r lo s im u la t ion a ssign s N ca ses r a n dom ly t o a
r ect a n gle wit h t h e sa m e a r ea a s t h e defin ed st u dy a r ea a s specified on t h e
Mea su r em en t P a r a m et er s t a b a n d evalu a t es t h e n u m ber of clus t er s
according to th e defined par am eters (i.e., search ra dius). It repeat s th is test
K t im es, wh er e K is defined by th e u ser (e.g., 100, 1,000, 10,000). By
r u n n in g t h e sim u la t ion m a n y tim es, t h e u ser can a ss ess a pp r oxim a t e
con fid en ce in t er va ls for t h e p ar t icu la r n u m ber of clu st er s a n d d en sit y of
clu st er s. Th e defa u lt is zer o sim u la t ion r u n s beca u se t h e sim u la t ion r u n
op tion u su a lly in cr ea ses th e ca lcu la t ion t im e con sid er a bly. If a sim u la t ion
r u n is select ed, t h e u ser sh ou ld id en t ify t h e a r ea of t h e st u dy r egion on t h e
Mea su r em en t P a r a m et er s t a b. It is bet t er t o u se t h e ju r isidict ion a l ar ea
ra th er th an th e reference area if th e jur isdiction is irr egular ly sha ped.
Ou tp u t
E ll i p s e s o r co n v ex h u l l s
Th e ellipses a r e ou t pu t with a pr efix of “St ” befor e t h e ou t pu t file n a m e wh ile t h e
con vex h u lls a r e ou t pu t wit h a pr efix of “Cst ” befor e t h e ou t pu t file n a m e. Th ese object s ca n
ea sily be in cor por a t ed in t o a GIS sys t em . ArcView sh a pe files ca n be open ed a s t h em es .
STAC gra phic files also can be added as a M apIn fo la yer u sin g t h e Un iver sa l Tr a n sla t or
Tool. M apIn fo Mif/Mid files m u st be im por t ed u sin g t h e com m a n d t a ble—>im por t . Bot h
M apIn fo an d ArcView files a r e polygon s a n d ca n be u se d for qu er ies , t h em a t ics, a n d
selections.
7.10
P r i n t ed O u t p u t
Ta ble 7.1 s h ows t h e pr in t ed out pu t . Not e t h a t t h e pr in t ed out pu t does n ot in clud e
t h e file n a m e. Be s u r e t o recor d t h e file n a m e a n d t h e r efer en ce file (if a n y t h a t is u se d).
1.
Th e firs t section of t h e ou t pu t docu m en t s pa r a m et er set t ings a n d file size.
Sa m ple size in dica t es t h e n u m ber of poin t s in t h e file specified in t h e set u p.
2.
Mea su r em en t Typ e in dica t es t h e t yp e of dis t a n ce m ea su r em en t , d ir ect or
In dir ect (Ma n h a t t a n ).
3.
Scan Type indicat es a recta ngular or t rian gular grid specified in t he setup.
4.
In pu t Un it in dica t es t h e u n its of t h e coor din a t es sp ecified in t h e set u p,
degr ees (if la t it u de/lon git u de) or m et er s or feet (if pr oject ed).
5.
Ou t pu t Un its ind ica t e t h e u n it of den sit y an d lengt h specified in t h e set u p
for t he out put an d ellipses. Outpu t Un its ar e genera lly, miles or k ilomet ers.
6.
Sea r ch Ra diu s is t h e u n it s s pecified in t h e set u p. In F igur e 7.2 a bove, th is is
met ers.
7.
Bou n da r y iden t ifies t h e coor din a t es of t h e lower left a n d u pper r igh t cor n er
of t h e s t u dy a r ea .
8.
P oin t s in sid e t h e bou n da r y cou n t t h e n u m ber of poin t s wit h in t h e r efer en ce
file. This m a y be fewer t h a n t h e n u m ber of poin t s in t h e t ot a l file wh en a
sm a ller a r ea is bein g u se d for a n a lysis (see a bove).
9.
Sim u lat ion Run s in dica t es t h e n u m ber of r u n s, if a n y specified in t h e set u p.
10.
F in a lly, STAC p r in t ed ou t pu t pr ovides su m m a r y s t a t is t ics for ea ch H ot Spot
Ar ea .
A.
Clu st er — a n id en t ifica t ion n u m ber for ea ch ellip se. Th is cor r espon ds
t o th eir ord er in a t a ble view in ArcView , or t h e br owser in M apIn fo.
B.
Mea n X a n d Mea n Y - Coor din a t es of t h e m ea n cen t er of t h e ellipse.
C.
Rota t ion - th e degr ees t h e ellipse is r ota t ed (0 is h orizont a l; 90 is
ver t ica l).
D.
X-a xis a n d Y-a xis - t h e len gt h (in t h e s elect ed ou t p u t un it s ) of t h e x
a n d y a xis. In t h e exa m ple, t h e len gt h of t h e x a xis of ellips e 1 is
1.04768 miles.
7.11
Ta ble 7.1
P ri n te d Ou tp u t fo r S TAC
Spatial and Temporal Analysis of Crime:
--------------------------------------Sample size ...........:
Measurement type ......:
Scan type....... ......:
Input units .... ......:
Output units ... ......:
Standard Deviations ...:
Search radius..........:
Boundary...............:
Points inside boundary.:
Simulation runs .......:
Cluster Mean X
------- ------1
-76.44915
2
-76.73681
3
-76.57098
4
-76.77129
5
-76.51830
6
-76.60231
7
-76.73087
8
-76.75451
Mean Y
------39.31484
39.28658
39.38499
39.35987
39.26019
39.40086
39.34246
39.31110
1181
Direct
Rectangular
Degrees
Miles, Squared Miles, Points per Squared Miles
1
804.672000
-76.83302,39.23274 to -76.38390,39.59103
1179
1000
Rotation
-------89.41867
69.91502
37.10812
11.26360
8.37773
14.84392
41.07812
74.78196
X-Axis
------1.04768
0.22142
0.34793
0.94336
0.43717
0.17969
0.31007
0.19154
Y-Axis
----0.25053
0.88202
0.82213
0.26216
0.25497
0.29466
0.25885
0.31572
Ellipse
Area Points Density
---- ------ ------0.82460 106 128.546688
0.61354
63 102.682109
0.89863
61
67.880882
0.77695
61
78.511958
0.35017
43
22.796997
0.16634
36
16.423811
0.25215
35
38.806566
0.18998
24
26.326405
Distribution of the number of clusters found in simulation (percentile):
Percentile Clusters
Area
Points
Density
---------- -------- --------------- ------------ ---------------min
12
0.01113
5
4.673554
0.5
13
0.02389
5
4.924993
1.0
13
0.03587
5
4.977644
2.5
14
0.05081
5
5.236646
5.0
14
0.06177
5
5.505124
95.0
19
1.24974
14
82.281060
97.5
19
1.39923
16
101.053102
99.0
20
1.58861
17
140.078387
99.5
20
1.67065
19
209.279368
max
20
2.08665
23
449.401912
E.
Ar ea - t h e a r ea of t h e ellip se in squ a r e u n it s . E llip ses ar e or der ed
a ccor din g to th eir s ize. In t h e exam ple, Ellipse 1 is 0.8246 squ a r e
m iles.
F.
P oint s - th e n u m ber of poin t s in t h e H ot Clust er . In t h e exam ple,
t h er e a r e 61 point s in clus t er 3.
7.12
G.
Clust er Density - th e num ber of points per squa re un it. The lar gest
clus t er is n ot n ecess a r ily th e den sest . In t h is exa m ple, clu st er eigh t is
t h e sm a llest, bu t its den sit y is h igh er t h a n t wo ot h er clus t er s.
Th e best wa y to prin t or sa ve Crim eS tat print ed out put is to place the cur sor inside
th e out put window and S elect all, t h en copy a n d p a st e t h e select ion in t o a wor d p r ocessin g
docum en t in la n ds cap e m ode.
Ma k e su r e t o ad equ a t ely a n n ota t e t h e file, especially t h e t ype of in ciden t s, t h e
r efer en ce bou n da r y, a n d t h e n a m e of th e out pu t file. Th is ca n be ver y im por t a n t for fu t u r e
r efer en ce.
F o r Ol d S TAC U s e r s
In gen er a l, S T A C h a s r et a in ed a ll t h e fun ction a lit y a n d s peed of pr evious ver sion s.
Th e ellipses will look somewh a t differ en t t h a n pr evious ver sion s, becau se a m ore w idely
a ccept ed m et h od for calcu la t in g st a n da r d d evia t iona l ellipse s h a s been u se d. S T A C for
DOS u sed a 1x s t a n da r d devia t ion ellip se. An a lyst s wh o wa n t r esu lt s sim ila r t o S T A C for
DOS sh ou ld set st a n da r d deviat ion s t o 1.
Th e Crim eS tat ver sion of S T A C h a s t h e followin g im pr ovem en t s over STAC for
DOS:
1.
S T A C n o longer r equ ir es t h e u se of a sp ecial ASCI I d a t a file. Th e da t a file
can be a n y of th ose a va ila ble in Crim eS tat.
2.
An y p r oject ion ca n be u sed, in clu din g la t it u de/lon git u de. F iles a r e n ot
con ver t ed in t o a E u clidea n pr oject ion .
3.
We h a ve n ot fou n d a lim it on t h e n u m ber of p oin t s t h a t ca n be a n a lyzed wit h
t h e Crim eS tat vers ion of STAC. Ther efor e, a sm a ll r a diu s can n ow be us ed
over large ar eas.
4.
S T A C can gen er a t e Sh a pe files for Ar cView or Mif/Mid files for Ma pI n fo.
Both a r e polygon s-n ot p oin t s.
5.
It is ea sier for t h e u ser t o specify t h e n u m ber of st a n da r d devia t ion s for a n
ellip se (1X, 1.5X, or 2X).
6.
Convex hu ll out put ha s been added.
7.
Th e u s er ca n r u n S T A C on a sp at ia lly r a n dom da t a set t o get a n es t im a t e of
t h e degr ee of clu st er in g det ect ed by S T A C in t h e in cid en t d a t a .
7.13
8.
Th e st u dy ar ea bou n da r y (r efer en ce file) ca n be gener a t ed from t h e da t a set
(we wou ld n ot su ggest doin g t h is sin ce it will be difficu lt t o com pa r e
distr ibut ions).
Exam ple 1: A STAC Ana lysis of 1999 Chica go Stre et Robbe ries
STAC H ot Spot Area s wer e ca lcu lat ed for a ll st r eet (or sidewa lk or a lley) r obber ies
occur r in g in Ch ica go in 1999 (n =13,009). 3 Th er e we r e 13,007 wit h in t h e sea r ch boun da r y.
Th e sea r ch r a diu s wa s set for 750 met er s (app r oxim a t ely ½ m ile), an d t h e ellipses wer e set
t o on e st a n da r d d evia t ion. Te n wa s t h e m in im u m n u m ber of in ciden t s p er clus t er .
In figu r e 7.2 (sh own ea r lier), STAC det ect ed seven ellipses. Th e a r ea s of t h e seven
ellip ses r a n ged fr om 5 squ a r e kilom et er s t o 0.7 squ a r e kilom et er s, a n d t h e n u m ber of
in ciden t s in a n ellipse r a n ged from 760 t o 153. Th e sm a lles t ellipse (nu m ber 7 in figur e
7.2) wa s t h e den sest , 222 robber ies per squ a r e kilom et er . Of t h e 13,007 inciden t s, 2,375
wer e in a clus t er . Ther efor e, 18 percent of a ll of Chicago’s st r eet r obber ies in 1999 occu r r ed
in 6% of its 233 squ a r e m ile a r ea .
To m a p t h e r esu lt s, t h e ellip se bou n da r ies wer e im por t ed in t o M apIn fo a s a m if/m id
file an d over la id on a m a p of police dist r icts . Th e la r ge blu e r ect a n gle in figur e 7.2
design a t es t h e sea r ch bou n da r y (r efer en ce file). O’H a r e Air por t wa s exclu ded beca u se
exa ct geo-codin g is n ot p ossible for t h e few s t r eet r obber ies t h a t occur r ed t h er e. At a citywide scale, th e m a p is int er est ing, but is m a inly us efu l for con firm ing wh a t is alr ea dy
kn own . Ellipse 1, on t h e west side, ha s h a d a h igh level of violence for m a n y yea r s.
E llip ses 2 a n d 6 a r e cen t er ed on a r ea s wh er e h igh r is e pu blic h ou sin g p r oject s a r e
gr a du a lly bein g a ba n don ed. Over a ll, t h ese ellip ses a r e n ot ver y u sefu l for t a ct ica l
p u r pos es . H owever , t h ey p oin t ou t t h a t fou r H ot S pot Ar ea s cr os s Dis t r ict bou n d a r ies , a n d
t h a t t h e la r ge n u m ber of st r eet r obber ies in t h ese a r ea s m igh t be los t in sepa r a t e dis t r ict
report s.
A N e i g h b or h o od S T AC An a l y si s
Th e pr esen ce of E llipse 4 (th e n or t h er n m ost ellipse in figu r e 7.2) m igh t be
u n expe cted t o ma n y Ch icagoan s. Th e m id-Nort h sid e, n ea r t h e La k e Michiga n , is gen er a lly
con sid er ed t o be a r ela t ively a fflu en t a n d sa fe n eigh bor h ood. H owever , t h e n eigh bor h ood
a r ou n d E llipse 4 h a s h a d a h igh level of cr ime for m a n y yea r s. It wa s a n en t er t a inm en t
cent er in t h e Roar in g Twen t ies, an d s ever a l inst it u t ion s of t h a t er a r em a in . Toda y it is a n
a r ea wit h m u lt ip le, oft en con flict in g, u ses. A m or e det a iled a n a lysis of t h e n eigh bor h ood
wit h t h e h elp of STAC m a y p oin t t o specific a r ea s t h a t n eed in cr ea sed pa t r ol or pr even t ion
a ctivit ies .
Th e secon d st ep of STAC an a lysis wa s t o defin e a focu sed s ea r ch bou n da r y ar ea
a r ou n d E llip se 4. Th is wa s don e ea sily by cr ea t in g a n ew m a p la yer in Ma pIn fo a n d
dr a wing a r ect a n gle a r ou n d t h e desir ed st u dy ar ea . Clickin g on t h e st u dy ar ea gave t h e
required Crim eS tat r efer en ce bou n da r y m a xim u m a n d m in im u m coord in a t es. Us in g t h is
more focused boun dar y, STAC was r un a second t ime with a 200 meter sear ch r adius an d
7.14
Figure 7.5:
STAC Hot Spots for Northeast Side Street Robberies
t h e sa m e file of 13,009 ca ses. Th e sea r ch bou n da r y (r efer en ce file) n ow con t a ined 442
inciden t s. STAC det ect ed t h r ee ellipses t h a t con t a ined 231 inciden t s. The S TAC ellipses
wer e t h en im por t ed in t o M apIn fo a n d m a pp ed (Figu r e 7.5).
As t h e a r ea cover ed by a m a p gr ows sm a ller , det a iled in for m a t ion a bout crim e
pa t t er n s a n d t h e com m u n it y ca n be a dd ed. In t h is m a p, t h e ST AC ellipses wer e overla in
wit h t h e a ddr ess loca t ion s of in cid en t s (sized a ccor din g t o t h e n u m ber occu r r in g a t ea ch
locat ion) an d str eets. 4 Much of t h e a r ea is r elat ively cr ime-fr ee. The m ost fr equ en t
loca t ion s for st r eet r obber y do n ot coincide wit h m a in s t r eet s. St r eet r obber y in ciden t s
t en d t o clus t er n ea r r a pid t r a n sit st a t ion s a n d t h e blocks imm edia t ely su r r ou n din g th em .
F or exa m ple, Ar gyle St r eet , bet ween Br oad wa y a n d S h er ida n , is t h e sit e of “New Ch in a
Town .” It is a n a n a r ea wit h a n u m ber of st r eet r obber ies a n d is a dest in a t ion a r ea for
“N or t h s id er s ” wh o wa n t a n in exp en s ive Ch in es e or Viet n a m es e m ea l.
Th er e is a pa r t icu la r ly r is k y a r ea in t h e n eigh bor h ood of Br oa dwa y a n d Wils on
a dja cen t t o Tr u m a n Com m u n it y College. In a pr eviou s a n a lysis of t h e Br on x, F or dh a m
Un iver sit y wa s sh own t o be a sim ila r a t t r a ct or for r obber y in cid en t s. Colleges su pply good
t a r gets for st r eet r obber y. Also, au t h or ity for secur ity is split bet ween t h e college a n d t h e
cit y p olice. Th e a r ea a r ou n d Br oa dwa y a n d Wils on h a s been r is ky for m a n y yea r s. N in et y
year s a go, it wa s t h e n or t h er n t er m inu s of r a pid t r a n sit , an d t h e sit e of severa l ver y
in exp en sive h ot els , t wo of wh ich st ill exist . Toda y t h e a r ea h a s sever a l p a wn sh ops a n d
cu r r en cy exch a n ges. Th er e is a n ATM loca t ed in t h e E L st a t ion . Th e a r ea look s
da n ger ous a n d d ir t y. Fin a lly, t h e a r ea h a s m a n y blin d corn er s a n d a lleys t h a t cou ld s er ve
a s sit es for r obber y; t h is is un u su a l for Chicago. The cen su s block t h a t inclu des t h e
n or t h west cor n er of Broad wa y an d Wilson r a n ked fift h a m on g Chica go’s 21,000 cen su s
block s in n u m ber of st r eet r obber ies in 1999.
Ch a n ges n eed t o be ma de t o red u ce th e r isk of st r eet r obber y in t h is a r ea . Ma pp in g
iden t ifies a pr oblem wit h st r eet r obber ies, bu t t o in vest iga t e possible cha n ges it is
n ecessa r y t o go beyon d m a ppin g. Asid e fr om ch a n ges in pa t r ol p r a ct ices, wh a t ph ys ica l
ch a n ges m igh t a id in crim e r edu ct ion ? The cam pu s h a s ver y litt le pa r kin g. The
a dm in ist r a t ion a ss u m es t h a t st u den t s t a k e pu blic tr a n sp ort a t ion, bu t m a n y do n ot. A
secur e pa r kin g ga r a ge th a t cou ld ser ve bot h t h e eleva t ed st a t ion a n d t h e school cou ld be
con st r u ct ed (va ca n t la n d is ava ila ble). In a dd it ion , in cr ea sed police p at r ol in t h e a r ea
bet ween t h e school an d t h e el s t a t ion could be im ple m en t ed.
Ad v a n ta g e s o f S TAC
STAC ha s a n u m ber of a dva n t a ges a s a clu st er in g a lgor it h m :
1.
STAC ca n a n a lyze a ver y lar ge n u m ber of cas es qu ickly. It is ver y fa st u sin g
a E uclidean pr ojection su ch a s UTM or St at e Plane, an d not quite as fast
u sin g s ph er ica l coor din a t es (la t it u de/lon git u de).
2.
Th e STAC u ser cont r ols t h e a ppr oxim a t e size of th e ellipses (sea r ch r a diu s),
th e minimu m n um ber of points per ellipse, an d th e study ar ea. These
7.16
featu res allow for a broad sear ch for H ot Spot Areas over an entire city an d a
secon d sea r ch con cen t r a t in g on a sm a ller a r ea a n d der ivin g focu sed H ot Spot
Area s for loca l t a ctica l u se .
3.
STAC a n d H eir a r ch ica l Clu st er in g a r e com plim en t a r y. H eir a r ch ica l
Clu st er in g firs t der ives sm a ll ellipse s a n d t h en a ggr ega t es t o lar ger ones.
The recomm ended STAC procedure is to first derive lar ge scale ellipses and
t h en t o a n a lyze th ese for t a ct ica l us e.
4.
Th e visu a l display of STAC ellipses or con vex h u lls is qu ite in t u itive.
5.
H ot spot s n eed n ot be lim it ed t o a sin gle kin d of cr im e, p la ce or even . F or
exa m ple, ellip ses of dr u g cr im e ca n be over la in on t h ose for bu r gla r y. Som e
ca u sa l fa ct or s a r e a lso a n a lyzable with STAC ellipses. F or exam ple, ellipses
of st r eet r obber y ca n be com pa r ed t o t h ose for liqu or licen ses.
9.
STAC com bin es fea t u r es of a h ier a r ch ica l a n d pa r t it ion in g s ea r ch m et h ods
a n d a da pt s it se lf to t h e size of t h e clust er s.
10.
Un lik e t h e Nn h r ou t in e, wh ich h a s a const a n t t h r esh old (sea r ch r a diu s),
STAC can creat e clusters of un equa l size becau se overlapping clusters a re
combined unt il th ere is no overlap.
Li m ita tio n s o f S TAC
Th er e a r e a lso som e limit a t ion s t o u sin g STAC:
1.
Th e dist r ibut ion of inciden t s wit h in clu st er s is n ot n ecessa r ily u n ifor m . The
u ser sh ou ld be ca r efu l n ot t o a ssu m e t h a t it is . A m a pped t h em e of t h e Mode
r ou t ine (see ch a pt er 6) a ccor din g to nu m ber of inciden t s or t h e sin gle ker n el
den sit y in t er pola t ion (see ch a pt er 8) over la id wit h STAC ellip ses a r e good
wa ys t o over com e t h is pr oblem (figu r e 7.5 a bove a n d figu r e 7.6 be low).
2.
STAC is ba sed on t h e dist r ibu t ion of da t a point s. N eit h er la n d u se n or r isk
fa ct or s is accou n t ed for . It is up to t h e a n a lys t t o id en t ify t h e ch a r a ct er is t ics
t h a t m a ke a H ot Sp ot ‘h ot ’.
3.
Sm a ll ch a n ges in t h e STAC st u dy ar ea bou n da r y ca n r esu lt in quit e differ en t
depict ion s of t h e ellipses. Th is is t r u e of a n y clu st er in g r out in e. Ret a in in g
t h e sa m e r efer en ce file over r epea t ed a n a lyses a lleviat es t h is pr oblem. Th e
an alysis should also be docum ented for t he an alysis par am eters.
Never t h eless, if u sed car efu lly, STAC is a p ower fu l tool for det ect ing clus t er s a n d
can allow an a na lyst t o experiment with var ying search ra dii an d reference boun dar ies.
Next, the K-mean s clustering rout ine is exam ined.
7.17
Figure 7.6:
STAC Robbery Hot Spots and Kernel Density Estimation
K-Me a n s P a rt it io n in g Clu s te rin g
Th e K -m ea n s clu s t er in g r ou t in e (Km ea n s ) is a pa r t it ion in g p r oced u r e wh er e t h e
d at a a r e gr ou ped in t o K gr ou ps defin ed by t h e u ser . A specified n u m ber of seed loca t ion s,
K, ar e defined by t h e u ser (Fis h er , 1958; Ma cQueen , 1967; Alden der fer a n d Bla sh field,
1984; Syst a t , 2000). The r ou t ine t r ies t o find t h e best position ing of t h e K cen t er s a n d t h en
assigns each point to the center tha t is nearest. Like the Nnh r ou t in e, t h e Km ea n s a ss ign s
poin t s t o on e, a n d on ly on e, clu st er . H owever , u n lik e t h e n ea r est n eigh bor h ier a r ch ica l
(N n h ) p r oced u r e, a ll p oin t s a r e a s sign ed t o clu s t er s . Th u s , t h er e is n o h ier a r ch y in t h e
rout ine, th at is th ere ar e no second- an d higher-order clust ers.
Th e t echn iqu e is u seful wh en a u ser wa n t t o cont r ol t h e gr oup in g. For exa m ple, if
t h er e a r e 10 pr ecin ct s in a ju r is dict ion , a n a n a lyst m igh t wa n t t o id en t ify t h e 10 m ost
com pa ct clus t er s, on e for pr ecinct. Alter n a t ively, if a pr eviou s a n a lysis h a s sh own t h er e
wer e 24 clu st er s, t h en a n a n a lyst cou ld check wh et h er t h e clus t er s h a ve sh ifted over t im e
by also ask ing for 24 clus t er s. By defin ition, th e t ech n ique is somewh a t a r bitr a r y since th e
u ser defin es h ow m a n y clu st er s a r e t o be exp ect ed. Wh et h er a clu st er cou ld be a ‘h ot spot ’
or n ot wou ld dep en d on t h e ext en t t o wh ich a u ser wa n t ed to r ep lica t e ‘h ot sp ot s ’ or n ot .
Th e t h eor y of t h e K-m ea n s pr ocedu r e is rela t ively st r a igh t for wa r d. The
im p lem en t a t ion is m or e com p lica t ed . K-m ea n s r ep r es en t s a n a t t em p t t o d efin e a n op t im a l
n u m ber of K loca t ion s wh er e t h e su m of t h e dist a n ce fr om every poin t t o ea ch of t h e K
cen t er s is m in im ized. It is a va r ia t ion of t h e old loca t ion t h eor y p a r a digm of how t o loca t e
K fa cilit ies (e.g., police st a t ion s, h ospit a ls , s h oppin g cen t er s) given t h e dis t r ibu t ion of
popu la t ion (H a gget t , Cliff, a n d F r ey, 1977). Th a t is, h ow does one iden t ify sup ply loca t ion s
in r ela t ion t o d em a n d loca t ion s. In t h eor y, solvin g t h is qu est ion is a n em pir ica l s olu t ion ,
what is frequent ly called global optim ization . On e t r ies ever y com bin a t ion of K object s
wh er e K is a su bs et of t h e t ota l popu la t ion of incide n t s (or people), N , a n d m ea s u r es t h e
dist a n ce fr om every inciden t poin t t o every on e of t h e K locat ion s. Th e pa r t icula r
combinat ion which gives the minima l sum of all dista nces (or a ll squar ed dista nces) is
con sid er ed t h e best solu t ion . In pr a ct ice, h owever , s olvin g t h is is com pu t a t ion a lly a lm ost
im possible, pa r t icu la r ly if N is lar ge. For exa m ple, wit h 6000 inciden t s gr ou ped in t o 20
pa r t itions (clus t er s), on e ca n n ot solve th is with a n y norm a l com pu t er sin ce t h er e a r e
6000!
-------------- = 1.456 x 105 7
20! 5980!
com bin a t ion s . N o com p u t er ca n s olve t h a t n u m ber a n d few s pr ea d sh eet s ca n ca lcu la t e t h e
fa ct or ia l of N gr ea t er t h a n abou t 127. 5 In oth er wor ds , it is a lm ost im possib le t o solve
com p u t a t ion a lly.
P r a ctica lly, th er efor e, t h e differ en t im plem en t a t ion s of t h e K-m ea n s r out in e a ll
m a ke in itia l gu esses a bou t t h e K loca t ion s a n d t h en op t im ize t h e s ea t in g of t h is loca t ion in
relation t o th e nearby points. This is called local optim ization. U n for t u n a t ely, ea ch Km ea n s r ou t ine h a s a differ en t wa y to define t h e init ial loca t ion s so th a t t wo K-m ea n s
7.19
pr oced u r es will u su a lly n ot pr odu ce t h e sa m e r esu lt s, even if K is ident ica l (E verit t , 1974;
Sys t a t , In c., 1994).
C r i m eS t a t K-m e a n s R ou ti ne
Th e K-m ea n s r ou t in e in Crim eS tat a lso m a kes a n init ial gues s a bou t t h e K loca t ion s
a n d t h en opt im izes t h e dist r ibu t ion locally. Th e pr ocedu r e t h a t is a dopt ed m a k es in it ia l
est ima t es a bou t loca t ion of t h e K clus t er s (seeds), a ssign s a ll poin t s t o its n ea r est seed
loca t ion , r e-ca lcu la t es a cen t er for ea ch clu s t er wh ich becom es a new s eed , a n d t h en
r epea t s t h e pr ocedu r e a ll over a gain . The pr ocedu r e st ops wh en t h er e a r e very few ch a n ges
t o t h e clu s t er com p os it ion . 6
Th e defau lt K-mea n s clu st er in g r out in e follows a n a lgor it h m for gr oup in g a ll point
loca t ion s in t o one, a n d only one, of t h ese K gr ou ps. Th er e a r e t wo gen er a l s t eps: 1) t h e
id en t ifica t ion of a n in it ia l gu ess (seed) for t h e loca t ion of t h e K clu st er s, a n d 2) loca l
op t im iza t ion wh ich a s sign s ea ch p oin t t o t h e n ea r es t of t h e K clu s t er s . A gr id is over la id
on t h e da t a set a n d t h e n u m ber of point s fa lling wit h in ea ch gr id cell is cou n t ed. Th e gr id
cell with t h e m ost point s is t h e in it ia l fir st clu st er . Th en , t h e secon d in it ia l clu st er is t h e
grid cell with th e next m ost points t ha t is separ at ed by at least:
Sep a r a t ion =
A
t * 0.5 * SQRT [ ------- ]
N
(7.1)
wh er e t is t h e St u den t ’s t -va lu e for t h e .01 sign ifica n ce level (2.358), A is t h e a r ea of th e
r egion, a n d N is t h e sa m ple size. A th ir d in it ia l clu st er is t h en selecte d wh ich is t h e gr id
cell with t h e t h ird m ost poin t s a n d is sep a r a t ed from t h e firs t t wo grid cells by at leas t t h e
sepa r a t ion fa ct or defin ed a bove. Th is pr ocess is r epea t ed u n t il a ll K in it ia l s eed loca t ion s
a r e ch os en .
The algorith m t hen condu cts local optim ization. It assigns each point t o th e nearest
of t h e K s eed locat ion s t o form a n in it ia l clu st er . For ea ch of t h e in it ia l clu st er s, it
calculat es the center of minimum dista nce and t hen r e-assigns a ll points t o th e nearest
cluster, based on t he distan ce to th e center of minimum dista nce. It r epeat s th is process
u n t il no poin t s ch a n ge clus t er s. To increa se t h e flexibility of t h e r out in e, t h e gr id t h a t is
overla id on t h e da t a poin t s is r e-sized to accom m oda t e differ en t clus t er st r u ct u r es,
in cre a sin g or d ecr ea sin g in size t o tr y t o fin d t h e K clu st er s. After it er a t in g t h r ough
differ en t gr id sizes, t h e code m a k es su r e t h a t t h e fin a l s eeds a r e fr om t h e "best " gr id or t h e
grid t h a t pr odu ces t h e m ost clus t er s. Fin a lly, for ea ch clus t er , th e r ou t ine calcu lat es a
st a n da r d deviat ion a l ellipse an d option a lly ca n ou t pu t t h e r esu lts gra ph ica lly a s eit h er
sta nda rd deviationa l ellipses or a convex hu lls.
7.20
Con trol o ve r Initi al Se lec tio n of Clus te rs
C h a n g in g t he sep a r a t i on b et w e en cl u st er s
On e pr oblem wit h t h is a ppr oa ch is t h a t in h igh ly con cen t r a t ed dis t r ibu t ion s, s u ch a s
wit h m os t cr im e in cid en t s in a m et r op olit a n a r ea , t h e s ep a r a t ion bet ween clu s t er s m a y n ot
be s u fficien t ly lar ge t o det ect clu st er s fa r t h er a wa y fr om t h e con cent r a t ion ; th e a lgor it h m
will t en d t o su b-divide con cen t r a t ed group ings of inciden t s in t o m u ltiple clus t er s r a t h er
t h a n seek clus t er s t h a t a r e less con cen t r a t ed a n d, us u a lly, fa r t h er a wa y. To increa se t h e
flexibilit y of t h e r out in e, Crim eS tat a llows t h e u ser t o m odify t h e in it ia l s elect ion of
clus t er s sin ce t h is h a s a lar ge effect on t h e fina l gr ou pin g (E veret t , 1974). Ther e a r e t wo
wa ys t h e in it ia l s elect ion of clu st er cen t er s ca n be m odified. Th e u ser ca n in cr ea se or
decrea se t h e sepa r a t ion fact or. F orm u la 7.1 is s t ill used t o sepa r a t e ea ch of t h e in it ia l
clu s t er s , bu t t h e u s er ca n eit h er s elect a t -va lu e fr om 1 t o 10 fr om t h e d r op down m en u or
wr ite in a n y nu m ber for t h e sepa r a t ion , in clud ing fr a ct ion s, t o increa se or decrea se t h e
se pa r a t ion bet ween t h e in it ia l clus t er s. Th e defa u lt is s et a t 4.
F igu r e 7.7 shows a sim u lat ion of eigh t clus t er s, fou r of wh ich h a ve higher
con cen t r a t ion s t h a n t he ot h er t wo. Two p ar t it ion s of t h e d at a set in t o eigh t gr ou ps ar e
sh own , on e u sin g a sep a r a t ion of 4 (da sh ed green ellipses) an d one with a sepa r a t ion of 15
(solid blu e ellip se s). As seen , t h e pa r t it ion wit h t h e la r ger sepa r a t ion ca pt u r es t h e eigh t
clu st er s bet t er . Wit h t h e sm a ller sepa r a t ion , t h e r ou t in e will t en d t o su b-divid e m or e
con cen t r a t ed clu st er s beca u se t h a t r ed uces th e d is t an ce of ea ch poin t fr om t h e clu st er
cen t er . Depend ing on t h e pu r pose of t h e pa r t itioning, a gr ea t er or lesser sepa r a t ion m a y be
desired.
S el ec ti n g t h e i n it i a l seed l oc a t ion s
Alter n a t ively, th e init ial clus t er s can be m odified t o a llow t h e u ser t o defin e t h e
actu al locat ions for t he initial cluster center s. This appr oach was u sed by Fr iedma n a nd
Rubin (1967) an d Ball an d Ha ll (1970). In Crim eS tat, th e user-defined locat ions a re
en t er ed wit h t h e secon da r y file wh ich list s t h e loca t ion of t h e in it ia l clu st er s. Th e r out in e
r ea ds t h e seconda r y file a n d u ses t h e n u m ber of poin t s in t h e file for K a n d t h e X/Y
coor d in a t es of ea ch p oin t a s t he in it ia l s eed loca t ion s . It t h en p r oceed s in t h e s a m e wa y
wit h loca l op t im iza t ion . Wh en eigh t p oin t s t h a t wer e a p pr oxim a t ely in t h e m id dle of t h e
eigh t clu st er s in figu r e 7.7 we r e in pu t a s t h e secon da r y file, t h e K-m ea n s r out in e
im m edia t ely id en t ified t h e eigh t clu st er s (r esu lt s n ot sh own ). Aga in , d epen din g on t h e
pu r pose th e u ser ca n t est a pa r t icu lar clus t er ing by requ irin g th e r ou t ine t o con sider t h a t
m odel, a t lea st for t h e in it ia l s eed loca t ion . Th e r ou t in e will con du ct loca l op t im iza t ion for
th e rest of th e clustering, as in th e above meth od.
Th e K-m ea n s ou t pu t is sim ila r for bot h r ou t in es. It in clu des t h e pa r a m et er s for t h e
st a n da r d devia t ion a l ellip se of ea ch clu st er in t h e t a ble. In a ddit ion , gr a ph ica lly on e ca n
out pu t ea ch clu st er a s a st a n da r d d evia t iona l ellipse or a s a con vex h u ll (see cha pt er 4).
Th e con vex h u ll dr a ws a polygon a r oun d a ll t h e poin t s in a clu st er . H en ce it is a lit er a l
7.21
Figure 7.7:
Separated Data and K-Means Solution
K=8 Paritions with Two Separations of Initial Seed Locations
!
Separation=4
!!
!
!
!
!
!
!!
!
!
!!
! !
! !
! !
!
!!
!
!
!
!
!
!!
!
! !
!
!
!
!
!
!
!
!
! !
!
!
!
!!
!
!
!
!
!
!
!
! !!
!
! !! !!
!
!
!!
! !!
!
! !
!
!! !
!! !!
!
!
!
!
!! !
!
! !
! !
!
!
!!
!
!
!
! !
! ! !! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
! !!
!!
!!!
!
!!
!
!
!
!
!
!
!
!
!!
!
! !!!
! !
! !
!
!
!
!!
!
!!
!
!
!
!!
!
!
!!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!! !
!
!
! !!
!!
!
!!
!
!
!
!!!
!
!
!
! !
!
!
!
!
! !
!
!
!
!
!
!
!!!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
! !!
!
!
!
!
!!!
!
!
! !
!
! !!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!!
!!
!
!
!!!
!
!
!!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
! !!
! !
!
!
!
!
!
!
!
!
!
! !! ! !
!
!!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
! !
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
! !!
!!
!!
!!
!
!
!
!
!
!!
!
!
! ! !!!
! !
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!!
Separation=15
!
!
!
!
!
!
!
! ! !
!
!
!
!
!
!!!
!!
!
! !
!
!!
!
!
!!
!
!
!
!
!!
! !
!! ! !
!
!
! !
!
!
!
!
!
!
!
!
! !
! !
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!!!
!
! ! !!
!
!!
!
!
!
!!
!
!
!
!!
!
!
!!!
!
!
!!
! !
!!
!!
!
!
!!
!
!
!
!!
!!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!!
!
!
!
!! ! !!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!! !
!
!! !!
!
!
!!
!!
! !
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!!
!
!
!!
! !
!!
!
!!
!
!
!
!!
!
!
!
!
! !!
!
!
!
!
!
!!!!!
!!
!
!
!
!
!! !
! !
!
!
!
!!
! ! !!!
!
!!
!
!
!
!
!
!
!!
!!!!
!
!
!
!
!
!
!
!
! !
!
!
! !
!
!
! !
!
!
! !!
! !!
!
!
!
!
!
! !
!!
!
!
!
!
! !
!!
!
!
!
!!
!
!
!
! !
!
!
!
!
!
!
!
!
!
! !
!
!!!
!
!
!
!! !
!!
!
! !
!!
!
!
!
!
!
!
!
!
!!
!
Miles
0
5
10
!
!
!
!
!
!!!!
!
!
!!! ! !
!!
!
!
!
!
!
!
! !
!
!!
!
!
!!
!
! !!
!
!!
! !
! !
!!
!
!
! !
!!
! ! !
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!
! !
!!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
D
descr ip t ion of t h e ext en t of t h e clu st er . Th e ellip se, on t h e ot h er h a n d, is a n a bst r a ct ion for
t h e clu st er . Typ ica lly, on e st a n da r d devia t ion will cover m or e t h a n 50% of t h e ca ses, on e
a n d a h a lf s t a n da r d devia t ion s will cover m or e t h a n 90% of t h e ca ses, a n d t wo st a n da r d
devia t ion s will cover m ore t h a n 99% of th e cas es, alt h ough t h e exa ct p er cent a ge will
depen d on t h e dist r ibut ion . In gen er a l, u se a 1X st a n da r d deviat ion a l ellipse since 1.5X
a n d 2X st a n da r d deviat ion s can cr ea t e a n exaggera t ed view of t h e u n der lying clus t er . The
ellipse, after a ll, is an abstr action from t he points in th e cluster which m ay be ar ra nged in
a n ir r egu la r m a n n er . On t h e ot h er h a n d, for a r egion a l view, a con vex h u ll or a on e
st a n da r d deviat ion a l ellipse m a y not be very visible. The u ser h a s t o ba lan ce t h e n eed t o
a ccu r a t ely dis pla y t h e clus t er com pa r ed t o ma k in g it ea sier for a viewer t o un der st a n d it s
loca t ion .
Mea n s q u a r ed e r r or
In a ddition, the out put for each cluster lists t wo additiona l sta tistics:
Su m of squ a r es
of clus t er C
=
NC
SSE C =
G { [(XiC - Mea n XC ]2 + [YiC -Mea n YC ]2 }
(7.2)
i=1
Mean squa red
er r or of clus t er C = MSE C =
SSE C / (N C -1)
(7.3)
wh er e XiC is t h e X valu e of a poin t t h a t belon gs t o clu st er C, YiC is t h e Y va lu e of a point
t h a t belon gs t o clus t er C, Mean XC is t h e m ea n X valu e of clu st er C (i.e., of on ly th ose poin t s
belon gin g to C), Mean YC is th e mean Y value of cluster C, an d N C is t h e n u m ber of point s in
clu s t er C. Th er e is a ls o a t ot a l s u m of s qu a r es a n d a t ot a l m ea n s qu a r e er r or wh ich is
su m m ed over a ll clus t er s
Tota l Sum
of Squ a r es
=
G SSE C
(7.4)
C
Tot a l Mean
S qu a r ed Er r or
=
G SSE C /(N-K-1)
(7.5)
C
where SSE C is th e sum of squar es for cluster C, N is the tota l sam ple size, an d K is th e
n u m ber of clus t er s. Th e su m of squ a r es is t h e squ a r ed devia t ion s of ea ch clust er point
fr om t h e cen t er of m in im u m d is t a n ce wh ile t h e m ea n s qu a r ed er r or is t h e a ver a ge of t h e
squ a r ed deviat ion s for ea ch clus t er .
Th e su m of squ a r es (or su m of squ a r ed er r or s) is fr equ en t ly u sed a s a cr it er ia for
id en t ifyin g ‘good n ess of fit ’ (E ver et t , 1974; Ald en der fer a n d Bla sh field , 1984; Ger sh o a n d
Gr a y, 1992). In gen er a l, for a given n u m ber of clu st er s, K, t h ose wit h a sm a ller su m of
squ a r es a n d, cor r espond ingly, sm a ller m ea n squ a r e er r or a r e bet t er defin ed t h a n clus t er s
with a lar ger s u m of squ a r es a n d la r ger m ea n squ a r ed er r or . Similar ly, a K-mea n s
7.23
solu t ion t h a t pr odu ces a sm a ller overa ll su m of squ a r es is a t igh t er gr oup in g t h a n a
gr oup in g t h a t pr odu ces a la r ger overa ll su m of squ a r es .
Bu t , t h er e ca n be except ion s. If t h er e a r e poin t s wh ich a r e ‘ou t lier s’, t h a t is wh ich
don’t obviou sly fa ll int o on e clus t er or a n ot h er , re-ass ign ing t h em t o on e or a n ot h er clus t er
ca n d is t or t t h e s u m of s qu a r es st a t is t ics . Als o, in h igh ly con cen t r a t ed dis t r ibu t ion s , s u ch
a s w it h crim e in ciden t s, a sm a ller su m of squ a r es crit er ia can be obt a in ed by split t in g t h e
con cen t r a t ion s r a t h er t h a n clu s t er in g les s cen t r a l a n d les s d en s e gr ou p s of in cid en t s (s u ch
a s in figu r e 7.7); t h e r esu lts , while min imizing t h e su m of squ a r ed er r or s from t h e clus t er
center s, will be less desirable becau se the peripheral clusters a re ignored. Thus, these
st a t ist ics a r e pr esen t ed for t h e u ser ’s in for m a t ion on ly. In a ssign ing poin t s t o clus t er s,
Crim eS tat st ill u ses t h e dis t a n ce t o t h e n ea r est seed loca t ion , r a t h er t h a n a solu t ion t h a t
minimizes the sum of squar ed dista nces.
Visua lizing the Clus ter
F in a lly, t h e K-m ea n s clu st er in g r out in e (Km ea n s) out pu t s clu st er s gr a ph ically a s
eit h er ellipse s or con vex h u lls, s im ila r t o th e oth er clus t er in g r out in es . F or t h e ellip se s, t h e
u ser ca n ch oose bet ween 1X, 1.5X, an d 2X st a n da r d deviat ion s t o displa y th e ellipses. Th e
gra ph ica l ellipses a r e ou t pu t with t h e pr efix ‘KM’ befor e t h e file n a m e. It s h ou ld be noted ,
h owever, t h a t t h e ellipses a r e a n a bst r a ct ion of t h e clus t er . The clus t er s a r e n ot
n ecessa r ily a r r a n ged in ellipses . They ar e for visu a liza t ion pu r poses on ly. For t h e con vex
h u ll, t h e r ou t ine dr a ws a polygon a r ou n d t h e poin t s in ea ch clus t er . The gra ph ica l con vex
h u lls ar e ou t pu t with t h e pr efix “CKM” befor e t h e file n a m e.
K-m e a n s Ou tp u t F ile s
Th e n a m in g s ys t em for t h e K-m ea n s ou t pu t s is sim pler t h a n t h e Nn h r ou t in e sin ce
t h er e a r e n o h igh er -or der clus t er s. Ea ch file is n a m ed
Km <usernam e>
Ck m <usernam e>
[for t h e ellipse]
[for t h e con vex h u ll]
wh er e usernam e is t h e n a m e of t h e file pr ovided by t h e u ser . With in t h e file, ea ch clus t er
is n a m ed
KmEll<N><usernam e>
[for t h e ellipse]
CkmHull<N><usernam e> [for t h e con vex h u ll]
wh er e N is th e cluster n um ber a nd usernam e is t h e n a m e of t h e file pr ovided by t h e u se r .
F or exa m ple ,
Km E ll3robber y
is t h e t h ir d ellips e for t h e file calle d ‘r obber y’ a n d
7.24
Ck m H u ll12bu r gla r y
is t h e 12 t h con vex h u ll for t h e file calle d ‘bu r gla r y’.
F or t h e ellipses, a slide-ba r a llows ellipses t o be defined for 1X, 1.5X, an d 2X
st a n da r d d evia t ion s a n d ca n be out pu t in ArcView ‘.sh p’, M apIn fo ‘.m if’ or Atlas*GIS ‘.bna ’
form at s. The convex hu lls, on t he oth er ha nd, draw a polygon a roun d th e clustered points.
Exam ple 2: K-me an s Clus tering of Street Robbe ries
I n Crim eS tat, t h e u ser sp ecifies t h e n u m ber of gr oup s t o su b-divide t h e da t a . Us in g
t h e 1996 r obber y in cid en t s for Ba lt im or e Cou n t y, t h e da t a wer e pa r t it ion ed in t o 10 gr ou ps
wit h t h e K-m ea n s r out in e (figu r e 7.8). As ca n be s een , t h e clust er s t en d t o fa ll a long t h e
bor der with Balt imore Cit y. But t h er e a r e t h r ee m or e disper sed clust er s, on e con cen t r a t ed
in t h e cen t r a l ea st er n pa r t of t h e cou n t y a n d t wo n or t h of t h e bor der wit h t h e Cit y.
Beca u se t h ese clu st er s a r e very lar ge, a finer m esh clus t er ing wa s con du ct ing by
pa r t itioning t h e da t a int o 31 clus t er s (figu r e 7.9). Thir t y-five clus t er s wer e r equ est ed bu t
t h e r ou t in e on ly fou n d 31 seed loca t ion . Con sequ en t ly, it ou t pu t t ed 31 clu st er s, wh ich a r e
dis pla yed a s ellips es . Th ough t h e ellip se s a r e st ill la r ger t h a n t h ose p r odu ced by t h e
n ea r est n eigh bor h ier a r ch ica l p r ocedu r e (see figu r e 6.7 in ch a pt er 6), t h er e is som e
con gr u en cy; clu st er s id en t ified by t h e n ea r est n eigh bor p r ocedu r e h a ve cor r espon din g
ellipses u sin g th e K-m ea n s pr ocedu r e.
F igu r e 7.10 shows a section of sout h west Balt imore Coun t y with fou r fu ll clus t er s
a n d t h r ee pa r t ia l clu st er s visible, d is pla yed a s con vex h u lls. Lookin g a t t h e dis t r ibu t ion ,
se ver a l clus t er s m a k e in t u it ive sen se wh ile a cou ple of oth er s d o not . For exa m ple , t wo
clus t er s h igh ligh t a con cent r a t ion a long a m a jor a r t er ia l (U.S. H igh wa y 40). Sim ila r ly, t h e
clu st er in t h e m id dle r igh t a pp ea r s t o ca pt u r e in cid en t s a lon g t wo a r t er ia l r oa ds . H owever ,
t h e ot h er t h r ee fu ll clus t er s do not a ppea r t o ca pt u r e m ea n ingfu l pa t t er n s a n d a ppea r
som ewh a t a r bit r a r y.
Ot h er u ses of t h e K-m ea n s a lgor it h m a r e possible. On e pr oblem t h a t a ffect s m ost
police depa r t m en t s is t h e n eed t o a lloca t e per sonn el th r ou ghout a city in a ba lan ced a n d
fa ir wa y. Too oft en , s om e police pr ecin ct s or dis t r ict s a r e over bu r den ed wit h Ca lls for
Ser vice wh er ea s oth er s h a ve more m oder a t e dem a n d. The iss u e of r e-dr a wing or r ea ssign in g p olice bou n da r ies in or der t o r e-est a blish ba la n ce is a con t in u a l on e for police
depa r t m en t s. The K-mea n s a lgor ith m ca n h elp in defining t h is ba lan ce, th ou gh t h er e a r e
m a n y ot h er fa ct or s t h a t will a ffect pa r t icu la r bou n da r ies. Th e n u m ber of gr ou pin gs , K, ca n
be ch osen ba sed on t h e n u m ber of police dist r ict s t h a t exist or t h a t a r e desir ed. The
locat ion s of division or p r ecinct st a t ion s ca n be en t er ed in a secon da r y file in ord er t o define
t h e init ial ‘seed’ loca t ion s. The K-mea n s r ou t ine can t h en be ru n t o a ssign a ll inciden t s t o
ea ch of th e K gr oup s. Th e a n a lyst can va r y t h e loca t ion of th e in it ia l seeds or , even , t h e
n u m ber of groups in order t o explor e differ en t a r r a n gemen t s in spa ce. Once an a greed
u pon solu t ion is fou n d, it is ea sy t o t h en r e-a ssign police beat s t o fit t h e n ew a r r a n gemen t .
7.25
Figure 7.8:
Baltimore County Robbery 'Hot Spots'
Using K-Means Routine with K=10 Clusters
!
!
!
!
!
Baltimore County
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
! !!
!
!!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !! !!!
! !
!!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
! !
!!
!
!
!
!
!
!
!!
! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
City of Baltimore
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!
!
!!!
!!!
!
!
!
!
!!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!!
! !
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
! !
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
4
!
!
!
!
!
!
!
!
!
2
!
!
!
!! !
!
!
!
!
!
!
!
!
!!
0
!
!
!
!
!
!
!
Miles
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!! !!
!
!
!
!! !!
!
! !
!
!
!
!
!
!
!
!
!
!
Figure 7.9:
Baltimore County Robbery 'Hot Spots'
Using K-Means Routine with K=31 Clusters
!
!
!
!
!
Baltimore County
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !! !
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
! !
!!
!
!
!
!!
! ! !!!! !!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! ! !!!
!!
!!!
!
!! !
!
!
!
!
!
!
!
!
!
! !
!! !
!!
! !
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
City of Baltimore
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!!
!!!
!
!
!
!!
!
!
!
!
!
!
!
!
! !! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!
!!!
!
!
!
!!
! !!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
! !
!
!
!!
!
4
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
2
!
!
!
!
!
!
!
!
0
!
!
!
!
!
!
!
!
!
Miles
!
!
!
!
!
!
!
!
!
!!
!
!
! !
! !
!
!
!!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!!!
!
!
!!
!
!
!!!
!!!!
!
!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
Figure 7.10:
Southwest Baltimore County Robbery 'Hot Spots'
Using K-Means Routine with K=31 Clusters
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! ! !
!!
!!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
C
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
Miles
!
!
!
!
!
! !
! !
! !
!
!!
0
1
!
!
!
!
!
2
!
!
!
!
!
!
!
!
! !
!!
! !
! !!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
Adva n tag e s a n d D isa dv an tag e s o f the K-m e an s P roc e du re
In sh or t , t h e K-m ea n s pr ocedu r e will d ivid e t h e da t a in t o t h e n u m ber of gr ou ps
specified by t h e u ser . Wh et h er t h ese gr ou ps m a k e a n y s en se or n ot will d epen d on h ow
ca r efu lly t h e u ser h a s selected clu st er s. Choosin g too m a n y will lea d t o defin ing pa t t er n s
t h a t don ’t r ea lly exist wh er ea s ch oosin g t oo few will lea d t o poor differ en t ia t ion a m on g
n eigh borh oods t h a t a r e dist in ctly d iffer en t .
It is t h is ch oice t h a t is bot h a st r en gth of t h e t ech n ique a s well as a wea kn ess. Th e
K-m ea n s pr ocedu r e pr ovides a grea t dea l of con t r ol for t h e u ser a n d can be us ed a s a n
exp lor a t or y t ool t o id en t ify possible ‘h ot spot s’. Wh er ea s t h e n ea r est n eigh bor h ier a r ch ica l
m et h od pr odu ces a solu t ion ba sed on geogra ph ica l pr oxim ity with m ost clus t er s being very
sm a ll, t h e K-m ea n s ca n a llow t h e u ser t o con t r ol t h e size of t h e clu st er s. In t er m s of
policing, th e K-m ea n s is bet t er su ited for defin ing lar ger geogra ph ica l ar ea s t h a n t h e
n ea r est n eigh bor m et h od, per h a ps m ore a pp r opr ia t e for a pa t r ol a r ea t h a n for a pa r t icula r
‘h ot spot’. Again , if ca r efu lly u sed, t h e K-m ea n s gives t h e u ser t h e a bilit y to ‘fine t u n e’ a
pa r t icu la r m odel of ‘h ot spot s’, a dju st in g t h e size of t h e clu st er s (vis-a -via t h e n u m ber of
clu s t er s select ed ) in or d er t o fit a p ar t icu la r p a t t er n wh ich is kn own .
Yet it is t h is s a m e flexible ch a r a cter ist ic t h a t m a k es t h e t echn iqu e poten t ia lly
d ifficu lt t o u se a n d p ron e t o m is us e. Sin ce t h e t ech n iqu e will d ivid e t h e d at a set in t o K
gr ou p s, t h er e is n o a s su m p t ion t h a t t h es e K groups r epr esen t r ea l ‘h ot spots ’ or n ot . A u ser
ca n n ot ju st a r bit r a r ily p ut in a n u m ber a n d exp ect it t o p rod uce m ea n in gfu l r es ult s . A m or e
ext en sive d iscu ss ion of t h is is su e can be fou n d in Mu r r a y a n d Gr u bes ic (2002). Gr u bes ic
a n d Mu r r a y (2001) pr es en t som e n ewer a pp r oach es in t h e K-m ea n s m et h odology.
Th e t ech n iqu e is , t h er efor e, bet t er seen a s bot h a n exp lor a t or y t ool a s well a s a t ool
for r efin in g a ‘h ot s pot ’ s ea r ch . If t h e u s er h a s a good id ea of wh er e t h er e s h ou ld be ‘h ot
spots ’, based on com m u n ity exper ience an d t h e r eport s of beat officer s, t h en t h e t ech n ique
can be u sed t o see if th e in ciden t s a ctu a lly corr espon d t o th e per cept ion . It a lso ca n h elp
iden t ify ‘h ot s pot s’ wh ich h a ve n ot been per ceived or id en t ified by officer s. Alter n a t ively, it
ca n id en t ify ‘h ot s pot s ’ t h a t don ’t r ea lly exis t a n d wh ich a r e m er ely by-p r od u ct s of t h e
st a t ist ica l pr ocedu r e. Exper ience an d sen sit ivity a r e n eeded t o kn ow wh et h er a n ident ified
‘h ot sp ot ’ is rea l or n ot .
An se lin ’s Loc al Moran St at is tic (LMora n )
Th e la st ‘h ot s pot ’ t echn iqu e in Crim eS tat is a zona l techn ique called t h e A n selin ’s
Local Moran st a t ist ic a n d wa s developed by Lu c An selin (1995). Un like t h e n ea r est
n eigh bor h iera r ch ica l an d K-mea n s pr ocedu r es, th e loca l Mor a n st a t ist ic r equ ires da t a t o
be a ggr ega t ed by zon es, s u ch a s cen su s block gr ou ps, zip codes, p olice r epor t in g a r ea s or
oth er a ggrega t ion s. Th e pr ocedu r e a pp lies M ora n ’s I st a t ist ic t o in dividu a l zon es, allowin g
t h em t o be iden t ified a s s im ila r or d iffer en t t o th eir n ea r by pa t t er n .
7.29
K-Means Clustering as an Alternative Measure of Urban Accessibility
Richard J. Crepeau
Department of Geography and Planning
Appalachian State University
Boone, NC
The relationship between land use and the transportation system is an
important issue. Many planners recognize that transportation policies, practices
and outcomes affect changes in land use, and vice versa, but there is disagreement
as to how best to describe this phenomenon. Traditional methods include measures
of accessibility via a matrix of zones (tracts, traffic analysis zones, etc.). However,
there are limits to the way interaction and accessibility is described with such
discrete units.
Through the use of K-Means clustering, an alternate measure of accessibility
can be calculated. Rather than relying on census geography, the left map shows ten
retail clusters in San Diego County (1995) as calculated by CrimeStat’s K-Means
clustering technique (using 1x standard deviational ellipse). The retail hot spots
were calculated using a geocoded point file of retail establishments in the county.
These clusters are not bound by census geography and allow a more realistic
appraisal about the attractiveness of specific regions within the county. An analyst
can then determine if residential location within a hot spot has an effect on travel
patterns, or if there is a relationship between proximity to a hot spot and travel
behavior. While this example illustrates a measure of regional retail attractiveness,
the flexibility of CrimeStat allows an analyst to evaluate these relationships on a
local level, thus allowing a scope of inquiry from regional to local accessibility (as
shown in right map, which uses the same parameters as the left figure, but limiting
its sample to retail in a sub-region of San Diego County noted by the arrow).
Regional Hot Spots
Local Hot Spots
Hot Spot Verification in Auto Theft Recoveries
Bryan Hill
Glendale Police Department
Glendale, AZ
We use CrimeStat as a verification tool to help isolate clusters of activity
when one application or method does not appear to completely identify a problem.
The following example utilizes several CrimeStat statistical functions to verify a
recovery pattern for auto thefts in the City of Glendale (AZ). The recovery data
included recovery locations for the past 6 months in the City of Glendale which were
geocoded with a county-wide street centerline file using ArcView.
First, a spatial density “grid” was created using Spatial Analyst with a grid
cell size of 300 feet and a search radius of 0.75 miles for the 307 recovery locations.
We then created a graduated color legend, using standard deviation as the
classification type and the value for the legend being the CrimeStat “Z” field that is
calculated.
In the map, the K-means (red ellipses), Nnh (green ellipses) and Spatial
Analyst grid (red-yellow grid cells) all showed that the area was a high density or
clustering of stolen vehicle recoveries. Although this information was not new, it did
help verify our conclusion and aided in organizing a response
Th e ba s ic con cep t is t h a t of a local in d icator of spatial association (LIS A) a n d h a s
been dis cu ssed by a n u m ber of r esea r ch er s (Ma n t el, 1967; Get is , 1991; An selin , 1995). F or
exam ple, An selin (1995) defin es t h is a s a n y st a t ist ic t h a t sa t isfies t wo r equ irem en t s:
1.
Th e LIS A for ea ch obs er va t ion in dica t es th e ext en t t o wh ich t h er e is
s ign ifica n t s pa t ia l clu s t er in g of s im ila r va lu es a r ou n d t ha t obs er va t ion ; a n d
2.
Th e su m of t h e LIS As for a ll obser vat ion s is pr oport ion a l to th e globa l
in d ica t or of s pa t ia l a s socia t ion .
L i = f(Yi , YJ i)
(7.6)
wh er e L i is t h e loca l in dica t or, Yi is t h e valu e of a n int en sit y va r iable a t
loca t ion i, a n d YJ i a r e t h e va lu es obs er ved in t h e n eigh bor h ood J i of i.
I n ot h er wor d s, a LIS A is a n in dica t or of th e ext en t t o wh ich t h e va lu e of an
obser vat ion is sim ilar or differ en t fr om its n eigh bor ing obser vat ion s. This r equ ires t wo
con dit ion s. F ir s t, t h a t ea ch obs er va t ion h a s a va r ia ble va lu e t h a t ca n be a ss ign ed to it (i.e.,
a n in t en sit y or a weigh t ) in a dd it ion t o it s X an d Y coor din a t es. For cr im e in ciden t s, t h is
m ea n s d a t a t h a t a r e a ggrega t ed in t o zones (e.g., nu m ber of in ciden t s by cen su s t r a cts , zip
codes, or police r eport ing dist r ict s). Secon d, th e n eigh borh ood h a s t o be defined. Th is
cou ld be eit h er a dja cen t zon es or a ll ot h er zon es n ega t ively weigh t ed by t h e dis t a n ce fr om
t h e obse r va t ion zon e.
On ce t h ese a r e defin ed, th e LIS A in dica t es t h e va lu e of t h e obser va t ion zon e in
r ela t ion t o it s n eigh borh ood. Th u s, in n eigh borh oods wh er e t h er e a r e ‘h igh’ in t en sit y
valu es, th e LIS A in dica t es wh et h er a pa r t icu la r obs er va t ion is sim ila r (i.e., a ls o ‘h igh ’) or
differ en t (i.e., low) a n d, con vers ely, in n eigh bor h oods wh er e t h er e a r e ‘low’ int en sit y va lues ,
t h e LIS A ind ica t es wh et h er a pa r t icu lar obser vat ion is sim ilar (i.e., a lso ‘low’) or differ en t
(i.e., ‘h igh ’). Tha t is, th e LIS A is a n in dica t or of sim ila r it y, n ot a bsolu t e va lu e of th e
int en sit y va r iable.
Form al Definition of Local Moran S tatistic
Th e I i s t a t i s t i c
An selin (1995) h a s a pplied t h e con cept t o a n u m ber of spa t ia l a u t ocor r ela t ion
st a t ist ics. The m ost com m only u sed, wh ich is in clud ed in Crim eS tat, is An selin ’s Loca l
Mor a n st a t ist ic, Ii , th e use of Mora n’s I stat istic as a LIS A. Th e defin it ion of I i is (fr om
Get is a n d Or d, 1996):
_
Ii =
(Zi - Z)
N
_
----------------- * G [ W ij * (Zj - Z) ]
S Z2
j=1
7.32
(7.7)
_
wh er e Z is t h e m ea n in t en sit y over a ll obser va t ions , Zi is t h e in t en sit y of obser va t ion i, Zj is
in t en sit y for a ll ot h er obser va t ion s, j (wh er e j =/ i), S Z2 is th e var iance over all observat ions,
an d W ij is a dis t a n ce weigh t for t h e in t er a ction bet ween obser va t ion s i a n d j. Not e, t h e firs t
t er m r efer s only to obser vat ion i, wh ile t h e secon d t er m is t h e su m of t h e weigh t ed valu es
for a ll ot h er obs er va t ion s (bu t n ot in clu din g i it self).
D i st a n c e w e i gh t s
Th e we igh t s, W ij , ca n be eit h er a n in dica t or of t h e a dja cen cy of a zon e t o t h e
obser va t ion zon e (i.e., ‘1' if a dja cen t ; 0 if n ot a dja cen t ) or a dis t a n ce-ba sed weigh t wh ich
decrea ses wit h dis t a n ce bet ween zon es i a n d j. Adja cency indices a r e u seful for definin g
n ea r n eigh bor h oods; th e a djacent zon es h a ve fu ll weight wh ile a ll ot h er zon es h a ve no
weigh t . Dis t a n ce weigh t s, on t h e ot h er h a n d, a r e u sefu l for defin in g s pa t ia l in t er a ct ion ;
zon es wh ich a r e fa r t h er a wa y ca n h a ve a n in flu en ce on a n obs er va t ion zon e, a lt h ou gh on e
t h a t is m u ch less . Crim eS tat u se s d ist a n ce weigh t s, in t wo for m s.
F ir st , t h er e is a t r a dit ion a l d is t a n ce deca y fu n ct ion :
1
W ij = ------------d ij
(7.8)
wh er e d ij is t h e dis t a n ce bet ween t h e obs er va t ion zon e, i, a n d a n ot h er zon e, j. Th u s, a zon e
wh ich is t wo m iles a wa y h a s h a lf th e we igh t of a zone t h a t is on e m ile a wa y.
S m a l l d i s ta n c e a d j us tm e n t
Secon d, t h er e is a n a dju st m en t for sm a ll dist a n ces. Dep en din g on t h e dist a n ce scale
u sed (m iles, k ilom et er s, m et er s), t h e weigh t in dex becom es pr oblem a t ic wh en t h e dis t a n ce
fa lls below 1 (i.e., below 1 m ile, 1 kilom et er ); t h e weigh t t h en in cr ea ses a s t h e dis t a n ce
decr ea ses, goin g t o in fin it y for d ij = 0. To corr ect for t h is, Crim eS tat includes an
a djus t m en t for sm a ll dist a n ces so th a t t h e m a xim u m weigh t ca n be never be grea t er t h a n
1.0 (see ch a pt er 4). Th e a dju st m en t sca les dis t a n ces t o on e m ile. Wh en t h e sm a ll dis t a n ce
a djus t m en t is t u r n ed on, th e m inim a l dist a n ce is scaled a u t om a t ica lly t o be on e m ile. The
form ula used is
on e m ile
W ij = --------------------on e m ile + d ij
(7.9)
in wh ichever u n it s a r e specified .
7.33
S i m i l a r i t y or d i s si m i l a r i t y
An exa ct t es t of s ign ifica n ce h a s n ot been wor k ed ou t beca u s e t h e d is t r ibu t ion of t h e
st a t is t ic is n ot kn own . Th e expect ed va lu e of I i a n d t h e var ian ce of I i a r e som ewh a t
com plica t ed (see en dn ot e 7 for t h e for m u la s).7 In st ea d, h igh posit ive or h igh
negat ive sta nda rdized scores of I i, Z(I i ), a r e t a k en a s in dica t or s of sim ila r it y or
dis sim ila r it y. A h igh positive st a n da r dized scor e in dica t es t h e spa t ia l clu st er in g of sim ila r
va lu es (eit h er h igh or low) wh ile a h igh negative st a n da r dized scor e in dica t es a clu st er in g
of dis sim ila r va lu es (high r ela t ive t o a n eigh borh ood t h a t is low or , con ver se ly, low rela t ive
t o a n eigh bor h ood t h a t is high ). The h igh er t h e st a n da r dized scor e, th e m or e t h e
obser va t ion is sim ila r (posit ive) or dis sim ila r (n ega t ive) t o it s n eigh bor s.
In ot h er wor ds, t h e Loca l Mor a n st a t is t ic is a good in dica t or of eit h er ‘h ot spot s’ or
‘cold spots ’, zon es wh ich a r e differ en t fr om t h eir n eigh bor h ood. ‘H ot spots ’ wou ld be seen
wh er e t h e n u m ber of in ciden t s in a zon e is m u ch h igher t h a n in t h e n ea r by zon es. ‘Cold
sp ots ’ would be s een wh er e t h e n u m ber of in ciden t s in a zon e is m u ch lower t h a n in t h e
n ea r by zon es. Th e Loca l Mora n st a t ist ic in dica t es wh et h er t h e zone is s im ila r or d iss im ila r
t o its n eigh bor s. A u ser m u st t h en look a t t h e a bsolu t e valu e of t h e zon e (i.e., t h e n u m ber
of in cid en t s in t h e zon e) t o s ee wh et h er it is a ‘h ot sp ot ’ or a ‘cold sp ot ’.
F or ea ch obs er va t ion, Crim eS tat ca lcu la t es t h e Loca l Mor a n s t a t is t ic a n d t h e
expect ed va lu e of t h e Local Mora n . If t h e v aria n ce box is checked, th e progra m will also
ca lcu la t e t h e va r ia n ce a n d t h e st a n da r dized Z-va lu e of t h e Loca l Mor a n . Th e defa u lt is for
t h e var ian ce n ot t o be ca lcu lat ed beca u se t h e ca lcu lat ion s a r e very int en se a n d m a y ta ke a
lon g t im e. Th er efor e, a u ser s h ou ld t es t h ow lon g it t a k es t o ca lcu la t e va r ia n ces for a s m a ll
sa m ple on a pa r t icu lar com pu t er befor e r u n n ing t h e var ian ce r ou t ine on a lar ge sam ple.
Exam ple 3: Local Moran S tatistics for Auto The fts
Usin g da t a on 14,853 m otor veh icle th eft s for 1996 in both Ba lt im ore Coun t y a n d
Ba lt im or e Cit y, t h e n u m ber of in cid en t s occu r r in g in ea ch of 1,349 cen su s block gr ou ps wa s
ca lcu la t ed wit h a GIS (F igu r e 7.11). As seen , t h e pa t t er n sh ows a h igh er con cen t r a t ion
t owar ds t h e cent er of t h e m et r opolit a n a r ea , as would be exp ected, bu t t h a t t h e pa t t er n is
n ot com plet ely u n ifor m . Th er e a r e m a n y block gr oup s wit h in t h e Cit y of Balt im ore w it h
ver y low n u m ber of a u t o t h eft s a n d t h er e a r e a n u m ber of block gr ou ps wit h in t h e Cou n t y
wit h a ver y h igh n u m ber .
U sin g t h es e d a t a , Crim eS tat ca lcu la t ed t h e Loca l Mor a n s t a t is t ic wit h t h e va r ia n ce
box bein g ch ecked a n d t h e sm a ll dis t a n ce a dju st m en t be in g u sed. Th e r a n ge of I i valu es
va r ied fr om -37.26 t o +180.14 wit h a m ea n of 5.20. Th e pseu do-st a n da r dized Loca l Mor a n
‘Z’ va r ied from -12.71 t o 50.12 wit h a m ea n of 1.61. Figu r e 7.12 m a ps t h e dist r ibu t ion.
Beca u se a n egat ive I i value indicat es dissimilarity, th ese values ha ve been dr awn in r ed,
com pa r ed t o blue for a positive I i valu e. As seen , in bot h t h e City of Balt imore a n d t h e
Cou n t y of Balt imore, t h er e a r e block gr ou ps wit h lar ge nega t ive I i va lu es , in dica t in g t h a t
t h ey d iffer fr om t h eir su r r ou n din g block gr ou ps. F or exa m ple, in t h e cen t r a l p a r t of
Ba lt im or e Cit y, t h er e is a sm a ll a r ea of abou t eigh t block gr ou ps wit h low n u m ber s of au t o
7.34
Figure 7.11:
1996 Motor Vehicle Thefts
Number of Auto Thefts Per Block Group
Auto Thefts
10 or fewer thefts
Baltimore County
11-20 thefts
21-30 thefts
31-40 thefts
41-50 thefts
51 or more thefts
City of Baltimore
Howard County
Miles
0
2
4
t h eft s, comp a r ed t o th e su r r oun din g block gr oup s. Th es e for m a ‘cold s pot ’. Cons equ en t ly,
t h ey a pp ea r in da r k t ones in figur e 7.12 in dica t in g t h a t t h ey h a ve h igh I i va lu es (i.e.,
n ega t ive a u t ocor r ela t ion ). Sim ila r ly, t h er e a r e sever a l block gr ou ps on t h e west er n sid e of
t h e Coun t y wh ich h a ve r ela t ively high n u m ber s of a u t o th efts com pa r ed t o th e su r r oun din g
block groups. They form a ‘hot spot’. Consequent ly, th ey also appear in da rk tones in
figu r e 7.12 beca u se t h is indicat es n egat ive spa t ial a u t ocor r elat ion , ha vin g va lues t h a t a r e
dissimilar t o th e surr oun ding blocks.
An ot h er u se of An selin ’s Loca l Mor a n st a t is t ic is to id en t ify ‘ou t lier s ’, zon es th a t a r e
ver y differ en t from t h eir n eigh bors. I n t h is ca se , zones w it h a h igh n ega t ive I va lu e (e.g.,
wit h a n I sm a ller t h a n t wo st a n da r d devia t ion s below t h e m ea n , -2) a r e in dica t ive of
ou t lier s. Th ey eit h er h a ve a h igh n u m ber of in cid en t s wh er ea s t h eir n eigh bor s h a ve a low
n u m ber or , t h e opposit e, a low n u m ber of in cid en t s a m id st zon es wit h a h igh n u m ber of
in ciden t s. Iden t ifyin g t h e out lier s ca n focus on zones wh ich a r e u n iqu e (an d wh ich s h ould
be st u died) or , in m u ltivar iat e a n a lysis, on zon es wh ich n eed t o be st a t ist ica lly t r ea t ed
differ en t in or der t o m in im ize a la r ge m odelin g er r or (e.g., cr ea t in g a du m m y va r ia ble for
t h e ext r em e out lier s in a r egr ession m odel).
In sh or t , t h e Loca l Mor a n st a t is t ic ca n be a u sefu l t ool for id en t ifyin g zon es wh ich
a r e diss im ila r from t h eir n eigh borh ood. It is t h e only s t a t ist ic t h a t is in Crim eS tat t h a t
dem ons t r a t es dis sim ila r it y. Th e oth er ‘h ot s pot ’ t ools w ill only id en t ify ar ea s w it h h igh
con cen t r a t ion s . To u s e t h e Loca l Mor a n s t a t is t ic, h owever , r equ ir es t h a t t he d a t a be
sum ma rized int o zones in order t o produce the necessary int ensity value. Given th at most
crim e in ciden t da t a ba ses will list in dividu a l even t s wit h out in t en sit ies, t h is will en t a il
a dd it iona l wor k by a la w en for cem en t a gen cy.
S o m e Th o u g h t s o n t h e Co n c e p t o f ‘H o t S p o t s ’
Ad va n ta ge s
Th e seven t ech n iques discuss ed in t h is a n d t h e las t ch a pt er h a ve bot h a dva n t a ges
a n d d isa dva n t a ges. Am ong t h e a dva n t a ges a r e t h a t t h ey a t t em pt t o isolat e a r ea s of high
con cen t r a t ion (or low con cen t r a t ion in t h e ca se of t h e Loca l Mor a n st a t ist ic) of inciden t s
a n d ca n , t h er efor e, h elp la w en for cem en t a gen cies focu s t h eir r es our ces on t h es e a r ea s.
On e of t h e powerful us es of a ‘h ot spot’ con cept is t h a t it is focu sed. It ca n pr ovide n ew
in for m a t ion a bout loca t ions t h a t police officer s or com m u n it y work er s m a y n ot r ecogn ize
(Ren ger t , 1995). Given t h a t m os t police d ep a r t m en t s a r e u n d er s t a ffed , a s t r a t egy t h a t
pr ior it izes in t er ven t ion is ver y a pp ea lin g. The ‘h ot s pot ’ con cept is im m in en t ly pr a ctica l.
An ot h er a d va n t a ge t o t h e id en t ifica t ion of ‘h ot s pot s ’ is t h a t t he t ech n iqu es
sys t em a t ica lly im plem en t a n a lgor it h m . In t h is sen se, t h ey m in im ize bia s on t h e pa r t of
officer s a n d a n a lys t s s in ce t h e t ech n iqu e op er a t es som ewh a t in dep en den t ly of
pr econ cept ion s. As h a s been m en t ion ed, h owever , t h ese t ech n iqu es a r e n ot t ot a lly wit h ou t
h u m a n ju dgem en t sin ce t h e u ser m u st m a k e decis ion s on t h e n u m ber of ‘h ot spot s’ a n d t h e
size of t h e sea r ch r a diu s, ch oices t h a t ca n a llow differ en t u ser s t o com e t o differ en t
7.36
Figure 7.12:
Local Spatial Autocorrelation of 1996 Vehicle Thefts
Local Moran Z-Value of Block Groups
LMoran Z-value
Z<-2.58
Baltimore County
Z>-2.58 and Z<=-1.96
Z>-1.96 and Z<=0
Z>0 and Z<=1.96
Z>1.96 and Z<=2.58
Z>2.58
No Information
Howard County
City of Baltimore
Miles
0
2
4
Using Local Moran’s I to Detect Spatial Outliers in Soil Organic
Carbon Concentrations in Ireland
Chaosheng Zhang1
David McGrath2
Lecturer in GIS
Research Officer
1 Department of Geography, National University of Ireland, Galway, Ireland
2 Teagasc, Johnstown Castle Research Centre, Wexford, Ireland
One objective in the study of soil organic carbon concentrations is to produce
a reliable spatial distribution map. A geostatistical variogram analysis was applied
to study the spatial structure of soils in Ireland for the purpose of carrying out a
spatial interpolation with the Kriging method. The variogram looks at similarities
in organic carbon concentrations as a function of distance. In the analysis, a
relatively poor variogram was observed, and one of the main reasons was the
existence of spatial outliers. Spatial outliers make the variogram curve erratic and
hard to interpret, and impair the quality of the spatial distribution map.
CrimeStat was used to identify the spatial outliers. The parameter of the
standardized Anselin’s Local Moran’s I (z) was used. When z < -1.96, the sample was
defined as a spatial outlier. Out of 678 soil samples, a total of 39 samples were
detected as spatial outliers, and excluded in the spatial structure calculation. As a
consequence, the variogram curve was significantly improved. This improvement
made the final spatial distribution map more reliable and trustable.
Spatial outliers are clearly different from the majority of samples nearby.
Compared with the samples nearby, high value spatial outliers are found in the
southeastern part, and low value spatial outliers are located in the western and
northern parts of the country.
con clus ion s. Ther e is pr oba bly n o wa y to get a r ou n d su bject ivity sin ce law en for cem en t
per sonn el ma y not u se a r esu lt u n less it pa r t ly con firm s wh a t t h ey alr ea dy kn ow. But , by
im p lem en t in g a n a lgor it h m , it for ces u ser s t o a t lea s t go t h r ou gh t h e s t ep s s ys t em a t ica lly.
A t h ir d a dva n t a ge is t h a t t h ese t echn iqu es a r e visu a l, pa r t icula r ly wh en u sed wit h a
GIS. Th e m ode a n d fuzzy m ode r ou t ines ou t pu t t h e r esu lts a s a dbf file, wh ich ca n be
displa yed in a GIS a s a pr oport ion a l circle. The N n h , Rnn h , Sta c, an d Km ea n s r ou t ines
can out put th e results directly as graph ical objects, eith er as st an dar d deviationa l ellipses
or con vex h u lls; t h ese ca n be dis pla yed dir ect ly in a GIS. Th e Loca l Mor a n t ech n iqu e ca n
be a da pt ed for t h em a t ic m a ppin g (a s F igu r e 7.12 dem on st r a t es). Visu a l in for m a t ion ca n
h elp crim e a n a lyst s a n d officer s t o un der st a n d t h e dist r ibu t ion of cr im e in a n a r ea s, a
n ecessa r y st ep in p lan n ing a su ccessful int er vent ion . We sh ou ld n ever u n der est ima t e t h e
im por t a n ce of visu a lizat ion in a n y a n a lysis .
Li m it at io n s
H owever , t h er e a r e a ls o som e dis t in ct lim it a t ion s t o t h e con cept of a ‘h ot spot ’, s om e
t ech n ica l a n d som e t h eor et ica l. Th e ch oice in volved in a u ser m a k in g a decis ion on h ow
st r ict or h ow loose t o cr ea t e clus t er s a llows t h e poten t ial for su bject ivity, as h a s been
m en t ion ed. In t h is s en se, isolat in g clu st er s (or ‘h ot s pot s’) ca n be a s m u ch a n a r t a s it is a
scien ce. Ther e a r e lim its t o t h is, however . As t h e sa m ple size goes u p, th er e is less
difference in t he result t ha t can be produced by adjusting the par am eters. For example,
wit h 6,000 or m or e ca ses, t h er e is ver y lit t le differ en ce bet ween u sin g t h e 0.1 sign ifica n ce
level in t h e n ea r est n eigh bor clu st er in g r out in e a n d t h e 0.001 sign ifican ce level.8 Th u s , t h e
su bject ivity of t h e u ser is m or e imp or t a n t for sm a ller sa m ples t h a n lar ger ones.
A s econ d pr oblem wit h t h e ‘h ot s pot ’ con cep t is t h a t it is u su a lly a p plied t o t h e
volume of incidents a nd n ot t o th e underlying risk. Clust ers (or ‘hot spots’) ar e defined by
a h igh con cen t r a t ion of in cid en t s wit h in a sm a ll geogr a ph ica l a r ea , t h a t is on t h e volu m e of
in ciden t s wit h in a n a r ea . Th is is a n im plicit density m ea su r e - t h e n u m ber of inciden t s per
u n it of ar ea (e.g., in cid en t s per squ a r e m ile). Bu t h igh er den sit y ca n a ls o be a fu n ct ion of a
h igh e r pop u la t i on a t r i sk .
F or s ome policin g policies, t h is is fin e. F or exa m ple, bea t officer s will necess a r ily
con cent r a t e on h igh in ciden t den sit y neigh borh oods becau se so m u ch of t h eir a ctivit y
r evolves a r oun d t h ose n eigh borh oods . Fr om a viewp oint of pr ovidin g con cent r a t ed policing,
t h e den sit y or volum e of inciden t s is a good in dex for a ssign ing police officer s (Sher m a n
a n d Weisbur d, 1995). Fr om t h e viewpoin t of a n cilla r y secu r ity ser vices, su ch a s a ccess t o
em er gen cy m edical ser vices, neigh borh ood wa t ch orga n iza t ion s, or r esiden t ia l bu r gla r
a la r m r et a il ou t let s, a r ea s wit h h igh er con cen t r a t ion s of in cid en t s m a y be a good foca l
point for organ izing th ese services.
Bu t for ot h er la w en for cem en t policies, a den sit y in dex is n ot a good on e. F r om t h e
viewp oin t of crim e pr even t ion , for exa m ple, high in ciden t volu m e a r ea s a r e n ot n ecess a r ily
u n sa fe an d t h a t effective pr even t ive in t er ven t ion will n ot n ecess a r ily lea d t o red u ction in
cr im e. It m a y be fa r m or e effect ive t o t a r get h igh r is k a r ea s r a t h er t h a n high volu m e
7.39
a r ea s. In h igh r is k a r ea s, t h er e a r e specia l cir cu m st a n ces wh ich exp ose t h e popu la t ion t o
h igh er -th a n -expecte d levels of crim e, per h a ps pa r t icula r con cent r a t ions of a ctivit ies (e.g.,
dr u g t r a din g) or pa r t icu la r la n d u ses t h a t en cou r a ge cr im e (e.g., sk id r ow a r ea s) or
pa r t icula r con cent r a t ion s of crim in a l a ctivit ies (e.g., gan gs). A pr even t ion st r a t egy will
wa n t t o focu s on t h ose sp ecial fa ct or s a n d t r y to redu ce t h em .
R isk , wh ich is defined a s t h e n u m ber of in ciden t s r ela t ive t o th e n u m ber of pot en t ia l
vict im s/t a r get s, is on ly loos ely cor r ela t ed wit h t h e volu m e of in cid en t s. Yet , ‘h ot spot s’ a r e
u su a lly defin ed by volum e, ra t h er t h a n r isk. Th e r isk-ad just ed h iera r ch ica l nea r est
n eigh bor clus t er ing r ou t ine, discuss ed in cha pt er 6, is t h e on ly t ool am on g th ese t h a t
iden t ifies r isk , r a t h er t h a n volu m e. It is clea r t h a t m ore t ools will be n eed ed t o exam in e
h ot s p ot loca t i on s t h a t a r e m or e a t r is k .
Th e fina l pr oblem wit h t h e ‘h ot spot’ con cept is m or e t h eor et ica l. Na m ely, given a
con cent r a t ion of incide n t s, h ow do we expla in it ? To iden t ify a concen t r a t ion is one t h in g.
To k n ow h ow t o in t er ven e is a n ot h er . It is im p er a t ive t h a t t h e a n a lys t dis cover s om e of t h e
u n der lying ca u ses t h a t link t h e even t s t oget h er in a sys t em a t ic wa y. Ot h er wise, a ll t h a t is
left is a n em pir ica l d escr ip t ion wit h ou t a n y con cept of t h e u n der lyin g ca u ses. F or on e
t h ing, th e con cen t r a t ion cou ld be ra n dom or h a ph a zar d; it cou ld h a ve ha ppen ed one t ime,
bu t n ever a ga in . F or a n ot h er , it cou ld be du e t o t h e con cen t r a t ion of t h e popu la t ion at risk ,
a s discus sed a bove. Fin a lly, th e con cen t r a t ion cou ld be circum st a n t ial a n d n ot be rela t ed
t o an yt h in g in h er en t a bout t h e loca t ion.
Th e poin t h er e is th a t a n em pir ica l descript ion of a loca t ion wh er e cr ime in ciden t s
ar e concentr at ed is only a first st ep in defining a rea l ‘hot spot’. It is an a pp aren t ‘h ot sp ot ’.
Un less t h e u n der lying vect or (ca u se) is discovered , it will be difficu lt t o pr ovide a dequ a t e
int er vent ion . The ca u ses cou ld be environm en t a l (e.g., con cen t r a t ion s of lan d u ses t h a t
a t t r a ct a t t a cker s a n d vict im s) or beh a vior a l (e.g., con cen t r a t ion s of ga n gs ). Th e m ost on e
ca n do is t r y to in cr ea se t h e con cen t r a t ion of police officer s. This is expen sive, of cou r se,
a n d ca n only be d one for lim it ed per iods . Even t u a lly, if t h e u n der lying vector is n ot d ea lt
wit h , in ciden t s w ill con t in u e a n d w ill overwh elm t h e a dd it iona l police en for cem en t . In
ot h er wor ds, u ltim a t ely, redu cing cr ime a r ou n d a ‘h ot spot’ will n eed t o involve m a n y ot h er
policies t h a n sim ply p olice en for cem en t , su ch a s comm u n it y in volvemen t , ga n g
int er vent ion , la n d u se m odifica t ion , job cr ea t ion , th e expan sion of ser vices, an d oth er
com m u n it y-ba sed in t er ven t ion s. In t h is sen se, t h e id en t ifica t ion of a n em pir ica l ‘h ot spot ’
is fr equ en t ly on ly a wind ow int o a m u ch deeper pr oblem t h a t will involve m or e t h a n
t a r get ed en for cem en t .
7.40
En dn ot e s for Ch ap te r 7
1.
STAC is a n a bbr evia t ion for Spa t ia l a n d Tem por a l An a lysis of Cr im e. Th e t em por a l
sect ion of t h e pr ogr a m wa s su per ceded by s ever a l ot h er pr ogr a m s a n d wa s n ot
u pda t ed for t h e m illen n iu m . Beca u se m a n y la w en for cem en t u ser s r efer t o STAC
ellip se s, we h a ve r et a in ed t h a t n a m e.
2.
Th e fir st t wo digit s of a bea t n u m ber design a t e t h e Dis t r ict .
3.
The Chicago Police Depar tm ent m ade available th e incidents in t his ana lysis to
Rich a r d Block for t h e eva lu a t ion of th e Ch ica go Alt er n a t ive P olice S t r a t egy (CAP S).
4.
In gen er a l a design a t ed m a in su r face s t r eet occur s ever y m ile on Ch icago’s gr id, a n d
t h er e a r e eigh t block s t o th e m ile. In t h is m a p, La wr en ce an d Ash la n d a r e m a in
Gr id st r eet s. In t h is a r ea , th er e a r e a lso severa l diagon a l ma in s t r eet s t h a t eith er
follow t h e la k e sh ore or old I n dia n t r a ils.
5.
Th e t ot a l n u m ber of wa ys for select in g K dis t in ct combin a t ion s of N incidents,
ir r espect ive of order , is (Bu r t a n d Ba r ber , 1996, 155):
N!
----------K! (N-K)!
6.
The steps are a s follows:
Gl oba l S elect ion of In it ia l S eed Loca ti on s
A.
A 100 x 100 gr id is over la id on t h e poin t dis t r ibu t ion ; t h e dim en sion s of th e
grid a r e defin ed by th e m inim u m a n d m a xim u m X a n d Y coor din a t es.
B.
A separa tion distan ce is defined, which is
Sep a r a t ion =
A
t * 0.5 SQRT [ ------- ]
N
wh er e t is t h e St u den t ’s t -valu e for t h e .01 sign ifica n ce level (2.358), A is t h e
ar ea of th e region, and N is th e sa m ple size. The sep a r a t ion dist a n ce was
ca lcu lat ed t o pr event a djacent cells fr om being select ed a s seed s.
C.
F or ea ch grid cell, t h e n u m ber of inciden t s fou n d a r e cou n t ed a n d t h en sort ed
in d escendin g or der .
D.
Th e cell wit h t h e h igh est n u m ber of in cid en t s fou n d is t h e in it ia l s eed for
clu st er 1.
7.41
E.
Th e cell wit h t h e n ext h igh es t n u m ber of in ciden t s is t em por a r ily select ed. If
t h e dist a n ce bet ween t h a t cell an d t h e seed 1 loca t ion is equ al to or greater
than t h e sepa r a t ion d ist a n ce, th is cell be com es in it ia l seed 2.
F.
If th e dista nce is less tha n t he separa tion distan ce, th e cell is dropped and
t h e r ou t ine pr oceeds t o t h e cell with t h e n ext h igh est n u m ber of inciden t s.
G.
Th is p r ocedu r e is r epea t ed u n t il K in itial seed s h a ve been loca t ed t h er eby
select ing t h e r em a inin g cell with t h e h igh est n u m ber of inciden t s a n d
ca lcu la t in g it s d is t an ce t o a ll p rior seed s. If t h e d is t an ce is equ a l t o or
great er t ha n t he separ at ion dista nce, th en t he cell is selected a s a seed. If
th e dista nce is less tha n t he separa tion distan ce, th en th e cell is dropped as a
seed can dida t e. Thu s, it is possible th a t K init ial seeds ca n n ot be ident ified
beca u se of t h e ina bilit y to loca t e K loca t ion s gr ea t er t h a n t h e t h r es h old
dis t a n ce. In t h is ca se , Crim eS tat keep s t h e n u m ber it h a s loca t ed a n d p r in t s
ou t a m essa ge t o t h is effect .
L oc a l Op t i m i z a t i on of S eed L oc a t ion s
7.
H.
Aft er t h e K in it ia l s eeds h a ve been select ed, a ll poin t s a r e a ssign ed t o t h e
near est initial seed locat ion. These are t he initial cluster groupings.
I.
F or ea ch in it ia l clu st er gr oup in g in t u r n , t h e cent er of m in im u m dis t a n ce is
ca lcu la t ed. Th ese a r e t h e secon d seed loca t ion s.
J.
All poin t s a r e a ssign ed t o t h e n ea r est secon d seed loca t ion .
K.
F or ea ch n ew clust er gr oup in g in t u r n , t h e cent er of m in im u m dis t a n ce is
ca lcu la t ed. Th ese a r e t h ir d seed loca t ion s.
L.
Steps J an d K a r e r epea t ed u n t il n o mor e point s ch a n ge clust er gr oup in gs.
These ar e the final seed locat ions a nd cluster groupings.
Th e for m u la s a r e a s follows a s follows . Th e expected va lu e of t h e Loca l Mora n is:
N
-
G
W
j=1
ij
E (I i ) = ------------N -1
wh er e W ij is a dis t a n ce weigh t for t h e in t er a ction bet ween obser va t ion s i a n d j
(eit h er a n a dja cen cy in dex or a weigh t decr ea sin g wit h dis t a n ce). Th e va r ia n ce of
th e Local Mora n is defined in t hr ee steps:
7.42
A.
F irs t , define b 2 .
_
(Xi - X ) 4
b2 =
E { ------------------
_
(Xi - X )2
}
N
/ [E { ------------------ } ]2
N
Th is is t h e fou r t h m om en t a r ou n d t h e m ea n divid ed by t h e squ a r ed secon d
m om en t a r ou n d t he m ea n .
B.
Secon d, define 2w i(k h ):
2w i(k h ) =
EEW
ik
W ih
w h er e k =/ i a n d h =/ i
Th is t er m is t wice t h e su m of t h e cross -pr odu cts of a ll weigh t s for i wit h
t h em selves, u sin g k a n d h t o a void t h e u se of id en t ica l s u bscr ip t s. Sin ce ea ch
pa ir of obs er va t ion s, i a n d j, h a s it s own specific weigh t , a cr oss-pr odu ct of
weigh t s a r e t wo weigh t s m u lt iplied by ea ch oth er (wher e i =/ j) a n d t h e s um of
t h es e cross-pr odu cts is t wice t h e su m of a ll poss ible in t er a ction s ir r es pective
of ord er (i.e., Wij = W ji). Becau se t h e weight of a n obser va t ion wit h it self is
zer o (i.e., W ii = 0), a ll t er m s ca n be in clu ded in t h e su m m a t ion .
C.
Th ir d, d efin e t h e va r ia n ce, s t a n da r d devia t ion , a n d a n a ppr oxim a t e (pseu do)
st a n da r dized scor e of I i :
(G w ij2 )*(n- b 2 )
Va r (I i ) = ----------------------- +
(n -1)
_____________
S(I i ) =
[ Var (I i ) ]
2w i(k h )(2b 2 - n)
-------------------(n -1)(n -2)
(G w ij)2
+ -----------------(n -1)2
%
Z(I i ) = [ I i - E(I i ) ] / S(I i )
8.
On on e t est of 6,051 bu r gla r ies with a m inim u m clus t er size requ irem en t of 10
inciden t s, for exam ple, we obt a ined 100 firs t -or der clus t er s, 9 secon d-or der clus t er s,
a n d n o t h ir d-or der clu st er s by u sin g a 0.1 sign ifica n ce level for t h e n ea r est n eigh bor
h iera r ch ica l clus t er ing r ou t ine. When t h e significa n ce level was r edu ced t o 0.001,
t h e n u m ber of clus t er s ext r a ct ed wa s 97 firs t -or der clus t er s, 8 secon d-or der clus t er s,
an d no th ird-order clust ers.
7.43