CIS Doctoral Colloquium 2013 Proceedings 3.94mb

ͺ
ʹͲͳ͵
tĞĂƌĞĚĞůŝŐŚƚĞĚƚŽǁĞůĐŽŵĞǇŽƵƚŽƚŚĞĨŝƌƐƚŽŵƉƵƚŝŶŐĂŶĚ/ŶĨŽƌŵĂƚŝŽŶ^ǇƐƚĞŵƐŽĐƚŽƌĂůŽůůŽƋƵŝƵŵ͘dŚŝƐ
ĞǀĞŶƚŝƐŽƌŐĂŶŝƐĞĚďǇƚŚĞ/^WŽƐƚŐƌĂĚƵĂƚĞ^ŽĐŝĞƚǇƚŽƉƌŽǀŝĚĞZ,ƐƚƵĚĞŶƚƐǁŝƚŚƚŚĞŽƉƉŽƌƚƵŶŝƚǇƚŽƉƌĞƐĞŶƚ
ƚŚĞŝƌǁŽƌŬƚŽĂďƌŽĂĚĞƌĂƵĚŝĞŶĐĞ͕ŝŶĂĨĂƐŚŝŽŶŶŽƚĂĨĨŽƌĚĞĚďǇŽƚŚĞƌĐŽŶĨĞƌĞŶĐĞƐĂŶĚǀĞŶƵĞƐ͘ƐƐƵĐŚ͕ƚŚĞ
ǁŽƌŬ ƉƌĞƐĞŶƚĞĚ Ăƚ ƚŚŝƐ ĞǀĞŶƚ ǁŝůů ďĞ ĞŝƚŚĞƌ ͚ĚŽĐƚŽƌĂů͛ ;ďƌŽĂĚůǇ ƉƌĞƐĞŶƚŝŶŐ ƚŚĞŝƌ WŚ ƌĞƐĞĂƌĐŚͿ Žƌ ΖƐŚŽƌƚΖ
;ƉƌĞƐĞŶƚŝŶŐ Ă ǁŽƌŬͲŝŶͲƉƌŽŐƌĞƐƐ Žƌ ĞĂƌůǇ ƌĞƐƵůƚƐͿ ƚƌĂĐŬ͘ tĞΖƌĞ ĚĞůŝŐŚƚĞĚ ƚŽ ŚĂǀĞ ƐŽ ŵĂŶǇ ƐƚƵĚĞŶƚƐ ŝŶǀŽůǀĞĚ͕
ĞŝƚŚĞƌĚĞůŝǀĞƌŝŶŐƉƌĞƐĞŶƚĂƚŝŽŶƐĚƵƌŝŶŐƚŚĞĚĂǇŽƌƉƌĞƐĞŶƚŝŶŐĂƉŽƐƚĞƌƐĞƐƐŝŽŶŝŶƚŚĞĐŽŶĐůƵĚŝŶŐŚŽƵƌ͘
ůŝƚƚůĞŽǀĞƌĂǇĞĂƌĂŐŽ͕ƚŚĞƚǁŽĚĞƉĂƌƚŵĞŶƚƐŽĨŽŵƉƵƚĞƌ^ĐŝĞŶĐĞĂŶĚ/ŶĨŽƌŵĂƚŝŽŶ^ǇƐƚĞŵƐǁĞƌĞŵĞƌŐĞĚ
ŝŶƚŽ Ă ŶĞǁ ŚŽŵĞ ŝŶƚŚĞ ŶŐŝŶĞĞƌŝŶŐ &ĂĐƵůƚǇ ĂƚdŚĞhŶŝǀĞƌƐŝƚǇŽĨDĞůďŽƵƌŶĞ͘dŚŝƐ ŵĞƌŐĞƌ ƉƌĞƐĞŶƚƐĞǆĐŝƚŝŶŐ
ŶĞǁŽƉƉŽƌƚƵŶŝƚŝĞƐĨŽƌĐŽůůĂďŽƌĂƚŝŽŶĂŶĚĞĚƵĐĂƚŝŽŶ͘ŽŶƐĞƋƵĞŶƚůǇ͕ƚŚŝƐĞǀĞŶƚŝƐĂůƐŽŽƌŐĂŶŝƐĞĚƚŽĨĂĐŝůŝƚĂƚĞƚŚĞ
ŵĞƌŐĞƌŽĨƚŚĞƐĞƚǁŽƉŽƐƚŐƌĂĚƵĂƚĞĐŽŵŵƵŶŝƚŝĞƐ͕ĂŶĚĂĨĨŽƌĚƐƚƵĚĞŶƚƐ͕ĂĐĂĚĞŵŝĐƐĂŶĚŽƵƌĨƌŝĞŶĚƐŽƵƚƐŝĚĞƚŚĞ
ƵŶŝǀĞƌƐŝƚǇ ƐǇƐƚĞŵ ƚŚĞ ĂďŝůŝƚǇ ƚŽ ŚĞĂƌ ƚŚĞ ƐƚŝŵƵůĂƚŝŶŐ ĂŶĚ ŝŶŶŽǀĂƚŝǀĞ ǁŽƌŬ ŽĨ ƚŚĞŝƌ ĐŽůůĞĂŐƵĞƐ ŝŶ ƚŚŝƐ ŶĞǁ͕
ŵĞƌŐĞĚ ĚĞƉĂƌƚŵĞŶƚ͘ ĐĐŽƌĚŝŶŐůǇ͕ ǁĞ ĂƌĞ ĚĞůŝŐŚƚĞĚ ƚŚĂƚ ƚŚĞ ƉƌŽĐĞĞĚŝŶŐƐ ŽĨ ƚŚĞ ĚĂǇ ƌĞĨůĞĐƚ ƚŚĞ ďƌŽĂĚ
ƌĞƐĞĂƌĐŚŝŶƚĞƌĞƐƚƐŽĨƚŚĞĚĞƉĂƌƚŵĞŶƚ͘
/ǁŽƵůĚůŝŬĞƚŽĂĐŬŶŽǁůĞĚŐĞƚŚĞĂƐƐŝƐƚĂŶĐĞŽĨ:ƵƐƚŝŶŽďĞů͕ZŚŽŶĚĂ^ŵŝƚŚŝĞƐĂŶĚ>ĞŽŶŽƌĂ^ƚƌŽƐĐŝŽ͕ĂůŽŶŐǁŝƚŚ
ŽƵƌŽůůŽƋƵŝƵŵŽŵŵŝƚƚĞĞŝŶŵĂŬŝŶŐƚŚŝƐĞǀĞŶƚŚĂƉƉĞŶ͗
DĂƌĐƵƐĂƌƚĞƌ
^ĞƌŐĞǇĞŵǇĂŶŽǀ
&ůŽƌŝĂŶ,ĂŶŬĞ
DĂŚƚĂďDŝƌŵŽŵĞŶŝ
ŚĂŝƌ
WƌŽĐĞĞĚŝŶŐƐŽͲŚĂŝƌ
&ůĂƐŚdĂůŬƐŚĂŝƌ
>ŝĨĞ^ĐŝĞŶĐĞŚĂŝƌ
/ĚĂƐĂĚĂ^ŽŵĞŚ
ůŚĂŵEĂŐŚŝĂĚĞŚ<ĂŬŚŬŝ
DĂƌǇĂŵ&ĂŶĂĞĞƉŽƵƌ
ůĞŬƐĂŶĚƌ<ĂŶ
/ŶĚƵƐƚƌǇZĞůĂƚŝŽŶƐŚĂŝƌ
WƌŽĐĞĞĚŝŶŐƐŽͲŚĂŝƌ
WŽƐƚĞƌŚĂŝƌ
WŚŽƚŽŐƌĂƉŚĞƌ
dŚŝƐ ĞǀĞŶƚ ǁŽƵůĚ ŶŽƚ ďĞ ƉŽƐƐŝďůĞ ǁŝƚŚŽƵƚ ƚŚĞ ŐĞŶĞƌŽƵƐ ƐƉŽŶƐŽƌƐŚŝƉ ŽĨ E/d͕ 'ŽŽŐůĞ ĂŶĚ s>^/͘ tĞΖƌĞ
ĞǆĐŝƚĞĚƚŽŚĂǀĞƚŚĞŵŝŶǀŽůǀĞĚ͕ĂŶĚƚŽŚĂǀĞƚŚĞŽƉƉŽƌƚƵŶŝƚǇƚŽƐŚĂƌĞŽƵƌƌĞƐĞĂƌĐŚǁŝƚŚƚŚĞŝƌƌĞƉƌĞƐĞŶƚĂƚŝǀĞƐ͘
tĞĂƌĞĂůƐŽŝŵŵĞŶƐĞůǇŐƌĂƚĞĨƵůƚŽƚŚĞĞƉĂƌƚŵĞŶƚŽĨŽŵƉƵƚŝŶŐĂŶĚ/ŶĨŽƌŵĂƚŝŽŶ^ǇƐƚĞŵƐĂƚdŚĞhŶŝǀĞƌƐŝƚǇ
ŽĨ DĞůďŽƵƌŶĞ͕ ĂŶĚ ƚŚĞ 'ƌĂĚƵĂƚĞ ^ƚƵĚĞŶƚ ƐƐŽĐŝĂƚŝŽŶ͕ ǁŚŽ ŚĂǀĞ ĂůƐŽ ƉƌŽǀŝĚĞĚ ĨŝŶĂŶĐŝĂů ĂƐƐŝƐƚĂŶĐĞ ƚŽ ŚŽůĚ
ƚŚŝƐĞǀĞŶƚ͘
ͲDĂƌĐƵƐĂƌƚĞƌ
ϮϬϭϯWƌĞƐŝĚĞŶƚŽĨƚŚĞ/^WŽƐƚŐƌĂĚƵĂƚĞ^ŽĐŝĞƚǇ
^/EzDzZ^/EdZ
ĂƌƌŝůůŽ'ĂŶƚŶĞƌdŚĞĂƚƌĞ
dŚĞĐŽŶĨĞƌĞŶĐĞǁŝůůďĞŚĞůĚŝŶƚŚĞĂƌƌŝůůŽ'ĂŶƚŶĞƌdŚĞĂƚƌĞ;ĂŶĚĐůŽƐĞůǇůŽĐĂƚĞĚƐĞŵŝŶĂƌƌŽŽŵƐͿŝŶƚŚĞ
^ŝĚŶĞǇ DǇĞƌ ƐŝĂ ĞŶƚƌĞ͕ ǁŚŝĐŚ ŝƐ ƚŚĞ ĨŝƌƐƚ hŶŝǀĞƌƐŝƚǇ ŽĨ DĞůďŽƵƌŶĞ ďƵŝůĚŝŶŐ ƚŽ ƚŚĞ tĞƐƚ ŽĨ ƚŚĞ
hŶŝǀĞƌƐŝƚǇŽĨDĞůďŽƵƌŶĞdƌĂŵ^ƚŽƉŽŶ^ǁĂŶƐƚŽŶ^ƚ͘
"
#
$
#
#
%
&
4
'
)
(
,
)
*
/
6
+
*
=
"
#
$
W
#
"
#
$
_
#
X
5
-
,
-
7
>
)
+
=
6
?
6
+
@
0
L
.
4
D
)
K
E
]
*
]
D
^
>
>
F
5
?
?
/
I
2
,
5
L
3
-
4
+
L
1
D
/
M
,
5
E
0
)
1
K
/
,
5
J
(
+
(
I
7
:
H
[
0
)
G
Z
)
;
?
1
:
2
0
5
E
0
/
,
?
Y
0
3
>
-
/
)
C
,
-
,
B
+
@
(
)
A
6
+
5
,
B
,
6
6
9
I
\
+
3
+
5
4
<
1
(
+
)
-
/
4
)
)
/
7
'
*
9
/
5
5
*
8
6
9
N
5
,
R
5
)
'
:
5
(
4
+
T
F
)
6
+
P
0
6
+
)
,
L
$
#
#
Q
2
.
)
f
(
I
g
+
/
L
2
+
=
I
a
,
?
)
5
O
S
5
>
+
/
.
F
4
)
O
P
0
6
"
$
W
#
e
9
5
d
/
n
"
"
$
_
#
u
\
O
+
)
/
+
&
0
5
(
1
4
`
7
2
2
5
2
)
*
0
6
+
)
,
d
/
+
,
-
e
4
5
9
,
)
(
)
-
Q
e
9
(
*
7
5
?
d
L
,
T
*
5
1
W
$
#
(
5
?
0
B
6
*
E
Y
>
4
5
=
)
.
@
%
P
)
p
0
)
>
(
;
.
o
7
>
0
6
5
R
Q
?
>
)
>
e
9
4
F
5
+
0
G
*
(
<
=
0
*
>
'
+
5
?
/
)
;
@
3
A
5
*
B
)
/
C
*
+
>
\
?
5
,
D
\
E
,
0
(
0
:
q
+
)
F
-
*
b
P
5
c
5
/
L
*
V
e
6
0
q
(
5
0
(
6
9
`
*
+
F
G
I
b
B
D
?
>
G
P
L
0
6
/
h
$
W
-
9
F
0
,
1
6
9
5
%
&
6
*
0
4
6
!
+
!
)
,
)
3
O
)
2
0
+
,
/
+
)
,
0
,
1
[
5
-
*
5
/
/
+
)
,
+
,
6
9
R
5
+
6
7
0
6
+
)
,
F
0
A
6
K
(
(
I
e
0
F
-
-
>
'
?
t
)
F
0
+
O
>
j
6
E
6
>
4
t
0
F
5
A
(
v
,
V
0
o
*
,
p
6
>
5
V
a
Q
s
R
5
>
e
9
5
%
0
0
'
*
[
f
W
Q
5
!
1
0
*
k
a
5
l
)
2
2
7
,
+
6
+
m
5
/
r
F
0
E
`
E
*
J
)
I
4
A
5
/
/
+
,
-
)
,
e
*
0
w
5
4
6
)
*
Q
O
0
6
0
?
>
#
\
"
+
=
=
>
V
)
B
+
I
:
?
/
F
*
/
K
5
L
3
o
1
)
)
I
v
"
5
K
0
i
"
J
\
R
-
4
U
C
"
1
+
/
a
"
5
3
6
5
L
9
\
1
F
7
^
)
4
A
P
5
*
x
/
O
+
/
6
0
,
4
5
<
0
/
0
-
5
R
1
+
2
+
(
0
*
+
6
Q
y
)
+
,
7
/
+
,
-
5
>
F
V
#
LUNCH 12:20 – 1:20 (not provided)
z
"
$
W
#
|
,
P
a
(
=
W
$
#
#
5
7
/
6
I
e
)
+
W
$
_
+
]
0

U
L
E
!
!
[
7
5
*
+
(
6
0
{
6
Q
+
`
)
*
,
0
/
4
9
6
+
+
4
'
5
<
/
5
+
l
6
,
m
:
!
5
}
*
5
-
R
,
0
,
+
!
5
/
4
0
7
6
+
*
)
+
,
6
Q
u
,
0
(
Q
/

/
+
L
,
^
-

>
+
*
6
7
0
(
\
0
4
9
+
,
5
d
/
5
+
a
,
(
)
7
a
1
)
2
'
7
6
+
,
-
F
K
6
5
2

,
5
4
>
|
-
9
5
K
1
0
6
R
B
,
,
!
-
1
~
0
,
,
?
1
E
6
0
\
f
0
E
*
;
-
5
E
0
/
+
*
F
:
[
6
7
(
5
(
,
+
-
5
6
,
%
,
4
6
5
O
5
*
*
'
*
+
+
P
/
5
,
5
3
|
)
,
*
3
)
*
}
*
2
-
0
0
6
,
+
+
/
)
R
,
0
6
+
)
5
,
4
7
*
+
6
Q
}
'
/
O
6
0
+
6
2
0
(
+
6
Q
)
3
[
5
/
+
(
+
5
,
6
Y
7
,
4
6
+
)
,
/
3
)
q
*
0
/
9
+
,
-
<
+
0
/
5
1
0

~
F
G
?
E
c

E
E
]
#
Afternoon Tea (Provided!)

$
W
#
}
*
-
0
,
+
=
/
I
0
6
K
+
>
)
,
o
Y
E
)
G
*
b
]
5
t
,
>
/
+
4
[
5
0
1
+
,
5
/
/
\
)
1
5
(
[
B
5
0
P
,
+
5
:
1
<
+
n

$
_
q
#
)

:
0
1
(
7
5

_
$
#
#
%
|
&
e
'
R
)
7
p
7
/
+
,
5
/
/
u
,
0
(
Q
>
)
~
*
+
'
B
,
'
>

>
G
-
)
L
|
*
L
,
3
6

H
7
]
o
*
X
~
I
)
n
A
E
2
$
W
#
*
/
?
+

$
#
4
R
/
Q
/
6
5
2
a
/
*
5
0
6
5
<
7
/
+
,
5
/
?
2
>
@
5
5
t
1
+
=
.
4
0
T
@

<
(
E
0
[
>
/
5
5
/
5
{
R
1
0
)
*
4
l
3
6
0
a
9
:
)
*
(
(
5
Y
0
.
*
)
0
*
2
0
5
6
+
:
)
)
,
*
;
/
3
)
a
*
/
F
/
6
,
+
)
R
,
q
-
0
L
?
B
D

0
1
*
)
+
P
,
-
5
X
*
/
5
+
5
1
a
,
(
/
+
Z
,
\
+
4
5
0
(
4
9
R
0
5
,
6
6
+
+
,
/
2
-
/
0
,
1
e
/
9
u
5
'
I
d
'
/
(
+
~
o
D
9
,
L

+
S
K
0

_
6
)
3
r
G
(
<
)
L
B
L
F
V
A
F
D
E
A
D
J
!
>

]
I

F
B
I
D

!
F
K
E
)
0
?
E
6
3
+

}
)
,
,
b
B
K
+
6
)
,
V
(
u
K
L
)
,
B
K
-
/
+
5
/
:
>
5
?
V
+
*
+
,
,
X
-
5
u
7
*
.
/
)
+
6
*
2
0
0
4
-
6
+

F
>
t
D
t
@
I
J
>
L
@
F
J

u
?
E
B
E
F
D

¡
¢
5
6
:
n
#

)
K

¦
*
;
o
\
>
B
)

1
>
5
]
K
(
)
3
0
E

§
¨

S
9
)
(
5
N
+
1
,
5
,
7

>
G
5
4
Q
-
5
0
*
,
+
5
1
/
e
9
5
+
*
(
+
,
+
4
0
(
POSTER SESSION
5pm – 6pm, Sidney Myer Asia Centre
In addition to presentations throughout the day, we are delighted to have a poster session which will be run
from 5pm - 6pm in the Sidney Myer Asia Centre. 21 PhD students in the Department of Computing and
Information Systems will be presenting posters, and will each be in attendance to answer questions and
discuss their research.
DesTeller: A System for Destination Prediction Based on
Trajectories with Privacy Protection
Andy Yuan Xue, Rui Zhang, Yu Zheng, Xing Xie, Jianhui
Yu and Yong Tang
Designing Digital Technologies that Support
Memorialization for Distributed Populations: A 'Black
Saturday' Bushfire Study
Joji Mori, Steve Howard and Martin Gibbs
iRobot: A Stacking-based Approach to Twitter User
Geolocation Prediction
Bo Han, Paul Cook and Timothy Baldwin
Understanding Exploration in Seeking Health Information
Patrick Pang, Shanton Chang, Jon Pearce and Karin
Verspoor
Quality versus Fidelity in Genomic Data
Rodrigo Canovas, Alistair Moffat and Andrew Turpin
Managing Multiple Influences: The Case of Self-Monitoring
and Social Comparison
Pedro Rosas, Steve Howard, Martin Gibbs and Jon Pearce
Private Spatial Data Processing on Trajectory Data
Maryam Fanaeepour, Egemen Tanin, Lars Kulik
The Voice Box: A Novel Language Recording Method
Florian Hanke and Lauren Gawne
Leveraging Enterprise 2.0 for Next Generation Knowledge
Management
Diana Wong, Rachelle Bosua, Shanton Chang and Sherah
Kurnia
Predicting Traffic Congestion through Mining Sensed
Traffic Data
Hengfeng Li, Lars Kulik and Rao Kotagiri
Behaviour Pattern Mining using Cellular Network
Trajectories
Kushani Perera, Lars Kulik, James Bailey
Understanding the Experience of Mixed Reality Quests
Aleksandr Kan
Analysing Virtual Machine Usage in Cloud Computing
Yi Han, Jeffrey Chan, Christopher Leckie
Understanding the Role of Technology in Parent-Child
Reunion
Konstantinos "Kostas" Kazakos and Frank Vetere
Anomaly Detection in Data Streams Using a Consensus
Approach
Masha Salehi, Christopher Leckie and Tharshan
Vaithianathan
Resolving Ambiguity in Genome Assembly using High
Performance Computing
Mahtab Miromeni, Tom Conway, Matthias Reumann
and Justin Zobel
Analysis of Sample Structure in a GWAS Celiac CaseControl Dataset using PCA
Karin Klotzbücher, Justin Bedo, Christopher Leckie and
Adam Kowalczyk
Learning Analytics for Informal Interprofessional Learning
Xin Li
Lazy Priority Queue for the Set Cover problem
LIM Ching Lih, Alistair Moffat and Tony Wirth
Toward a Personal Health Information Self
Quantification System (PHI-SQS)
Privacy-preserving data mining in Internet of Things (IoT)
Manal Almalki, Fernando Sanchez and Kathleen Gray
Sarah Erfani, Shanika Karunasekera and Christopher
Leckie
5
PROCEEDINGS
HUMAN COMPUTER INTERACTION
Blogs as a Domain of Scientific Discourse: The Construction of New Knowledge in the Blogosphere
Marcus Carter and Sophie Ritson
Negotiating Frames, Rules and Motivations
Mitchell Harrop
Symbolism in Commemoration Using Technology
Joji Mori
The Use of Facebook by Social Brokers in Malawi
Thomas McNamara and Marcus Carter
Page
7
8
9
10
NATURAL LANGUAGE PROCESSING
Knowledge Discovery and the Extraction of Domain Specific Web Data
Li Wang
Mixed Progression and Regression in the Situation Calculus
Christopher Ewin and Adrian Pearce
The Universal Tagger
Long Duong
11
12
13
BUSINESS INFORMATION SYSTEMS
Exploring Information Sharing Needs, Mechanisms and IT Support Nursing Handovers in Clinical Settings
Nazik ALTurki, Rachelle Bosua and Sherah Kurnia
How do Business Analytics Systems Create Business Value?
Ida Asadi Someh and Graeme Shanks
Investigating the Relationship Between Security Culture and Security Practices in Organisations
Moneer Alshaikh, Sean Maynard, Atif Ahmad and Shanton Chang
Organisational Forensic Readiness Model
Mohamed Elyas, Atif Ahmad, Sean B. Maynard and Andrew Lonie
Toward an Intelligence-Driven Information Security Risk Management Enterprise for Organisations
Jeb Webb
14
15
16
17
18
DATABASE AND SECURITY
Analysing Virtual Machine Usage in Cloud Computing
Yi Han, Jeffrey Chan and Christopher Leckie
How Tightly Connected are Communities?
Minh Van Nguyen, Michael Kirley and Rodolfo García-Flores
Optimality of Resilient Functions for Hashing Biased Data
Andrew Peel
Private Spatial Data Processing on Trajectory Data
Maryam Fanaeepour, Egemen Tanin, Lars Kulik
The Earth Mover's Distance Based Similarity Join Using MapReduce
Jin Huang
19
20
21
23
24
LIFE SCIENCE
A Model to Evaluate Therapies for Mental Health Disorders
Fernando Estrada
A Network Model of a Whole Kidney
Thomas Gale
Review of Web-Based Software Frameworks for Clinical and Biomedical Research Collaborations
Tracy McLean
The Use of Ontologies in Neuroimaging and Their Application in Answering Abstract Queries
Aref Eshghishargh, Simon Milton, Andrew Lonie and Gary Egan
25
26
27
28
Blogs as a Domain of Scientific Discourse: The
Construction of New Knowledge in the Blogosphere
Marcus Carter
Sophie Ritson
Interaction Design Lab
Department of Computing and Information Systems
The University of Melbourne
Unit for History and Philosophy of Science
Faculty of Science
Sydney University
[email protected]
[email protected]
Categories and Subject Descriptors
H.1.2 [User/Machine Systems]: Human Factors
Keywords
Blogs, String Theory, Scientific Communication
1. INTRODUCTION
Tim Berners-Lee initially developed the World Wide Web,
between 1989 and 1991, as a tool designed to help high-energy
physicists connect globally and to share data, news and
documentation. Following its rapid commercialization in the mid
1990’s, and the collapse of the ‘Dot-Com Bubble’ in 2001, the
now ubiquitous Web underwent a fundamental ideological shift in
the way information and content was to be shared and created
online. Since nicknamed ‘Web 2.0’, this shift entailed the
democratization of information sharing and the rise of the
Weblog, or simply ‘blog’; typically personal websites newssharing like websites focused on a single theme.
Blogs are broadly used by scientists in all fields. Indeed, a large
number of opinion pieces and educational curriculum recommend
new researchers in certain fields to blog, as both a form of
scientific communication and identity management.
2. PRIOR WORK
Existing research into the impact of the internet on scholarly
communication has been mostly positive. Scientific research is an
inherently social undertaking, and the internet facilitates
communication and collaboration in existing social networks and
assists in the development of new networks [2]. Technologies
such as VoIP applications (such as Skype), email and the bulk
online sharing of data traditionally have been neatly
conceptualized within the informal domain of scientific
communication. This includes, ephemeral communication
conducted between private networks for the purpose of
developing raw information into scientific knowledge before
transition into the permanent, public formal domain of journals,
conferences and books. William Garvey, who first conceptualized
these domains [1] argued that the transition between informal and
formal communication is a boundary established by science to
delay new scientific information so that it might be sufficiently
examined and mediated by the community. Thus, as a new form of
informal communication, blogs do not fundamentally affect the
processes of scientific communication, thus not affecting how new
knowledge is created.
On the basis of our initial research, we claim that this distinction
is untenable, and blogs confound categorization as formal or
informal communication [see also 3]. Blogs resemble formal
scientific communication as they are public, and have potentially
large audiences. They are permanently stored and retrievable and
are non-interactive (in contrast to, say, a conference presentation).
However, blogs also resemble informal communication in both
style and content; they contain the most current information,
discuss open-ended questions and works-in-progress, and the
types of discussions more closely resemble informal
communications. In consequence, we believe that scientific blogs
represent a new form of scientific discourse, which challenges
existing theories regarding how scientists communicate and new
knowledge is formed.
3. THE STRING WARS
The tag ‘string wars’ belongs to the press who identified the
raging controversy that was occurring online. This was a
controversy that occurred in the high energy physics community
that was to a significant extent played out on blogs discussing
string theory. The blogs were written by both sides and become
quite intensely personal and malicious.
This research explores a number of different controversies of
epistemic authority that occurred across comments on these blogs
(and in other Web 2.0 technologies, such as Twitter). In
particular, we examine how this new form of internet-mediated
communication technology is transforming how scientific
knowledge is created (or co-created) by participants.
4. CONTRIBUTION
Understanding the impact of internet-communication technologies
(ICTs) on scientific communication is crucially important for
understanding how modern scientific knowledge is produced.
Developing this understanding now is important, as in the future,
as broadband enabled technologies are widely implemented,
scientific communication is likely to be further transformed. The
String Wars present a fruitful domain in which to conduct this
research, as high-energy physics has developed a number of
different tools (such as ‘trackbacks’; linking the discussions of an
article to the article on an online repository) which both make
blog communication (and its effect) more prominent but also
emphasize this as a possible space for studying how changes in
the design of these technologies changes the communications that
take place.
5. REFERENCES
[1] Garvey, W. 1979. Communication, the essence of Science:
Faiclitating Information Exchange Among Librarians,
Scientists, Engineers and Students. Pergamon Press.
[2] Olson, G. M., Zimmerman, A. & Bos, N. 2008. Scientific
Collaboration on the Internet. Cambridge: MIT Press.
[3] Warden, T. 2010. The Internet and Science Communication:
Blurring the Boundaries. European Association of Cancer
Research 4(203), 1-8.
Negotiating Frames, Rules and Motivations
Mitchell Harrop
The University of Melbourne
Melbourne, Victoria, Australia
(+613) 83441553
[email protected]
Frame Analysis, Games, Negotiation, Oscillating Engrossment.
researcher’s own playing experiences of the games in question
were incorporated into the analysis. This very holistic approach
gave insights not only into the playing of games, but also allowed
for the situating of these games as part of broader gamer culture.
Data analysis was conducted using a Grounded theory informed
approach (Strauss & Corbin, 1990).
1. INTRODUCTION AND BACKGROUND
3. FINDINGS AND DISCUSSION
Categories and Subject Descriptors
H.1.2 [User/Machine Systems]: Human Factors
Keywords
Secondly, this thesis aims to examine the role of ‘fabrications’ in
the negotiation of game rules and experiences. Fabrications were a
major part of Goffman’s (1974) original Frame Analysis work, but
have not been incorporated into contemporary digital games
studies in any extensive manner.
Fine’s (1983) work found that typically three frames operate
during tabletop fantasy role playing games: Players frame their
interactions as part of their common understanding of social
reality; as part of their understanding of the game and game rules;
or as part of the fantasy frame players collectively imagine. Fine
argued for the ‘oscillating nature of engrossment’ in these frames,
as players swiftly move attention between each one back and forth
(talking in-character and then as a player concerned with rules and
then as a social person). This idea can be used to explain the data
and observations conducted as part of this thesis: during gameplay groups oscillated between framing their games as the
different kinds of experiences. Many of these frames
corresponded to what others have described as Player Types and
Motivations (Bartle and Yee above), i.e. a temporally social
events versus a group of Socialiser player types.
2. METHODS AND APPROACH
4. REFERENCES
The study design involved three case studies which built upon
each other. The first was an exploratory study using the case of
Defence of the Ancients (DotA) - a game modification that went
through many versions and was selected for the known
complexities as to how players framed their playing experiences
and utilised different social rules for play. The second study
concerned the negotiation of loot distribution (in-game items) in
the massively multiplayer online role playing game World of
Warcraft (WoW) and how this occurred in the context of changes
to the game mechanics. The final study focused primarily on
fabrication behaviours across different games.
[1] Bartle, R. (2003). Designing Virtual Worlds. Indianapolis:
New Riders.
This thesis aims to extend the Frame Analysis work of Fine
(1983) to the domain of digital game studies. In particular, this
involves incorporating existing work in digital game studies such
as Player Types and Player Motivations (see Bartle, 2003; Yee,
2006) into the theories. Fine’s work was largely concerned with
gameplay contexts; hence, in the extending of Fine’s work, this
thesis has a particular focus on not only activities during play, but
also play-related activities and the influences of ongoing changes
to the technology and software of games.
The studies used ethnographically informed data gathering
techniques with the primary data collection tool of semistructured open-ended interviews and focus groups. These
primary data gathering techniques were augmented by observation
and recording of play sessions as well as the examination of
paratexts (Consalvo, 2007) such as forums, Youtube videos, and
player created art and fiction. Finally, detailed notes from the
[2] Consalvo, M. (2007). Cheating: Gaining Advantage in
Videogames. Cambridge, Massachusetts: The MIT Press.
[3] Fine, G. A. (1983). Shared Fantasy: Role Playing Games as
Social Worlds. Chicago: The University of Chicago Press.
[4] Goffman, E. (1974). Frame Analysis: An Essay on the
Organization of Experience. Cambridge, Massachusetts:
Harvard University Press.
[5] Strauss A. L. & Corbin, J. (1990) Basics of Qualitative
Research: Grounded Theory Procedures and techniques.
Thousand Oaks, California: Sage.
[6] Yee, N. (2006). Motivations for play in online games.
CyberPsychology & Behavior, 9(6), 772-775.
Symbolism in Commemoration using Technology
Joji Mori
Interaction Design Laboratory
Department of Computing and
Information Systems
The University of Melbourne
[email protected]
Categories and Subject Descriptors
H.1.2 [Information Systems]: User/Machine Systems – Human
factors
Keywords
Commemoration, condolence, design, symbolism.
1. BACKGROUND
Physical objects such as flowers are used to represent the beauty
and fragility of life in commemorating the death of a loved one
[2]. This may be in the form of gifting flowers to the bereaved or
arranging them on the grave as an ongoing visiting ritual.
Interactive technologies on the other hand, are often used to bring
a person ‘to life’ through rich multimedia. In death, this can be
useful for creating a commemorative video where photos and
video footage of the deceased is put together by loved ones as a
way to commemorate and celebrate their life [3]. For example,
showing slideshows of photographs to a backdrop of sentimental
music is increasingly becoming commonplace in funerals. In this
talk however, rather than limiting the role of technology to that of
replaying multimedia of the deceased, I heighten the importance
of understanding symbolism surrounding death as a way to
approach technology design for commemorative purposes.
To do this, I will be talking about a website I developed that
incorporates the idea of using symbolism in the design of a
technology for commemorating the Black Saturday bushfires.
Using the website [1], users can make a simple gesture of
condolence to the bereaved community. People do this by
selecting a shape on the website, alongside a message to send to
survivors who were affected by the Black Saturday bushfires
which devastated regional Victoria in 2009.
message. These shapes were then sent to small screens in
domestic spaces of those who had lost people they knew in the
fires, such as their kitchen or dining room. Many of the shapes
people could select had no predefined meaning attributed to them
(e.g. a circle, square or triangle), while others related to
established and well understood symbols such as a flower or a
heart. On the fourth anniversary of Black Saturday, we sent out
emails to people who might be interested in sending a shape from
the website to those affected by Black Saturday. 147 shapes were
sent from people both in the affected community itself and
beyond. Through the shapes and message combinations that
people sent, symbolism was embraced by users in their
contributions. Below are three examples which highlight how
people appropriated the shapes to send their messages of hope to
people in the affected communities.
I selected the full green circle. For me, it signifies
fullness of life hope, gradual completion of renewal –
my heart and soul are with you
Just like this flower, the communities are growing
again thanks to the support of the people within them
Sending thoughts of love and hope to you!
Figure 2 – Shapes and their associated messages
For the first shape, the green circle was interpreted as a signifier
for the fullness of life by the sender. The second shape includes a
flower which represented the communities growing again after the
fires, and the third example includes a heart as an expression of
love. The design of the website afforded agency on the person
sending the shape, to come up with their own symbolism, which
they could then be communicated to the recipient.
This simple gesture of sending a shape from a website to people in
Black Saturday affected communities is a simple example of how
technology can be designed such that symbolism is an active
consideration rather than leaving symbolism to the domain of
physical objects.
2. REFERENCES
Figure 1 – Sending a shape from the website as condolence
On the website (Figure 1), people select a shape that had been
hand painted using a brush by an artist in a fire affected
community and then scanned into the website for people to
choose. People can then select the shape’s colour, and type a
[1] Commemorating
Black
Saturday:
http://commemoratingblacksaturday.com/1/form/index.php.
[2] Hallam, E. and Hockey, J.L. 2001. Death, memory, and
material culture. Berg Publishers.
[3] Wahlberg, M. 2009. YouTube Commemorationௗ: Private
Grief and Communal Consolation. The YouTube reader. The
YouTube reader Stockholmௗ: National Library of Sweden.
218–235.
The Use of Facebook by Social Brokers in Malawi
Thomas McNamara
Marcus Carter
School of Social and Political Sciences
Faculty of Arts
The University of Melbourne
Interaction Design Lab
Department of Computing and Information Systems
The University of Melbourne
[email protected]
[email protected]
Facebook, social brokers, Africa.
which Malawian development brokers utilize Facebook and
whether they need to be modified, abandoned or complemented
by functionalist theories?
3. How do differing and reduced technical literacies and
infrastructure impact upon Malawian social brokers’ utilization of
various features of Facebook?
1. INTRODUCTION
4. RESULTS
Categories and Subject Descriptors
H.1.2 [User/Machine Systems]: Human Factors
Keywords
There are approximately 1,000,000,000 users of the social media
site Facebook.com, representing close to 14% of the global
population [5]. This is an astonishing statistic in the context of the
global penetration of the internet currently at 34.3%. However,
this is user base is predominantly found in first world countries;
being as high as 49.9% in North America and 38.4% in Oceana.
In consequence, Facebook is a vitally important phenomenon of
study in the field of internet research and human-computer
interaction. The majority of studies of Facebook reflect this user
base; predominantly focusing on the use of Facebook by firstworld users, and overwhelmingly, its use by young university
students [e.g. 3]. In a broad and uncritical summary of this corpus
for the purpose of this extended abstract, the use of Facebook is
generally understood through theories of identity presentation and
as serving the sole function of enhancing social capital.
In contrast to the single use/function paradigm found in analyses
of Western Facebook use, our preliminary research indicates that
rural Malawian social brokers users employ Facebook for
multiple, sometimes competing, functions, with the desire to
present a positive identity tempered by both the liabilities this
identity may create and the incompatible values and literacies of
the disparate audiences. Economic and knowledge impediments,
for instance internet speed and cost, impede the utilization of
some aspects of Facebook and alter the symbolic meaning of
others. For the technologically enabled, Facebook sometimes
represented a cheaper form of communication than phone-calls,
incentivizing future rapid uptake of the medium when developing
communities are able to appropriate its (Western) structure to
their social interactions.
5. FUTURE WORK
2. FACEBOOK IN AFRICA
In the face of the digital divide [2], understanding how early
adopters employ social internet technologies like Facebook in one
of the most disadvantaged contexts is likely to contribute to the
future development of technologies that ameliorate inequality.
Further, the timing of these research presents unique opportunities
for investigating how technologies like Facebook fundamentally
alter communication practices within the community. Further, the
contrasting use between first-world and third-world users presents
a potentially fruitful approach in particular for understanding how
the design of social technologies impose particular (Western)
social interactions and structures.
Internet penetration on the African continent is at 15%, but is as
low as 1.1% in poorer countries like Ethiopia. However, the
ubiquity of cheap mobile technologies with internet capabilities
and huge investments in wireless infrastructure is likely to drive
up this penetration in coming years. Malawi, one of Africa’s
poorest countries with the 14th lowest per capita incomes in the
world, only has an internet penetration of 4.4%. However, 28% of
those with internet access have an account on Facebook (in
comparison, that statistic is 46% for Europe, 60% for Oceania and
67% for North America). Facebook membership in Malawi can be
expected to grow significantly in the future as international
investments expand the penetration of 4G networks and the cost
of mobile technologies drops.
6. REFERENCES
[1] Bierschenk, T, Chauveau, J & de Sardan, J. 2002. Local
Development Brokers in Africa; the rise of a new social category.
Working Papers no13 Johannes. Gutenberg University
As far as we are aware, no existing study has examined Facebook
use in the rural African context. This study aims to fill this gap by
examining Facebook use among Malawi social brokers; an
interstitial elite who move between western and rural spaces, as
such are incentivized to be early adopters of the technology in
Africa [3, 4].
[2] Norris, P. 2003. Digital Divide: Civic engagement, information
poverty, and the internet worldwide. Vol. 40. Cambridge:
Cambridge University Press.
[3] Raacke, J., & Bonds-Raacke, J. 2008. MySpace and Facebook:
Applying the uses and gratifications theory to exploring friendnetworking sites.CyberPsychology & Behavior, 11(2), 169-174.
3. RESEARCH QUESTIONS
1. How do Malawian social brokers utilize Facebook to manage
their social networks and express their identity? A question that
focuses on the contrasting but coexisting identities that are
generated through the respondents’ relationships with myriad
northern and southern actors and the different literacies each uses
when interpreting a Facebook profile.
2. How do theories of identity presentation and social capital
enhancement incorporate the differing functional environment in
[4] Swidler, A & Watkins, S C. 2009 ‘Teach a Man to Fish: the
Doctrine of Sustainability and its Effects on Three Strata of
Malawian Society. World Development Vol.37(7) pp.1182-1196
[5] Tsukayama, H. 2012. Facebook hits milestone of 1 billion users. The
!
Washington Post. Retrieved 15/06/13 from
<http://articles.washingtonpost.com/2012-1004/business/35498784_1_user-mark-mark-zuckerberg-facebookusers>
Knowledge Discovery and Extraction
of Domain-specific Web Data
[Extended Abstract]
Li Wang
Dept. of Computing and Information Systems, The University of Melbourne
NICTA Victoria Research Laboratory
Supervisors: Timothy Baldwin, Su Nam Kim
[email protected]
General Terms
Natural Language Processing
Keywords
Discourse Structure, Web User Forums, Social Media, Dialogue Act
Web user forums (or simply “forums”) are online platforms
for people to discuss information and obtain information via
a text-based threaded discourse, generally in a pre-determined
domain (e.g. IT support or DSLR cameras). With the advent of Web 2.0, there has been an explosion of web authorship in this area, and forums are now widely used in various
areas such as customer support, community development,
interactive reporting and online eduction. In addition to
providing the means to interactively participate in discussions or obtain/provide answers to questions, the vast volumes of data contained in forums make them a valuable resource for “support sharing”, i.e. looking over records of past
user interactions to potentially find an immediately applicable solution to a current problem. On the one hand, more
and more answers to questions over a wide range of domains
are becoming available on forums; on the other hand, it is
becoming harder and harder to extract and access relevant
information due to the sheer scale and diversity of the data.
Addressing this problem, we propose the tasks of automatically parsing the Discourse Structure of forum threads, for
the purpose of enhancing information access and solution
sharing over web user forums.
The discourse structure of a forum thread is modelled as a
rooted Directed Acyclic Graph (DAG), and each post in the
thread is represented as a node in this DAG. The reply-to
relations between posts are then denoted as direct edges between nodes in the DAG (LINK), and the type of a reply-to
relation is defined as Dialogue Act (DA). The LINK between two connected posts (i.e. having a reply-to relation)
is represented as the distance between the two posts in their
chronological ordering. Our specific focus is first on automatic Discourse Structure Parsing. We approach this parsing task in several ways, including a structured classification approach, where Conditional Random Fields (CRFs)
is used to either classify the LINK and DA separately and
compose them afterwards, or classify the combined LINK
and DA directly. Another technique we adopt is to treat
this Discourse Structure Parsing as a Dependency Parsing
problem, which is the task of automatically predicting the
dependency structure of a token sequence, in the form of
binary asymmetric dependency relations with dependency
types. We obtain high Discourse Structure Parsing F-scores
with the proposed methods.
Furthermore, we investigate ways of using this Discourse
Structure information to improve information access and solution sharing over web user forums. In particular, we explore the tasks of thread Solvedness (i.e. whether the problem asked in a thread is solved or not) classification, and
thread-level Information Retrieval over forums. Our experiments show that using the Discourse Structure information
of forum threads can benefit both tasks significantly, especially for the forum Information Retrieval task, where statistical significance is achieved by using only the automatically
predicted Discourse Structure with out-of-domain training
data.
Additionally, we are planning to carry out inter-domain experiments to analyse the generalisability of our proposed
Discourse Structure representation and respective learning
models over different domains.
Mixed Progression and Regression in the Situation
Calculus
Christopher Ewin
Adrian Pearce
Department of Computing and Information
Systems
The University of Melbourne
Victoria, 3010, Australia
Department of Computing and Information
Systems
The University of Melbourne
Victoria, 3010, Australia
[email protected]
Categories and Subject Descriptors
[email protected]
allow non-sequential actions to be reordered without introducing undesirable complexity characteristics.
I.2.4 [ Artificial Intelligence]: Knowledge Representation Formalisms and Methods
Keywords
Intuitively, action ay can be said to dominate action ax iff
performing ay makes the occurrence of ay irrelevant. We
show a set of conditions for which one action dominates another, and demonstrate the use of sensing actions to simplify
reasoning about dominated actions.
Situation Calculus, Regression, Progression
1. INTRODUCTION
4. HYBRID REASONING
The situation calculus is a logic formalism used for reasoning
in dynamical domains. Existing techniques for reasoning in
the situation calculus focus on either progressing a knowledge base to a given situation in order to enable a query to
be resolved [4] [1], or regressing the query to an earlier situation [2]. Progression based techniques are limited in their
versatility, as progression is not first order definable in the
general case [5]. As a result, reasoning with progression can
only be accomplished by placing significant restricitons on
the domain. Regression based techniques are inefficient in
a wide range of practical domains, such as for long running
programs, as each query must be regressed back to the initial situation.
Here we present methods of combining progression and regression, with an aim to developing more efficient and versatile mechanisms for theorem proving and agent control. We
focus on exploring action sequences in which actions can be
reordered or omitted in order to obtain more desirable computational characteristics while maintaining semantic equivalence with the original action theory.
As well as introducing potential performance benefits, the
ability to reorder or omit actions also allows us to perform
more sophisticated regression and progression based reasoning. Computationally, it is generally desirable to progress a
database as far as possible, since reasoning about the progressed database will be more efficient, however in many
cases it is not possible to progress past certain actions. There
is no known progression strategy that will allow us to progress
past an unrestricted global-effect action, for example. The
application of these theorems make it possible, in some circumstances, to omit an action which is not progressable, or
to move it further down the action sequence such that other
progressable actions can be considered beforehand. We propose a hybrid knowledge base as follows:
2. PRELIMINARIES
Let D be a basic action theory and {b1 ...bm } be a sequence
of ground actions. D0 [b1 , ...bm ] is a hybrid KB iff it is a sequence of m operations on D0 of the form op1 (b1 )...opm (bm ),
each of which is either of the form [b] or hbi where [b] denotes a regression strategy wrt b hbi denotes a progression
strategy wrt b
5. REFERENCES
Details of the situation calculus formalism used can be found
in [3]. We define the concepts of the argument and characteristic sets as in [4]. Intuitively, these correspond respectively to the set of objects affected by a particular action
and the set of ground fluent atoms affected by that action.
We then define the concept of independence as follows: Actions α1 and α2 are independent iff the argument sets CF 1
wrt α1 and CF 2 wrt α2 do not share any elements. Formally,
¬∃~c(~c ∈ CF 1 ∧ ~c ∈ CF 2 )
3. ORDERING & DOMINATING ACTIONS
[1] Liu, Y., and Lakemeyer, G. On first-order
definability and computability of progression for
local-effect actions and beyond. In IJCAI (2009),
pp. 860–866.
[2] Pirri, F., and Reiter, R. Some contributions to the
metatheory of the situation calculus. J. ACM 46, 3
(May 1999), 325–361.
[3] Reiter, R. Knowledge in Action: Logical Foundations
for Specifying and Implementing Dynamical Systems.
MIT Press, 2001.
[4] Vassos, S. A Reasoning Module for Long-Lived
Cognitive Agents. PhD thesis, University of Toronto,
Toronto, Canada, 2009.
[5] Vassos, S., and Levesque, H. J. On the progression
of situation calculus basic action theories: Resolving a
10-year-old conjecture. In In Proc. AAAI08 (2008).
If actions α1 and α2 are independent, then the order of these
two actions can be reversed without affecting the truth values after both actions have been completed. A similar argument is made for reordering wider classes of actions, such as
global-effect actions. We also show that sensing actions can
!
The Universal Tagger
Long Duong
The University of Melbourne
Australia
[email protected]
newspaper etc). We also employed the consensus 12 Universal
Tagset that enable cross-language processing which resolve
second challenge.
Categories and Subject Descriptors
I.2.7 [Natural Language Processing]
In contrast to the existing state-of-the-art approach of Das and
Petrov [1], we have developed a substantially simpler method
(Universal Tagger) by automatically identifying ``good'' training
sentences from the parallel corpus and applying self-training with
revision. In experimental results on eight languages, our method
achieves state-of-the-art results but (1) use less training data (we
just use Europarl [2] parallel corpus, Das and Petrov [1]
additionally use ODS United Nation Corpus. (2) Simpler method
which does not involve building large graph and optimizing a
convex function and (3) Faster method, our approach’s
complexity is O(nlogn) compare to O(n2) of theirs.
Keywords
Part-of-speech tagger, cross-language, unsupervised, multilingual
NLP.
1. ABSTRACT
Part-of-speech (POS) tagger automatically assigns word class
such as Noun, Verb, Preposition, etc. to lexical items (words).
POS tagging is one of the most basic operation of computational
linguistic. Since it helps to disambiguate syntactic categories (and
possibly senses), POS are regularly used in various Natural
Language Processing (NLP) tasks such as parsing, sentence
classifying, word sense disambiguation and so-forth.
There are two main challenges for POS tagging. The first
challenge is the training data. Currently, all supervised taggers
outperform unsupervised ones. Supervised algorithms for POS
tagger performs as accurate as 97% for English, French and many
other resource-rich languages. However, supervised learning
needs manually annotated data which is time consuming and
costly to construct. There are approximately 7000 languages in the
world but very small fraction (around 30 languages) has sufficient
POS manually annotated data for building reliable supervised
POS tagger. Unsupervised POS tagging, on the other hand, does
not need any manually annotated data. However, there is a huge
gap between supervised and unsupervised learning accuracy. The
second challenge is that the current POS taggers are language
oriented, lack of consensus. Tag set are adapted to each language,
therefore, obstacle cross-language processing. For example,
when comparing syntactic similarity between two languages, we
need to compare tag sequence similarity. However, since tag set
for each language is different, it is incomparable. Another
example is when working with multilingual environment such as
World Wide Web, giving a solution that can work for every
language is in high demand. However, if we keep individual
tagset for each language, we might have to manually or semiautomatically map tagset between languages pairwise.
Bilingual corpora offer a promising bridge between resource-rich
and resource-poor languages, enabling the development of
multilingual NLP technologies. English is often used as a source
language, but it is not the only available resource-rich language.
The era of English dominating one side of parallel data is shifting
to a far wider range of other languages. Different choice of source
language may have a dramatic effect on target language tagger
performance. In an effort to further improve our Universal Tagger,
we investigate on choosing better source language(s). To the best
of our knowledge, we are the first investigating on this issue.
We found out that, English is hardly the best source language. We
are able to construct a model that can predict best source language
--- based on only monolingual features of the source and target
languages --- that improves tagger accuracy compared to choosing
the single best (overall) source language. However, if parallel data
is available, our predictive model is able to leverage this to select
a more appropriate source language and obtain further
improvements in accuracy. Finally, we showed that if multiple
source languages are available, even better accuracy can be
obtained by combining information from just those sources that
are selected by our model.
2. REFERENCES
[1] Das, Dipanjan, and Slav Petrov. 2011. Unsupervised part-ofspeech tagging with bilingual graph-based projections. In
Proceedings of the 49th Annual Meeting of the Association
for Computational Linguistics: Human Language
Technologies - Volume 1 , HLT '11, 600-609, Stroudsburg,
PA, USA. Association for Computational Linguistics.
In this paper, we aim to resolve these two challenges. We narrow
the gap between supervised and unsupervised approach by
proposing an unsupervised multilingual POS tagger which
additionally exploits parallel data. We use parallel data as the
bridge to transfer POS information from resource-rich to
resource-poor language. The intuition is that, for many resourcepoor languages, there are no manually POS annotated data which
involves the intensive work of linguist, but parallel data are easier
to acquire (i.e. from multilingual government document, film
subtitles, large amount of translation memory from books,
!
[2] Koehn, Philipp. 2005. Europarl: A Parallel Corpus for
Statistical Machine Translation. In Proceedings of the Tenth
Machine Translation Summit (MT Summit X), 79-86, Phuket,
Thailand. AAMT.
Exploring Information Sharing Needs, Mechanisms and IT
Support Nursing Handovers in Clinical Settings
Nazik ALTurki
Rachelle Bosua
Sherah Kurnia
The University of Melbourne,
3010, Australia
61344 1517
[email protected]
The University of Melbourne,
3010, Australia
61344 81398
[email protected]
The University of Melbourne,
3010, Australia
61344 1534
[email protected]
inadequate handover artifacts that are not supportive in carrying
key shift information from one shift to another and insufficient
integration and support of existing IT systems to support
handover.
Keywords
Activity Theory, clinical settings. information sharing needs, shift
handover.
1. INTRODUCTION
Shift work in clinical settings is highly dependent on effective
information sharing during shift handover to ensure patient safety.
This process determines outcomes associated with planning the
delivery and evaluation of patient care [1, 2]. Therefore, poor
communication tools, strategies and settings may lead to
insufficient information sharing during handover, which may
result in adverse events, delays in treatment and low patient and
healthcare provider satisfaction [3].
2. REFERENCES
[1] Bardram, J.E., Mobility work: The spatial dimension of
collaboration at a hospital. Computer supported cooperative
work, 2005. 14(2): p. 131.
In spite of the frequency of handover activity and the use of
modern Information Technology (IT), minimal guidelines exist to
facilitate effective handover in terms of information sharing
practices while there are still information sharing problems
experienced by nurses during handover[4]. Thus, there is a need
for a deeper study on the integral information elements and
mechanisms of nursing handover to improve the quality of the
information shared [2].
[2] Matic, J., Review: bringing patient safety to the forefront
through structured computerisation during clinical handover.
Journal of clinical nursing, 2010. 20(1-2): p. 184.
[3] Patterson, E. and R. Wears, Patient handoffs: standardized
and reliable measurement tools remain elusive. The joint
commission journal on quality and patient safety, 2010. 36(2):
p. 52-61.
The aim of this study is twofold: to explore specific information
needs and mechanisms required to improve information sharing
during shift-to-shift handover, and the utilization of IT to enable
and support information sharing during handover.
[4] ALTurki, N. and R. Bosua, Assessing Nurses’ Knowledge
Sharing Problems Associated with Shift Handover in Hospital
Settings. 2011.
[5] Cavaye, A.L.M., Case study research: a multi faceted
research approach for IS. Information systems journal, 1996.
6(3): p. 227-242.
[6] Darke, P., G. Shanks, and M. Broadbent, Successfully
completing case study research: combining rigour, relevance
and pragmatism. Information systems journal, 1998. 8(4): p.
273-289.
A multiple cases study approach was followed to compare current
information sharing phenomena and problems of shift handover
across three different hospitals located in Riyadh, Saudi Arabia [5,
6]. The different data collection techniques employed are namely
1) individual semi-structured interviews 2) observation 3)
examination of key artifacts used to conduct handover.
The content of fifty-nine audio-taped interviews were transcribed
and analyzed using thematic coding and content analyzing and by
applying Activity Theory [7] as the theoretical lens. Current
results show inconsistent information sharing practices within
hierarchies of teams and between shifts, non-standard and
[7] Engeström, Y., Learning by Expanding: An Activitytheoretical Approach to Developmental Research. 1987:
Orienta-Konsultit Oy.
!
How do Business Analytics Systems Create Business
Value?
Ida Asadi Someh
Graeme Shanks
PhD student
University of Melbourne
Doug McDonell Building
The University of Melbourne
Parkville 3010 VIC Australia
Professor of Information Systems
University of Melbourne
Doug McDonell Building
The University of Melbourne
Parkville 3010 VIC Australia
[email protected]
[email protected]
Keywords
benefits [7] and competitive advantage [8]. Accordingly, we have
developed a research model which explains that the synergistic
combination of BA resources, with other organizational resources
leads to the emergence of BA-enabled organizational resources
which have the capability to generate significant business value.
The research model will be evaluated using a survey of large
Australian organisations that are mature users of BA systems in
the context of customer relationship management in future
research.
Business Analytics, Systems theory, Synergy, Resource-based
view, Organisational value.
1. INTRODUCTION
Business Analytics (BA) is a key subset of IT, which provides
managers with insight in their decision-making. Insights from BA
systems enable organizational decision makers to take competitive
actions that differentiate them from their rivals. Data is usually
stored in data warehouses and is processed using analytical tools
including reporting, data mining, statistical data analysis, on-line
analytical processing (OLAP) and visualisation. Understanding
how BA systems contribute to organisational value and create
competitive advantage is an important area of research. Recently,
business intelligence (BI) applications were ranked the first
technical priority for CEO’s [5]. Case study reports have provided
strong evidence of organisational benefits from these investments
[3]. However, they do not theoretically explain how the benefits
are achieved. Several theoretical models have been proposed to
explain how value is created from BA systems [2–4, 6, 9, 10].
However, the underlying mechanisms through which BA systems
interact with other organisational systems to generate business
value are poorly understood. Thus, research question associated
with this study is:
How do business analytics systems contribute to business
value?
To answer this question, concepts from the business value of IT
literature and the resource-based view (RBV) literature are
analysed to highlight the importance of the synergistic
combination of BA resources and organisational resources.
Systems theory is then used to explain the mechanism through
which BA systems interact and enhance other organisational
systems to create value.
The business value of IT literature suggests that IT resources
indirectly influence business value [8]. This indirect relationship
implies that IT augments other organizational resources. Together,
they may be conceptualized as higher-order IT-enabled business
resources, which influence firm performance [1, 7]. Hence, IT
resources are not able to create the business value individually
and should be implemented together with other organizational
resources. When the IT and other organizational resources are
synergistically related, they mutually reinforce each other, leading
to outcomes greater than the additive effect of the individual
resources. Therefore, synergy of IT resources with other
organizational resources is an important source of organizational
2. REFERENCES
Bharadwaj, A. 2000. A Resource-based Perspective on
Information Technology Capability and Firm Performance:
An Empirical Investigation. MIS quarterly. 24, 1 (2000),
169–196.
[2] Davenport, T.H. et al. 2010. Analytics at Work: Smarter
Decisions, Better Results. Cambridge, MA: Harvard
Business School Press.
[3] Davenport, T.H. and Harris, J.G. 2007. Competing on
Analytics: The New Science of Winning. Harvard Business
School Press.
[4] Elbashir, M.Z. et al. 2011. The Role of Organizational
Absorptive Capacity in Strategic Use of Business
Intelligence to Support Integrated Management Control
Systems. The Accounting Review. 86, 1 (2011), 155–184.
[5] Gartner Executive Programs’ Worldwide Survey of More
Than 2,300 CIOs Shows Flat IT Budgets in 2012, but IT
Organizations Must Deliver on Multiple Priorities: 2012.
http://www.gartner.com/it/page.jsp?id=1897514. Accessed:
2012-08-21.
[6] Isik, O. et al. 2011. Business Intelligence (BI) Success and
the Role of BI Capabilities. Intelligent Systems in
Accounting, Finance and Management. 176, January
(2011), 161–176.
[7] Nevo, S. and Wade, M. 2011. Firm-Level Benefits of ITenabled Resources: A Conceptual Extension and an
Empirical Assessment. The Journal of Strategic Information
Systems. 20, 4 (2011), 403–418.
[8] Nevo, S. and Wade, M. 2010. The Formation andValue of
IT-enabled Resources: Antecedents and Consequences of
Synergistic Relationships. MIS Quarterly. 34, 1 (2010),
163–183.
[9] Seddon, P. et al. 2012. How Does Business Analytics
Contribute to Business Value? Thirty Third International
Conference on Informati on Systems (Orlando, 2012).
[10] Wixom, B. and Watson, H. 2001. An Empirical
Investigation of the Factors Affecting Data Warehousing
Success. MIS quarterly. 25, 1 (2001), 17–41.
!
[1]
Investigating the relationship between security culture
and security practices in organisations
Moneer Alshaikh
Sean Maynard
Atif Ahmad
Shanton Chang
PhD student
Doug McDonell Building
The University of
Melbourne
[email protected]
b.edu.au
Lecturer
Doug McDonell Building
The University of
Melbourne
Sean.maynard@unimelb.
edu.au
Lecturer
Doug McDonell Building
The University of
Melbourne
[email protected]
Lecturer
Doug McDonell Building
The University of
Melbourne
[email protected]
risk level. This research aims to conduct cross-cultural
comparison between organisations in different cultures. Therefore,
the result of this research will be compare to Lim’s finding [10].
The finding of this research will enable organisations to increase
their security by cultivating security culture through the
implementation of information security practices.
Keywords
Information security culture, organizational culture, information
security, National culture.
1. INTRODUCTION
A considerable amount of information security literature
focuses on the implementation of technical security controls to
prevent security breaches. However, recent security reports
showed that security incidents in organisations have increased and
that nearly half of these security breaches are caused by users
within organisations[1, 2]. Several researchers have suggested that
security culture can improve organisations’ information security
by positively influencing employees’ behaviour towards security
such that security becomes part of their daily activities[3-9].
Security culture is an informal security control, which
encompasses all socio-culture activities, including employees’
behaviour, attitude, practices as well as management
responsibility to support the technical aspect of information
security[3]. This recognition of the importance of security culture
has led to many attempts to understand and suggestions methods
to cultivate and assess this culture by applying different concepts
and frameworks.
2. REFERENCES
[1] Baker, W., et al., 2013 Data Breach Investigations Report, in
United States Secret Service 2013.
[2] Global Information Security Survey, in Fighting to close the
gap2012, Ernst & Young
[3] Schlienger, T. and S. Teufel. Information Security Culture:
The Socio-Cultural Dimension in Information Security
Management. 2002. Boston, London, Kluwer Academic
Publishers.
[4] Dhillon, G., Principles of information systems security : text
and cases / Gurpreet Dhillon2007: Hoboken, NJ : John
Wiley & Sons, c2007.
This research project investigates the relationship between
information security culture and information security practices in
Saudi Arabian and Australian organizations. Lim [10] empirically
defined the relationship between security culture and security
practices. This relationship assists organizations to align the
security culture of their employees to their enterprise security
objectives. Although the findings of Lim [10] were a significant
contribution to the domain of security culture, empirical data was
only collected from Malaysian organizations. Therefore, we
intend to develop a cross-cultural perspective by evaluating
different cultural contexts: Saudi Arabia (representing Middle
Eastern Culture) and Australia (representing Western culture) to
exploring the influence of national culture on cultivating security
culture Thus, the researchers will attempt to answer the following
question:
[5] Thomson, K.-L., R. von Solms, and L. Louw, Cultivating an
organizational information security culture. Computer Fraud
& Security, 2006. 2006(10): p. 7-11.
[6] Zakaria, O. and A. Gani. A Conceptual Checklist Of
Information Security Culture. in Proceedings of the 2nd
European Conference on Information Warfare and Security2003. 2003. Academic Conferences Limited.
[7] Martins, A. and J. Eloffe, Information Security Culture, in
Security in the Information Society, M.A. Ghonaimy, M. ElHadidi, and H. Aslan, Editors. 2002, Springer US. p. 203214.
[8] Chia, P., S. Maynard, and A. Ruighaver. Exploring
Organisational
Security
Culture:
Developing
A
Comprehensive Research Model. in Proceedings of IS ONE
World Conference. 2002.
What is the relationship between information security culture and
information security practices?
[9] Ruighaver, S.B. Maynard, and S. Chang, Organisational
security culture: Extending the end-user perspective.
Computers & Security, 2007. 26(1): p. 56-62.
To address this question, we will develop variance model based
on Lim’s theoretical framework [10] of the relationship between
security culture and security practices on Malaysian context.
Then the model will be tested in different cultural contexts. A
survey will be conducted in organisations with different perceive
[10] Lim, et al., Towards an Organizational Culture Framework
for Information Security Practices, in Strategic and Practical
Approaches for Information Security Governance:
Technologies and Applied Solutions 2012, IGI Global. p.
296-315.
!
Organizational Forensic Readiness Model
Mohamed Elyas
Atif Ahmad
Sean B. Maynard
Andrew Lonie
Department of Computing and
Information Systems
The University of Melbourne
+61 3 8344 1517
Department of Computing and
Information Systems
The University of Melbourne
+61 3 8344 1396
Department of Computing and
Information Systems
The University of Melbourne
+61 3 8344 1573
Victorian Life Sciences Institute
(VLSI),
The University of Melbourne
+61 3 8344 1395
[email protected]
b.edu.au
[email protected]
sean.maynard@unimelb. [email protected]
edu.au
systematic holistic approach to the phenomenon has been lacking
[9]. In our project, a comprehensive model for organizational
forensic readiness is introduced. A Systematic Literature Review
(SLR) has been conducted over a course of two years to
synthesize the body of knowledge in the area, with an aspiration
to develop a more holistic understanding of the phenomenon. The
proposed model explains the key drivers of forensic readiness, the
key factors that contribute to readiness, and how these factors
work together to achieve forensic readiness in organizations. The
model has been refined through a series of focus groups with
forensic experts from major consulting firms, business, academia,
and law enforcement. The model is currently being validated
through a multi-round online survey (Delphi study). The panel
members of Delphi are computer forensic professionals and
academics recruited from across the world. This is – to the best of
knowledge – one of the most rigorously validated work in the
area. From practical point of view, the model is designed to help
organizations in assessing, and subsequently improving their
forensic readiness.
Categories and Subject Descriptors
k.6.5 [Management of Computing and Information Systems]:
Security and Protection – Unauthorized access (Hacking,
Phreaking, etc).
Keywords
Digital Forensic Readiness, Proactive Digital Forensic, Corporate
Forensic
1. INTRODUCTION
Individuals, organizations, and governments are becoming
increasingly dependent on information technology overtime. More
transactions and businesses are done online than ever before –
saving organizations and individuals considerable amounts of
time and effort. However, the brilliance and convenience of IT is
not headache-free. Criminal minds invented methods to exploit
the digital realms of people – which led to the emergence of what
is known as digital forensics.
2. REFERENCES
Digital forensics is “the application of science to the
identification, collection, examination, and analysis of data while
preserving the integrity of the information and maintaining a strict
chain of custody for the data” [1]. Digital forensics (also known
as computer forensics, IT forensics, and forensic technology) aims
to secure proper digital evidence, in order to ensure that wrong
dowers are legally bound to their actions. The field has been for
long associated with law enforcement [2]. However, influenced by
new regulations, cyber attacks, industry standards, and the heavy
reliance on digital assets, digital forensics now plays a more
prominent role in many civil organizations [3]. The art of digital
forensic is generally reactive – practiced in response to incidents.
However, in the context of organizations, digital forensic takes
more proactive stance [3], which brings us to the concept of
organizational forensic readiness.
[1] NIST 2006. Guide to Integrating Forensic Techniques into Incident
Response. NIST SP800-86 Notes. US
[2] Cully, A. 2003. Computer forensics: past, present and future.
Information Security Technical Report, pages 32-36.
[3] Elyas, M., Maynard, S., Ahmad, A., and Lonie, A. 2014. Towards a
Systematic Framework for Digital Forensic Readiness. Journal of
Computer Information Systems. (In Press)
[4] Tan, J. 2001 Forensic Readiness. @stake. Retrieved September 26,
2005, from:
http://www.atstake.com/research/reports/acrobat/atstake_forensic_re
adiness.pdf.
[5] Mouhtaropoulos, A., Grobler, M. & Li, C.T. 2011. Digital Forensic
Forensic readiness was introduced by Tan [4] in a technical report
focusing on system monitoring techniques. Since then, the term
gained popularity and has frequently been used in digital forensic
literature [5] [6]. Forensic readiness concerns with increasing the
forensic potential of organizations [7]. An increased forensic
potential means that an organization would stand better chances in
securing sound digital evidence, that may be used successfully in
prosecution, defense, and other legal issues [3]. Forensic
readiness also helps organizations to demonstrate compliance with
the relevant laws and regulations.
Readiness - An Insight into Governmental and Academic Initiatives.
European Intelligence and Security Informatics Conference. Pages
191-196, Athens, Greece: IEEE Computer Society.
[6] Popovsky, B. E., Frincke, D. A. & Taylor, C. A. 2007. A Theoretical
Framework for Organizational Network Forensic Readiness. Journal
of Computers. Pages 1-11.
[7] Rowlingson, R. 2004. A Ten Step Process for Forensic Readiness.
International Journal of Digital Evidence. 2 (3).
[8] Australian Institute of Criminology 2009. The Australian Business
Assessment of Computer User Security: A National Survey.
Australian Institute of Criminology, Australia
[9] Grobler, C., Louwrens, C. & von Solms, S. 2010. A framework to
Despite of its importance, recent studies showed that only 2% of
Australian organizations have a forensic plan at all [8]. There
have been a number of attempts throughout the past decade to
study forensic readiness from different perspectives, but a
!
guide the implementation of Proactive Digital Forensics in
organizations. International Conference on Availability, Reliability
and Security. Pages 677-682, IEEE Computer Society.
Toward an Intelligence-Driven Information Security Risk
Management Enterprise for Organizations
Jeb Webb
Department of Computing & Information Systems
University of Melbourne School of Engineering, Australia
(04) 5227-7553
[email protected]
[4] Baskerville, R. 1991. Risk Analysis: an interpretive
feasibility tool in justifying information systems security.
European Journal of Information Systems 1, no.2: 121-130.
DOI= 10.1057/ejis.1991.20
Categories and Subject Descriptors
K.6.5. [Security and Protection] and K.6.1 [Project and People
Management: Management techniques, strategic information
systems planning, systems analysis and design
[5] Coles, Robert S., and Rolf Moulton. 2003.Operationalizing
IT risk management. Computers & Security 22.6: 487-493.
DOI= 10.1016/S0167-4048(03)00606-0
Keywords
Strategic interest; information security; risk management process;
organizations; business processes; intelligence cycle; collection
and analysis; U.S. Intelligence Community; situation awareness;
human factors and ergonomics
[6] Endsley, M.R. 1995. Toward a Theory of Situation
Awareness in Dynamic Systems. Human Factors 37, no. 1:
32-64. DOI= 10.1518/001872095779049543
[7] Endsley, Mica R. and Debra G. Jones. 2011. Designing for
Situation Awareness: an Approach to User-Centered Design.
Boca Raton, Florida; London: CRC Press. eBook ISBN: 9781-4200-6358-5
1. INTRODUCTION
A literature review revealed three endemic deficiencies in
information security as practiced today. Organizations tend to
focus on compliance more than protection; to estimate risk rather
than investigating it; and to assess risk on an occasional (as
opposed to continuous) basis. These tendencies all indicate that
important data is being missed and that the situation awareness of
decision makers in many organizations is currently inadequate. To
answer the research question “how can situational awareness be
increased in information security risk management?” this PhD
research project turns to Mica Endsley's situation awareness
theory, and examines, by way of case study using publicly
available documents, how the U.S. national security intelligence
enterprise, as a best practice case of situation awareness
development in security and risk management, achieves this. We
will then adapt these functions for use by organizations in their
information security risk management processes.
[8] Johnson, Loch K. 2012. National Security Intelligence:
Secret Operations in Defense of the Democracies.
Cambridge, UK; Malden, MA.
[9] Lowenthal, Mark M. 2003. Intelligence: From Secrets to
Policy. Washington, D.C.: CQ Press.
[10] Office of the Director of National Intelligence. 2011. U.S.
National Intelligence: an Overview. Intelligence Consumer’s
Guide. Washington, DC.
[11] Parker, Donn B. 2007. Risks of Risk-Based Security.
Communications of the ACM 50, no. 3: 120. DOI:
10.1145/1226736.1226774
2. REFERENCES
[12] Pironti, John P. 2008. Key Elements of an Information Risk
Management Program: Transforming Information Security
into Information Risk Management. Information Systems
Control Journal Vol. 2, 1-6.
[1] Ahmad, Atif, Justin Hadgkiss, and A.B. Ruighaver. 2012.
Incident response teams – Challenges in supporting the
organizational security function. Computers & Security 31,
643-652. DOI= 10.1016/j.cose.2012.04.001
[13] Salas, E., T.L. Dickinson, S. Converse, and S.I.
Tannenbaum. Toward an Understanding of Team
Performance and Training. In Teams: Their Training and
Performance, by R.W. Swezey & E. Salas (eds.), 3-29.
Norwood, New Jersey: Ablex, 1992.
[2] Alter, S. 2008. Defining Information Systems as Work
Systems: Implications for the IS Field. European Journal of
Information Systems 17, no. 5: 448-469. DOI=
10.1057/ejis.2008.37
[3] Artman, H., Garbis, C. (1998). Team Communication and
Coordination as Distributed Cognition. In T. Green, L.
Bannon, C. Warren, Buckley (Eds.) Cognition and
cooperation. Proceedings of 9th Conference of Cognitive
Ergonomics, (pp. 151-156). Limerick: Ireland. DOI=
[14] Shedden, Piya, Rens Scheepers, Wally Smith, and Atif
Ahmad. 2011. Incorporating a knowledge perspective into
security risk assessments. VINE: The Journal Of Information
& Knowledge Management Systems 41, no. 2: 152. DOI=
10.1108/03055721111134790
!
[15] Spears, Janine L. 2007. Institutionalizing Information
Security Risk Management: A Multi-Method Empirical
Study on the Effects of Regulation. PhD dissertation.
Pennsylvania State University.
Analysing Virtual Machine Usage in Cloud Computing
Yi Han
Jeffrey Chan
Christopher Leckie
Department of Computing and Information Systems
University of Melbourne
Melbourne, Australia
[email protected], {jeffrey.chan, caleckie}@unimelb.edu.au
information. However, the attacker needs to start more VMs than
normal users. In addition, they will probably stop those VMs that
are not co-resident with the victim’s VM, in order to minimize the
cost, i.e., the attacker’s VMs are relatively short-lived. If normal
user behaviours are modelled, it will be much easier to identify
these anomalous activities that are likely to deviate from normal
behaviour. Finally, such a traffic model is crucial for developing
accurate simulations of VM loads in cloud computing
environments. An ideal cloud simulation platform should be able
to reflect VM usage fluctuations that occur in real life, in order to
provide realistic simulation results.
Categories and Subject Descriptors
C.4 [Performance of Systems]: Modeling techniques.
Keywords
Self-similarity, ARIMA, VM arrival/departure statistics
1. INTRODUCTION
Due to the benefits of cloud computing in terms of
availability, cost efficiency, and scalability, many companies are
moving their computing infrastructure to the cloud. It was
estimated that by March 2012, there were around 454,400 servers
in the Amazon EC2 cloud [1], and more than 50,000 virtual
machines (VM) were requested within 24 hours in the US-East
region of the Amazon EC2 cloud. A major challenge in managing
the performance of cloud platforms in the presence of such large
and complex traffic demands is how to model the dynamics of
VM usage and predict future usage.
While the statistical behaviour of traffic in the Internet [2, 3]
and grid computing [4] has been well studied, there has been little
empirical analysis of the statistical behaviour of VMs in cloud
computing. In this paper, we monitor the arrival and departure
rates of VM requests, and the number of live VMs in the Amazon
EC2 and Windows Azure platforms. Based on the measurements,
we characterise these arrival and departure processes, and develop
a model to forecast VM demands in the cloud environment.
In this paper, we study the VM usage in the commercial
cloud: Amazon EC2 and Windows Azure, and our contributions
include: (1) We collect real-world data of the request arrival and
departure processes for VMs, identify the bursty nature of VM
arrivals and departures on different time scales, and show that
these two processes exhibit self-similarity; (2) We give a possible
reason why the above two processes are self-similar: the number
of VMs started/stopped every time period follows a power law
distribution, where a time period can be either one or two
minutes; (3) We are able to fit ARIMA (autoregressive integrated
moving average) models to the number of live VMs in the system,
and the models can be used for forecasting up to 60 time periods
(of one or two minutes). While traffic self-similarity has been
widely studied in distributed computing and networking [2-4], to
the best our knowledge there has been no work that confirms this
is the case for cloud VM usage.
There are four main benefits of this research. First, an
accurate model of the VM demands enables cloud providers to
dimension the infrastructure more precisely. One key property of a
cloud is elasticity – the provided service should expand and
shrink with user demands, and ideally, additional resources should
be available instantly. An accurate prediction of future requests is
a critical step to achieving this goal. Second, the statistics of the
arrival and departure processes play a vital role in designing
effective VM allocation policies. Common policies [5] include:
choosing the server (1) on a round robin basis, (2) with the least
number of VMs, (3) with the greatest number of VMs, (4) with
the greatest number of free CPUs, and (5) with the greatest ratio
of free CPUs to allocated CPUs. None of these policies considers
future demands, but in order to achieve an optimal result over a
long period of time, forecasting future demands is helpful. Hence
an accurate prediction model could improve existing allocation
policies by incorporating predicted future VM statistics. Third, an
understanding of normal traffic behaviour can be used by
administrators to differentiate malicious and normal user
behaviours. Security and privacy protection are major challenges
in cloud computing. Because of the comparatively low cost,
illegal users can exploit cloud resources to launch attacks.
Specifically, in [6], the authors point out a novel malicious attack:
the attacker first co-locates their VM on the server that hosts the
victim’s VM, and then build side channels to obtain sensitive
2. REFERENCES
[1] Amazon data center size, http://huanliu.wordpress.com/
2012/03/13/amazon-data-center-size/
[2] Crovella, M.E., and Bestavros, A. Self-similarity in World
Wide Web traffic: Evidence and possible causes. IEEE-ACM
Trans. Networking, 5, 6 (1997), 835-846.
[3] Papagiannaki, K., Taft, N., Zhang, Z.L., and Diot, C. Longterm forecasting of Internet backbone traffic. IEEE Trans.
Neural Networks, 16, 5 (2005), 1110-1124.
[4] Li, H., Muskulus, M., and Wolters, L. Modeling job arrivals
in a data-intensive Grid. Job Scheduling Strategies for
Parallel Processing, 4376 (2007), 210-231.
[5] Jansen, R., and Brenner, P.R. Energy efficient virtual
machine allocation in the cloud: An analysis of cloud
allocation policies. In Proceedings of the International
Green Computing Conference and Workshops, 2011, 1-8
!
[6] Ristenpart, T., Tromer, E., Shacham, H., and Savage, S. Hey,
You, Get Off of My Cloud: Exploring Information Leakage
in Third-Party Compute Clouds. In Proceedings of the ACM
Conference on Computer and Communications Security,
2009, 199-212.
How tightly connected are communities?
[Extended abstract]
Minh Van Nguyen
Michael Kirley
Dept. Computing and Information Systems
The University of Melbourne
Melbourne, Australia
Dept. Computing and Information Systems
The University of Melbourne
Melbourne, Australia
[email protected]
[email protected]
Rodolfo García-Flores
CSIRO Mathematics, Informatics and Statistics
Clayton, Australia
[email protected]
Categories and Subject Descriptors: H.2.8 Database Management: Database applications – Data mining
General Terms: Measurement; Experimentation
compactness
1 Google
Keywords: Community structure; Graph partitioning
1. INTRODUCTION
1 Enron
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Networks are ubiquitous in modern society. From the Internet to
social networks, a network can be divided into clusters where the
nodes in each cluster are tightly connected among themselves, with
sparse connection between a cluster and the rest of the network.
Clusters that satisfy this property are known as communities [1].
In terms of the Internet, communities represent clusters of Autonomous Systems that, once extracted, allow us to identify a minimum set of links whose removal would fragment the Internet. While
there are efficient algorithms to extract communities, a fundamental
issue remains: How well connected are the nodes in a community?
0
100
0
101
102
103
community size
104
100
101
102
103
community size
104
Figure 1: (Color online) The compactness of communities in
two networks. In each plot, black dots represent the compactness W ∗ of communities and a blue dot represents the ideal
∗
when a community has all possible edges.
compactness WK
n
Each blue curve models the ideal compactness and the red
∗
curve provides an upper bound on both W ∗ and WK
.
n
community, in contrast to the clustering coefficient which measures
the average local cliquishness of a node.
2. METHODS
We propose a technique to measure how tightly connected is a community. Denote by dij the distance, or minimum number of links,
separating two nodes i and j in a community C. A community
whose nodes are tightly connected among themselves must be such
that the sum of the distance between allP
pairs of nodes is as small
as possible. That is, the sum W (C) = i<j dij should be minimal. If T is a minimum spanning tree of C, the ratio W (C)/W (T )
quantifies the probability that C has a topology similar to T . We
(C)
define the compactness ratio W ∗ = 1 − W
to measure how
W (T )
tightly connected is the community. The closer that W ∗ is to 1,
the more tightly connected is the community. A related measure
is the clustering coefficient [2], which in terms of social networks
quantifies the probability that two friends have a friend in common.
The compactness ratio measures the global connectedness across a
3. RESULTS
We have computed the compactness of communities in various social, information, technological, and biological networks. See Figure 1 for results for two real-world networks. For large communities on n nodes, we find that the compactness changes at a rate that
is at most proportional to n(log1 n)2 . The largest communities in
some networks have low clustering coefficient (< 0.02), yet high
compactness (W ∗ > 0.3). We have verified that the high compactness of a community can be attributed to edges that act as shortcuts in the community. The shortcuts connect nodes having high
numbers of links to nodes with low numbers of links, thereby decreasing the minimum number of links that separate distant nodes.
The presence of shortcut edges means that the overall structure of
a community can be highly compact, despite a low clustering coefficient.
4. REFERENCES
!
[1] S. Fortunato. Community detection in graphs. Phys. Rep.,
486:75–174, 2010.
[2] D. J. Watts and S. H. Strogatz. Collective dynamics of
‘small-world’ networks. Nature, 393:440–442, 1998.
!
(
4
)
B
,
5
(
6
4
2
C
7
7
D
)
8
,
E
0
;
P
-
,
,
>
9
/
7
7
I
Q
#
,
K
2
@
'
,
>
,
7
0
;
/
3
@
E
J
2
2
?
.
N
P
$
1
,
9
N
0
$
0
=
,
M
A
/
<
F
L
@
.
;
2
&
-
:
K
0
%
,
0
.
J
9
$
+
/
H
.
"
*
/
5
K
O
1
2
,
3
.
G
.
?
2
;
/
A
.
0
;
/
:
M
,
>
?
P
,
A
@
P
.
@
h
¯
R
S
T
U
V
W
X
Y
U
Z
S
[
\
]
^
_
`
U
a
T
b
U
Z
a
X
Y
c
T
W
X
°

d
e

f
e

|
g
y
h
}

i
{
j

k
j

l
x
j
v
|
m

n
o
v
|

j
v
{
p

q
}
n

r
{
n
v
p

k
u
s
v
t

v
}
w
v
x
y

z
{

}
}
x

|
v
}
~
x
v
{
y
|
}
x

{

{
v

W
X
\
}
²

}
³
|
´
}
v
µ
¶

¦
s
{
v

¸
¹
º
h
·
x
³
|

y
{
v

x
z
v

{
v
´

»
s
{
y

v
|

}

v
|

e
u
U
±
Z
|

}

{
|
y

}

|
¡
}
x

}
v
x
v

{

|
}

{
|

x
|
w

}

{

w
e

|
y
}

v
}
y

}

v
}
|

w
}

v
v

|
}
x
x
v
v
}

{

}
v

y
}
v
|

}

v
|
w

|
}
x
v

v
x
v

|
x

y
{

}

{

|
}
x
v

]

z
}
v

y
x

{

|
}
x
v

z
{

x
v

|
}
z

y
w
x
y
z

x
x
y

}
{

v
}

{
|

}

v
|
{
x
|

{
e

{

}

|
x
}

{
v
|
{

{
v

x
v

v
}

{

|
}
x
v

{
y

v
|

}

{

{
x
y

}
y
{
x

y
}
|
v
}

w
|
{
s
{

x
z

x
v

v
|

}
v
|
}
{
v

|
y
x
y

}

v
e
£
|
x

{

{
{
y

w
y
}

|
v

|
z
}

x
v
y

y
y
v

{
|
|

{
{

{

¤
}
f
v
¥
|
x

¦
x

{
§

x
y

}

}
z

y
|

}

y
}
}

v
{

}
y

|
{
}
x
v
«

{
x
y
x

|
z
y
w

}

y
z

¡
}
y

}

x
}

v
}

|
{
|
{
}
v

{

y
¤
y
h
{
g

s
w

x
©
y

{

{
z

|
|

v
£

y
y
y
}

v
}

{

y
x
x

}

}
z
x
¡
v

}
¡
|

¨
x
y
x
y
x
|

x
{
z

{

e
¬
{

x
w
|

}
w

y
v

x
y

}
}
|

v
}
|

|
}
x
v
¡
}
|

y
x

y
|

{
|
¡

v
¼

v
|

x
w
|

|
}
{
x

{
v

y
}
v
y

|
x

z
z

{
}
x
v

}

v
w
y

x
w
z
y
{

{

{
v

}
¨
{

w
{

}
{

x
y

{
v

{

}

y
{
x
y

|
v
x

|
{

|
v

}
w
w
x
x
y
y
©
z

{

}
}

}
{

}
|
x

v
{
}
|

|
x

|
}

z
{

}

x
v
|

}
{
v
|
|

{
{

{

{
{
v
}
y

|
x

w
{
©
|
|

{
{

|
|

}

x
}
¡
x

{
y
y
x

{

}

|
|

x
v

{
v

|
}
}
v
x

{
v
y

}
|

}
x

w
v

}

{

v
©

{

}
|
y
x

{
x

}

}
{
|
y

|
{
v

x
}
x

}
v
|

}

x
|
x

{

|
}
x
v

{
}

x
v
|
y
}

|
}
x
v
}

}

v
}
w
}

{
v
|
}
v
|
¡
x
z
{
}
v
{

e
£
x
x

y
|
v
z

v
{
v

¨
x

y
v
x
¡

}

w
}
y

y
x
x
w
x
w
x

|
}
z
{

}
|

w
x
y

{

}
v

{

y
{

{

x
w

}
{

{
|
{
e

}
|
}
x
v
{

w
x
y
w

|
}
x
v

}
{

|
}
x
v

}

|
}
v

|
©

{

|
{
}

{

v
x
|

}
x

v
}
v

}
v

}
y
}

{
|

}
v

}
v
x
y

{
|
}
x
v

x
w

{

|
}
x
v

}
v
{

x
}
v
|
x

|
|

{
|
w

|
}

x
v

{

{
v

}

x
v

v
|

|
}
x
v

e
w
z

{

x
w

y
{
x

¾
ª

y
£
¡
h

{
w

e
w

½
y
¡
¢

x
{
v

x
{

w
~

e
©

{
y
|

}

{

{
v

}

w
}

}
{

{

}
v

}
{

{
|
{
e
x
¿

À
Á
Â

b
Ã
Ä

]

}
x

}

x
|
v
{

}

y
y

{
{

{

}

y
|
{

ª
w

y
{
v

}

z
x

e
}
u

}

v
{

}

v
{

y
x

{

}

}
|

}

|
y
}

|
}
x
v
x
w
|

{

v
}
w
x
y
z

{
|

}

}
x
v

{
y

{

}
v
w
y

v
|
{

¡
x

w
y
x
z

{
v

x
z
{

{
|
}
x
v
x
y

{

|
{

x
x

x
v
|

}
{
v
z

z
x
y

v
}

{

}
v

y
x

}

x
x

v
|
z
x
y
|
}

¡
x
y

{

|
}
z

{

}
v

}
}
}

y
|

z
y
|
}

{
y

{
|

|
}

}
x
x
v
v

}
|
{

v
}

{

x
x

y
|

w
{
{
}
{
y
v
y

¥
{

}

e
x
v

|
}
x
|
x
v
¡

{

{
x
y
v
y
y

{
v

{
}

y
v
x

v
{
}
{

{

v
}
{

{

|
}
x
v

{
y

y
y

{

{
|
}
x
v
x
w

x
y
w
y

{
y

{
|

|
}

}

{
|
}
x
v
{

}
x
y
{

w
|
w
|

x
|
y
|
v

}

w
v
}

x
x

w
|
y
w

|
}
x
v

}

}
{
x

{

}
|
{
y

|
x

y
{

}

{

|
{
y

x
¡
e
¢
z

v
v

}

}
x
v

}
¡
}

v
|

{
v
|
{

y
x
{

|
w

{
|
¡

}
|

{

w
y
v
y

|
}

|
}
x
v

{
|
v
}
{
w
{
z

y
x

{

}
|
}
x
v
{

w
}
{
{

¬
¡
x

|
x

Æ
v

|
w

}

}
y
|

v
g
s
¥
d
s

x
y
|
{
v
|

y
x

z
y

}
w
}

{

w
x
y
x

{

y
y

{

{
|
{

w
w
}

}

v
|
w
}
v

y
v

x
y
¡
}

}

y
v
{
y
}

{

}
{

v
{

v
Å

y
v
z
x
z
z

v
}

{
|
v
}
|
x
{
v

{

v
|

}

|
{

{
y

}

y
x

£
e

{
{

}
¡
z
}
}
|

{

y
|
z

x
|
}

y
}

}
{
v

|
|

{

}
{

}
v

|
v

{
|

{
v

}
v

|
{

z
}

|
h
{
f

s
h

§
y
s

h
x

|
y

x
w

}

Æ
x

y
Ç

}

v
|

}
v

É
{
g
v
f

y
e
R

]

{
t
x
x

y
{
v

x
z
v

y
x

|
}
x
v

{

x
y

Ç
{

x
Í

{

d
w

¡
{
y
{

x
x
y
}
v
|
}
v

e
®

y
z

{

|
}

{
|
x
{

{

Ï
z
e
Ð
{
Ñ
v

|
Ò

Ó
®
t
Ô
{
Ì
Õ
¼
y
Ì

Ö
©
{
¢

×
Ø
e

e
Ù
g
{
Ú
¾

Û
»
{

Ü
¾

Ý
¼
e

Ö
Î
e
v
Ó
}
x
y
Û
Þ

Ì
Ö
y
Ù

{
Þ
Ö
á
¤
¥
¨

g
§
f

§
f
s

e
{
z
{
©
y
}

v
{

®
e
â
e
{
v

È
x

Ç
e
g
¾
¾
»
e
Æ

y
w
x
y
z
{
v

}
v
x

y
{

|
}

x
w

|
y
}
v

{

}
v

|
}
x
v

e
ã
×
Ò
Þ
Ð
ä
Ø
¼
Ø
å
Ø
Ý
Ö
Ý

Ü
Ý
¼
Ö
Ó
Ý
æ
Ò
×
ç
Ú
è
Ø
Ù
Þ
Ö
Ú
ç
Ô
Ô
é
ß
Þ
Ø
¼
ß
Ò
Ù
Ý
Û
Ü
Ó
Ô
Ò
Ý
ß
Õ
Ó
Ð
¥
g

¥
g
f
y
|
}

§
s
¬
e

|
ê

{
v

~
{

ì
e

}
x

Å
e
{
v

{

}

Æ
e
¥
É
g
f
e

y
w
x
y
z
{
v

x
w
|

z
x

x
z
z
x
v
v
x
v

|
x

y
{

}

s
î
ï
í
ð
ñ
ò
ó
ô
õ
ö
ò
ï
÷
Ò
æ
¼
ø
Ø
×
Ö

þ
y

{

}

{
|
}
x
v
e
ù
ã
×
Ø
Þ
¼
ß
Þ
Ö
Ø
Ù
Ú
ú
û
Ô
Ö
×
ß
Ö
Ù
Þ
Ö
Ð
ü
v

}
v

Ý
e

|
|
|

{
w
s
y
Û

e
í

e
|

|
{
x

Û
}
y

h
{
{
z
à

v
y

x
|
v
v
h
¬
w

|
Ê

v
v

}
e

{
¬
y
£
x
|
|

Ð
!
"
#
$
3
>
$
8
"
?
8
%
&
$
@
3
8
&
9
A
8
'
:
:
B
$
9
(
7
@
:
#
;
2
:
)
5
)
'
*
&
*
C
E
)
8
%
,
5
*
4
+
<
.
&
,
5
$
'
-
<
%
5
E
5
4
*
&
*
A
8
%
(
&
*
.
7
0
=
3
@
/
5
1
&
1
:
2
8
:
$
3
:
3
'
5
2
3
2
5
'
&
4
=
1
=
:
$
3
&
3
1
6
4
<
:
&
&
1
3
5
3
$
2
4
2
2
&
$
7
8
4
*
&
)
+
5
,
5
D
A
+
(
*
O
)
3
8
=
%
"
"
G
5
5
<
%
&
H
@
7
*
H
@
8
4
5
;
E
7
B
@
<
I
B
8
(
0
1
2
3
)
+
,
D
D
*
3
8
8
$
9
:
:
<
5
5
<
%
&
5
-
$
&
"
B
1
I
8
3
@
@
(
5
;
2
3
5
1
4
)
.
E
2
4
*
C
=
7
3
*
5
&
3
*
)
8
G
%
&
'
=
%
<
2
$
5
8
*
7
5
=
*
1
E
:
+
,
D
D
:
,
K
:
B
@
8
4
5
;
E
7
B
@
<
%
2
A
;
4
(
&
)
B
N
7
3
*
)
D
L
O
*
-
)
!
0
1
*
P
2
3
8
&
&
;
'
)
B
?
8
*
&
)
M
Q
4
8
4
&
1
B
7
;
8
=
B
5
1
@
@
@
$
)
4
5
J
?
'
*
%
)
&
N
=
E
;
5
B
&
1
B
B
$
5
%
%
8
`
=
2
$
@
\
7
@
1
@
h
)
&
E
g
@
B
B
R
U
*
M
B
@
B
_
1
5
2
a
8
B
5
j
I
T
O
@
8
2
4
U
k
1
@
@
S
g
I
7
B
*
i
;
@
2
@
g
@
E
5
8
i
l
8
8
7
Y
&
W
J
1
L
(
V
>
$
8
8
&
5
B
4
7
3
8
)
.
*
J
s
s
h
Z
\
U
]
o
h
d
\
V
Z
j
q
r
D
+
D
J
D
#
1
O
Y
*
D
M
$
3
*
Z
[
\
]
^
_
`
[
]
a
b
]
`
a
c
d
U
_
]
a
^
]
e
*
1
4
K
X
F
)
4
<
X
,
E
m
X
K
J
%
5
@
5
%
[
&
f
7
B
]
I
N
#
F
2
f
:
(
B
O
@
O
*
B
n
8
O
*
&
I
g
#
5
^
%
_
o
$
4
a
3
4
^
1
&
%
p
'
1
8
f
&
B
]
8
c
1
5
@
@
J
H
g
;
^
=
d
8
=
3
a
&
c
E
B
4
@
&
1
@
;
Private Spatial Data Processing on Trajectory Data
Maryam Fanaeepour, Egemen Tanin, Lars Kulik
National ICT Australia (NICTA),
Department of Computing and Information Systems,
University of Melbourne, Parkville, Victoria 3010, Australia
[email protected], {etanin, lkulik}@unimelb.edu.au
Categories and Subject Descriptors
2. REFERENCES
H.2.8 [Database Applications]: Spatial Databases
[1] F. Braz, S. Orlando, R. Orsini, A. Ra_aela, A. Roncato, and
C. Silvestri. Approximate aggregations in trajectory data
warehouses. In Data Engineering Workshop, 2007 IEEE
23rd International Conference on, pages 536-545, 2007.
Keywords
Location Privacy, Aggregate Data, Trajectory Analytics, Distinct
Counting Problem, Spatial Databases.
1. EXTENDED ABSTRACT
[2] C.-Y. Chow and M. F. Mokbel. Trajectory privacy in
location-based services and data publication. SIGKDD
Explor. Newsl., 13(1):19-29, 2011.
The demand for location based services (LBSs) has been
increasing due to the advances in location-based technologies
such as GPS, RFID, GSM networks. As a result, a large amount of
spatio-temporal datasets regarding moving objects trajectories are
being created every day. Keeping personal spatial data private is a
significant concern and challenging issue for LBSs, because of the
potential disclosure of users' individual information [2]. This data
exposure is considered as a potential danger to privacy. A
successful method to protect the personal individual data is the
use of the aggregated data [1]. Aggregated data can be counted
information which can be applied on spatial data. As a result, the
individual data would not be accessible by others. Trajectory
mining as a new study direction has drawn the attention of many
researchers [4]. Traffic monitoring and control systems have
become popular for spatial data analytics [11]. In this kind of
applications, based on the trajectory analytics results the decisionmaking processes will occur [1].
[3] L. Gomez, B. Kuijpers, B. Moelans, and A. Vaisman. A
state-of-the-art in spatio-temporal data warehousing, olap
and mining. Integrations of Data Warehousing, Data Mining
and Database Technologies: Innovative Approaches, page
200, 2011.
[4] H. Jeung, M. Yiu, and C. Jensen. Trajectory pattern mining.
In Y. Zheng and X. Zhou, editors, Computing with Spatial
Trajectories, pages 143-177. Springer New York, 2011.
[5] I. Lopez, R. Snodgrass, and B. Moon. Spatiotemporal
aggregate computation: a survey. Knowledge and Data
Engineering, IEEE Transactions on, 17(2):271-286, 2005.
Aggregation is a key method for privacy aware trajectory
analytics. In some applications like traffic monitoring systems in
order to estimate traffic volume, processing individual data is not
required. In fact, aggregate data is the data of choice to be
processed for the desired query in traffic monitoring purposes:
e.g., “the number of cars passing a specific query region during a
specific time” or “the number of users visiting a particular area
within a particular period of the day”? A common problem for
spatial data mining using aggregate data is the distinct counting
problem, which is also known as the double counting problem,
where an object with an extent is counted multiple times since it
re-enters query region for several timestamps during the query
interval. Therefore, it will be counted multiple times in the result.
Traffic monitoring as a very popular application domain using
trajectory data related to the cars could lead to a considerable
level of inaccuracy in providing correct answers because of the
distinct counting problem. In the literature [1, 3, 5, 6, 7, 8, 9, 11],
the problem of maintaining accurate count has been considered as
a difficult research question and no solution has been provided.
We are the first to propose an accurate answer for the distinct
counting problem. We propose the Connection Aware Spatial
Euler Histograms (CASE Histograms). In CASE histograms, we
keep the connectivity between a moving object path without
storing the ID. Therefore, if an object re-enters a region more than
once, it will not be counted multiple times. Theoretically and
experimentally, we show that this new method will provide
accurate answer whilst preserving privacy.
[6] S. Orlando, R. Orsini, A. Raffaetá, A. Roncato, and C.
Silvestri. Spatio-temporal aggregations in trajectory data
warehouses. In I. Song, J. Eder, and T. Nguyen, editors, Data
Warehousing and Knowledge Discovery, volume 4654 of
Lecture Notes in Computer Science, pages 66-77. Springer
Berlin Heidelberg, 2007.
[7] T. B. Pedersen and N. Tryfona. Pre-aggregation in spatial
data warehouses. In Proceedings of the 7th International
Symposium on Advances in Spatial and Temporal Databases,
SSTD '01, pages 460-480. Springer-Verlag, 2001.
[8] Y. Tao, G. Kollios, J. Considine, F. Li, and D. Papadias.
Spatio-temporal aggregation using sketches. In Data
Engineering, 2004. Proceedings. 20th International
Conference on, pages 214-225, 2004.
[9] T. Wan, K. Zeitouni, and X. Meng. An olap system for
network-constrained moving objects. In Proceedings of the
2007 ACM symposium on Applied computing, SAC '07,
pages 13-18. ACM, 2007.
[10] H. Xie, L. Kulik, and E. Tanin. Privacy-aware traffic
monitoring. Intelligent Transportation Systems, IEEE
Transactions on, 11(1):61-70, 2010.
!
[11] H. Xie, E. Tanin, and L. Kulik. Distributed histograms for
processing aggregate data from moving objects. In Mobile
Data Management, 2007 International Conference on, pages
152-157. IEEE, 2007.
The Earth Mover’s Distance Based Similarity Join Using
MapReduce
Jin Huang
Department of CIS
University of Melbourne
Melbourne, VIC, Australia
[email protected]
Categories and Subject Descriptors
unit of workload. To further enhance the pruning power,
multiple approximations in different spaces are integrated
and the data are partitioned using the reduce key corresponding to their relationships in different spaces.
H.2.8 [Database Management]: database applications;
C.2.4 [Distributed Systems]: distributed applications
Keywords
Similarity Join, MapReduce, Earth Mover’s Distance
References
Introduction
We investigate processing the similarity join query in a distributed cluster using the MapReduce programming model [2].
The similarity join finds all pair of records in data sets such
that the distance between them is smaller than a given
threshold. In this study, we focus on Earth Mover’s Distance (EMD) [8] due to its popularity among content based
image retrieval and uncertainty analysis [14] [3] [10]. The
major challenge is that the computation cost of this advanced metric is prohibitive (super-cubic in average), leading the join operation unacceptable when the data sets grow.
The MapReduce is a good option to enable the operation
as it provides the horizontal scalability. Recent years have
seen some efforts devoted in designing join algorithm using
MapReduce [1, 7, 12, 13, 6, 5, 11, 15, 4], which have significant drawbacks when applied to EMD similarity join, such as
the assumption on that data are sparse on high-dimensional
data spaces [13, 6] and extensive EMD computation, relying
on the sampling data, and intolerance towards skewed data
sets [11], which are commonly observed in real applications.
To tackle these problems, this study aims at devising more
efficient MapReduce algorithm to perform EMD based similarity join on large scale distribution (histogram) data sets.
The proposed approach follows the refine-filtering strategy.
The general idea is to employ much cheaper lower bounds
of EMD to avoid expensive exact computations and relies
on space transformation techniques to enable early pruning
on the transformed data. To implement the idea we need
to integrate the lower bounds of EMD into the geometric
pruning techniques and to balance the workloads for each
reducer so that the computation is paralleled in the highest
degree. Specifically, the solution first applies the normal distribution approximation [9] to the distribution data set, and
then transforms the approximations into a two-dimensional
space where the geometric pruning and grid-based load balancing techniques can be used to determine which reducers
should the data be assigned to perform the exact EMD computation. For better load-balancing effects, a quantile based
technique is introduced to partition the space into grid cells
with different sizes. These grid cells are used as the basic
!
[1] S. Blanas, J. M. Patel, V. Ercegovac, and J. Rao, “A
comparison of join algorithms for log processing in
mapreduce,” in SIGMOD, 2010.
[2] J. Dean and S. Ghemawat, “Mapreduce: Simplified
data processing on large clusters,” in OSDI, 2004.
[3] K. Grauman and T. Darrel, “Fast contour matching
using approximate earth mover’s distance,” in CVPR,
2004.
[4] Y. Kim and K. Shim, “Parallel top-k similarity join
algorithms using mapreduce,” in ICDE, 2012.
[5] J. Lin, “Brute force and index approaches to pairwise
document similarity comparisons with mapreduce,” in
SIGIR, 2009.
[6] A. Metwally and C. Faloutsos, “V-smart-join: A
scalable mapreduce framework for all-pair similarity
joins of multisets and vectors,” PVLDB, vol. 5, no. 8,
2012.
[7] A. Okcan and M. Riedewald, “Processing theta-join
using mapreduce,” in SIGMOD, 2011.
[8] Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth
mover’s distance as a metric for image retrieval,”
International Journal of Computer Vision, vol. 40, pp.
99–121, 2000.
[9] B. E. Ruttenberg and A. K. Singh, “Indexing the
earth mover’s distance using normal distributions,”
PVLDB, 2012.
[10] M. A. Ruzon and C. Tomasi, “Edge, junction, and
corner detection using color distributions,” IEEE
Transaction on Pattern Analysis and Machine
Intelligence, 2001.
[11] Y. N. Silva, J. M. Reed, and L. M. Tsosie,
“Mapreduce-based similarity join,” 2012.
[12] F. Ture, T. Elsayed, and J. Lin, “No free lunch: Brute
force vs. locality-sensitive hashing for cross-lingual
pairwise similarity,” in SIGIR, 2011.
[13] R. Vernica, M. J. Carey, and C. Li, “Efficient parallel
set-similarity joins using mapreduce,” in SIGMOD,
2010.
[14] D. Xu, T.-J. Cham, S. Yan, and S.-F. Chang, “Near
with spatially aligned
duplicate image identiı̈ňAcation
,
pyramid matching,” in CVPR, 2008.
[15] C. Zhang, F. Li, and J. Jestes, “Efficient parallel knn
joins for large data in mapreduce,” in EDBT, 2012.
A Model to Evaluate Therapies for Mental Health Disorders
Fernando Estrada
PhD student (starting August 2013)
University of Melbourne
5/22 Abinger Street
Richmond, VIC 3121
Mobile 0403857797
[email protected]
Supervisors: Reeva Lederman /
Gregory Wadley
Department of Information Systems
University of Melbourne
Categories and Subject Descriptors
4. Design and Evaluation of Mobile Therapy
H.0 General – H.1 Models and Principles - H1.2 User / Machine
systems (Human Information Processing)
Keywords
In order to accomplish the above goal, this research will focus on
designing a method to efficiently evaluate and test mobile phone
applications in relation to mental health disorders. The
methodology to be used is a qualitative and quantitative approach
based on:
Mental Health, Mobile therapy, Model.
1. Background: Mental Health
a) Systematic review of literature available in relation to
computer-based and mobile applications in mental health;
As stated by the World Health Organization, mental health
disorders have become one of the leading causes of death among
children and, later in life, a cause of disability in adults causing
not only suffering but also impacting on quality of life, wellbeing
and productivity of individuals, their social environment and even
those around them.
b) Consultation with experts in the health/science fields,
understanding their measures and approaches to support
individuals remotely;
2. Technologies for Mental Health
c) Generating a model to evaluate and test mobile phone
applications in relation to mental health; and
d) Generating, evaluating and testing a mobile phone software
application based on on-line therapy.
With the increase of mental health disorders, supporting tools are
crucial to treatment and a faster recovery of the individual.
Statistics show that, if addressed early in life, the chances of a full
recovery increase. Potential illnesses as a consequence of an
undiagnosed and untreated mental health disorder could be
diminished, as could resources required as a result of medical
intervention.
5. Proposed research program
My three years plan is as follows:
Year 1.
Systematic literature review, Developing research framework,
Defining objectives / Main questions, Outlining research
methodology, scheduling tasks.
3. Mobile Therapy
Mobile phones, as valuable supportive tools in our daily life, may
be able to play a key role in reaching and supporting individuals
with mental health disorders. Mobile technology embedded with
knowledgeable systems reach us everywhere, even when autoisolated from others, as may be the case for some individuals with
mental health disorders; they relate to us, gathering data from, and
interacting with us. Therefore, this research aims to enhance the
mental wellbeing of individuals through the use of this technology
to efficiently support those with mental health disorders.
Year 2.
Update systematic review, qualitative research
Year 3.
!
Quantitative research, Data collection, analysis, designing
prototype and mobile phone app, Generate Model, writing
and submitting thesis.
A network model of a whole kidney
Thomas Gale
Department of Computing and Information Systems
University of Melbourne
Parkville, Victoria, Australia
[email protected]
Categories and Subject Descriptors
3. RESULTS
J.3 [Life and Medical Sciences]: Biology and genetics;
I.6.3 [Simulation and Modelling]: Applications
A large number of arterial trees have been computer generated. These generated arterial trees have similar similar
statistics to rat kidneys and are more suitable for physiological simulation than pre-existing algorithms.
Simulation of smaller structures with 4 and 16 nephrons
reproduce observed behaviour from Moss [2] when comparable simulations are performed. Whole rat kidney simulations with approximately 60,000 artery segments and 30,000
nephrons have also been performed. These simulations are
stable and produce sensible overall results, such as whole
system filtration rate and individual nephron behaviour.
Keywords
kidney, physiology, modelling, simulation
1. INTRODUCTION
The kidney is a complex system, made up of many nephrons
with varying behaviour and interactions with other nephrons.
As a whole, the kidneys behave very stably to regulate the
extracellular environment, despite outside influences, damage
or loss of tissue. Existing computational models of kidney
physiology either cover the whole organ using lumped parameters or only consider local function. Whole kidney simulation
with explicit structure for each nephron will contribute to
understanding how whole organ stability arises and how function deteriorates in diseased kidneys, which is not possible
in lumped parameter or localised models.
Moss [2] showed that it is computationally tractable to
simulate whole kidneys with a network model, but only
simulated 384 nephrons. Animal kidneys contain many more
nephrons, about 30,000 in rats and 1,000,000 in humans.
This work aims to produce a network model of a whole
rat kidney with arteries suppyling blood to nephrons, then
validate that model by comparison with animal data and
known behaviours of other existing computational models.
The rat kidney is a useful target due to its smaller size, the
wide availability of animal experiment data and its use in
other computational models.
4. DISCUSSION AND CONCLUSIONS
These results from small scale and large scale simulations
are a strong indication that the whole rat kidney model
proposed is valid. However, simulations over a wider range of
model sizes and conditions, in addition to further comparison
with data from animals and other computer models are
needed to properly validate this model.
Work is currently in progress on performing this validation. Other remaining work includes improved analysis and
visualisation techniques for simulation results with large numbers of nephrons and demonstrating that the modelling and
computational approach extends to models of human sized
kidneys.
5. ACKNOWLEDGMENTS
2. METHODS
Work on the large set of arterial structures and whole
kidney simulation was carried out using computing facilities provided by the Victorian Life Sciences Computation
Initiative.
Thank you to my PhD supervisors, Ed Kazmierczak and
Linda Stern, for their advice and encouragement.
The kidney arterial tree contains a large number of segments, making it infeasible to manually reproduce at whole
kidney scale. Algorithms have been developed to computer generate arterial tree structures, based on optimisation
rules and measurements taken from CT scanned rat kidneys.
Nephron structures are attached to leaves of the arterial tree,
including a distribution of loop of Henle lengths.
This generated kidney structure is then used for physiological simulation. A network arterial model calculates
pressure, resistance and flow rates in each artery segment,
using Poiseuille flow and the myogenic response model from
Kleinstruer. [1] The arteries are connected to the nephron
model from Moss [2], with modifications to the afferent arteriole and glomerulus to use the blood pressure supplied by
the connecting arteries.
6. REFERENCES
!
[1] Kleinstreuer, N. C., David, T., Plank, M. J., and
Endre, Z. Dynamic myogenic autoregulation in the rat
kidney: a whole-organ model. American Journal of
Physiology – Renal Physiology (2008).
[2] Moss, R., Kazmierczak, E., Kirley, M., and
Harris, P. A computational model for emergent
dynamics in the kidney. Philosophical Transactions of
the Royal Society A (2009).
Review of Web-based Software Frameworks for Clinical
and Biomedical Research Collaborations
Tracy McLean
PhD Student
Computing and Information Systems
The University of Melbourne
Parkville 3010 VIC Australia
(+61) 490088422
[email protected]
technologies, security models and levels of maturity in which
these frameworks are used in a clinical//biomedical setting. The
author will also review efforts made towards achieving a unified
platform in a clinical/biomedical setting, and determine the
feasibility of achieving such a platform that could potentially
integrate different applications such as registries, data
management systems, hospital patient care systems within the
same domain and also across different domains including neuro-,
endo-, and cancer.
Categories and Subject Descriptors
D.2.13 [Reusable Software]: Domain engineering, reusable
models, reusable libraries.
Keywords
Clinical research, biomedical research, software frameworks.
1. INTRODUCTION
The increasing demand to support interdisciplinary research and
collaboration in the field of biomedical and clinical research
requires lessons to be learnt in building such systems and
improving the overall knowledge of how to build such systems. It
is often the case that research projects begin with little or no
recourse or understanding of previously undertaken efforts. Webbased clinical/biomedical research frameworks represent one way
that successful designs can be captured and applied.
2. ACKNOWLEDGMENTS
I would like to thank my supervisor Richard Sinnott for his
guidance throughout my PhD thus far.
The evolution of clinical and biomedical research transverse
different data management methods from paper-based to
spreadsheets to standalone ad-hoc systems to web-based
integrated systems [1] [2]. Whilst some organisations continue to
use paper files, spreadsheets and home-grown databases, the
emergence of collaborative scientific research and the evolution of
the World Wide Web have helped to shape the way in which
developers design and implement software systems for clinical
and biomedical research collaboration.
3. REFERENCES
It is evident that web-based systems in clinical and biomedicine
are becoming more prevalent and usage of these systems has
increased dramatically, however the integrated usage of these
systems is a complex activity due to heterogeneity of data,
differences in access control and security domains. Whilst there
have been numerous efforts to develop a single centrallymandated systems, such efforts including the UK Connecting for
Health [3], have failed miserably [4]. However, similar
contributions continue to evolve in the implementation of unified
platforms for research, for example the Australian Urban Research
Infrastructure Network (AURIN) [5] initiative, which aims to
integrate and analyse heterogeneous data from multiple sources to
enable researchers to access a wide range of data sets.
This paper focuses on clinical/biomedical frameworks that span
across different areas of research including medical imaging,
clinical trials, -omics and phenotypic research and discusses the
design,
framework
architecture,
standards,
computing
!
[1]
J. D. Franklin, A. Guidry, and J. F. Brinkley, “A
partnership approach for Electronic Data Capture in
small-scale clinical trials.,” Journal of biomedical
informatics, vol. 44 Suppl 1, pp. S103–8, Dec. 2011.
[2]
S. Myneni and V. L. Patel, “Organization of Biomedical
Data for Collaborative Scientific Research: A Research
Information Management System.,” International
journal of information management, vol. 30, no. 3, pp.
256–264, Jun. 2010.
[3]
M. Cross, “Information technology Will Connecting for
Health deliver its promisesௗ?,” British Medical Journal,
vol. 332, no. March, pp. 599–601, 2006.
[4]
D. Martin, “NHS IT project failure Labour’s £12bn
computer scheme scrapped Mail Online,” Daily Mail,
Daily Mail, 22-Sep-2011.
[5]
R. J. Stimson, R. Sinnott, and M. Tomko, “The
Australian Urban Research Infrastructure Network (
AURIN ) Initiativeௗ: A Platform Offering Data and Tools
for Urban and Built Environment Researchers across
Australia,” no. June 2010, pp. 1–16.
The Use of Ontologies in Neuroimaging and Their
Application in Answering Abstract Queries
Aref Eshghishargh
Simon Milton
Andrew Lonie
Gary Egan
University of Melbourne
Department of
Computing and
Information Systems
Associate Professor
University of Melbourne
Associate Professor
Head of the Life
Sciences Computation
Centre at the Victorian
Life Sciences
Computation Initiative
(VLSCI)
Professor
Foundation Director
Monash Biomedical
Imaging, Monash
University
[email protected]
simon.milton@unim
elb.edu.au
[email protected]
.au
gary.egan@monash
.edu
[4] Mei, J., L. Ma, and Y. Pan, Ontology query answering on
databases, in The Semantic Web-ISWC 20062006, Springer.
p. 445-458.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Query formulation
with the use of ontologies, Search process. H.2.4 [Systems]:
Query processing with ontologies. J.3 [LIFE AND MEDICAL
SCIENCES]: Medical information systems.
Design.
[5] Möller, M., S. Regel, and M. Sintek, Radsem: Semantic
annotation and retrieval for medical images, in The
Semantic Web: Research and Applications2009, Springer. p.
21-35.
Keywords
[6] Seifert, S., et al., Semantic annotation of medical images.
2010: p. 762808-762808.
Neuroscience, Neuroimaging,
Semantic annotation.
Ontology,
Query
answering,
1. INTRODUCTION
[7] Uren, V., et al., Semantic annotation for knowledge
management: Requirements and a survey of the state of the
art. Web Semantics: Science, Services and Agents on the
World Wide Web, 2006. 4(1): p. 14-28.
[8] Magnini, B., M. Speranza, and V. Kumar. Towards
interactive question answering: an ontology-based approach.
in Semantic Computing, 2009. ICSC'09. IEEE International
Conference on. 2009. IEEE.
Large neuro-images are being produced every day [1, 2] as the
output of experimental workflows [11] in neuroscience. This led
to the design and implementation of various neuroimaging tools
and techniques [12, 13]. Our research is trying to find a way that
researchers can easily query these images and their contents. We
propose the use of ontologies as one of the best practices suitable
for managing and retrieving information from neuro-images. The
ontologies have the specifications that exactly match the
neuroscience and the data produced in this field [3]. We first
investigate current uses of ontologies, their use in neuroscience
and how they can be used to address the queries [4]. Also, how
the images should be annotated [5-7] so they can assist us on
answering the queries with maximum confidence. In the next
stage we will try to answer abstract queries with the aid of
ontologies [8-10].
[9] Vargas-Vera, Maria, Enrico Motta, and John Domingue.
"AQUA: An ontology-driven question answering system."
New Directions in Question Answering, Papers from 2003
AAAI Spring Symposium, Stanford University. 2003.
[10] Chen, L., et al., OntoQuest: exploring ontological data made
easy, in Proceedings of the 32nd international conference on
Very large data bases2006, VLDB Endowment: Seoul,
Korea. p. 1183-1186.
2. REFERENCES
[11] Killeen, N. E., Lohrey, J. M., Farrell, M., Liu, W., Garic, S.,
Abramson, D., ... & Egan, G. (2012, October). Integration of
modern data management practice with scientific workflows.
In E-Science (e-Science), 2012 IEEE 8th International
Conference on (pp. 1-8). IEEE.
[1] Ozyurt, I., et al., Federated Web-accessible Clinical Data
Management within an Extensible NeuroImaging Database.
Neuroinformatics, 2010. 8(4): p. 231-249.
[2]
Uppoor, R.S., The use of imaging in the early development
of neuropharmacological drugs: a survey of approved NDAs.
Clinical pharmacology & therapeutics, 2008. 84(1): p. 69.
[12] Adamson, C. L., & Wood, A. G. (2010). DFBIdb: a software
package for neuroimaging data management.
Neuroinformatics, 8(4), 273-284.
[3] Horrocks, I., What are ontologies good for? Evolution of
Semantic Systems, 2013: p. 175-188.
[13] Temal, L., Dojat, M., Kassel, G., & Gibaud, B. (2008).
Towards an ontology for sharing medical images and regions
of interest in neuroimaging. Journal of Biomedical
Informatics, 41(5), 766-778.
!

Download Report

CIS Doctoral Colloquium 2013 Proceedings 3.94mb

Paperzz.com

Your Paperzz