AUXILIARY MATERIAL MCMC algorithm Bayesian inference for

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
AUXILIARY MATERIAL
MCMC algorithm
Bayesian inference for MSARMs is performed by sampling from the posterior density by a
Gibbs sampling algorithm, which is described here. The algorithm is run for L+M+N
iterations, where L is the number of iterations of the burn-in period, M is the number of
iterations used to compute the posterior mode, N is the number of iterations drawing the
MCMC sample. The parameters 𝐆, 𝛍, 𝛔2 , 𝛗, 𝐱 T are drawn through five consecutive steps. The
algorithm is completed by the β€œpermutation” step and the β€œestimation” step.
[Step 0] Generate starting values 𝐆1(0) , 𝛍(0) , 𝛔2(0) , 𝛗(0) from their respective priors; set
𝐆 2(0) , … , 𝐆 z(0) equal to 𝐆1(0) ; and generate 𝐱 T(k) from 𝐆1(0) . Then, for any iteration k, go
through Steps 1-8.
(k)
[Step 1] Placing a Normal prior 𝑁(ΞΌM , 2M ) on each ΞΌi, the signals ΞΌi , for any iοƒŽSX, are
generated independently from Normal distributions of mean
βˆ‘{t:x(kβˆ’1)=i} (yt βˆ’ βˆ‘pΟ„=1 Ο†(kβˆ’1)
ο³βˆ’2(kβˆ’1)
ytβˆ’Ο„ ) + ΞΌM ο³βˆ’2
M
i
Ο„,i
t
(kβˆ’1) βˆ’2(kβˆ’1)
i
Ti
and variance
(kβˆ’1) βˆ’2(kβˆ’1)
i
(Ti
20
21
22
23
24
25
26
27
+ ο³βˆ’2
M
βˆ’1
βˆ’2
+ M
) ,
(kβˆ’1)
where Ti
is the number of observations corresponding to the contemporary state i of the
hidden chain 𝐱 T(kβˆ’1) .
βˆ’2(k)
[Step 2] Placing a Gamma prior 𝐺(Ξ±Ξ£ , Ξ²Ξ£ ) on each ο³βˆ’2
, for any iοƒŽSX,
i , the precisions i
are generated independently from Gamma distributions of parameters
(kβˆ’1)
⁄2 + Ξ±Ξ£
Ti
and
2
1
(kβˆ’1)
p
βˆ‘ (kβˆ’1)=i} (yt βˆ’ ΞΌ(k)
ytβˆ’Ο„ ) + Ξ²Ξ£ .
i βˆ’ βˆ‘Ο„=1 φτ,i
2 {t:x
28
t
29
30
[Step 3] Placing a multivariate Normal prior 𝑁(p) (𝛍Φ ; πš²βˆ’1
Ξ¦ ) on each 𝛗i , the autoregressive
31
32
coefficients 𝛗i , for any iοƒŽSX, are generated independently from multivariate Normal
distributions of mean vectors
(k)
(kβˆ’1)
33
[ο³βˆ’2(k)
𝐙′ 𝐐i
i
34
35
and covariance matrices
36
𝐙 + 𝚲Φ ]
βˆ’1
(kβˆ’1)
[𝐙′ 𝐐i
(kβˆ’1)
[ο³βˆ’2(k)
𝐙′ 𝐐i
i
(k)
(𝐲 T βˆ’ ΞΌi 𝟏(T) )] ο³βˆ’2(k)
+𝚲Φ 𝛍Φ
i
βˆ’1
𝐙 + 𝚲Φ ] ,
37
38
39
40
where Z is a (Tο‚΄p) matrix whose generic entry on the t-th row and the j-th column is yt-j ,
whereas Qi is a (Tο‚΄T) diagonal matrix whose t-th term is I(xt=i).
41
h
Dirichlet prior of parameter 𝛂 = (Ξ±j,1 , … , Ξ±j,m )β€² on 𝐆jβ€’
, each row 𝐆jβ€’
h
[Step 4] Let 𝐆jβ€’
= (g hj,1 , … , g hj,m ) be the j-th row of each matrix 𝐆h (h = 1, … , H). Placing a
h(k)
, for any jοƒŽSX, is
1
42
independently
43
(Tj,1
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
h(kβˆ’1)
generated
h(kβˆ’1)
, … , Tj,m
from
a
Dirichlet
h(kβˆ’1)
D (𝛂 + 𝐓jβ€’
), where
h(kβˆ’1)
𝐓jβ€’
=
).
[Step 5 - Permutation step] If k=L+1, set the posterior mode (π†βˆ— , π›βˆ— , 𝛔2βˆ— , π›—βˆ— ) equal to
(𝐆 (L+1) , 𝛍(L+1) , 𝛔2(L+1) , 𝛗(L+1) ).
If L+2≀k≀L+M, compute the posterior mode as
(π†βˆ— , π›βˆ— , 𝛔2βˆ— , π›—βˆ— )=arg max {p(𝐆 (k) , 𝛍(k) , 𝛔2(k) , 𝛗(k) |𝐲 T , 𝐲 0 ); p(π†βˆ— , π›βˆ— , 𝛔2βˆ— , π›—βˆ— |𝐲 T , 𝐲 0 ) },
where p(β€’ |𝐲 T , 𝐲 0 ) is computed by marginalizing over the sequence of the hidden states.
Let H be the class of the m! permutations Ξ·j of the labels (Ξ·j ∈ H, for any j=1,…,m!), so that
Ξ·j(𝐆 (k) , 𝛍(k) , 𝛔2(k) , 𝛗(k) ) be some permutation of the parameters obtained at the k-th iteration,
by which the signals, the variances, the columns of the autoregressive coefficient matrix, and
the rows and the columns of the transition matrix assume a new order.
If L+M+1≀k≀L+M+N, compute Ξ·* such that
Ξ·βˆ— = arg minβ€–Ξ·j (𝐆 (k) , 𝛍(k) , 𝛔2(k) , 𝛗(k) ) βˆ’ (π†βˆ— , π›βˆ— , 𝛔2βˆ— , π›—βˆ— )β€–
ηj ∈𝐻
and place
𝐆 (k) , 𝛍(k) , 𝛔2(k) , 𝛗(k) = Ξ·βˆ— (𝐆 (k) , 𝛍(k) , 𝛔2(k) , 𝛗(k) ).
[Step 6] The sequence of the hidden states 𝐱 T(k) is generated by the forward-filtering
backward-sampling (ff-bs) algorithm of Carter and Kohn (1994) and Frühwirth-Schnatter
(1994). It is so called because firstly the filtered probabilities of the hidden states are
computed going forwards; then the conditional probabilities of the hidden states are
computed going backwards, sampling the states in block from the full conditional
Tβˆ’1
T
T
p(𝐱 |𝐲 , 𝐲
65
0)
= p(xT |𝐲
T)
∏ p(xt |xt+1 , 𝐲 t ),
t=1
66
67
68
69
70
71
72
suppressing the conditioning on 𝐆, 𝛍, 𝛔2 , 𝛗, with yt=( y-p+1,…,y0, y1,…,yt)´.
Let ΞΎt+1|t be the m-dimensional vector whose generic entry is P(xt+1 = i|yt); ΞΎt|t be the mdimensional vector whose generic entry is P(xt = i|yt); ΞΎt be the m-dimensional vector whose
generic entry is P(xt = i|xt+1 ,yt), for any iοƒŽSX. The iterative scheme of the ff-bs algorithm is
as follows.
73
74
75
76
ΞΎ1|0 = (mβˆ’1 , … , mβˆ’1 )β€² ,
Set
(k)
as initial filtered probability.
Compute
(k)
ΞΎt|t =
77
78
79
80
81
82
(k) (k)
ΞΎt|tβˆ’1
(k) (k)
β€²
𝟏(m) (𝐅t ΞΎt|tβˆ’1 )
𝐅t
and
(k)
(k)
ΞΎt+1|t = [πšͺ t(k) ]β€²ΞΎt|t ,
for any t=1,…,T-1, where
(k)
(k)
(k)
𝐅t = diag [p(yt |ytβˆ’1 , … ytβˆ’p , ΞΌ1 , 12(k) , 𝛗1 , xt = 1), … ,
(k)
(k)
p(yt |ytβˆ’1 , … ytβˆ’p , ΞΌm , 2(k)
m , 𝛗m , xt = m)] ,
β€²
with 𝟏(m) the m-dimensional vector of ones.
Compute
2
(k) (k)
(k)
ΞΎT|T
83
=
𝐅T ΞΎT|Tβˆ’1
.
(k) (k)
πŸβ€²(m) (𝐅T ΞΎT|Tβˆ’1 )
84
85
86
87
(k)
(k)
Generate xT from ΞΎT|T .
Compute
(k)
(k)
ΞΎt
88
(k)
xt
(π‘˜)
ΞΎt ,
diag(ΞΎt|t )πšͺ
=
for any t=T-1,…,1. Vector πšͺ
90
91
92
93
94
95
96
corresponding to the state generated previously.
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
,
)
t+1
and generate
97
(k)
(k)
(k)
πŸβ€²(m) (diag (ΞΎt|t ) πšͺ (k)
β€’x
89
from
(k)
β€’xt+1
(k)
(k)
β€’xt+1
represents the column of πšͺ (k)
[Step 7 – Estimation step] Let ΞΈ be the generic entry of the parameters 𝐆 or 𝛍 or 𝛔2 or 𝛗.
If L+M+1≀k≀L+M+N, for each possible ΞΈ, compute its ergodic mean Θ(g) , with g = k βˆ’ M βˆ’
L:
Θ(g) = {[Θ(gβˆ’1) (g βˆ’ 1)] + ΞΈ(k) }⁄g,
with Θ(0) = 0. Also compute, for any t=1,…,T and any iοƒŽSX, the sum
(g)
(gβˆ’1)
Ct,i = Ct,i
+ I(xt = i),
(0)
with Ct,i = 0.
[Step 8] If k≀L+M+N-1, repeat from step [2].
If k=L+M+N, the ergodic mean Θ(g) is the estimate of ΞΈ, for each ΞΈοƒŽ{𝐆, 𝛍, 𝛔2 , 𝛗}. The
(g)
estimate of the sequence of hidden states is obtained by computing arg max Ct,i , for any
t=1,…,T.
iοƒŽSX
References
Carter, C.K. & Kohn, R. (1994) On Gibbs sampling for state space models. Biometrika, 81,
541-553.
Frühwirth-Schnatter, S. (1994) Data Augmentation and Dynamic Linear Models. Journal of
Time Series Analysis, 15, 183-202.
3
114
115
116
117
118
119
120
121
122
123
124
125
126
127
Comment on the Fortran codes
In our Fortran codes we used the commercial IMSL Libraries provided by Rogue Wave
Software. We are aware that this fact might cause difficulties in making our codes publicly
available. We apologies for that. The IMSL Libraries are called for generating random
numbers and manipulating matrices only. These functions can also be obtained from freely
available routines such as RanLib (for random numbers) and LAPACK (matrices).
Unfortunately, the replacement of all calls to IMSL with calls to freely available libraries is
not straightforward and need a massive re-writing of the code. We will do that in our next
work about the multivariate extension of the MSARMs, which is going to start very soon.
The final intention of our new work is to build our Fortran codes into an R package for
Multivariate MSARMs, which also enables use of special cases, such as univariate MSRAMs
and uni/multivariate (Normal) HMMs.
4
128
129
130
131
Table S1 – Posterior means of the parameters of the MSARMs applied to the four different
individuals. Into brackets the numerical standard errors (the non-overlapping first, and the
overlapping second)
g11,1
g11,2
g12,1
g12,2
2
g1,1
2
g1,2
g 22,1
g 22,2
3
g1,1
3
g1,2
g 32,1
g 32,2
4
g1,1
4
g1,2
g 42,1
g 42,2
5
g1,1
5
g1,2
g 52,1
g 52,2
6
g1,1
6
g1,2
A
(m=2;p=3)
0.967
(0.000, 0.000)
0.033
(0.000, 0.000)
0.095
(0.000, 0.000)
0.905
(0.000, 0.000)
0.966
(0.000, 0.000)
0.034
(0.000, 0.000)
0.079
(0.000, 0.000)
0.921
(0.000, 0.000)
0.977
(0.000, 0.000)
0.023
(0.000, 0.000)
0.074
(0.000, 0.000)
0.926
(0.000, 0.000)
0.976
(0.000, 0.000)
0.024
(0.000, 0.000)
0.075
(0.000, 0.000)
0.925
(0.000, 0.000)
B
(m=2;p=4)
0.952
(0.000, 0.000)
0.048
(0.000, 0.000)
0.061
(0.000, 0.000)
0.939
(0.000, 0.000)
0.960
(0.000, 0.000)
0.040
(0.000, 0.000)
0.056
(0.000, 0.000)
0.944
(0.000, 0.000)
0.944
(0.000, 0.000)
0.056
(0.000, 0.000)
0.042
(0.000, 0.000)
0.958
(0.000, 0.000)
0.957
(0.000, 0.000)
0.043
(0.000, 0.000)
0.057
(0.000, 0.000)
0.943
(0.000, 0.000)
C
(m=2;p=4)
0.940
(0.000, 0.000)
0.060
(0.000, 0.000)
0.031
(0.000, 0.000)
0.969
(0.000, 0.000)
0.943
(0.000, 0.000)
0.057
(0.000, 0.000)
0.047
(0.000, 0.000)
0.953
(0.000, 0.000)
0.925
(0.000, 0.000)
0.075
(0.000, 0.000)
0.043
(0.000, 0.000)
0.957
(0.000, 0.000)
0.939
(0.000, 0.000)
0.061
(0.000, 0.000)
0.035
(0.000, 0.000)
0.965
(0.000, 0.000)
0.941
(0.000, 0.000)
0.059
(0.000, 0.000)
0.045
(0.000, 0.000)
0.955
(0.000, 0.000)
0.943
(0.000, 0.000)
0.057
(0.000, 0.000)
D
(m=2;p=3)
0.906
(0.000, 0.000)
0.094
(0.000, 0.000)
0.095
(0.000, 0.000)
0.905
(0.000, 0.000)
NA
NA
NA
NA
0.836
(0.000, 0.000)
0.164
(0.000, 0.000)
0.178
(0.000, 0.000)
0.822
(0.000, 0.000)
0.846
(0.000, 0.000)
0.154
(0.000, 0.000)
0.133
(0.000, 0.000)
0.867
(0.000, 0.000)
0.734
(0.000, 0.000)
0.266
(0.000, 0.000)
0.268
(0.001, 0.001)
0.732
(0.001, 0.001)
NA
NA
5
g 62,1
g 62,2
7
g1,1
7
g1,2
g 72,1
g 72,2
8
g1,1
8
g1,2
g 82,1
g 82,2
9
g1,1
9
g1,2
g 92,1
g 92,2
g10
1,1
g10
1,2
g10
2,1
10
g 2,2
g11
1,1
g11
1,2
g11
2,1
g11
2,2
g12
1,1
g12
1,2
g12
2,1
g12
2,2
g13
1,1
0.036
(0.000, 0.000)
0.964
(0.000, 0.000)
0.924
(0.000, 0.000)
0.076
(0.000, 0.000)
0.035
(0.000, 0.000)
0.965
(0.000, 0.000)
0.934
(0.000, 0.000)
0.066
(0.000, 0.000)
0.042
(0.000, 0.000)
0.958
(0.000, 0.000)
NA
NA
0.810
(0.000, 0.000)
0.190
(0.000, 0.000)
0.110
(0.000, 0.000)
0.890
(0.000, 0.000)
0.767
(0.000, 0.000)
0.233
(0.000, 0.000)
0.228
(0.000, 0.000)
0.772
(0.000, 0.000)
0.823
(0.001, 0.001)
0.177
(0.001, 0.001)
0.076
(0.000, 0.000)
0.924
(0.000, 0.000)
NA
NA
NA
NA
0.950
(0.000, 0.000)
0.050
(0.000, 0.000)
0.061
(0.000, 0.000)
0.939
(0.000, 0.000)
0.900
(0.000, 0.000)
0.100
(0.000, 0.000)
0.640
(0.001, 0.001)
0.360
(0.001, 0.001)
0.939
(0.000, 0.000)
6
g13
1,2
g13
2,1
g13
2,2
g14
1,1
g14
1,2
g14
2,1
14
g 2,2
g15
1,1
g15
1,2
g15
2,1
g15
2,2
g16
1,1
g16
1,2
g16
2,1
g16
2,2
ΞΌ1
ΞΌ2
12
22
Ο†1,1
Ο†2,1
Ο†3,1
0.013
(0.000, 0.000)
-0.253
(0.001, 0.001)
0.071
(0.000, 0.000)
3.700
(0.003, 0.003)
1.097
(0.001, 0.001)
-0.010
(0.001, 0.001)
-0.087
(0.000, 0.000)
Ο†4,1
Ο†1,2
Ο†2,2
Ο†3,2
Ο†4,2
1.369
(0.000, 0.000)
0.323
(0.000, 0.000)
-0.047
(0.000, 0.000)
-0.007
(0.000, 0.000)
-0.341
(0.000, 0.000)
0.082
(0.000, 0.000)
15.521
(0.000, 0.000)
1.014
(0.000, 0.000)
-0.092
(0.000, 0.000)
-0.052
(0.000, 0.000)
-0.007
(0.000, 0.000)
1.309
(0.000, 0.000)
-0.374
(0.000, 0.000)
0.006
(0.000, 0.000)
-0.002
(0.000, 0.000)
0.048
(0.000, 0.000)
-0.948
(0.002, 0.002)
0.759
(0.002, 0.002)
21.223
(0.004, 0.004)
1.150
(0.000, 0.000)
-0.133
(0.000, 0.000)
-0.007
(0.000, 0.000)
-0.016
(0.000, 0.000)
1.223
(0.000, 0.000)
-0.257
(0.000, 0.000)
0.051
(0.000, 0.000)
-0.025
(0.000, 0.000)
0.161
(0.000, 0.000)
0.248
(0.000, 0.000)
0.752
(0.000, 0.000)
NA
NA
NA
NA
0.859
(0.000, 0.000)
0.141
(0.000, 0.000)
0.261
(0.000, 0.000)
0.739
(0.000, 0.000)
0.923
(0.000, 0.000)
0.077
(0.000, 0.000)
0.685
(0.001, 0.001)
0.315
(0.000, 0.001)
-0.006
(0.000, 0.000)
-0.439
(0.003, 0.004)
0.026
(0.000, 0.000)
7.407
(0.008, 0.009)
0.991
(0.000, 0.000)
0.014
(0.000, 0.000)
-0.058
(0.000, 0.000)
1.380
(0.000, 0.000)
-0.316
(0.000, 0.000)
-0.069
(0.000, 0.000)
7
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
Comments on Table S1
Estimates of the parameters are reproduced in Table S2, along with the corresponding
simulation standard errors, computed through non-overlapping (Green, 2000) and
overlapping (Chen et al., 2000, pp. 71–75) batch statistic methods.
We looked at the various transition matrices of the non-homogeneous hidden Markov chain
and analysed each of them as it was the transition matrix of a homogeneous Markov chain.
After this assumption, we looked at the mean time spent in each state of the Markov chain
upon each return to it, which has a geometric distribution of mean 1⁄(1 βˆ’ g hi,i ), for any
i=1,…,m, and any h=1,…,H. These mean times were computed along with their
corresponding simulation standard errors, by using the MCMC sample. From the diagonal
entries of any transition matrices, it was possible to compute the mean time of persistence in
a state and see how it varied according to the dynamics of the selected covariates.
Explanatory variable L4 was the best to explain the switching between the hidden states of
skate A. The mean number of minutes spent in state 1 were about 60 (non-overlapping
standard error of the dwell time was 0.049, while its overlapping standard error was 0.071),
58 (0.054, 0.098), 89 (0.074, 0.096), 85 (0.081, 0.091), whereas in state 2 were about 21
(0.006, 0.014), 25 (0.010, 0.025), 27 (0.007, 0.011), 26 (0.010, 0.014), when the lunar phase
were new moon, first quarter, full moon, last quarter, respectively. Explanatory variable L4
was also the best to explain the switching between the hidden states of skate B: the mean
number of minutes spent in state 1 were about 42 (0.031, 0.070), 51 (0.014, 0.055), 36 (0.041,
0.085), 48 (0.016, 0.059), whereas in state 2 were about 33 (0.023, 0.060), 36 (0.017, 0.081),
48 (0.032, 0.079), 35 (0.012, 0.058), when the lunar phase is new moon, first quarter, full
moon, last quarter, respectively. Lunar type covariates L4 and L2 were selected jointly to best
explain the switching between the hidden states of skate C. The eight categories have been
defined by the formula l4+l2*4. For each of the eight categories, the mean number of minutes
spent in state 1 were about 34 (0.024, 0.056), 36 (0.033, 0.059), 27 (0.018, 0.045), 33 (0.023,
0.055), 34 (0.016, 0.056), 36 (0.020, 0.059), 26 (0.022, 0.044), 30 (0.026, 0.051); whereas in
state 2 were about 66 (0.045, 0.051), 43 (0.043, 0.044), 47 (0.031, 0.057), 58 (0.032, 0.040),
44 (0.018, 0.024), 57 (0.026, 0.039), 58 (0.051, 0.064), 48 (0.033, 0.049). All explanatory
variables were selected to explain the switching between the hidden states of skate D. The 12
categories have been defined by the formula l4+l2*3+d*6 Note that no data have been
recorded when the moon was first quarter (i.e., L4=2). For each of the 12 categories, the mean
number of minutes spent in state 1 were about 22 (0.022, 0.036), 12 (0.030, 0.044), 13 (0.056,
0.093), 8 (0.093, 0.164), 10 (0.022, 0.032), 8 (0.011, 0.014), 11 (0.048, 0.052), 40 (0.424,
0.439), 20 (0.086, 0.091), 22 (0.052, 0.055), 14 (0.052, 0.053), 26 (0.139, 0.167); whereas in
state 2 were about 21 (0.021, 0.035), 11 (0.082, 0.135), 15 (0.008, 0.023), 8 (0.039, 0.061),
18 (0.023, 0.040), 9 (0.007, 0.010), 27 (0.108, 0.155), 33 (0.003, 0.005), 3 (0.017, 0.021), 8
(0.229, 0.366), 8 (0.014, 0.017), 3 (0.002, 0.004). It is evident that all skates produced
transition matrices that are different according to the categories obtained by the combination
of the covariates. This fact is also made clearer by the mean number of minutes spent in each
state for any combination of the covariates. This result confirms that is worth to model the
hidden Markov chain as a non-homogeneous process.
References
Chen, M.-H., Shao, Q.-M., Ibrahim, J.G. (2000). Monte Carlo Methods in Bayesian
Computation. Springer, New York.
8
182
183
184
185
186
Green, P.G. (2000). A primer on Markov chain Monte Carlo. In: Barndorff-Nielsen, O.E.,
Cox, D.R., Klüpperberg, C. (Eds.), Complex Stochastic Systems. Chapman &
Hall/CRC, Boca Raton, pp. 1–62.
9
187
188
189
Table S2 – Number of observations classified into the two states, also given the value
assumed by each of the three covariates separately.
All sequence
State 1
State 2
A
189,612
64,501
B
68,337
59,019
C
40,277
67,610
D
6,254
5,147
Diel cycle
State 1; D=0
State 2; D=0
81,916
41,109
41,870
36,466
21,816
45,812
3,325
4,047
State 1; D=1
State 2; D=1
107,696
23,392
26,467
22,553
18,461
21,798
2,929
1,100
Lunar phase
State 1; L4=1 47,292
State 2; L4=1 16,082
18,331
14,575
10,688
17,277
990
834
State 1; L4=2 44,244
State 2; L4=2 18,800
19,642
13,818
11,748
17,199
NA
NA
State 1; L4=3 47,808
State 2; L4=3 14,085
13,044
17,790
8,537
17,191
2,172
2,438
State 1; L4=4 50,268
State 2; L4=4 15,534
17,320
12,836
9,304
15,943
3,092
1,875
Lunar cycle
State 1; L2=0 90,298
State 2; L2=0 35,217
34,375
33,962
19,379
20,898
2,837
3,417
State 1; L2=1 99,314
State 2; L2=1 29,284
29,211
29,808
32,616
34,994
2,680
2,467
190
191
10
192
193
194
Figure S1 – The autocorrelation functions of the depth profiles of each individual (A, B, C,
D).
195
196
11
197
198
199
Figure S2 – The autocorrelation functions of the residuals of the MSARMs applied to the
depth profiles of each individual (A, B, C, D).
200
12
201
202
Figure S3 - Actual (black dots) and fitted (red solid line) values of the whole series, for individual A. Time axis units are numbers of two-minute
intervals.
203
204
205
206
207
13
208
209
Figure S4 - Actual (black dots) and fitted (red solid line) values of the whole series, for individual B. Time axis units are numbers of two-minute
intervals.
210
211
212
213
14
214
215
Figure S5 - Actual (black dots) and fitted (red solid line) values of the whole series, for individual C. Time axis units are numbers of two-minute
intervals.
216
217
218
219
15
220
221
Figure S6 - Actual (black dots) and fitted (red solid line) values of the whole series, for individual D. Time axis units are numbers of two-minute
intervals.
222
223
224
225
226
227
16
228
229
Figure S7 - Observations associated with state 1 (top) and 2 (bottom), for individual A. Time axis units are numbers of two-minute intervals.
230
231
232
233
234
235
17
236
237
Figure S8 - Observations associated with state 1 (top) and 2 (bottom), for individual B. Time axis units are numbers of two-minute intervals.
238
239
240
241
242
243
18
244
245
Figure S9 - Observations associated with state 1 (top) and 2 (bottom), for individual C. Time axis units are numbers of two-minute intervals.
246
247
248
249
250
251
19
252
253
Figure S10 - Observations associated with state 1 (top) and 2 (bottom), for individual D. Time axis units are numbers of two-minute intervals.
254
255
20
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
Comparison of Markov switching autoregressive models (MSARMs) with hidden
Markov models (HMMs)
Normal HMMs form a special case of MSARMs, obtained by placing the autoregressive
order equal to zero. Both HMMs and MSARMs can handle non-linear, non-Normal, nonstationary time series with a long memory, but HMMs have some limitations due to their
simpler structure (no autoregressive component). HMMs can be used for classification
purposes only, whereas MSARMs can produce the fitting of the data, too.
Under the computational side, HMMs are not easier to implement than MSARMs. If we can
simulate from the HMMs by Gibbs sampling, it is straightforward to implement the Gibbs
sampling for MSARMs. A further parameter should be generated (the matrix of the
autoregressive coefficients), so we just need to add a full conditional to obtain the
autoregressive parameter (from multivariate Normals).
As we said in the paper, HMMs are used frequently in ecology, but we are not aware of any
applications of MSARMs. This is quite surprising, because with a modest effort in coding we
can obtain a precise fitting of the actual data. This cannot be obtained by HMMs, as it is
shown in Figures S11, S12, and S13 of this auxiliary material. In fact, we re-analysed one the
four series we have (skate B, six months long) by two HMMs, one with two hidden states and
the other with six. Because, Normal HMMs are signal-plus-noise-models, the fitted values
were estimated by values close to the state-dependent means, so they could not capture all
spikes we have in the series, even when the number of hidden states is very large.
As reported in the paper, the model performance was assessed by the Root Mean Square
Error (RMSE) and the Mean Absolute Error (MAE). When the MSARM was applied to skate
B depth profile, the RMSE was 2.683 (1.2% of the data range), whereas the MAE was 1.296
(0.6%). The HMM with 2 hidden states resulted in a RMSE of 24.21 (10.7%) and in a MAE
of 20.38 (9.1%), whereas that with six hidden states gave a RMSE of 8.33 (3.7%), and a
MAE of 6.72 (3.0%).
Further, the observations were classified into states according to the values of the statesdependent means (Figure S14 and S15). So, the states represent different mean levels of
depth at which the skate is, instead of proper behaviours, as those obtained by MSARMs,
where the observations were grouped into states according to their variability.
Finally, we produced plots of the ACF of the residuals of the HMMs (Figure S16), and it is
evident the residuals are not uncorrelated, with a slow decay at the higher lags.
We hope our comments along with the results and Figures S11-S16 may show the
improvements we get when MSARMs are applied.
21
297
298
Figure S11 - Actual (black dots) and fitted (red solid line) values of the whole series, for individual B, when the hidden Markov model with two
hidden states (i.e., MSARM with m=2 and p=0) is applied. Time axis units are numbers of two-minute intervals.
299
300
22
301
302
Figure S12 - Actual (black dots) and fitted (red solid line) values of the whole series, for individual B, when the hidden Markov model with six
hidden states (i.e., MSARM with m=6 and p=0) is applied. Time axis units are numbers of two-minute intervals.
303
304
23
305
306
307
308
Figure S13 - Actual (dots) and fitted (solid line) values of subseries of 500 points, for
individual B, when the hidden Markov model with six hidden states (i.e., MSARM with m=6
and p=0) is applied. Time axis units are numbers of two-minute intervals.
0
100
200
300
400
500
0
100
200
300
400
500
Skate B - subseries[60501:61000]
Skate B - subseries[106501:107000]
-97
-250
-250
-174
Meters
-97
0
Time
0
Time
-174
Meters
-97
-250
-174
Meters
-97
-250
-174
Meters
0
Skate B - subseries[35001:35500]
0
Skate B - subseries[1001:1500]
0
100
200
300
Time
400
500
0
100
200
300
400
500
Time
309
24
310
311
312
Figure S14 - Observations of individual B associated with the different states, when the
hidden Markov model with two hidden states (i.e., MSARM with m=2 and p=0) is applied.
Time axis units are numbers of two-minute intervals.
-150
-250
Meters
-50 0
Observations in state 2
0
50000
1e+05
107891
1e+05
107891
Time
-150
-250
Meters
-50 0
Observations in state 1
0
313
50000
Time
314
25
315
316
317
Figure S15 – Observations of individual B associated with the different states, when the
hidden Markov model with six hidden states (i.e., MSARM with m=6 and p=0) is applied.
Time axis units are numbers of two-minute intervals.
318
-50
1e+05
0
Observations in state 4
Observations in state 3
1e+05
-50
-100
Meters
-250
-250
-200
-150
-150
-100
-50
0
Time
-200
50000
1e+05
0
50000
Time
Observations in state 2
Observations in state 1
1e+05
-50
-100
-200
-250
-250
-200
-150
Meters
-100
-50
0
Time
-150
Meters
50000
Time
0
0
Meters
-100
Meters
-200
-250
50000
0
0
0
319
-150
-100
-150
-250
-200
Meters
-50
0
Observations in state 5
0
Observations in state 6
50000
Time
1e+05
0
50000
1e+05
Time
320
26
321
322
323
Figure S16 – The autocorrelation functions of the residuals of hidden Markov model with
two (left) and six (right) hidden states (i.e., MSARM with m=2,6 and p=0) applied to the
depth profile of individual B.
0.8
0.6
0.0
0.2
0.4
ACF
0.0
0.2
0.4
ACF
0.6
0.8
1.0
six hidden states
1.0
two hidden states
0
324
10
20
30
Lag
40
50
0
10
20
30
40
50
Lag
27