APPENDIX A PROOF OF LEMMA 6 Proof: We only prove the first

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX
15
A PPENDIX A
P ROOF OF L EMMA 6
where the fourth equality uses the definition of I ∗ . So
the proof is complete.
Proof: We only prove the first equation. Let D =
{t : x(t) ≥ 0} be the set of timestamps at which
observations
are available. Then, it is easy to see that
P
|Si+ | = t∈Si T D x0 (t). Now, let us partition Si into
A PPENDIX C
D ISCUSSION ON THE P ERFORMANCE
F OURIER T RANSFORM
Si,j = {i + jT, i + (j + T0 )T, i + (j + 2T0 )T, . . .},
OF
Fourier transform has long been regarded as one of
where j = {0, . . . , T0 − 1}. Since each subsequence the standard tools for periodicity analysis. Therefore,
{x0 (t) : t ∈ Si,j } is the realization of a single mixing some readers may find it rather surprising that it
actually performs much worse than our method, esprocess, we have w.p. 1 that
pecially with randomly generated periodic behaviors.
PT0 −1 P
T
T
D| Also, as shown in Figure 12(a) and 12(b), Fourier
|Si+ |
j=0
t∈Si,j D x0 (t) |Si
T
= lim
·
lim
transform is not as robust as our method w.r.t missing
n→∞
n→∞ n
|Si D|
n
PT0 −1 T0
observations. To provide some further understandings
ρf · j=0 pFT (i+j×T )
0
of these important issues, next we give a brief review
=
,
T0 T
of Fourier transform.
T
ρf
|Si D|
The normalized discrete Fourier transform (DFT)
= T for the last
where we use limn→∞
n
of
a sequence {x(t)}n−1
t=0 is a sequence of complex
equality. Also, since the random process can be denumbers
X(f
):
composed into T0 mixing processes, we have w.p. 1
ρ PT0 −1 T0
n−1
pk . Therefore,
that limn→∞ |S + |/n = Tf0 k=0
j2πkt
1 X
P
√
x(t)e− n , k = 0, . . . , n − 1, (18)
X(f
)
=
k/n
+
+
n t=0
|SI |/n
i∈I |Si |/n
= lim
lim µ+ (I, T ) = lim
n→∞ X
n→∞ |S + |/n
n→∞
|S + |/n


where the subscript k/n denotes the frequency (norT
0 −1 p 0
X 1 TX
malized to [0, 1]) that each coefficient captures. Since
FT0 (i+j×T )

=
PT0 −1 T0  .
we are dealing with real signals, the Fourier coeffiT j=0
k=0 pk
i∈I
cients are symmetric around 0.5. Most importantly, the
Fourier transform aims to represent the original signal
as a linear combination of the complex sinusoids,
A PPENDIX B
given by the inverse Fourier transform:
P ROOF OF T HEOREM 3
n−1
T
j2πkt
1 X
pi 0
1
√
x(t)
=
X(fk/n )e n , t = 0, . . . , n − 1. (19)
,
it
is
easy
to
Proof: Define c+
=
−
P
T
−1
T
i
0
0
T0
p
n
k=0
k
k=0
+
see that the value limn→∞ γX
(T0 ) is achieved by I ∗ =
+
{i ∈ [0, T0 − 1] : ci > 0}. So it suffices to show that To discover potential periodicities in the input sequence X , one needs to examine its power spectrum.
for any T ∈ Z and I ∈ IT ,
X
Mathematically, this is given by the peroidogram P ,
lim ∆+ (I, T ) ≤ lim ∆+
(I ∗ , T0 ) =
c+
i .
whose values are the squared length of the Fourier
n→∞ X
n→∞ X
i∈I ∗
coefficients:
Meanwhile, from Lemma 6, we have
n−1


e (20)
P (fk/n ) = kX(fk/n )k2 , k = 0, 1, . . . , d
!
T0
T
−1
0


2
1 X X pFT0 (i+j×T )
+
lim ∆ (I, T ) =
PT0 −1 T0 − 1
n→∞ X

 Then, the dominant period of X is obtained assuming
T
k=0 pk
j=0
i∈I
that it corresponds to the frequency at which the peri!
T0
TX
0 −1
p
odogram achieves its highest value.
X
1
1
FT0 (i+j×T )
=
PT0 −1 T0 −
In Figure 18, we show the periodograms of three
T
T0
k=0 pk
i∈I j=0
synthetic sequences, all generated with T = 24, T N =
T0 −1
1000, η = 1, α = 1 and β = 0. As one can see, when
1XX
=
c+
the periodic behavior is regular (Figure 18, first row),
F
(i+j×T
)
T0
T
i∈I j=0
the dominant frequency does correspond to the actual
TX
0 −1
X
period, suggesting that the time-series can be well1
max(c+
, 0)
≤
approximated by a sinusoid with period 24. However,
F
(i+j×T
)
T
0
T
i∈I j=0
this is not true when the periodic behavior is highly
T0X
T −1
irregular (Figure 18, third row). in which case the
1
≤
max(c+
, 0)
F
(j)
periodogram is dominated by higher frequencies. This
T
0
T j=0
explains why Fourier transform performs miserably
X
X
1
+
+
with randomly generate periodic behaviors. To the
=
×T
ci =
ci ,
T
contrary, our method does not make any assumption
∗
∗
i∈I
i∈I
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XX XXXX
1
2500
T=24
2000
1500
0
1000
500
−1
0
4
8
12
16
20
24
1
0
0
0.1
0.2
0.3
0.4
0.5
0.2
0.3
0.4
0.5
0.2
0.3
0.4
0.5
800
T=24
600
0
400
200
−1
0
4
8
12
16
20
24
0
0
0.1
800
1
600
T=24
400
0
200
−1
0
4
8
12
16
20
24
(a) x(1 : T )
0
0
0.1
(b) Periodogram
Fig. 18. Sequences with the same period (T = 24) may
have very different power spectrum. The dominant frequency does not necessarily indicate the true period.
on the periodic behaviors, and is guaranteed to work
with any sequence as long as it is generated by some
periodically mixing process.
In addition, in Figure 19 we show the periodogram
of two sequences, which are generated by sampling
the sequence shown in the second row of Figure 18
at sampling rate η = 0.1 and η = 0.01, respectively.
As one can see, the dominate frequency of the periodogram no longer corresponds to the true period
in these cases. This example illustrates that Fourier
transform may be sensitive to missing observations.
0.1
8
T=24
0.08
6
T=24
0.06
4
0.04
2
0
0
0.02
0.1
0.2
0.3
0.4
0.5
(a) Periodogram, η = 0.1
0
0
0.1
0.2
0.3
0.4
0.5
(b) Periodogram, η = 0.01
Fig. 19. Effect of missing observations on FFT.
16