manual - EpiSouth

Introduction to Time Series Analysis
Madrid, Spain
10-14 September 2007
Case study: Exploring sine curves and Fourier transform
Objectives: at the end of the case-study, the participant should
• Understand and manipulate sine curves
• Manipulate Fourier Transform and build a periodogramme
Denis Coulombier, Fernando Simón, Bruno Coignard
Sine curves and Fourier transform case-study
Page 2
Presentation
This case study includes 4 parts. Each part requires the participant to perform actions in the spreadsheet.
•
•
•
•
Part 1: definition and principle of trigonometry
Part 2: decomposition of a signal
Part 3: manual calculation of a single Fourier coefficient
Part 4: computation of all Fourier coefficients using Excel
Programs needed on the computer:
• Microsoft Excel
Example file used by the case study:
• SINE.XLS
Text style used in the case-study
Commands to type in the computer. The text between ' and ' is the text you
actually need to type
Additional information about the programs
Reference to cells in Excel uses the following syntax:
A1 refers to the first cell of the first row.
When a formula is copied over a range, it is important to keep in mind that Excel considers cells entered in this format
as relative references. If the formula:
=A1 + B1
is copied over a range, the output cells will have their references incremented:
=A2 + B2
=A3 + B3
and so on.
Using $ sign in front of one of the coordinates makes it an absolute reference. If the formula:
=$A$1 + $B1
is copied over, it gives:
=$A$1 + $B2
=$A$1 + $B3
A cell can be named rather than expressed as coordinates. To name a cell, place the cursor in the cell, click on the left
cell of the line immediately above the header line and enter the desired name. Subsequently, you can use this name to
refer to this cell. This allows for more meaningful names of cells in formulas. In the same way, a range can be named.
Select the range to name (it appears against a black background) and indicate the appropriate name in the left cell above
the top row. In this case-study, most cells have been renamed.
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 3
All formulas and labels have been already entered in the spreadsheet. Results indicated when loading the spreadsheet
may be erroneous at this stage because some of the referenced cells are empty. Follow instructions to fill these cells in
order to get proper values.
The protection of the spreadsheet is activated in order to avoid erasing a cell. However the protection of cells in which
you need to enter, change the value, or copy cells has been disabled.
This case-study uses Excel version 5 or later. Excel uses different names for formulas in the various European
languages. This case-study uses English denomination. If you are using non-English version of Excel, you should use
the following table to adapt formulas, after having removed the workbook protection.
Lexicon of formulas in European languages
Formula
French
P
PI()
Sum of a range
SOMME()
Complex modulus COMPLEXE.MODULUS()
Variance
VAR()
Covariance
COVARIANCE()
Standard deviation STD()
Mean
MOYENNE()
English
PI()
SUM()
IMABS()
VAR()
COVAR()
STDEV()
AVERAGE()
German
PI()
SUMME()
IMABS()
VARIANZ()
KOVAR()
MITELABW()
MITTELWERT()
Italian
PI.GRECO()
SOMMA()
COMP.MODULO()
VAR()
COVARIANZA()
DEV.ST()
MEDIA()
This case-study uses some Excel add-ins. You need to check that these add-ins have been activated before proceeding.
In the ‘Tools’ menu of the main menu bar, you should see options for ‘Solver’ and ‘Analysis tool pack’. If not,
Click on ‘add-ins’, and activate the ‘Solver’ and the ‘Analysis tool pack’
If you do not see the analysis tool pack option, it means that you did not carry out a complete installation of Excel.
When a complete installation is carried out, you should see a ‘Analysis’ directory under ‘EXCEL\LIBRARY’ on your
hard disk. If not, reinstall Excel.
Load the SINE.XLS spreadsheet in Excel
Intro: setting-up the spreadsheet
Adjust screen display button
The button “Adjust screen display” will optimise the layout of the various spreadsheets for the resolution of your
screen.
Click on the “Adjust screen display” button
Reset spreadsheet
The “reset spreadsheet button“ copies default values in all the necessary fields. It erases all activities previously carried
out by the participant. This button is only activated when the protection has been removed. It is not necessary to
execute this procedure upon loading the spreadsheet for the first time.
Remove protection/Protect document
This button removes the protection of the workbook, and allows you to change formulas and values in all the fields.
This should be done with caution since the spreadsheet may not work properly if content of cells are altered. It is
advised to make a copy of the spreadsheet if you intend to modify its content.
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 4
Part 1: definitions and principles
Activate “Cosine” spreadsheet.
Introduction
Spectrum analysis is concerned with the exploration of cyclical patterns of data. The purpose of the analysis is to
decompose a complex time series with cyclical components into a few underlying sinusoidal (sine and cosine) functions
of particular wavelengths. The term "spectrum" provides an appropriate metaphor for the nature of this analysis:
Suppose you study a beam of white sun light, which at first looks like a random (white noise) accumulation of light of
different wavelengths. However, when put through a prism, we can separate the different wavelengths or cyclical
components that make up white sun light. In fact, via this technique we can now identify and distinguish between
different sources of light. Thus, by identifying the important underlying cyclical components, we have learned
something about the phenomenon of interest. In essence, performing spectrum analysis on a time series is like putting
the series through a prism in order to identify the wave lengths and importance of underlying cyclical components. As
a result of a successful analysis one might uncover just a few recurring cycles of different lengths in the time series of
interest, which at first looked more or less like random noise.
Spectrum analysis has become one of the most used mathematical tools with increasing computer power. The necessity
for powerful tools to analyse signals is crucial in telecommunication, image processing, weather forecasting, and much
more. Number of cases over time of diseases can benefit from these techniques tremendously.
Definition
Wavelength or Period is the number of time units in a sinusoidal cycle between 2 crests.
Frequency: term used to denote the number of times that any regularly recurring phenomenon occurs in a given time
interval. In wave motion of all kinds, the frequency of the wave is usually given in terms of the number of wave crests
that pass a given point in a given period, usually seconds. In our example we will express it over the entire study period
of 256 weeks.
Trigonometry
The development of trigonometry was triggered by the need to perform angular calculations for astronomical
navigation and measurement of the height of buildings.
On a circle centred on 0 having a radius of 1, an angle θ determines several ratios:
Sinus
Cosine is one of the basic trigonometry ratios. In the right triangle BOC, the
cosine of θ, written Cos θ, is defined as the ratio of the side of the angle over
the hypotenuse (which is 1, being equal to the radius). It corresponds to the
orthogonal projection of C on the horizontal axis OB.
C
A
θ
O
B
Cosinus
Sine is another of the basic trigonometry ratio. In the right triangle BOC, the
sine of θ, written Sin θ, is defined as the ratio of the side of the triangle
opposite to θ over the hypotenuse (which is 1, being equal to the radius). It
corresponds to the orthogonal projection of C on the vertical axis OA.
When C goes around the circle, the sine varies from 0 to +1 (for a 90º angle),
back to 0 (180º), decreases to -1 (270º) and returns to 0 (360º or 0º). The cosine takes successively the values of +1, 0, 1, 1.
We will use these cyclical values of the sine and cosine to model cyclical time series.
Spectral distribution function
The natural function to model a periodic component in a time series is:
Yt = R cos(ωt + θ)
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 5
Where:
ω is the frequency of the periodic variation = 2π/period
R is the amplitude of the variation
θ is the phase
Figure 1.1. Cosine curve to represent one periodic component in a time series
Cosine Curve: Y = R cos( ωt + θ )
Period = 2π/ω
Phase θ
1
Y
0
64
128
Amplitude R
0
-1
t in weeks
In 1807, Fourier shows that any periodic signal can be decomposed in series of sinusoidal curves such as the one above,
corresponding to each possible frequency in the signal. The signal in figure 1.2. is the sum of the 4 sinusoidal curves in
figure 1.3. plus a constant term.
Fourier, Jean Baptiste Joseph, Baron (1768-1830), French mathematician, born in Auxerre, and educated at the
monastery of Saint-Benoît-sur-Loire. He taught (1795) at the École Normale, where he had been a student, and at the
École Polytechnique in Paris from 1795 to 1798 when he joined the campaign of Napoleon I in Egypt. After returning
to France in 1802 he published important material on Egyptian antiquities and was, until 1815, prefect of Isère
Department. He was created baron by Napoleon in 1808. In 1816 he was elected to the Academy of Sciences and in
1827 to the French Academy. His fame rests on his work in mathematics and mathematical physics. In his treatise The
Analytical Theory of Heat (1822; trans. 1878), he employed a trigonometric series, usually called the Fourier series, by
means of which discontinuous functions can be expressed as the sum of an infinite series of sines and cosines.
"Fourier, Jean Baptiste Joseph, Baron," Microsoft (R) Encarta. Copyright (c) 1994 Microsoft Corporation. Copyright
(c) 1994 Funk & Wagnall's Corporation.
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 6
Example
Discrete time series, aggregated over constant periods of time (days, week, month …).
Figure 1.2. Random time series or structured data?
Surveillance signal ?
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
0
52
104
Figure 1.3. Sinusoidal decomposition of figure 1.2
Sum of 4 sinusoidal curves
In 128 data points we can fit 64 sinusoidal curves. The first sine curve (curve 1 of figure 1.4.) has one oscillation over
the 128 weeks. It crosses the axis at 64 weeks. The second one has 2 oscillations. The period is 64 weeks, and the
frequency is 2. In 128 data points, there are 64 sine curves that would fit. More generally, there are n/2 frequencies in a
signal of n data points thus there are n/2 sinusoidal curves in a Fourier series.
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 7
Figure 1.4. Sinusoidal curves fitting a time series
First 3 sinusoidal curves fitting a 128 data point time series
# of oscillations
0
1
32
2
64
3
96
128
For each frequency, the curve has a certain amplitude and phase which can be calculated (figure 1.1).
The Fourier transform
Definition
The Fourier transform converts a signal expressed by a function of time in a function of frequency. The world of time
and the world of frequency are 2 completely separate worlds, both sides of a mirror, but in each world, the signal can be
expressed completely, and the Fourier transform allows switching world without loosing information. The Fourier
transform returns for each frequency 2 Fourier coefficients that enable us to get the corresponding amplitude and phase.
Since:
cos(a + b) = cos(a) cos(b) - sin(a) sin(b),
Yt = R cos(ω t + θ)
Sinus
R
where R is the amplitude and θ the phase can be expressed
as well under the form
Yt = R (cos ω t cos θ - sin ωt sin θ)
By grouping :
a = R cos θ
and,
b = -R sin θ
The expression becomes
Yt = a cos ω t+ b sin ωt
23/11/2007, 12:10 PM
θ
O
Cosinus
Sine curves and Fourier transform case-study
Page 8
Expression as complex numbers
In this form, a et and b hold information about the phase and amplitude. It is under this form that the Fourier transform
outputs results for each frequency. The output is usually expressed as a complex number that allows expressing both
information in a single notation. Complex numbers are of the form a + b*i, where a is called the real component of the
complex, b the imaginary component, and i is the imaginary number equal to the square root of -1.
Complex Number, in mathematics, is a number of the form a + bi, in which a and b are real numbers and
i = √(−1), that is, a root of the equation x2 + 1 = 0.
The product of a real number multiplied by itself is 0 or positive, so the equation x2 = -1 has no solutions in the real
number system. If such a solution is desired, new numbers must be invented. Let i be a new number representing a
solution of the preceding equation. All numbers of the form a + bi, in which a and b are real numbers, belong to the
complex number system. If b is not 0, the complex number is called an imaginary number; if b is not 0 but a is 0, the
complex number is called a pure imaginary number; if b is 0, the complex number is a real number. Imaginary
numbers (the term must not be used in a literal sense but in the technical sense just described) are extremely useful in
the theory of alternating currents and many other branches of physics and natural science.
In fact, as strange as it appears, complex numbers were invented to simplify calculations!!!
From each of the Fourier coefficient, a sinusoidal curve can be constructed, of the form:
Yt = a cos ω t+ b sin ωt
In a n data point signal, the Fourier transform returns n/2 complex numbers in the form a + bi.
The addition of all sine curves reproduces the original signal, without loosing any information. Signal processing such
as filtering sound waves or images uses intensively these techniques to get rid of high frequency « noise » by omitting
the highest frequency sine curves when reconstructing the signal. The « Dolby » processing used to increase purity of
sound is derived from such techniques.
The following formula has been entered in cell B3:
=AMPLITUDE*COS(2*PI*(A3+LAG)/PERIOD)
This is the expression of equation shown in figure 1.1 in Excel function
language.
The following default values have been entered:
Amplitude in cell E2 = 3
Period in cell E3 = 52
Phase (or Lag) in cell E4 = 0
You can now change these values using the spin buttons next to each field and
explore how it affects the display of the cosine curve.
38,88 weeks : 1,49 pi
0,12 radians :
13 weeks : 0,5 pi
Set the Phase to 13
The circle on the right exemplifies the link between a sine curve and the
trigonometric circle.
0
The yellow area represents the phase. A phase of 13 weeks means ¼ of the 52
week period. This is why it occupies ¼ of the circle.
The red area represents ω , which is the angle expressed in radian covered by a
time unit, 0.12 radians, which is 2* PI / 52 for 52 week periodicity.
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 9
Part 2: Decomposition of a signal in its sine curves
Activate “Decomposition” spreadsheet
This sheet explores the decomposition of a series into its sine curves. There are 8 data segments plotted on the top
graph. 4 sine curves can fit through these 8 data points. They have respectively:
o
o
o
o
1 oscillation, period = 8 (8/1)
2 oscillations, period = 4 (8/2)
3 oscillations, period = 2.66 (8/3)
4 oscillations, period = 2 (8/4)
The second graph represents the sum of the 4 sine curve + a constant, represented by sine curve 5, having a period of 1,
Modify the amplitude and the phase of the 5 sine curves to see how it affects the display.
For example, enter the following values:
Points
Frequency
Period
Amplitude
Phase
Sine 1
Sine 2
Sine 3
Sine 4
Sine 5
8.000
1.000
2.000
3.000
4.000
8.000
8.000
4.000
2.667
2.000
1.000
0.050
0.050
0.050
0.100
0.100
0.100
1.800
-1.300
-0.100
0.800
They correspond to the following sine curves:
0.1
0.1
0.1
0.1
0.0
0.0
0.0
0.0 0
0.0
-0.1
1
2
3
4
5
6
7
8
5
6
7
8
The sum of the sine curves is represented by the following graph:
Sum (Sine)
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0
1
2
3
4
Its equation is given by:
0.05*Cos(2PI*(TIME-0.1)/8.0 + 0.05*Cos(2PI*(TIME-1.8)/4 +
0.05*Cos(2PI*(TIME+1.3)/2.667 + 0.1*Cos(2PI*(TIME+0.1)/2 + 0.095
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 10
The test curve, at the bottom of the sheet, is a 8 data segment curve the value of which are defined in cells B23 to B30.
B31 is a copy of the value in B23 since the signal is considered as periodic.
Any curve going through these data points can be represented by the 4+1 sine curves. To test this,
Enter random values in the cells B23 to B30
in order to display a series to test.
To define the test curve, you can either enter manually the values in cells B23 to B30, or click twice on a data point on
the graph and drag it up or down to a new value.
Press on the "Find test curve" button
This macro uses the solver to find the best values of amplitude and phase of the 5 sine curves to fit the test curve.
Alternatively, you can run manually the solver by indicating cell D22 for the target cell and cells C18 to G18 and C20
to G20 for cells to change. Click on "Minimum" for the solver option.
Note that cells C17 to G17 contain formulas referring to C19 to G19, in order to get small increase of the amplitude
while using the spin button. Same remark applies to cells C19 to G19, referring to cells C20 to G20 for the phase.
Part3: Manual calculation of a single Fourier coefficient in Excel
In part 2, we will calculate Fourier coefficient for past 5 year food-borne disease cases in France using Microsoft Excel.
Load the file SINE.XLS in Excel.
Activate “Fourier” spreadsheet.
Here is the structure of the spreadsheet. Time intervals appear in cells $A$3 to $A$258, and corresponding number of
cases in cells B3 to B258.
A
B
C
D
E
F
1
Parameters
2 Time Data Cos 0 Product0
1
3
Phase
14
0,99
13,898
2
4
Period
13
3
5
Frequency
20
4
6
Area
9
5
7
4
23/11/2007, 12:10 PM
G
0,00
52
4,92
13,90
Sine curves and Fourier transform case-study
Page 11
Figure 2.1. Food-borne disease cases in France, 1992-1996
Signal + cosine curve
30
25
Cases
20
15
10
5
0
1
53
105
157
209
Week
For simplification of calculation, the time series includes 256 data points. Fourier coefficient calculation is easier if the
time series data points are a power of 2. Calculation of a single Fourier coefficient for a given period requires 4 steps:
•
•
•
•
Drawing a sinusoidal curve of the period to test,
Multiplication of the signal by the sine curve,
Getting the prime (the area under the curve) of the resulting curve,
Adjusting the phase for maximization of the prime
The resulting area represents the "magnitude" of the coefficient or the 'contribution' of this sine curve. It is called the
"energy" for this frequency or period. In other words, we will test all frequencies, and evaluate their respective energy
and retain in the model only the most significant ones.
1. Draw a sinusoidal curve of a given period to test
The equation for a simple cosine curve is :
Y = cos ω t
The frequency ω, can be further expressed as:
ω = 2π t/Period
Expressing the frequency in period yields the new generic formula for a sinusoidal curve:
Y = cos(2π t/Period)
In order to adjust for the starting point of the sine curve, we introduce the phase, expressed in weeks. The equation
becomes:
Y = cos(2π(t-Phase)/Period)
Visual review of figure 2.1. shows a strong 52 week seasonal variation. We start by the Fourier coefficient which
corresponds to 1 oscillation every year, corresponding to a period of 52 weeks. We define the period as a parameter in
the spreadsheet, so that we can modify it later.
The frequency is the number of oscillations over the entire period. Cell G5
contains the formula =256/PERIOD0. PERIOD0 is the name given to cell G4. It
indicates 4.92 which is the number of data points, 256, divided by 52.
Cell C3 contains the formula =COS(2*PI*(TIME-LAG0)/PERIOD0). It represents the
equation of a simple cosine curve of period = PERIOD0.
Copy the formula in C3 to the range C4 to C258 in order to draw the sine curve
by clicking on the button
23/11/2007, 12:10 PM
.
Sine curves and Fourier transform case-study
Page 12
In Excel, it is possible to rename a cell in order to refer to it by its name rather than by its coordinate. In this part, we
have renamed the cell G3 as LAG0, G4 as PERIOD0, and the range A3:A258 by TIME. PI is the name given to the cell
E6 of the first part sheet. It contains the formula =PI(), which is the Excel built-in function for Pi.
We use a cosine rather than a sine curve in order to have the sinusoidal curve starting on the crest of the wave at t=1.
Formulas in the spreadsheet:
A
B
C
1
2 Time Dat Cos 0
a
1
4 =COS(2*PI*(TIME3
LAG0)/PERIOD0)
2
4 =COS(2*PI*(TIME4
LAG0)/PERIOD0)
3
2 =COS(2*PI*(TIME5
LAG0)/PERIOD0)
4
2 =COS(2*PI*(TIME6
LAG0)/PERIOD0)
5
3 =COS(2*PI*(TIME7
LAG0)/PERIOD0)
D
E
Product0
=B3*COS
0
=B4*COS
0
=B5*COS
0
=B6*COS
0
=B7*COS
0
F
G
Parameter
s
Lag0
0,00
Period0
Frequency
Area0
52
4.92
13.90
Values returned by the formulas:
A
B
C
D
E
F
1
Parameters
2 Time Data Cos 0 Product0
1
0,99
Lag0
3
14
13.898
2
0,97
Period0
4
13
12.622
3
0,94
Frequency
5
20
18.700
4
6
Area0
0,89
9
7.969
G
0.00
52
4.92
476.2
5
2. Get the product of the signal by the sinusoidal curve
The formulas in column D return the product of the 2 curves:
The formula '=B3*COS0' appears in cell D3
COS0 is the name given to the range C3:C258. PRODUCT0 is the name given to the range C3:C258
3. Get the area under the product curve
The area under the curve is an estimation of the contribution of oscillations at this frequency in the signal.
The prime of the product curve is the area under it and can be approximated by summing the values of the entire series
in column D:
The formula '=SUM(PRODUCT0)' in cell G6 calculates the sum of PRODUCT0,
corresponding to the range D3 to D258
A value of 476.25 should be returned for the area
AERA0 corresponds to the content of the cell G6
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 13
Output :
Figure 2.3. Food-borne disease cases in France, 1992-1996 + cosine curve + product area
Signal + cosine curve
30
25
20
Cases
15
10
5
0
-5
1
53
105
157
209
-10
-15
Week
The black area is the area under the product curve. Since the cosine curve oscillates between -1 and +1, the product
curve oscillates between minus the signal and plus the signal. Positive and negative areas sum up to 476.25.
4. Adjust the phase to maximize the area
We arbitrarily started the cosine curve from the origin. This step involves moving the curve along the X axis in order to
maximize the area under the product curve. When the area is maximal, the 2 signals are « in phase ». This can be done
by trials and errors, or by using the Excel solver.
A value of 0 for the phase or lag appears in cell G3. This cell is named LAG0.
To maximize the area manually, increase gradually the value for Phase in cell G3
by clicking on the spin button next to it, until you get the largest possible
value in cell G6 for AREA
A value of 483.51 is returned for AREA for a Phase of 2. However, we can use the solver to let Excel get the "best"
value for maximizing the area.
Call the 'solver' by clicking on the button
This button performs the following actions:
Enter 'AREA0' for ' Set the target cell '
Set 'Equal to' to Max
Enter 'LAG0' in 'By changing cell'
Call on 'Solve'
You can use the button or call directly the Excel solver through the “Tool” menu
To facilitate the selection of cells in the solver, you can click on the parameter to be defined ('Set the target cell' for
instance) then click in the spreadsheet on the corresponding cell.
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 14
Excel should return the following "best solution":
A
B
C
D
E
F
1
2 Time Data Cos 0 Product0
Parameters
1
3
Phase
14
1.00
13.973
2
4
Period
13
1.00
12.978
3
5
Frequency
20
0.98
19.679
4
6
Area
9
0.96
8.597
G
1.51
52
4.92
484.34
The new value for area 484.34 is greater than 483.51 before adjusting the lag. This is the magnitude of the contribution
of the sinusoidal curve of oscillations of 52 weeks.
The values returned by the solver can differ slightly from those indicated in this example. These differences can be
explained by a different set-up of the solver on your computer (number of iterations for example) or by different initial
values. The solver is quite sensitive to initial values in its search for the best solution.
Figure 2.4. Food-borne disease cases in France, 1992-1996 + cosine curve + product area
Signal + cosine curve
30
25
20
Cases
15
10
5
0
-5
1
53
105
157
209
-10
-15
Week
After maximization of the lag (or phase), positive areas are bigger than negative areas for this frequency of oscillations.
Let's change arbitrarily the period to 33 weeks.
Reset the lag to '0' in cell G3 by clicking on the spin button
Set the period to '33' in cell G4 by clicking on the spin button
This yields a frequency of 7.76, and an area of -97.58. We need to maximize again the area by setting the lag through
the solver.
Click on “Optimise with solver”
A
B
C
D
E
F
1
Parameters
2 Time Data Cos 0 Product0
1
3
Phase
14 -0.75
-10.544
2
4
Period
13 -0.62
-7.996
3
5
Frequency
20 -0.45
-9.094
4
6
Area
9 -0.28
-2.501
5
7
4 -0.09
-0.364
23/11/2007, 12:10 PM
G
13.73
33
7.76
142.42
Sine curves and Fourier transform case-study
Page 15
The area or magnitude of the contribution of the 33 week sine curve is 142.42, which is much smaller than 484.34 that
we got for 52 weeks, indicating a much weaker contribution of 33 week oscillations in the data.
Summary
In part 3 we have calculated a single Fourier coefficient showing the magnitude of the contribution of a given
frequency in our data set. This calculation is quite simple, but would become cumbersome if we had to repeat it 128
times, for each frequency in 256 data points. Part 4 will use Excel to do all calculations at once by the Fast Fourier
Transform (FFT).
Part 4: Calculation of all sine contribution using Excel Fast Fourier
Transform function
Computation
Load the file SINE.XLS in Excel.
Activate FFT spreadsheet
The « analysis » add-in macro should be loaded in order to use the FFT (add-ins in the tool directory of Excel 5). When
the macro is loaded, an additional option appears on the tool menu, for « other analysis ». When activated, a list of addins appears.
Click on the button
If you want to practice Excel, you can carry-out this operation manually as
follows:
Click on the Fast Fourier Transform
The Fourier Transform dialog takes only 2 parameters: the input and output range:
Put the cursor in the Input range box, erase its content if any, then click on
the spreadsheet with the mouse and select the range B3 B258 with the mouse.
Click on 'Output range', erase its content if any, then click on cell C3 on the
spreadsheet. The dialog box should look as above (but not necessarily in
French…)
Click on the 'Ok' button
23/11/2007, 12:10 PM
Sine curves and Fourier transform case-study
Page 16
Sample output
A
B
C
1
2 Time Data Fourier coefficients
3
1 14.00 1986
4
2 13.00 -11.7313339426719-111.231527951024i
5
3 20.00 -44.0200955863075+97.1432268207391i
6
4 9.00 -36.3893403262534-8.98050299780206i
The output lists the coefficients expressed as complex numbers, starting with the cell indicated for output range. The
first coefficient (cell C3) does not have an imaginary component. It is calculated for 0 oscillation and corresponds to the
sum of observations over the entire range of values. The next coefficient (cell C4) corresponds to 1 oscillation over the
256 data points (period of 256 weeks); next one (cell C5) corresponds to 2 oscillations (period of 128 weeks). There are
256 coefficients, but the last 128 coefficients are mirrored images of the first 128. We will just look at the first 128.
Building the periodogram
We want to know for which period in weeks (or which frequency) do we get the strongest oscillations in the signal. The
periodogram is a graph of the period by the energy of oscillations. In order to get the period and the energy, a couple of
additional values and formulas have been entered.
The range F4 to F131 contains numbers from 1 to 128. They correspond to the 128 cosine curves that can fit 256 data
points. 1 means one oscillation over the 256 data points. Cell D4 to D131 contains the value for the corresponding
period, =256/F4 to =256/F131.
The energy is the Neper logarithm of the square root of the sum of the square of the imaginary and real component of
the complex number (!). Excel provides a formula (under 'scientific' in the list of available formulas) which extracts it
directly: IMABS(Complex Number Cell). Cells F4 to F131 contain the formula =LN(IMABS(C4)) to
=LN(IMABS(C131)).
Formulas in the example:
A
1
2
3
4
5
6
Time
1
2
3
4
B
C
Data Fourier coefficients
14.00 1986
13.00 -11.7313339426719-111.231527951024i
20.00 -44.0200955863075+97.1432268207391i
9.00 -36.3893403262534-8.98050299780206i
D
E
Period
Energy
F
=256/F4
=256/F5
=256/F6
= LN(IMABS(C4))
= LN(IMABS(C5))
= LN(IMABS(C6))
1
2
3
Corresponding values in the spreadsheet:
A
B
C
D
E
F
1
2 Time Data Fourier coefficients
Period Energy F
3
1 14.00 1986
4
2 13.00 -11.7313339426719-111.231527951024i
256.00
4.72 1
5
3 20.00 -44.0200955863075+97.1432268207391i 128.00
4.67 2
6
4 9.00 -36.3893403262534-8.98050299780206i
85.33
3.62 3
7
5 4.00 -57.2818674126406+76.8869391583771i
64.00
4.56 4
8
6 5.00 492.233449348617-127.547857769363i
51.20
6.23 5
9
7 6.00 16.8729323342511-74.6127879077932i
42.67
4.34 6
10
8 6.00 -46.1167202378768-60.0396218812223i
36.57
4.33 7
23/11/2007, 12:10 PM
F
Sine curves and Fourier transform case-study
Page 17
Cells H28 to I30 contain the mean and standard deviation of the “Energy” coefficients. The “Cut off” cell represents the
value of the mean + ALPHA standard deviations, which is the cut-off value we will use to assess the significance of the
corresponding cyclical contributions. If a coefficient is greater than the cut-off, it is displayed in bold and red, using the
Excel conditional formatting that can be set through the “Format” menu.
Mean
STD
CutOff
P(%)
Alpha
3.82052077
0.61527743
4.83256169
90
1.64
Looking at the E column shows that the energy is maximal for a period of 51.2
weeks. This shows a strong yearly seasonal effect in our data set.
EXP(6.23)=508.49, which is very close from the value of 484.34 that we got in
part 2 using 52 weeks. The FFT can only test for 51.20 weeks since it uses
fractions of 256. This explains the discrepancy, but does not cause problems in
interpreting the periodogram. In fact, if you enter 51.2 for the period in the
“Fourier” sheet and optimise using the solver, you get exactly the same value.
Interpretation of the periodogram
Figure 3.1. Periodogram of Food-borne disease cases
Pe r iodogr am of foodbor ne notifications
Energy
10
5
0
0
52
104
156
Pe r iod in w e e k s
208
The visual analysis of the periodogram shows a strong contribution for a period of 52 weeks and a smaller contribution
for 25.6 weeks, which correspond to half a year. This second contribution, being a sub-multiple of 52 is called an
harmonic.
As the period decreases to 0, corresponding to higher frequency, the energy decreases to low values. The presence of
harmonics in the periodogram can be explained by the fact that seasonal peaks are narrower than those of the 52-week
sine curve. In order to narrow the peaks of the model, we should take into account the harmonics, which are submultiples of 52. Every 52 weeks, these harmonics will reinforce the peak, and in between, they will tend to compensate
each other, resulting in narrower seasonal peaks. Thus, these harmonics are closely related to the 52-week cycle.
Summary
In part 4, we have used Excel to perform a spectral analysis of our signal. We have built the periodogramme, and
showed a strong contribution of 52-week oscillation in our data set. In fact, it was rather obvious from looking at the
original time series. However, this is not always the case, especially when there are 2 or 3 years cycle contribution to
the series.
23/11/2007, 12:10 PM