G563 Week 3: Random walks, trends, and time series

���������������
�������������������������������������
����������������������
1. Random Walk = Brownian Motion = Markov Chain (with caveat that each of these is a class of
different models, not all of which are exactly equal)
2. A random walk is a specific kind of time series process in which the next step is random
with respect to previous steps. A time series process is any process in which a series of
events occur starting with the state after the last event. Evolution, both within a lineage and
on a phylogeny, is a type of time series, though not always a random walk. Random walks are
often used as null models for the evolutionary process because they specify what happens in
a time series if all the events are completely random.
3. Outcomes of random walks are predictable, in a statistical sense, if the number of steps
and the average amount of change at each step is known. The statistics of random walks thus
form an important part of quantitative analysis of evolutionary patterns within lineages and on
phylogenetic trees.
4. Statistical outcomes of univariate, unbiased random walks:
4.1. The most likely end point is the starting point
4.2. End points are normally distributed with the mean equal to the starting point
4.3. The variance of the end points equals the variance of the per-step change times the number of
steps
4.4. The standard deviation of the end points is equal to the standard deviaton of the per-step change
times the square-root of the number of steps
5. Key ingredients for programming a random walk
5.1. Choose a starting point and record it
5.2. Draw a random number from a distribution whose average is 0 and where there is equal probability
of getting a positive or negative outcome
5.3. Add the random value to the previous value and repeat
�����������
1. Do loop: “Do loops” are standard programming structures that “do” something repeatedly until
told to stop. Mathematica can handle these loops. Key ingredients are:
1.1. setting up variables to record the outcomes of each step of the Do loop
1.2. setting up the structure to repeat commands
1.3. saving the outcome at each step
2 ���
Lecture 3 - Trait evolution, random walks.nb
1.4. setting up a control to end the loop (often this simply involves specifying the number of times to
repeat as an iterator at the end of the Do function)
Here is a short do loop that generates a random walk starting at zero. The repeated commands appear before the
comma within the Do[] function. The iterator is {x, 10000}, which repeats the loop 10,000 times. First, a step value of -1
or 1 is chosen randomly with the RandomChoice[] function. Then that value is added to the previous value stored in the
randomwalk variable(randomwalk[[-1]]) and appends it to the values already stored there.
���������� = {�}�
��[
���� = ������������[{- �� �}]�
���������� = ������[����������� ����������[[- �]] + ����]�
� {�� �� ���}]�
��������[����������� ������ → ����]
50
2000
4000
6000
8000
10 000
-50
Note that you can trigger the Do loop to end early when conditions that you specify are reached. For example, you can
end early if the value reaches 50 or higher, or you could end early if a random extinction event happens. Use the If[]
function to do this. If has two or three parameters, the first of which is a logical statement that specifies the condition, the
second is the command to execute if the statement is true (the third is optional, an alternative command to execute if the
statement is false). Break[] is a function to end a loop. Note that I have added options to the ListPlot so that all 10000
steps are shown even if the lineage becomes extinct early so that that the effect of the If[] function is more obvious.
���������� = {�}�
��[
���� = ������������[{- �� �}]�
���������� = ������[����������� ����������[[- �]] + ����]�
��[����������[[- �]] ≥ ��� �����[]]
� {�� �� ���}]�
��������[����������� ������ → ����� ��������� → {{�� �� ���}� {- ���� ���}}]
200
100
2000
-100
-200
4000
6000
8000
10 000
Lecture 3 - Trait evolution, random walks.nb
���3
In the following example we simulate a random walk that has a 1% chance of becoming extinct at each step. In the If[]
function, a random number between 0 and 1 is chosen and the walk is terminated if that number is greater than .99 (i.e.,
there is a 99% chance of not becoming extinct, and a 1% chance of ending. Why doesn’t the walk ever get very far?
How far, on average, does it get? How can you change the code so that 50% of the walks end before finishing, but the
other 50% make it all the way through 10,000 steps?
≥
→
��������� = {}�
��[
���������� = {�}�
��[
���� = ������������[{- �� �}]�
���������� = ������[����������� ����������[[- �]] + ����]�
��[����������[{�� �}] ≥ ����� �����[]]
� {�� �� ���}]�
��������� = ������[���������� ����������[[- �]]]
� {�� ����}]
���������[���������]
300
250
200
150
100
50
-40
-20
0
20
40
2. Table: Table[] is a specialized Mathematica function that has many of the properties of a Do
loop, but which saves each value automatically. Like Do[], the funciton ends with an iterator
preceeded by a comma. You can place several commands in front of the comma by
separating them with semicolons. Note that the result of commands ending with semicolons
are not returned in the table, only the result of the last command before the comma. You still
have to keep track of the current value of the random walk at each step. Here, this is done
using a temporary variable val to hold the most recent value, while at the same time inserting it
into the table.
4 ���
Lecture 3 - Trait evolution, random walks.nb
��� = ��
���������� = �����[��� = ��� + ������������[{- �� �}]� {�� �� ���}]�
��������[����������� ������ → ����]
40
20
2000
4000
6000
8000
10 000
-20
-40
-60
-80
3. Accumulate is a function that adds up a bunch of numbers generated by a command and
returns each step of the addition. This function can be used to generate a simple random walk
like the first ones above.
���������� = ����������[������������[{- �� �}� �� ���]]�
��������[����������� ������ → ����]
200
150
100
50
2000
4000
6000
8000
10 000
4. NestList is a generic version of Accumulate where you can perform any function on a list,
including addition. If you use addition, then the result is exactly the same as Accumulate. #
serves as a place holder variable for each step. Here, the random choice of -1 or 1 is added
to # at each step, with the whole thing starting at 0 and repeating 10,000 times.
Lecture 3 - Trait evolution, random walks.nb
���5
���������� = ��������[# + ������������[{- �� �}] �� �� �� ���]�
��������[����������� ������ → ����]
2000
4000
6000
8000
10 000
-50
-100
-150
�������������������������
The random walk code above produces binary random walks, one in which each step is +1 or -1. What
is the average change each step? What is the expected variance for a given number of steps.
��������
���
����
������������������������������
1. Selection coefficient drawn from normal distribution
1.1. NormalDistrubtion[] function that takes mean and standard deviation as input parameters.
RandomReal can be used to select value from that distribution.
6 ���
Lecture 3 - Trait evolution, random walks.nb
���������� = {�}�
��[
���� = ����������[������������������[�� �]]�
���������� = ������[����������� ����������[[- �]] + ����]�
� {�� �� ���}]�
��������[����������� ������ → ����]
2000
4000
6000
8000
10 000
-50
-100
-150
-200
2. McKinney Equation 2. Different method altogether for generating a random walk. If r=1, then
the formula produces a random walk. What happens if r<1 or r>1? Note that what happens if
r is much bigger than 1, even 2.0.
� = �����
���������� = {�}�
��[
�� = ����������[������������������[�� �]]�
���������� = ������[����������� � * ����������[[- �]] + ��]�
� {�� ���}]�
��������[����������� ������ → ����� ��������� → ���]
25
20
15
10
5
20
40
60
80
100
������������������
Serial correlation is one way to measure whether a sequence of data belong to a time series. If one
value is dependent on the value before it, then serial correlation is high (near 1). Time series like
random walks have strong serial correlations. Show this with Monte Carlo following instructions in
McKinney paper.
Lecture 3 - Trait evolution, random walks.nb
���7
1. First generate a random walk.
���������� = {{�� �}}�
��[
���� = ����������[������������������[�� �]]�
���������� = ������[����������� {�� ����������[[- �� �]] + ����}]�
� {�� �� ���}]�
��������[����������� ������ → ����]
250
200
150
100
50
2000
4000
6000
8000
10 000
����������[[� �� ��]]
{{�� �}� {�� - ��������}� {�� - �������}� {�� - �������}� {�� - ��������}�
{�� - ��������}� {�� �������}� {�� ������}� {�� �������}� {�� ��������}}
2. Then find a linear regression line through the random walk, which describes its average trend (a
random trend, in this case). Plot the regresison line to make sure it is right.
�� = ��������������[����������� {�� �}� �]
����[��������[����������]� ����[��[�]� {�� �� �� ���}]]
�����������
2000
������� - �������� �

4000
6000
8000
10 000
-50
-100
-150
3. regress the residuals around the fit to the residuals in the previous step. To do this, offset the residuals by choosing the 1st step through the next to the last step. Regress the second through last residual
onto them. If the regression slope is near 1.0 then the time series has a near perfect serial correlation.
8 ���
Lecture 3 - Trait evolution, random walks.nb
��������������[
������[{��[��������������][[� �� - �]]� ��[��������������][[� ��]]}]� {�� �}� �]
�����������
-����������� + �������� �

�����������������������������������
����������� = ����������[������������[{- �� �}� �� ���]]�
����������� = ����������[������������[{- �� �}� �� ���]]�
��������[{������������ �����������}� ������ → ����]
200
150
100
50
2000
4000
6000
8000
10 000
�����������[������������ �����������] // �
��������
����������� =
�����[�����������[[� + �]] - �����������[[�]]� {�� ������[�����������] - �}]�
����������� = �����[�����������[[� + �]] - �����������[[�]]�
{�� ������[�����������] - �}]�
�����������[������������ �����������] // �
- �����������