Lab3 Introduction to Matlab III

Geog410 Modeling of Environmental Systems
Lab3 Introduction to Matlab III
Date Due: 5pm Sep, 18th, 2008
1. Goals
(1) To learn how to import/export into/out from Matlab.
(2) To learn how to do Numerical Integration with Matlab.
(3) To learn how to perform simple linear regression analysis with Matlab.
2. Import/Export
The basic data arrangement in Matlab is in columns and/or rows, called arrays in
computer language. The numbers of rows and columns are called data dimension. In
many other types of software, you have to define the data dimension before data
import/export, but you don’t have to do this in Matlab.
We learned how to provide data to Matlab in the first Lab on screen. However, this
manual input only works with very limited data volume. If we have thousands of input
data saved in a file, it would be very convenient and efficient to read the data directly
from the file instead of punching the keyboard. The functions in Matlab for data import is
>> matlab_variable=dlmread(‘text_filename’);
Note: dlmread() is the function name for reading an external text file. The filename inside
the single right quote is the file name of a text file with either comma (‘) or space
delimited between numbers. The file MUST be located in the current directory as
indicated at the top of the Matlab window. The “matlab_variable” is the variable name to
note the data in matlab.The semicolon at the end suppresses the output on screen. If your
files are big, it is important to have it.
Example:
>>x=dlmread(‘myfile.txt’);
The function for data export is
>>dlmwrite(‘text_filename’, matlab_variablename);
Note: “text_filename” is the external file name onto which the contents of the
“matlab_variablename” will be written to. It is comma delimited text file. I recommend
1
your name the file as filename.txt. Adding the .txt as extension helps you to recognize the
data type later on.
Copy the file “theta.txt” in the data/ directory to your geog410 student folder. Start
Matlab, and set the current directory as your student folder. Then import the theta.txt into
Matlab by issuing the following command:
>>theta=dlmread(‘theta.txt’)
You should see the contents of theta on the screen as we did not put a semicolon at the
end. Use wordpad to open thata.txt from your student folder, you should be able to see
that the thata.txt has the exact content as you have on the screen. “theta.txt” contains the
100 angles in radiance equally spaced between 0 and 2π. Now issue the following
commands:
>> x=cos(theta);
>> y=sin(theta);
% taking a cosine of the angles and assigned the output to x;
% taking a cosine of the angles and assigned the output to y;
Now we can export x and y as in the following:
>>dlmwrite(‘x.txt’,x);
>>dlmwrite(‘y.txt’,y);
You should see two new files names as x.txt and y.txt in your student folder. If you type
>>x
>>y
You should see the content of x and y. Please use wordpad to open x.txt and y.txt from
your student folder. You should see the exact content as you have in the Matlab window.
The content of x and y are now in two separate files. Sometimes, we want them to be two
columns in a single file. We can combine them into an ARRAY with two columns of data
with column 1 being x and column 2 being y.
>>xy(:,1)=x;
>>xy(:,2)=y;
The above commands mean assigning x to the first column of xy (xy is the name of a new
Matlab variable), and y to the second column of xy. Please note the colon and the comma
in the parentheses. For any array in Matlab, we can reference any element of the array by
its rows and columns as
>>variable(row,col) such as xy(50,1) will find the element in the 50th row and the first
column in the xy array.
>> xy(50,1)
2
ans =
-0.9995
If you want to reference to the entire column, you have
>> xy(:,1)
Or
>>xy(:,2)
If you want to reference to the entire row, you can do
>>xy(50,:)
Now the Matlab variable “xy” has two columns x, and y. We can export the two columns
at one time into a single file as
>>dlmwrite(‘xy.txt’,xy);
You should see a new file named xy.txt in your student folder. Please use wordpad to
open it and see its content.
Exercises 1
1. Data Export: for the line y  2 x  1, 0  x  10 , taking 100 linearly spaced
points in the given interval for x (refer to the last lab instruction for how to do this)
and calculate y for each x. Combine the x and y into an array, and save the
coordinates of x and y to a file named ‘array_line.txt’ in your geog410 student
folder. (Tips: x=0:0.1:10)
2. Data Import: Import the data in the text file ‘xy.txt’ you just created and plot the
first column as horizontal axis and the second column as the vertical axis. Save
your figure in your geog410 student folder as lab3-figure.tif. Include the figure in
your lab report.
3. Integration
We define f (x) as a continuous function on the interval [a, b]. The area enclosed in
Figure 1 by the curve: y  f ( x) , the x-axis, and the two vertical lines at x  a, and x  b
b
is called a trapezium. Definite integration of f(x) over [a,b],
 f ( x)dx , is to calculate the
a
area of the enclosed area. We can divide the interval [a, b] into n small intervals, the
dividing points are: a  x0  x1  x2    xn  b . The whole trapezium is divided into n
small trapezium. For the kth trapezium, the length of the base xk  xk  xk 1 the area of
3
the small trapezium can be approximated as the area of a rectangle, calculated
as: S k  f ( xk )xk . The area of the whole trapezium approximates to the summation of
n
n
k 1
k 1
the area of all small trapezium as S   S k   f ( xk )xk .
Fig. 1 The basic principle of numerical integration
When the number of dividing points increases, The area of the whole enclosed areas is
n
increasingly close to S   S k .
k 1
Let’s working with an example using the function, y  f ( x)  e x ,0  x  3 ; the process to
calculate the integration of the function on interval [0, 3] is below:
>> x=0:0.01:3;
The above command is to assign x with a value starting from 0, with an increase step of
0.01 all the way to 3. This would assign 300 values to x.
>>y=exp(x);
If we do a plot for x and y,
>> plot(x,y)
This is what we get
4
>> If we divide the x axis into 300 equal intervals with a step Δx=0.01, then the small
trapezium enclosed by any x, x+Δx would be 0.01*y; The total area of below the entire
curve would be
>>A=sum(0.01*y)
3
This would provide a numerical integration of
 e dx
x
0
Note You can repeat the above example with Δx=0.1, or 0.001 etc. The smaller the step
size, the more accurate the integration.
Exercises2
,1  x  10 (Tip: ln(x) in MATLAB
1. Calculate the integration: y  f ( x)  ln(x  1)
is log(x)). Include all the MATLAB commands and the final result in your report.
x2 1
2. Calculate the integration: y  f ( x)  2
,2  x  6 . Include all the MATLAB
x 1
commands and the final result in your report. (Tips:y=(x.^2-1)./(x.^2+1);)
4. Simple Linear Regression Analysis
In the regression analysis, the variable which is to be estimated is called dependent
variable, usually denoted as y. The variable which is used to estimate the dependent
variable is called independent variable, usually denoted as x.
The Simple linear regression model between x and y can be expressed:
y  a  bx   0 ,
Where a and b are called regression coefficients, and the error term is  0 .
Our goal is to get the estimation of a and b as well as the coefficient of determination
(R2). For notational convenience, we usually denote
yˆ  a  bx
5
The regression coefficients, a and b, are estimated such that sum of the squared
difference between the observation (yi) and the estimation( ŷi ) is the smallest.
Mathematically, the solution for the regression coefficients is
n
n
n

n
x
y

x


i i
i  yi

i 1
i 1
i 1
b 
n
n
,
2

n  xi  (  x i ) 2

i 1
i 1

a  y  bx
And the coefficient of determination is
R
2
 ( yˆ

(y
i
 y)2
i
 y)
2
, where 0  R 2  1
The closer R 2 to 1, the stronger linear relationship exists between x and y. The closer
R 2 to 0, the weaker linear relationship. We do not have to remember how to calculate the
coefficients; MATLAB provides some functions which can calculate the coefficients
easily.
First please copy the file “mgt-ndvi0082.txt” from the data/ directory to your geog410
student folder. This is a file containing the change in vegetation index measured from
satellites (first column) and the times of increase in migration from 1982 to 2000 for a
dozen eastern provinces in China. These data are from Dr. Song’s research (Song, C.,
Lord, W. J., Zhou, L. and Xiao, J. 2008. Empirical Evidence for Impacts of Internal
Migration on Vegetation Dynamics in China from 1982 to 2000. Sensors, 8: 5069-5080;
DOI: 10.3390/s8085069)
Second, let’s import the data into Matlab as
>>mgtndvi=dlmread(‘mgt-ndvi0082.txt’)
You should be able to see the data on the screen since we did not put a semicolon at the
end. We will use the first column as dependent variable (y), and second column as
independent variable (x). In Matlab, the dependent variable has to be in a single column
matrix, and the independent variable (x) has to be in a two column matrix with the first
column being ones and the second column being the actual x value. This seems weird,
but it is how matrix operation for simple linear regression works.
6
 y1 
1 x1 
y 
1 x 
2
 2

Y   .  and X   . . 
 


 . 
. . 
 y n 
1 x n 
In matrix notation, the regression between X and Y can be written as
Y=Xb+ε
I put everything in bold font to indicate the every letter is a matrix. The matrix b contains
the regression coefficient a and b as
a 
b  
b 
Now let’s create X and Y for the simple linear regression analysis using the data in
mgtndvi.
>>Y(:,1)=mgtndvi(:,1)
>>X(:,1)=ones(12,1)
>>X(:,2)=mgtndvi(:,2)
Note there is a Matlab function “ones(#1,#2)”, which will create an array of ones in #1
rows and #2 columns. In the above, ones(12,1) creates a matrix in 12 rows and one
column of ones, which will be the first column of X. Then we assign the second column
of X the real x values.
The output of Matlab regression looks weird again.
>>[b,bint,r,rint,stats]=regress(Y,X);
Where b is the vector b above, bint is the possible interval for a and b, r is the residual, i.e.
( yi  yˆ i ) , and rint is the possible interval for each residual. The “stats” contains R2, F
value, P value and the variance of errors. The function in Matlab to perform regression is
“regress(dep, indep)”.
>> [b,bint,r, rint,stats]=regress(Y,X)
b=
0.5829
-0.0843
7
bint =
0.4303 0.7355
-0.1106 -0.0581
r=
-0.0163
-0.0471
-0.0027
-0.0753
-0.0530
0.0470
0.0814
0.0342
-0.0036
-0.0936
0.1933
-0.0644
rint =
-0.1976 0.1649
-0.2129 0.1187
-0.1795 0.1741
-0.2466 0.0959
-0.2360 0.1299
-0.1371 0.2311
-0.0899 0.2528
-0.1538 0.2222
-0.1835 0.1764
-0.2666 0.0794
0.0699 0.3168
-0.2171 0.0883
stats =
0.8365 51.1600
0.0000
0.0071
Based on the above results, the regression equation should be
y=0.5829 -0.0843x, R2=0.8365, P=0.0000;
8
Exercise 3
Copy the file mgt-ndvi0082-2.txt in the data/ directory to your geog410 student folder.
This is the same data as used in the example above, but for another set of provinces.
Please import the data into Matlab, run a simple linear regression analysis with the first
column being the dependent variable, and the second column as the independent variable.
Copy the model output into your lab report, identify the regression coefficient, a and b,
and R2. Describe how x and y are related.
9