Homework Problems for Course Numerical Methods for - D-MATH

Prof. R. Hiptmair
G. Alberti,
F. Leonardi
AS 2015
Numerical Methods for CSE
ETH Zürich
D-MATH
Problem Sheet 1
You should try to your best to do the core problems. If time permits, please try to do the
rest as well.
Problem 1
Arrow matrix-vector multiplication (core problem)
Consider the multiplication of the two “arrow matrices” A with a vector x, implemented
as a function arrowmatvec(d,a,x) in the following M ATLAB script
1
2
3
4
5
6
7
Listing 5: multiplying a vector with the product of two “arrow matrices”
f u n c t i o n y = arrowmatvec(d,a,x)
% Multiplying a vector with the product of two ``arrow
matrices ''
% Arrow matrix is specified by passing two column
vectors a and d
i f ( l e n g t h (d)
≠
l e n g t h (a)), e r r o r ('size mismatch'); end
% Build arrow matrix using the MATLAB function diag()
A = [ d i a g (d(1:end-1)),a(1:end-1);(a(1:end-1))',d(end)];
y = A*A*x;
(1a)
For general vectors d = (d1 , . . . , dn )� and a = (a1 , . . . , an )� , sketch the matrix
A created in line 6 of Listing 5.
H INT: This M ATLAB script is provided as file arrowmatvec.m.
a1 �
�d1
�
d
a2 �
2
�
�
�
�
⋮ �
Solution: A = �
�
�
dn−1 an−1 �
�
�
�a1 a2 . . . an−1 dn �
1
(1b)
The tic-toc timing results for arrowmatvec.m are available in Figure 1.
Give a detailed explanation of the results.
timings for arrowmatvec
1
10
0
10
−1
10
−2
10
−3
time[s]
10
tic−toc time
−4
10
3
O(n )
−5
10
−6
10
−7
10
−8
10
−9
10
0
10
1
2
10
10
3
10
4
10
vector size n
Figure 1: timings for arrowmatvec(d,a,x)
H INT: This M ATLAB created figure is provided as file arrowmatvectiming.{eps,jpg}.
Solution: The standard matrix-matrix multiplication has runtimes growing with O(n3 )
and the standard matrix-vector multiplication has runtimes growing with O(n2 ). Hence,
the overall computational complexity is dominated by O(n3 ).
(1c)
Write an efficient M ATLAB function
function y = arrowmatvec2(d,a,x)
that computes the same multiplication as in code 5 but with optimal asymptotic complexity with respect to n. Here d passes the vector (d1 , . . . , dn )T and a passes the vector
(a1 , . . . , an )T .
2
Solution: Due to the sparsity and special structure of the matrix, it is possible to write
a more efficient implementation than the standard matrix-vector multiplication. See code
listing 6
1
2
3
4
5
Listing 6: implementation of the function arrowmatvec2
f u n c t i o n y = arrowmatvec2(d,a,x)
i f ( l e n g t h (d) ≠ l e n g t h (a)), e r r o r ('size mismatch'); end
% Recursive matrix -vector multiplication to obtain A * A * x
= A * (A * x)
y = A(d,a,A(d,a,x));
end
6
7
8
9
10
11
12
% Efficient multiplication of a vector with the ``arrow
matrix ''
f u n c t i o n Ax = A(d,a,x)
Ax = d.*x;
Ax(1:end-1) = Ax(1:end-1) + a(1:end-1)*x(end);
Ax(end) = Ax(end) + a(1:end-1)'*x(1:end-1);
end
(1d)
What is the complexity of your algorithm from sub-problem (1c) (with respect
to problem size n)?
Solution: The efficient implementation only needs two vector-vector element-wise multiplications and one vector-scalar multiplication. Therefore the complexity is O(n).
(1e)
Compare the runtime of your implementation and the implementation given
in code 5 for n = 25,6,...,12 . Use the routines tic and toc as explained in example [1,
Ex. 1.4.10] of the Lecture Slides.
Solution: The standard matrix multiplication has runtimes growing with O(n3 ). The
runtimes of the more efficient implementation are growing with O(n). See Listing 7 and
Figure 2.
1
Listing 7: Execution and timings of arrowmatvec and arrowmatvec2
nruns = 3; res = [];
3
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ns = 2.^(2:12);
f o r n = ns
a = rand(n,1); d = rand(n,1); x = rand(n,1);
t = realmax;
t2 = realmax;
f o r k=1:nruns
t i c ; y = arrowmatvec(d,a,x); t = min( t o c ,t);
t i c ; y2 = arrowmatvec2(d,a,x); t2 = min( t o c ,t2);
end;
res = [res; t t2];
end
f i g u r e ('name','timings arrowmatvec and arrowmatvec2');
c1 = sum(res(:,1))/sum(ns.^3);
c2 = sum(res(:,2))/sum(ns);
l o g l o g (ns, res(:,1),'r+', ns, res(:,2),'bo',...
ns, c1*ns.^3, 'k-', ns, c2*ns, 'g-');
x l a b e l ('{\bf vector size n}','fontsize',14);
y l a b e l ('{\bf time[s]}','fontsize',14);
t i t l e ('{\bf timings for arrowmatvec and
arrowmatvec2}','fontsize',14);
l e g e n d ('arrowmatvec','arrowmatvec2','O(n^3)','O(n)',...
'location','best');
p r i n t -depsc2 '../PICTURES/arrowmatvec2timing.eps';
(1f)
Write the E IGEN codes corresponding to the functions arrowmatvec and
arrowmatvec2.
Solution: See Listing 8 and Listing 9.
1
2
3
Listing 8: Implementation of arrowmatvec in E IGEN
# i n c l u d e <Eigen / Dense>
# i n c l u d e < iostream >
# i n c l u d e <ctime >
4
5
u s i n g namespace Eigen ;
6
7
template < c l a s s Matrix >
4
timings for arrowmatvec and arrowmatvec2
1
10
0
10
−1
10
−2
10
−3
time[s]
10
−4
10
−5
10
−6
10
−7
arrowmatvec
10
arrowmatvec2
O(n3)
−8
10
O(n)
−9
10
0
10
1
2
10
10
3
10
4
10
vector size n
Figure 2: timings for arrowmatvec2(d,a,x)
8
9
10
11
12
13
14
15
16
17
18
// The function arrowmatvec computes the product the
desired product directly as A * A * x
v o i d arrowmatvec ( c o n s t M a t r i x & d , c o n s t M a t r i x & a ,
const Matrix & x , Matrix & y )
{
// Here , we choose a MATLAB style implementation
using block construction
// you can also use loops
// If you are interested you can compare both
implementation and see if and how they differ
i n t n=d . s i z e ( ) ;
VectorXd d c u t = d . head ( n −1) ;
VectorXd a c u t = a . head ( n −1) ;
MatrixXd ddiag= d c u t . asDiagonal ( ) ;
5
MatrixXd A( n , n ) ;
MatrixXd D = d c u t . asDiagonal ( ) ;
19
20
// If you do not create the temporary matrix D, you
will get an error: D must be casted to MatrixXd
21
A << D, a . head ( n −1) , a c u t . t r a n s p o s e ( ) , d ( n −1) ;
22
23
24
25
}
y=A * A * x ;
26
27
// We test the function arrowmatvec with 5 dimensional
random vectors.
28
30
i n t main ( v o i d )
{
31
//
29
srand((unsigned int) time(0));
VectorXd
VectorXd
VectorXd
VectorXd
32
33
34
35
a=VectorXd : : Random ( 5 ) ;
d=VectorXd : : Random ( 5 ) ;
x=VectorXd : : Random ( 5 ) ;
y;
36
37
38
39
1
2
}
arrowmatvec ( d , a , x , y ) ;
s t d : : c o u t << " A * A * x = " << y << s t d : : e n d l ;
Listing 9: Implementation of arrowmatvec2 in E IGEN
# i n c l u d e <Eigen / Dense>
# i n c l u d e < iostream >
3
4
u s i n g namespace Eigen ;
5
6
template < c l a s s Matrix >
7
8
// The auxiliary function Atimesx computes the function
A * x in a smart way, using the particular structure of
the matrix A.
9
10
v o i d Atimesx ( c o n s t M a t r i x & d , c o n s t M a t r i x & a , c o n s t
6
11
{
12
13
14
15
16
17
18
19
}
M a t r i x & x , M a t r i x & Ax )
i n t n=d . s i z e ( ) ;
Ax =( d . a r r a y ( ) * x . a r r a y ( ) ) . m a t r i x ( ) ;
VectorXd Axcut=Ax . head ( n −1) ;
VectorXd a c u t = a . head ( n −1) ;
VectorXd x c u t = x . head ( n −1) ;
Ax << Axcut + x ( n −1) * acut , Ax ( n −1)+
a c u t . t r a n sp o s e ( ) * x c u t ;
20
21
// We compute A * A * x by using the function Atimesx twice
with 5 dimensional random vectors.
22
23
24
25
26
27
28
i n t main ( v o i d )
{
VectorXd a=VectorXd : : Random ( 5 ) ;
VectorXd d=VectorXd : : Random ( 5 ) ;
VectorXd x=VectorXd : : Random ( 5 ) ;
VectorXd Ax ( 5 ) ;
29
30
31
32
33
34
}
Atimesx ( d , a , x , Ax ) ;
VectorXd AAx ( 5 ) ;
Atimesx ( d , a , Ax , AAx ) ;
s t d : : c o u t << " A * A * x = " << AAx << s t d : : e n d l ;
Problem 2
Avoiding cancellation (core problem)
In [1, Section 1.5.4] we saw that the so-called cancellation phenomenon is a major cause
of numerical instability, cf. [1, § 1.5.41]. Cancellation is the massive amplification of
relative errors when subtracting two real numbers of about the same value.
Fortunately, expressions vulnerable to cancellation can often be recast in a mathematically
equivalent form that is no longer affected by cancellation, see [1, § 1.5.50]. There we
studied several examples, and this problem gives some more.
7
(2a)
We consider the function
f1 (x0 , h) ∶= sin(x0 + h) − sin(x0 ) .
(1)
It can the transformed into another form, f2 (x0 , h), using the trigonometric identity
sin(') − sin( ) = 2 cos �
'+
'−
� sin �
�.
2
2
Thus, f1 and f2 give the same values, in exact arithmetic, for any given argument values
x0 and h.
1. Derive f2 (x0 , h), which does no longer involve the difference of return values of
trigonometric functions.
2. Suggest a formula that avoids cancellation errors for computing the approximation
(f (x0 + h) − f (x0 ))�h) of the derivative of f (x) ∶= sin(x) at x = x0 . Write a
M ATLAB program that implements your formula and computes an approximation
of f ′ (1.2), for h = 1 ⋅ 10−20 , 1 ⋅ 10−19 , �, 1.
H INT: For background information refer to [1, Ex. 1.5.43].
3. Plot the error (in doubly logarithmic scale using M ATLAB’s loglog plotting function) of the derivative computed with the suggested formula and with the naive implementation using f1 .
4. Explain the observed behaviour of the error.
Solution: Check the M ATLAB implementation in Listing 10 and the plot in Fig. 3. We
can clearly observe that the computation using f1 leads to a big error as h → 0. This
is due to the cancellation error given by the subtraction of two number of approximately
same magnitude. The second implementation using f2 is very stable and does not display
round-off errors.
Listing 10: M ATLAB script for Problem 2
1
2
3
%% Cancellation
h = 10.^(-20:0);
x = 1.2;
4
5
% Derivative
8
6
7
8
g1 = ( s i n (x+h) - s i n (x)) ./ h; % Naive
g2 = 2 .* c o s (x + h * 0.5) .* s i n (h * 0.5) ./ h; % Better
ex = c o s (x); % Exact
9
10
11
12
13
14
15
16
% Plot
l o g l o g (h, abs(g1-ex), 'r',h, abs(g2-ex), 'b', h, h,
'k--');
t i t l e ('Error of the approximation of f''(x_0)');
l e g e n d ('g_1','g_2', 'O(h)');
x l a b e l ('h');
y l a b e l ('| f''(x_0) - g_i(x_0,h) |');
g r i d on
Error of the approximation of f'(x
0
10 0
)
g1
g2
O(h)
| f'(x 0 ) - g i (x 0 ,h) |
10
-5
10 -10
10 -15
10 -20
10 -20
10 -15
10 -10
10 -5
h
Figure 3: Timings for Problem 2
(2b)
Using a trick applied in [1, Ex. 1.5.55] show that
√
√
ln(x − x2 − 1) = − ln(x + x2 − 1) .
9
10 0
Which of the two formulas is more suitable for numerical computation? Explain why, and
provide a numerical example in which the difference in accuracy is evident.
√
√
Solution: We immediately derive ln(x− x2 − 1)+ln(x+ x2 − 1) = log(x2 −(x2 −1)) = 0.
As x → ∞ the left log consists of subtraction of two numbers of equal magnitude, whilst
the right log consists on the addition of two numbers of approximately the same magnitude.
Therefore, in the first case there may be cancellation for large values of x, making it worse
for numerical computation. Try, in M ATLAB, with x = 108 .
(2c)
For the following expressions, state the numerical difficulties that may occur,
and rewrite the formulas in a way that is more suitable for numerical computation.
1.
2.
�
�
x + x1 − x − x1 , where x � 1.
�
1
a2
Solution:
+ b12 , where a ≈ 0, b ≈ 1.
1. Inside the square roots we have the addition (rest. subtraction) of a small number
to a big number. The difference of the square roots incur in cancellation, since they
have the same, large magnitude. A ∶= x+ x1 , B ∶= x− x1 then (A−B)(A+B)�(A+B) =
2�x
2 √
�
�
= √x(√x2 +1+
1
1
x2 −1)
2.
x+ x +
x− x
becomes very large as a approaches 0, whilst b12 → 1 as b → 1. Therefore, the
relative size of a12 and b12 becomes so big, that, in computer arithmetic, a12 + b12 = a12 .
�
On the other hand a1 1 + ( ab )2 avoids this problem by performing a division between
two numbers with very different magnitude, instead of a summation.
1
a2
Problem 3
Kronecker product
In [1, Def. 1.4.16] we learned about the so-called Kronecker product, available in M ATLAB
through the command kron. In this problem we revisit the discussion of [1, Ex. 1.4.17].
Please refresh yourself on this example and study [1, Code 1.4.18] again.
As in [1, Ex. 1.4.17], the starting point is the line of M ATLAB code
y = kron(A, B) ∗ x,
where the arguments are A, B ∈ Rn,n , x ∈ Rn⋅n .
10
(2)
(3a)
Obtain further information about the kron command from M ATLAB help
issuing doc kron in the M ATLAB command window.
Solution: See M ATLAB help.
(3b)
Explicitly write Eq. (2) in the form y = Mx (i.e. write down M), for A =
1 2
5 6
�
� and B = �
�.
3 4
7 8
� 5 6 10
� 7 8 14
Solution: y = �
�15 18 20
�21 24 28
12�
16�
� x.
24�
32�
(3c)
What is the asymptotic complexity (→ [1, Def. 1.4.3]) of the M ATLAB code
(2)? Use the Landau symbol from [1, Def. 1.4.4] to state your answer.
Solution: kron(A,B) results in a matrix of size n2 × n2 and x has length n2 . So the
complexity is the same as a matrix-vector multiplication for the resulting sizes. In total
this is O(n2 ∗ n2 ) = O(n4 ).
(3d)
Measure the runtime of (2) for n = 23,4,5,6 and random matrices. Use the
M ATLAB functions tic and toc as explained in example [1, Ex. 1.4.10] of the Lecture
Slides.
Solution: Since kron(A,B) creates a large matrix consisting of smaller blocks with
size n, i.e. B multiplied with A(i, j), we can split the problem up in n matrix-vector
multiplications of size n. This results in a routine with complexity n ∗ O(n2 ) = O(n3 ) The
implementation is listed in 11. The runtimes are shown in Figure 4.
1
2
3
4
5
Listing 11: An efficient implementation for Problem 3
f u n c t i o n y = Kron_B(A,B,x)
% Return y
% Input:
%
% Output:
= kron(A,B) * x (smart version)
A,B:
2 n x n matrices.
x:
Vector of length n * n.
y:
Result vector of length n * n.
6
7
8
9
% check size of A
[n,m] = s i z e (A);
assert(n == m, 'expected quadratic matrix')
11
10
11
12
13
% kron gives a matrix with n x n blocks
% block i,j is A(i,j) * B
% => y = M * x can be done block -wise so that we reuse
B * x(...)
14
15
% init
16
y = z e r o s (n*n,1);
17
% loop first over columns and then (!) over rows
18
f o r j = 1:n
% reuse B * x(...) part (constant in given column) =>
O(n^2)
19
z = B*x((j-1)*n+1:j*n);
20
% add to result vector (need to go through full
vector) => O(n^2)
21
22
23
24
f o r i = 1:n
y((i-1)*n+1:i*n) = y((i-1)*n+1:i*n) + A(i,j)*z;
end
25
end
26
% Note: complexity is O(n^3)
27
end
(3e)
Explain in detail, why (2) can be replaced with the single line of M ATLAB code
y = reshape( B * reshape(x,n,n) * A’, n*n, 1);
and compare the execution times of (2) and (3) for random matrices of size n = 23,4,5,6 .
Solution:
1
2
3
4
5
Listing 12: A second efficient implementation for Problem 3 using reshape.
f u n c t i o n y = Kron_C (A,B,x)
% Return y = kron(A,B) * x (smart version with reshapes)
% Input:
A,B:
2 n x n matrices.
%
x:
Vector of length n * n.
% Output:
y:
Result vector of length n * n.
6
7
% check size of A
12
(3)
8
[n,m] = s i z e (A);
matrix')
assert(n == m, 'expected quadratic
9
10
11
12
% init
yy = z e r o s (n,n);
xx = r e s h a p e (x,n,n);
13
14
% precompute the multiplications of B with the parts of
vector x
20
Z = B*xx;
f o r j=1:n
yy = yy + Z(:,j) * A(:,j)';
end
y = r e s h a p e (yy,n^2,1);
end
21
% Notes: complexity is O(n^3)
15
16
17
18
19
Listing 13: Main routine for runtime measurements of Problem 3
1
2
3
4
5
6
7
%% Kron (runtimes)
clear all; close all;
nruns = 3; % we average over a few runs
N = 2.^(2:10);
T = z e r o s ( l e n g t h (N),5);
f o r i = 1: l e n g t h (N)
n = N(i)
% problem size n
12
%
%
%
%
%
13
A=rand(n);
8
9
10
11
% initial matrices A,B and vector x
A = reshape(1:n * n, n, n)';
B = reshape(n * n+1:2 * n * n, n, n)';
x = (1:n * n)';
alternative choice:
B=rand(n);
x=rand(n^2,1);
14
15
16
17
18
19
tic;
% smart implementation #1
f o r ir = 1:nruns
yB = Kron_B(A,B,x);
end
tb = t o c /nruns;
13
20
tic;
% smart implementation #2: with reshapes
f o r ir = 1:nruns
yC = Kron_C(A,B,x);
end
tc = t o c /nruns;
f p r i n t f ('Error B-vs-C:
%g\n', norm(yB-yC))
21
22
23
24
25
26
27
tic;
% implementation with kron and matrix * vector
i f (N(i)<128)
f o r ir = 1:nruns
yA = kron(A,B)*x;
end
ta = t o c /nruns;
f p r i n t f ('Error A-vs-B:
%g\n', norm(yA-yB))
else
ta=0;
end
28
29
30
31
32
33
34
35
36
37
38
tic;
39
40
% implementation with 1-line command!
% inspired by the solution of Kevin Bocksrocker
f o r ir = 1:nruns
yD = r e s h a p e (B* r e s h a p e (x,n,n)*A',n^2,1);
end
td = t o c /nruns;
f p r i n t f ('Error D-vs-B:
%g\n', norm(yD-yB));
41
42
43
44
45
46
f p r i n t f ('Timings: Matrix: %g\n
%g\n',ta,tb)
f p r i n t f ('
Smart+Reshape:
1-line Smart: %g\n',tc,td)
T(i,:) = [n, ta, tb, tc, td];
47
48
49
50
Smart:
%g\n
end
51
52
53
54
% log-scale plot for investigation of asymptotic
complexity
a2 = sum(T(:,3)) / sum(T(:,1).^2);
a3 = sum(T(:,3)) / sum(T(:,1).^3);
14
55
56
57
58
59
60
61
62
63
64
65
66
67
a4 = sum(T(1:5,2)) / sum(T(1:5,1).^4);
f i g u r e ('name','kron timings');
l o g l o g (T(:,1),T(:,2),'m+', T(:,1),T(:,3),'ro',...
T(:,1),T(:,4),'bd', T(:,1),T(:,5),'gp',...
T(:,1),(T(:,1).^2)*a2,'k-',
T(:,1),(T(:,1).^3)*a3,'k--',...
T(1:5,1),(T(1:5,1).^4)*a4,'k-.', 'linewidth', 2);
x l a b e l ('{\bf problem size n}','fontsize',14);
y l a b e l ('{\bf average runtime (s)}','fontsize',14);
t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs',
nruns),'fontsize',14);
l e g e n d ('slow evaluation','efficient evaluation',...
'efficient ev. with reshape','Kevin 1-line',...
'O(n^2)','O(n^3)','O(n^4)','location','northwest');
p r i n t -depsc2 '../PICTURES/kron_timings.eps';
(3f)
Based on the E IGEN numerical library (→ [1, Section 1.2.3]) implement a C++
function
t e m p l a t e < c l a s s Matrix>
v o i d kron( c o n s t Matrix & A, c o n s t Matrix & B, Matrix &
C) {
// Your code here
}
returns the Kronecker product of the argument matrices A and B in the matrix C.
H INT: Feel free (but not forced) to use the partial codes provided in kron.cpp as well as
the CMake file CMakeLists.txt (including cmake-modules) and the timing header
file timer.h .
Solution: See kron.cpp or Listing 14.
(3g)
Devise an implementation of the M ATLAB code (2) in C++according to the
function definition
t e m p l a t e < c l a s s Matrix, c l a s s Vector>
v o i d kron_mv( c o n s t Matrix & A, c o n s t Matrix & B, c o n s t
15
Vector & x, Vector & y);
The meaning of the arguments should be self-explanatory.
Solution: See kron.cpp or Listing 14.
(3h)
Now, using a function definition similar to that of the previous sub-problem,
implement the C++ equivalent of (3) in the function kron_mv_fast.
H INT: Study [1, Rem. 1.2.23] about “reshaping” matrices in E IGEN.
(3i)
Compare the runtimes of your two implementations as you did for the M ATLAB
implementations in sub-problem (3e).
Solution:
1
2
3
Listing 14: Main routine for runtime measurements of Problem 3
# i n c l u d e <Eigen / Dense>
# i n c l u d e < iostream >
# include <vector >
4
5
# include " timer . h "
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
//!
//!
//!
//!
\brief Compute the Kronecker product C = A ⊗ B .
\param[in] A Matrix n × n
\param[in] B Matrix n × n
\param[out] C Kronecker product of A and B of dim
n2 × n2
template < c l a s s Matrix >
v o i d kron ( c o n s t M a t r i x & A , c o n s t M a t r i x & B , M a t r i x & C)
{
C = M a t r i x (A . rows ( ) * B . rows ( ) , A . c o l s ( ) * B . c o l s ( ) ) ;
f o r ( u n s i g n e d i n t i = 0 ; i < A . rows ( ) ; ++ i ) {
f o r ( u n s i g n e d i n t j = 0 ; j < A . c o l s ( ) ; ++ j ) {
C. b l o c k ( i * B . rows ( ) , j * B . c o l s ( ) , B . rows ( ) ,
B . c o l s ( ) ) = A( i , j ) * B ;
}
}
}
21
16
22
23
24
25
26
27
28
29
30
31
//! \brief Compute the Kronecker product C = A ⊗ B .
Exploit matrix -vector product.
//! A,B and x must have dimension n \times n resp. n
//! \param[in] A Matrix n × n
//! \param[in] B Matrix n × n
//! \param[in] x Vector of dim n
//! \param[out] y Vector y = kron(A,B) * x
t e m p l a t e < c l a s s M a t r i x , c l a s s Vector >
void kron_fast ( const Matrix & A, const Matrix & B, const
Vector & x , Vector & y )
{
y = V e c t o r : : Zero (A . rows ( ) * B . rows ( ) ) ;
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
}
u n s i g n e d i n t n = A . rows ( ) ;
f o r ( u n s i g n e d i n t j = 0 ; j < A . c o l s ( ) ; ++ j ) {
V e c t o r z = B * x . segment ( j * n , n ) ;
f o r ( u n s i g n e d i n t i = 0 ; i < A . rows ( ) ; ++ i ) {
y . segment ( i * n , n ) += A( i , j ) * z ;
}
}
//! \brief Compute the Kronecker product C = A ⊗ B . Uses
fast remapping tecniques (similar to Matlab reshape)
//! A,B and x must have dimension n \times n resp. n * n
//! Elegant way using reshape
//! WARNING: using Matrix::Map we assume the matrix is
in ColMajor format , * beware * you may incur in bugs if
matrix is in RowMajor isntead
//! \param[in] A Matrix n × n
//! \param[in] B Matrix n × n
//! \param[in] x Vector of dim n
//! \param[out] y Vector y = kron(A,B) * x
t e m p l a t e < c l a s s M a t r i x , c l a s s Vector >
void kron_super_fast ( const Matrix & A, const Matrix & B,
const Vector & x , Vector & y )
{
u n s i g n e d i n t n = A . rows ( ) ;
17
54
55
56
}
M a t r i x t = B * M a t r i x : : Map( x . data ( ) , n , n ) *
A. transpose ( ) ;
y = M a t r i x : : Map( t . data ( ) , n * n , 1 ) ;
57
58
i n t main ( v o i d ) {
59
60
61
62
63
64
65
// Check if kron works , cf.
Eigen : : MatrixXd A ( 2 , 2 ) ;
A << 1 , 2 , 3 , 4 ;
Eigen : : MatrixXd B ( 2 , 2 ) ;
B << 5 , 6 , 7 , 8 ;
Eigen : : MatrixXd C;
66
67
68
69
70
71
72
Eigen : : VectorXd x = Eigen : : VectorXd : : Random ( 4 ) ;
Eigen : : VectorXd y ;
kron ( A , B , C) ;
y = C* x ;
s t d : : c o u t << " k r o n ( A , B ) = " << s t d : : e n d l << C <<
std : : endl ;
s t d : : c o u t << " U s i n g k r o n : y =
" << s t d : : e n d l <<
y << s t d : : e n d l ;
73
74
75
76
77
kron_fast (A,B, x , y ) ;
s t d : : c o u t << " U s i n g k r o n _ f a s t : y = " << s t d : : e n d l <<
y << s t d : : e n d l ;
kron_super_fast (A,B, x , y ) ;
s t d : : c o u t << " U s i n g k r o n _ s u p e r _ f a s t : y = " <<
s t d : : e n d l << y << s t d : : e n d l ;
78
79
80
81
82
// Compute runtime of different implementations of
kron
unsigned i n t repeats = 10;
t i m e r <> tm_kron , t m _ k r o n _ f a s t , t m _ k r o n _ s u p e r _ f a s t ;
s t d : : v e c t o r < i n t > times_kron , t i m e s _ k r o n _ f a s t ,
times_kron_super_fast ;
83
84
for ( unsigned i n t p = 2; p
18
≤
9 ; p++) {
tm_kron . r e s e t ( ) ;
tm_kron_fast . reset ( ) ;
tm_kron_super_fast . reset ( ) ;
f o r ( u n s i g n e d i n t r = 0 ; r < r e p e a t s ; ++ r ) {
u n s i g n e d i n t M = pow ( 2 , p ) ;
A = Eigen : : MatrixXd : : Random (M,M) ;
B = Eigen : : MatrixXd : : Random (M,M) ;
x = Eigen : : VectorXd : : Random (M*M) ;
85
86
87
88
89
90
91
92
93
94
// May be too slow for large p, comment if so
95
tm_kron . s t a r t ( ) ;
96
97
//
//
kron(A,B,C);
y = C * x;
tm_kron . s t o p ( ) ;
98
99
tm_kron_fast . s t a r t ( ) ;
kron_fast (A,B, x , y ) ;
tm_kron_fast . stop ( ) ;
100
101
102
103
104
105
106
107
}
tm_kron_super_fast . s t a r t ( ) ;
kron_super_fast (A,B, x , y ) ;
tm_kron_super_fast . stop ( ) ;
108
109
110
111
112
113
114
s t d : : c o u t << " L a z y K r o n t o o k :
" <<
tm_kron . min ( ) . count ( ) / 1000000. << " ms " <<
std : : endl ;
s t d : : c o u t << " K r o n f a s t t o o k :
" <<
t m _ k r o n _ f a s t . min ( ) . count ( ) / 1000000. << "
ms " << s t d : : e n d l ;
s t d : : c o u t << " K r o n s u p e r f a s t t o o k : " <<
t m _ k r o n _ s u p e r _ f a s t . min ( ) . count ( ) / 1000000.
<< " ms " << s t d : : e n d l ;
t i m e s _ k r o n . push_back ( tm_kron . min ( ) . count ( ) ) ;
t i m e s _ k r o n _ f a s t . push_back (
t m _ k r o n _ f a s t . min ( ) . count ( ) ) ;
t i m e s _ k r o n _ s u p e r _ f a s t . push_back (
t m _ k r o n _ s u p e r _ f a s t . min ( ) . count ( ) ) ;
19
}
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
}
f o r ( a u t o i t = t im e s _ k r o n . begin ( ) ; i t ! =
t i m e s _ k r o n . end ( ) ; ++ i t ) {
s t d : : c o u t << * i t << " " ;
}
s t d : : c o u t << s t d : : e n d l ;
f o r ( a u t o i t = t i m e s _ k r o n _ f a s t . begin ( ) ; i t ! =
t i m e s _ k r o n _ f a s t . end ( ) ; ++ i t ) {
s t d : : c o u t << * i t << " " ;
}
s t d : : c o u t << s t d : : e n d l ;
f o r ( a u t o i t = t i m e s _ k r o n _ s u p e r _ f a s t . begin ( ) ; i t ! =
t i m e s _ k r o n _ s u p e r _ f a s t . end ( ) ; ++ i t ) {
s t d : : c o u t << * i t << " " ;
}
s t d : : c o u t << s t d : : e n d l ;
Problem 4
Structured matrix–vector product
In [1, Ex. 1.4.14] we saw how the particular structure of a matrix can be exploited to
compute a matrix-vector product with substantially reduced computational effort. This
problem presents a similar case.
Consider the real n × n matrix A defined by (A)i,j = ai,j = min{i, j}, for i, j = 1, . . . , n.
The matrix-vector product y = Ax can be implemented in M ATLAB as
(4a)
LAB
y = min(ones(n, 1) ∗ (1 ∶ n), (1 ∶ n)′ ∗ ones(1, n)) ∗ x;
(4)
What is the asymptotic complexity (for n → ∞) of the evaluation of the M ATcommand displayed above, with respect to the problem size parameter n?
Solution: Matrix–vector multiplication: quadratic dependence O(n2 ).
(4b)
Write an efficient M ATLAB function
function y = multAmin(x)
that computes the same multiplication as (4) but with a better asymptotic complexity with
respect to n.
20
10 2
tic-toc timing averaged over 3 runs
10 1
10 0
average runtime (s)
10 -1
10 -2
10 -3
10 -4
ML, slow evaluation
ML, efficient evaluation
ML, efficient ev. with reshape
ML, oneliner
Cpp, slow
Cpp fast
Cpp superfast
O(n 2 )
O(n 3 )
O(n 4 )
10 -5
10 -6
10 -7
10 0
10 1
10 2
10 3
10 4
problem size n
Figure 4: Timings for Problem 3 with M ATLABand C++implementations.
H INT: you can test your implementation by comparing the returned values with the ones
obtained with code (4).
Solution: For every j we have yj = ∑jk=1 kxk + j ∑nk=j+1 xk , so we pre-compute the two
terms for every j only once.
1
Listing 15: implementation for the function multAmin
f u n c t i o n y = multAmin(x)
2
% O(n), slow version
3
4
5
6
n
y
v
w
=
=
=
=
l e n g t h (x);
z e r o s (n,1);
z e r o s (n,1);
z e r o s (n,1);
7
8
9
v(1) = x(n);
w(1) = x(1);
21
10
11
12
13
14
15
16
17
f o r j = 2:n
v(j) = v(j-1)+x(n+1-j);
w(j) = w(j-1)+j*x(j);
end
f o r j = 1:n-1
y(j) = w(j) + v(n-j)*j;
end
y(n) = w(n);
18
19
20
21
% To check the code , run:
% n=500; x=randn(n,1); y = multAmin(x);
% norm(y - min(ones(n,1) * (1:n), (1:n)'* ones(1,n)) * x)
(4c)
What is the asymptotic complexity (in terms of problem size parameter n) of
your function multAmin?
Solution: Linear dependence: O(n).
(4d)
Compare the runtime of your implementation and the implementation given in
(4) for n = 25,6,...,12 . Use the routines tic and toc as explained in example [1, Ex. 1.4.10]
of the Lecture Slides.
Solution:
The matrix multiplication in (4) has runtimes growing with O(n2 ). The runtimes of
the more efficient implementation with hand-coded loops, or using the M ATLABfunction
cumsum are growing with O(n).
Listing 16: comparison of execution timings
1
2
3
4
ps = 4:12;
ns = 2.^ps;
ts = 1e6*ones( l e n g t h (ns),3);
nruns = 10;
% timings
% to average the runtimes
5
6
7
8
9
10
% loop over different Problem Sizes
f o r j=1: l e n g t h (ns)
n = ns(j);
f p r i n t f ('Vector length: %d \n', n);
x = rand(n,1);
22
11
% timing for naive multiplication
12
tic;
f o r k=1:nruns
y = min(ones(n,1)*(1:n), (1:n)'*ones(1,n)) * x;
end
ts(j,1) = t o c /nruns;
13
14
15
16
17
18
% timing multAmin
19
tic;
f o r k=1:nruns
y = multAmin(x);
end
ts(j,2) = t o c /nruns;
20
21
22
23
24
25
% timing multAmin2
26
27
28
29
30
31
32
end
tic;
f o r k=1:nruns
y = multAmin2(x);
end
ts(j,3) = t o c /nruns;
33
34
35
c1 = sum(ts(:,2)) / sum(ns);
c2 = sum(ts(:,1)) / sum(ns.^2);
36
37
38
39
40
41
42
43
l o g l o g (ns, ts(:,1), '-k', ns, ts(:,2), '-og', ns,
ts(:,3), '-xr',...
ns, c1*ns, '-.b', ns, c2*ns.^2, '--k',
'linewidth', 2);
l e g e n d ('naive','multAmin','multAmin2','O(n)','O(n^2)',...
'Location','NorthWest')
t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs',
nruns),'fontsize',14);
x l a b e l ('{\bf problem size n}','fontsize',14);
y l a b e l ('{\bf runtime (s)}','fontsize',14);
44
45
p r i n t -depsc2 '../PICTURES/multAmin_timings.eps';
23
10 0
10 -1
runtime (s)
10 -2
tic-toc timing averaged over 10 runs
Matlab naive
Matlab multAmin
Matlab multAmin2
Cpp naive
Cpp naive (with loops)
Cpp fast
O(n)
O(n 2 )
10 -3
10 -4
10 -5
10 -6
10 -7
10 1
10 2
10 3
10 4
problem size n
Figure 5: Timings for different implementations of y = Ax with both M ATLABand C++.
(4e)
Can you solve task (4b) without using any for- or while-loop?
Implement it in the function
function y = multAmin2(x)
H INT: you may use the M ATLABbuilt-in command cumsum.
Solution: Using cumsum to avoid the for loops:
1
Listing 17: implementation for the function multAmin without loops
f u n c t i o n y = multAmin2(x)
2
% O(n), no-for version
3
4
5
6
n
v
w
y
=
=
=
=
l e n g t h (x);
cumsum(x(end:-1:1));
cumsum(x.*(1:n)');
w + (1:n)'.*[v(n-1:-1:1);0];
24
(4f)
Consider the following M ATLABscript multAB.m:
Listing 18: M ATLABscript calling multAmin
1
2
3
4
5
n = 10;
B = d i a g (-ones(n-1,1),-1)+ d i a g ([2*ones(n-1,1);1],0)...
+ d i a g (-ones(n-1,1),1);
x = rand(n,1);
f p r i n t f ('|x-y| = %d\n',norm(multAmin(B*x)-x));
Sketch the matrix B created in line 3 of multAB.m.
H INT: this M ATLABscript is provided as file multAB.m.
Solution:
� 2 −1 0 � 0 �
�−1 2 −1 � ⋮ �
�
�
�
B ∶= �
�0 � � � 0�
� ⋮ � −1 2 −1�
�
�
� 0 � 0 −1 1 �
Notice the value 1 in the entry (n, n).
(4g)
Run the code of Listing 18 several times and conjecture a relationship between
the matrices A and B from the output. Prove your conjecture.
H INT: You must take into account that computers inevitably commit round-off errors, see
[1, Section 1.5].
Solution: It is easy to verify with M ATLAB(or to prove) that B = A−1 .
For 2 ≤ j ≤ n − 1, we obtain:
(AB)i,j = � ai,k bk,j = ai,j−1 bj−1,j + 2ai,j bj,j + ai,j+1 bj+1,j
n
k=1
�
−i + 2i − i = 0
�
�
�
�
= − min(i, j − 1) + 2 min(i, j) − min(i, j + 1) = �−(i − 1) + 2i − i = 1
�
�
�
�
�−(j − 1) + 2j − (j + 1) = 0
Furthermore, (AB)i,1 = i,1 and (AB)i,n = i,n , hence AB = I.
The last line of multAB.m prints the value of �A B x − x� = �x − x� = 0.
The returned values are not exactly zero due to round-off errors.
25
if i < j,
if i = j,
if i > j.
(4h)
1
2
Implement a C++ function with declaration
template <class Vector>
void minmatmv(const Vector &x,Vector &y);
that realizes the efficient version of the M ATLAB line of code (4). Test your function by
comparing with output from the equivalent M ATLAB functions.
Solution:
1
Listing 19: C++script implementing multAmin
#include <Eigen/Dense>
2
3
4
#include <iostream>
#include <vector>
5
6
#include "timer.h"
7
8
9
10
11
12
//! \brief build A*x using array of ranges and of ones.
//! \param[in] x vector x f o r A*x = y
//! \param[out] y y = A*x
void multAminSlow(const Eigen::VectorXd & x,
Eigen::VectorXd & y) {
unsigned int n = x. s i z e ();
13
14
15
16
17
}
Eigen::VectorXd one = Eigen::VectorXd::Ones(n);
Eigen::VectorXd linsp =
Eigen::VectorXd::LinSpaced(n,1,n);
y = ( (one * linsp.transpose()).cwiseMin(linsp *
one.transpose()) ) *x;
18
19
20
21
22
//! \brief build A*x using a simple f o r loop
//! \param[in] x vector x f o r A*x = y
//! \param[out] y y = A*x
void multAminLoops(const Eigen::VectorXd & x,
Eigen::VectorXd & y) {
26
unsigned int n = x. s i z e ();
23
24
Eigen::MatrixXd A(n,n);
25
26
27
28
29
30
31
32
33
}
f o r (unsigned int i = 0; i < n; ++i) {
f o r (unsigned int j = 0; j < n; ++j) {
A(i,j) = s t d ::min(i+1,j+1);
}
}
y = A * x;
34
35
36
37
38
39
40
41
42
//! \brief build A*x using a clever representation
//! \param[in] x vector x f o r A*x = y
//! \param[out] y y = A*x
void multAmin(const Eigen::VectorXd & x, Eigen::VectorXd
& y) {
unsigned int n = x. s i z e ();
y = Eigen::VectorXd::Zero(n);
Eigen::VectorXd v = Eigen::VectorXd::Zero(n);
Eigen::VectorXd w = Eigen::VectorXd::Zero(n);
43
v(0) = x(n-1);
w(0) = x(0);
44
45
46
f o r (unsigned int j = 1; j < n; ++j) {
v(j) = v(j-1) + x(n-j-1);
w(j) = w(j-1) + (j+1)*x(j);
}
f o r (unsigned int j = 0; j < n-1; ++j) {
y(j) = w(j) + v(n-j-2)*(j+1);
}
y(n-1) = w(n-1);
47
48
49
50
51
52
53
54
55
}
56
57
58
int main(void) {
// Build Matrix B with 10x10 dimensions such that B
= i n v (A)
27
59
60
61
62
63
64
65
66
67
unsigned int n = 10;
Eigen::MatrixXd B = Eigen::MatrixXd::Zero(n,n);
f o r (unsigned int i = 0; i < n; ++i) {
B(i,i) = 2;
i f (i < n-1) B(i+1,i) = -1;
i f (i > 0) B(i-1,i) = -1;
}
B(n-1,n-1) = 1;
s t d ::cout << "B = " << B << s t d ::endl;
68
69
70
71
72
73
74
75
76
// Check that B = i n v (A) (up to machine precision)
Eigen::VectorXd x = Eigen::VectorXd::Random(n), y;
multAmin(B*x, y);
s t d ::cout << "|y-x| = " << (y - x).norm() <<
s t d ::endl;
multAminSlow(B*x, y);
s t d ::cout << "|y-x| = " << (y - x).norm() <<
s t d ::endl;
multAminLoops(B*x, y);
s t d ::cout << "|y-x| = " << (y - x).norm() <<
s t d ::endl;
77
78
79
80
81
82
83
84
85
86
87
// Timing from 2^4 to 2^13 repeating nruns times
timer<> tm_slow, tm_slow_loops, tm_fast;
s t d ::vector<int> times_slow, times_slow_loops,
times_fast;
unsigned int nruns = 10;
f o r (unsigned int p = 4; p ≤ 13; ++p) {
tm_slow. r e s e t ();
tm_slow_loops. r e s e t ();
tm_fast. r e s e t ();
f o r (unsigned int r = 0; r < nruns; ++r) {
x = Eigen::VectorXd::Random(pow(2,p));
88
89
90
91
tm_slow.start();
multAminSlow(x, y);
tm_slow.stop();
92
28
tm_slow_loops.start();
multAminLoops(x, y);
tm_slow_loops.stop();
93
94
95
96
tm_fast.start();
multAmin(x, y);
tm_fast.stop();
97
98
99
}
times_slow.push_back( tm_slow.avg().count() );
times_slow_loops.push_back(
tm_slow_loops.avg().count() );
times_fast.push_back( tm_fast.avg().count() );
100
101
102
103
}
104
105
f o r (auto it = times_slow.begin(); it !=
times_slow.end(); ++it) {
s t d ::cout << *it << " ";
}
s t d ::cout << s t d ::endl;
f o r (auto it = times_slow_loops.begin(); it !=
times_slow_loops.end(); ++it) {
s t d ::cout << *it << " ";
}
s t d ::cout << s t d ::endl;
f o r (auto it = times_fast.begin(); it !=
times_fast.end(); ++it) {
s t d ::cout << *it << " ";
}
s t d ::cout << s t d ::endl;
106
107
108
109
110
111
112
113
114
115
116
117
118
}
Problem 5
(5a)
Matrix powers
Implement a M ATLAB function
Pow(A, k)
that, using only basic linear algebra operations (including matrix-vector or matrix-matrix
multiplications), computes efficiently the k th power of the n × n matrix A.
29
H INT: use the M ATLAB operator ∧ to test your implementation on random matrices A.
H INT: use the M ATLAB functions de2bi to extract the “binary digits” of an integer.
j
Solution: Write k in binary format: k = ∑M
j=0 bj 2 , bj ∈ {0, 1}. Then
Ak = � A2
M
j=0
M
j
bj
=
�
j
j s.t. bj =1
A2 .
We compute A, A2 , A4 , . . . , A2 (one matrix-matrix multiplication each) and we mulj
tiply only the matrices A2 such that bj ≠ 0.
1
2
3
4
5
Listing 20: An efficient implementation for Problem 5
f u n c t i o n X = Pow (A, k)
% Pow - Return A^k for square matrix A and integer k
% Input:
A:
n * n matrix
%
k:
positive integer
% Output:
X:
n * n matrix X = A^k
6
7
8
9
10
% transform k in basis 2
bin_k = de2bi(k) ;
M = l e n g t h (bin_k);
X = e y e ( s i z e (A));
11
12
13
14
15
16
17
f o r j = 1:M
i f bin_k(j) == 1
X = X*A;
end
A = A*A;
% now A{new} = A{initial} ^(2^j)
end
(5b)
LAB
Find the asymptotic complexity in k (and n) taking into account that in M ATa matrix-matrix multiplication requires a O(n3 ) effort.
Solution: Using the simplest implementation:
Ak = � . . . ��A ⋅ A� ⋅ A� . . . ⋅ A� ⋅ A
��� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
k
30
→
O�(k − 1)n3 �.
Using the efficient implementation from Listing 20, for each j ∈ {1, 2, . . . , log2 (k)} we
have to perform at most two multiplications (X ∗ A and A ∗ A):
complexity
≤
2 ∗ M ∗ matrix-matrix mult.
(�a� = ceil(a) = inf{b ∈ Z, a ≤ b}).
≈
2 ∗ �log2 k� ∗ n3 .
(5c)
Plot the runtime of the built-in M ATLAB power (∧) function and find out the
complexity. Compare it with the function Pow from (5a).
Use the matrix
2⇡i jk
1
Aj,k = √ exp �
�
n
n
to test the two functions.
Solution:
1
2
3
4
5
6
Listing 21: Timing plots for Problem 5
clear all; close all;
nruns = 10;
% we average over a few runs
nn = 30:3:60;
% dimensions used
kk = (2:50);
% powers used
tt1 = z e r o s ( l e n g t h (nn), l e n g t h (kk)); % times for Pow
tt2 = z e r o s ( l e n g t h (nn), l e n g t h (kk)); % times for Matlab
power
7
8
9
f o r i = 1: l e n g t h (nn)
n = nn(i);
10
11
12
13
% matrix with |eigenvalues| =1:
% A = vander([exp(1i * 2 * pi * [1:n]/n)])/sqrt(n);
A = exp(1i * 2 * p i * [1:n]'*[1:n]/n)/ s q r t (n);
14
15
16
17
18
19
20
21
f o r j=1: l e n g t h (kk)
k = kk(j);
tic
f o r run = 1:nruns
X = Pow(A, k);
end
tt1(i,j) = t o c ;
31
22
23
24
25
26
27
28
end
29
tic
f o r run = 1:nruns
XX = A^k;
end
tt2(i,j) = t o c ;
n_k_err = [n, k, max(max(abs(X-XX)))]
30
31
end
32
35
f i g u r e ('name','Pow timings');
s u b p l o t (2,1,1)
n_sel=6;
%plot in k only for a selected n
36
% expected logarithmic dep. on k, semilogX used:
33
34
37
38
39
40
41
42
43
44
s e m i l o g x (kk,tt1(n_sel,:),'m+',
kk,tt2(n_sel,:),'ro',...
kk,sum(tt1(n_sel,:))* l o g (kk)/( l e n g t h (kk)* l o g (k)),
'linewidth', 2);
x l a b e l ('{\bf power k}','fontsize',14);
y l a b e l ('{\bf average runtime (s)}','fontsize',14);
t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs,
matrix size = %d',...
nruns, nn(n_sel)),'fontsize',14);
l e g e n d ('our implementation','Matlab built-in',...
'O(C log(k))','location','northwest');
45
46
47
48
49
50
51
52
53
s u b p l o t (2,1,2)
k_sel = 35;
%plot in n only for a selected k
l o g l o g (nn, tt1(:,k_sel),'m+',
nn,
tt2(:,k_sel),'ro',...
nn, sum(tt1(:,k_sel))* nn.^3/sum(kk.^3),
'linewidth', 2);
x l a b e l ('{\bf dimension n}','fontsize',14);
y l a b e l ('{\bf average runtime (s)}','fontsize',14);
t i t l e ( s p r i n t f ('tic-toc timing averaged over %d runs,
power = %d',...
nruns, kk(k_sel)),'fontsize',14);
32
54
55
56
l e g e n d ('our implementation','Matlab built-in',...
'O(n^3)','location','northwest');
p r i n t -depsc2 'Pow_timings.eps';
average runtime (s)
tic-toc timing averaged over 10 runs, matrix size = 45
0.01
our implementation
Matlab built-in
O(C log(k))
0.008
0.006
0.004
0.002
0
10 0
10 1
10 2
average runtime (s)
power
-3
tic-toc
timing
averaged over
10 kruns, power = 36
×10
8
6
4
our implementation
Matlab built-in
O(n 3 )
2
30
35
40
45
50
55
60
dimension n
Figure 6: Timings for Problem 5
The M ATLAB∧-function has (at most) logarithmic complexity in k but the timing is slightly
better than our implementation.
All the eigenvalues of the Vandermonde matrix A have absolute value 1, so the powers Ak
are “stable”: the eigenvalues of Ak are not approaching neither 0 nor ∞ when k grows.
(5d)
1
2
Using E IGEN, devise a C++ function with the calling sequence
template <class Matrix>
void matPow(const Matrix &A,unsigned int k);
33
that computes the k th power of the square matrix A (passed in the argument A). Of course,
your implementation should be as efficient as the M ATLAB version from sub-problem (5a).
H INT: matrix multiplication suffers no aliasing issues (you can safely write A = A*A).
H INT: feel free to use the provided matPow.cpp.
H INT: you may want to use log and ceil.
H INT: E IGEN implementation of power (A.pow(k)) can be found in:
#include <unsupported/Eigen/MatrixFunctions>
Solution:
1
2
3
4
#include
#include
#include
#include
Listing 22: Implementation of matPow
<Eigen/Dense>
<unsupported/Eigen/MatrixFunctions>
<iostream>
<math.h>
5
6
7
8
9
10
11
12
13
14
15
16
17
//! \brief Compute powers of a square matrix using smart
binary representation
//! \param[in,out] A matrix f o r which you want to
compute Ak . Ak is stored in A
//! \param[out] k integer f o r Ak
template <class Matrix>
void matPow(Matrix & A, unsigned int k) {
Matrix X = Matrix::Identity(A.rows(), A.cols());
i
// p is used as binary mask to check wether k = ∑M
i=0 bi 2
has 1 in the i-th binary digit
// obtaining the binay representation of p can be
done in many ways, here we use ¬k & p to check
i-th binary is 1
unsigned int p = 1;
// Cycle a l l the way up to the l e n g t h of the binary
representation of k
f o r (unsigned int j = 1; j ≤ c e i l ( l o g 2 (k)); ++j) {
34
i f ( ( ¬k & p ) == 0 ) {
X = X*A;
}
18
19
20
21
A = A*A;
p = p << 1;
22
23
}
A = X;
24
25
26
}
27
28
29
30
31
int main(void) {
// Check/Test with provided, complex, matrix
unsigned int n = 3; // s i z e of matrix
unsigned int k = 9; // power
32
double PI = M_PI; // from math.h
s t d ::complex<double> I = s t d ::complex<double>(0,1);
// imaginary unit
33
34
35
Eigen::MatrixXcd A(n,n);
36
37
f o r (unsigned int i = 0; i < n; ++i) {
f o r (unsigned int j = 0; j < n; ++j) {
A(i,j) = exp(2. * PI * I * (double) (i+1) *
(double) (j+1) / (double) n) /
s q r t ((double) n);
}
}
38
39
40
41
42
43
44
45
46
47
//
//
//
// Test with simple matrix/simple power
Eigen::MatrixXd A(2,2);
k = 3;
A << 1,2,3,4;
48
49
50
51
// Output
s t d ::cout
s t d ::cout
= " <<
results
<< "A = " << A << s t d ::endl;
<< "Eigen:" << s t d ::endl << "A^" << k << "
A.pow(k) << s t d ::endl;
35
matPow(A, k);
s t d ::cout << "Ours:" << s t d ::endl << "A^" << k << "
= " << A << s t d ::endl;
52
53
54
}
Problem 6
Complexity of a M ATLAB function
In this problem we recall a concept from linear algebra, the diagonalization of a square
matrix. Unless you can still define what this means, please look up the chapter on “eigenvalues” in your linear algebra lecture notes. This problem also has a subtle relationship
with Problem 5
We consider the M ATLAB function defined in getit.m (cf. Listing 23)
1
2
3
4
Listing 23: M ATLABimplementation of getit for Problem 6.
f u n c t i o n y = getit(A, x, k)
[S,D] = e i g (A);
y = S* d i a g ( d i a g (D).^k)* (S\x);
end
H INT: Give the command doc eig in M ATLAB to understand what eig does.
H INT: You may use that eig applied to an n × n-matrix requires an asymptotic computational effort of O(n3 ) for n → ∞.
H INT: in M ATLAB, the function diag(x) for x ∈ Rn , builds a diagonal, n × n matrix
with x as diagonal. If M is a n × n matrix, diag(M) returns (extracts) the diagonal of M
as a vector in Rn .
H INT: the operator v.^k for v ∈ Rn and k ∈ N � {0} returns the vector with components
vik (i.e. component-wise exponent)
(6a)
What is the output of getit, when A is a diagonalizable n × n matrix, x ∈ Rn
and k ∈ N?
Solution: The output is y s.t. y = Ak x. The eigenvalue decomposition of the matrix
A = SDS−1 (where D is diagonal and S invertible), allows us to write:
Ak = (SDS−1 )k = SDk S−1 ,
and Dk can be computed efficiently (component-wise) for diagonal matrices.
36
(6b)
Fix k ∈ N. Discuss (in detail) the asymptotic complexity of getit n → ∞.
Solution: The algorithm comprises the following operations:
• diagonalization of a full-rank matrix A is O(n3 );
• matrix-vector multiplication is O(n2 );
• raising a vector in Rn for the power k has complexity O(n);
• solve a fully determined linear system: O(n3 ).
The complexity of the algorithm is dominated by the operations with higher exponent.
Therefore the total complexity of the algorithm is O(n3 ) for n → ∞.
Issue date: 17.09.2015
Hand-in: 24.09.2015 (in the boxes in front of HG G 53/54).
Version compiled on: July 16, 2016 (v. 1.0).
37