A Quantum Statistic Algorithm for Missing Value Estimation in Gene

A Quantum Statistic Algorithm for Missing Value
Estimation in Gene Expression Matrices
By
Wei Jing
University Scholars Program
National University of Singapore
&
Department of Computer Science
School of Computing
National University of Singapore
&
Department of Philosophy
Faculty of Arts and Social Sciences
National University of Singapore
Project Title: A Quantum Statistic Algorithm for Missing Value Estimation
in Gene Expression Matrices
Project Supervisor: Dr Kuldip Singh
Abstract
Gene expression matrices are essential for mining gene functions and information.
However, expression matrices obtained from gene microarray experiments are sometime
incomplete. Thus missing values estimation becomes an important issue for retrieving the
gene expression matrices. Several statistic methods for estimating missing values have
been developed, such as K-Nearest Neighbor (KNN) and Singular Value Decomposition
(SVD). Different from the existing methods, this essay introduces a quantum statistic
algorithm (qStas) for approximating the missing values in gene expression matrices. This
new algorithm is an integration of quantum theory and statistic method.
Key words: quantum method, qStas algorithm, microarrays, missing value estimation.
2
Acknowledgements

Dr Kuldip Singh, Lecturer of UIT2205: Quantum Computing

Prof Wong Limsoon, Lecturer of CS2220: Introduction to Computational Biology

UIT2205 and CS2220 classmates

University Scholars Program, National University of Singapore

Department of Computer Science, School of Computing, National University of
Singapore
3
Table of Contents
Abstract………………………………………….………………..……………..…..…i
Acknowledgements…………………………………….………..………………..…...ii
1 Introduction………………………………………….………………..……….…..1
2 Inspiration………………………….…………………………………..……..……2
2.1 A Simplified Model of Gene Expression Matrix….………………..……...…..2
2.2 SVD Algorithm……………………………………………………..……….…3
2.3 Quantum Theory………………………………………………...……………..4
3 Quantum Statistic Algorithm……..………………………………………...……...6
3.1 Idea……………………………………………….……………………………6
3.2 Workflow………………………………………………………..……………..8
3.3 Analysis………………………………………………………..………………10
4 Results and Discussion…………………………………………………………….12
5 Conclusion…………………………………………………………………………15
Appendix A…………………………………………………………………………...16
References…………………………………………………………………………..…19
4
1 Introduction
In bioinformatics, gene expression matrices, which contain gene expression values are
important for gene analysis. However, the expression matrices obtained from microarray
experiments sometimes have a few missing values, due to the diverse reasons. For
example, some cells of the microarrays might be covered by dust. Unfortunately, many
gene analysis algorithms, like C4.5 decision tree algorithm and K-clustering algorithm,
require the expression matrices to be complete. Otherwise, the accuracy and the
effectiveness of algorithms will be weakened. Thus retrieving the expression matrices
become a significant issue.
Bioinformatics, statistics and artificial intelligence researchers have developed several
effective methods for approximating the missing expression values, for instance, Singular
Value Decomposition (SVD) and K-Nearest Neighbor (KNN). Most of these methods are
statistic methods, which make use of the relations between the rows and columns of the
expression matrices.
This paper introduces a new algorithm—quantum statistic algorithm (qStas in short),
which integrates the idea of a statistic method (SVD) and the concepts developed by
quantum mechanics. In this paper, the Inspiration section briefly describes a simplified
model of gene expression matrix, SVD algorithm and quantum theory. After that, the
Algorithm section discusses in detail the idea, workflow and analysis of qStas algorithm.
Some experiment results are presented in the Result section.
1
2 Inspiration
2.1 A Simplified Model of Gene Expression Matrix
A gene expression matrix (denoted as A) contains the data obtained from gene
microarray experiments, where each row of the matrix contains the expression values of a
gene under different conditions and each column contains the expression values of
different genes under a condition. In other words, each entry A[i][j] represents the
expression value of the i-th gene under the j-th condition.
However, an actual expression matrix is usually huge in size, which involves hundreds or
thousands of genes and tens of conditions. The values in different rows are often of
different scales. To make analysis easier, the experiments involved in this paper are
conducted on simplified models. A simplified gene expression matrix contains seven
rows and four columns (Table 1). The values in all columns are also tuned to the similar
scale. However, if the algorithm works for the simplified model, it can also be applied to
actual gene expression matrices.
\
Condition 1
Condition 2
Condition 3
Condition 4
Gene 1
1.00
3.00
9.00
13.00
Gene 2
1.20
3.39
*(8.80)
12.59
Gene 3
1.50
4.00
8.49
12.00
Gene 4
2.00
5.01
8.01
*(11.00)
Gene 5
2.40
5.81
7.61
*(10.20)
Gene 6
3.00
7.01
6.99
9.02
2
4.00
Gene 7
8.99
*(6.00)
6.98
Table 1. A simplified gene expression matrix. The (i, j)-entry represent the expression value of the i-th
gene under the j-th condition. Missing values are represented as *, with the corresponding true value in the
brackets.
2.2 SVD Algorithm
Singular Value Decomposition (SVD) algorithm seeks for mutually orthogonal gene
expressions which can be linearly combined to approximate the expressions of all genes
in the matrix. These mutually orthogonal genes are termed as eigengenes, which can be
obtained from the following equation
In the equation above, A is the gene expression matrix which contains the expression
value of m genes under n conditions. U and V are orthogonal matrices, and D is a
diagonal matrix. Each row of VT is an eigengene, whose corresponding eigenvalue is a
diagonal entry in D.
Given
The following equation can be derived
(
)
(
)
Thus V can be obtained by orthogonalizing ATA. Then SVD algorithm chooses k
eigengenes in VT which correspond to k largest eigenvalues in D (DTD). After that, it
3
estimates the missing expression values by linearly regressing the genes against the k
eigengenes. That is, by computing the solution (X) to
(0)
Where UJ is a new matrix derived from U after choose J<p eigengenes.1
The workflow of SVD algorithm is
Step 1: Fill the missing entries with the average value of the corresponding row
Repeat
Step2: Obtain eigengenes and eigenvalues
Step3: Choose k eigengenes which corresponds to k largest eigenvalues
Step4: Estimate the expression values by linear regressing the genes against the
k eigengenes.
Until the estimated values converge (convergence test)
2.2 Quantum Theory
Quantum Theory claims that a state (represented by a vector) collapses to an eigenstate
(eigenvector) of a Hermitian operator (a Hermitian matrix), when a measurement is made.
̂
⟩ →
1
⟩
For detailed explanation of SVD algorithm and the equation, please refer to Alter et al.’s and Troyanskaya
et al.’s Paper.
4
Let the eigenvectors of a Hermitian operator ̂ be
corresponding eigenvalues be
to eigenstate
,
,
,…,
⟩,
⟩,
⟩ ,…,
⟩ and the
. The probability that a state
⟩ collapses
⟩ is given by
𝑃( 𝑥⟩ → 𝑒𝑖 ⟩)
where the sandwiched term
⟩⟨
⟨𝑥 𝑒𝑖 ⟩⟨𝑒𝑖 𝑥⟩
(1)
is called a projector. The summation of all projectors
is the identity matrix, since the eigenvectors are complete.
𝑛
𝑒𝑖 ⟩⟨𝑒𝑖
𝐼
𝑒 ⟩⟨𝑒
𝐼
𝑖=
(2)
𝑛
And the operator can be written as the summation of the projectors weighted by the
corresponding eigenvalues.
𝑛
𝑄̂
𝑐𝑖 𝑒𝑖 ⟩⟨𝑒𝑖
(3)
𝑖=
Equations (1), (2), and (3) are key concept in quantum theory and are used for developing
and analyzing the quantum statistic algorithm (qStas).
5
3 Quantum Statistic Algorithm
3.1 Idea
Inspired by SVD algorithm and quantum theory, Quantum Statistic Algorithm (qStas) is a
hybrid between the two. It adopts the similar schema as SVD: gradually adjusting the
values of missing entries and terminating the process by convergence test. However,
instead of choosing eigengenes and do classical linear regression, i.e. Step 2 to Step 4
using Equation (0), it adopts the quantum probability theory, which is Equation (1), to
approximate the missing values.
The operator ̂ in qStas algorithm is simply ATA, whose Hermitian property can be
proven:
For any
≤ i, j ≤ n, the (i, j) entry of 𝑄̂ =ATA is
𝑄̂ 𝑖 𝑗 =
𝑛
𝑘=
𝐴𝑇 𝑖 𝑘
=
𝑛
𝑘=
𝐴𝑘 𝑗
𝐴𝑇 𝑖 𝑘
=
𝑛
𝑘=
𝐴𝑇 𝑗 𝑘
𝐴𝑘 𝑖
𝐴𝑘 𝑗
= 𝑄̂ 𝑗 𝑖
Thus 𝑄̂ =ATA is a symmetric matrix. In other words, 𝑄̂ =ATA is a Hermitian matrix in
which the imaginary component of every entry is 0.
6
The eigenvectors can be obtained by orthogonalizing ̂ . The same as SVD, the
eigenvectors of ̂ are termed as eigengenes, each of which corresponds to a column of an
orthogonal matrix V.
Let
⟩ be a gene, whose corresponding bra vector ⟨
is represented as a row in the
expression matrix. Let ̂ =ATA be the Hermitian operator whose eigenstates are
⟩. If ̂ is operated on state (gene)
⟩,…,
an eigenstate (eigengene)
⟩ is
Therefore, the expected state that
(
⟩, the probability that
⟩→
⟩)
⟨
⟩⟨
⟩
⟩,
⟩,
⟩ collapses to
by Equation (1).
⟩ collapses to when operated by ̂ is
𝑛
Exp 𝑄, 𝐺𝑥 ⟩
⟨𝐺𝑥 𝑒𝑖 ⟩⟨𝑒𝑖 𝐺𝑥 ⟩ 𝑒𝑖 ⟩
(4)
𝑖=
by accumulating over all eigenstates and corresponding probabilities (this is slightly
different from quantum mechanics tradition which uses the eigenvalues as possible
outcomes). However, Equation (4) exhibits a problem that the probabilities do not sum up
to 1. This is because
⟩ is not normalized (the norm of
A solution to this is to normalize
⟩ is not guaranteed to be 1).
⟩ before projecting it to the eigenstates, and retrieve
its norm after all the projections.
Let
⟩
⟩⁄‖
⟩ ‖. The expected state that
⟩ collapses to can be given by
𝑛
Exp 𝑄, 𝐺𝑥 ⟩
‖ 𝐺𝑥 ⟩ ‖
⟨𝐺 𝑥 𝑒𝑖 ⟩⟨𝑒𝑖 𝐺 𝑥 ⟩ 𝑒𝑖 ⟩
𝑖=
7
(5)
By substituting in
⟩
⟩⁄‖
⟩ ‖, Equation (5) can be simplified to
𝑛
Exp 𝑄, 𝐺𝑥 ⟩
‖ 𝐺𝑥 ⟩ ‖
⟨𝐺𝑥 𝑒𝑖 ⟩⟨𝑒𝑖 𝐺𝑥 ⟩ 𝑒𝑖 ⟩
(6)
𝑖=
The qStas algorithm relies on Equation (6) to approximate the missing values, by
operating the Hermitian operator
on every gene
⟩ which has expression values
initially missing. Like SVD algorithm, qStas also starts with filling the missing entries
with corresponding row averages, and terminates when the estimated values converge.
3.2 Workflow
The work flow of qStas algorithm is similar to that of SVD, except that it uses Equation
(6) derived from quantum theory to perform approximation.
8
Step1: Fill the missing values with corresponding row averages
Repeat:
Step2: Obtain the Hermitian operator 𝑄̂ and the eigenstates (eigengenes)
Step3: Operate 𝑄̂ on the genes which have missing values, using
Equation (6)
Step4: Replace the missing values with the newly estimated values, while
other entries remain unchanged
Until the estimated values converge
*Note: missing values stand for values that are missing in the initial matrix.
The algorithm can also be treated as a black box which contains a Hermitian operator ̂
and an evaluation function f.
f
Expression matrix A
Converge
Output matrix A’
Not Converge
The black box shown above simulates the performance of qStas algorithm:
1) receives a gene expression matrix A;
2) operates
on A and evaluates the resulting matrix A’ using f;
3) outputs A’ if the estimated values converge;
4) otherwise feed A’ back into the box again, and go to 1).
9
3.3 Analysis
Optimality and termination are considered when analyzing whether a general algorithm is
effective. For an estimation algorithm, its optimality or accuracy depends on its
approximation method. Hence the accuracy of qStas relies on (6), which is derived from
quantum theory. Therefore, qStas would be a plausible estimation algorithm if it does
terminate after a finite number of steps. The following section proves its termination
using a physical model.
3.3.1 Physical model of qStas algorithm
In this model, the eigengenes and the genes with missing values are modeled as electric
bars, which lie in an n-dimensional space. Every electric bar has one end fixed at the
origin. The electric bars of the eigengenes are always orthogonal to each other, and they
try to repel the electric bars of the genes. For simplification, we explain this with a case
in a 2-dimensional space, or a plane. But this can be theoretically analogized to higher
dimensions.
Let 𝑁𝐺𝑥 ⟩
Let p
𝐺𝑥 ⟩
‖ 𝐺𝑥 ⟩‖
be the normalized form of gene 𝐺𝑥 ⟩.
⟨𝑁𝐺𝑥 𝑒𝑖 ⟩ be the projection of 𝑁𝐺𝑥 ⟩ on eigengene 𝑒𝑖 ⟩. Clearly, ‖𝑝‖ ≤ .
After projecting 𝑁𝐺𝑥 ⟩ to 𝑒𝑖 ⟩⟨𝑒𝑖 ,
⟨𝑁𝐺𝑥 𝑒𝑖 ⟩⟨𝑒𝑖 𝑁𝐺𝑥 ⟩ 𝑒𝑖 ⟩
New projection of 𝑁𝐺𝑥 ⟩ on 𝑒𝑖 ⟩ is p′
𝑝 ⟨𝑒𝑖 𝑒𝑖 ⟩
10
𝑝 𝑒𝑖 ⟩
𝑝 ≤ ‖𝑝‖
Suppose p ≥ 0. Since the projection of 𝑁𝐺𝑥 ⟩ is decreased, the angle between 𝑁𝐺𝑥 ⟩
and 𝑒𝑖 ⟩ will be enlarged if 𝑁𝐺𝑥 ⟩ remains the similar length.
In p < 0 case, the angle between 𝑁𝐺𝑥 ⟩ and
In summary, 𝑒𝑖 ⟩ or
𝑒𝑖 ⟩ will be enlarged.
𝑒𝑖 ⟩ repels the genes such that the angle
𝑁𝐺𝑥 ⟩
in between tends to 90°.
p2
p
𝑒𝑖 ⟩
Therefore, the eigengenes make the genes to rotate in each operation, while also gently
adjusting themselves as a whole after the operation. Classical physics has proven that
perpetual motion machine does not exist. Thus the electric bars, as a closed system (the
operators are generated from the system itself), must rest after finite steps of rotation. In
other words, the estimated values converge and the algorithm terminates after a finite
number of operations.
3.3.2 Time complexity
The running time of qStas relies on two factors: the size of the matrix and the number of
iterations before termination. Since the highest time complexity for matrix operations
involved in the algorithm is O(m2n) (when computing
ATA), the overall time
complexity is O(km2n), where k is the number of operations performed before the
estimated values converge. This complexity is the same as that of SVD algorithm.
11
4 Results and Discussion
The qStas algorithm is tested on a simplified gene expression matrix as show in Table 2.
Matrix operations are conducted using MATLAB 7.0.1, and the discrepancy threshold for
determining convergence is set as 10-3.
\
Condition 1
Condition 2
Condition 3
Condition 4
Gene 1
1.00
3.00
9.00
13.00
Gene 2
1.20
3.39
*(8.80)
12.59
Gene 3
1.50
4.00
8.49
12.00
Gene 4
2.00
5.01
8.01
*(11.00)
Gene 5
2.40
5.81
7.61
*(10.20)
Gene 6
3.00
7.01
6.99
9.02
Gene 7
4.00
8.99
*(6.00)
6.98
Table 2. A simplified gene expression matrix. The (i, j)-entry represent the expression value of the i-th
gene under the j-th condition. Missing values are represented as *, with the corresponding true value in the
brackets.
In the initiation step, the missing values are filled with the corresponding row averages:
A[2][3]=5.73,
A[4][4]=5.00,
A[5][4]=5.27
orthogonalizing ̂ =ATA, the eigengenes
⟩,
and
⟩,
A[7][3]=6.66.
⟩,
0.
0.
0. 0
0.
0.
0. 0
0.
0. 009
12
that,
by
⟩ are obtained and each
corresponds to a column in the orthogonal matrix V1.
0.90
0. 0
(
0.0
0.009
After
0.
0.
0.
0.
9
)
9
After the first operation of
on the genes using Equation (6), the missing values are
estimated to be: A[2][3]=7.7107, A[4][4]=6.4547, A[5][4]=6.6801, and A[7][3]=7.2017.
Compared to the initial values, three of the estimated values (A[2][3], A[4][4], A[5][4])
get closer to the true value, while one of them (A[7][3]) crosses over the expected value.
However, in average, the missing values are approaching the corresponding expected
values.
After replacing the missing values by the estimated values, the expression matrix A is fed
back for a second operation. At this time, the eigengenes obtained are
0. 99
0.
9
(
0.0 9
0.00 9
0.
0. 0 0
0.
0.
0
0.
0. 0
0.0
0.
0.
0.
0.
0.
9
)
By operating the new Hermitian operator on A, new estimations are obtained:
A[2][3]=8.3673 , A[4][4]=7.6975, A[5][4]=7.8773, A[7][3]=7.1293. In this round, all the
missing values change towards the true values. However, they do not converge yet.
After 8 operations, the approximated values are already very close to the expected values:
A[2][3]=8.2280(8.8000), A[4][4]=10.6039(11.0000), A[5][4]=10.8094(10.2000) and
A[7][3]=6.3903(6.0000). Although the results are still not converged, the changing rate is
smaller compared to that in the previous operations.
Another important observation is that the eigengenes in one round become very similar to
those in its previous round. For example, the eigengenes in the ninth iteration are
13
0.90
0.
(
0.0
0.0 0
0.0
0. 00
0.
0
0.
0. 9
0.
0.0
0. 0 9
0.
0.
0.
0.
0. 9
0.
0.0 9
0. 09
0.
0.
0.
0.
0
)
and the eigengenes in the eighth iteration are
0.90
0.
(
0.0
0.0
0.0 0
0.
0.
9
0.
)
This implies that the eigengenes, as well as the estimations for missing values are
converging. The intermediate expression matrices for the test example can be found in
Appendix A.
Unsurprisingly, the estimated values converge at the 14th operation where the estimation
is A[2][3]=8.0838, A[4][4]=10.9689, A[5][4]=11.1889 and A[7][3]=6.2527. Compared
to the 13th operation where A[2][3]=8.0937, A[4][4]=10.9443, A[5][4]=11.1632 and
A[7][3]=6.2621, the discrepancy is only 0.09%, which is smaller than the threshold 10-3.
Compared to the true values, the root-mean-square difference of the estimation is 0.6236,
and the coefficient of variance is 6.93%. Thus the precision of the estimation is 93.07%,
which is very high for an approximation algorithm. Having proven its O(km2n) time
complexity, qStas algorithm is scalable to gene expression matrices of actual size.
14
5 Conclusion
Missing value estimation in expression matrices is an important issue for gene analysis.
The quantum statistic algorithm (qStas) adopts a probability theory in quantum
mechanics to approximate the missing expression values. Its termination has been proven
using a classical physical model, and its time complexity is O(km2n). This new algorithm
works effectively in the simplified model and results in estimations of high precision.
Predictably, it would work well when applied to actual gene expression matrices of
greater size.
Quantum mechanics has developed a set of state of the art theorems and methods. These
methods are useful not only for quantum physics itself, but also for other disciplines like
statistics, computer science and analytic philosophy. The qStas algorithm developed in
this paper is an example of applying quantum theory in bioinformatics.
15
Appendix A
The intermediate results when operating qStas on a simplified gene expression matrix.
For operation 1…9, both the eigengenes and the estimated expression matrices are
recoded. Only the estimated matrices are recorded for operation 10…14.
Expected expression matrix Exp=
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.8000
8.4900
8.0100
7.6100
6.9900
6.0000
13.0000
12.5900
12.0000
11.0000
10.2000
9.0200
6.9800
9.0000
5.7300
8.4900
8.0100
7.6100
6.9900
6.6600
13.0000
12.5900
12.0000
5.0000
5.2700
9.0200
6.9800
1.5000
2.0000
2.4000
3.0000
4.0000
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
8.4900 12.0000
8.0100 6.4547
7.6100 6.6801
6.9900 9.0200
7.2017 6.9800
V2 =
0.8993
-0.4349
0.0449
0.0019
Original matrix A0 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
4.0000
5.0100
5.8100
7.0100
8.9900
-0.1854 0.3657 0.1519
-0.3010 0.7606 0.3764
0.8162 0.0655 0.5723
-0.4570 -0.5324 0.7125
A2 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
[Both the eigengenes and the estimated
matrix are recorded for iteration 1…9]
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000 13.0000
8.3673 12.5900
8.4900 12.0000
8.0100 7.6975
7.6100 7.8773
6.9900 9.0200
7.1293 6.9800
V3 =
V1 =
0.9017
-0.4308
0.0346
0.0091
A1=
1.0000
1.2000
0.8994
-0.4345
0.0468
-0.0003
-0.2243 0.3346 0.1569
-0.4124 0.7024 0.3885
0.8056 0.1831 0.5623
-0.3614 -0.6009 0.7129
3.0000
3.3900
-0.1539 0.3809 0.1490
-0.2299 0.7886 0.3693
0.8209 -0.0014 0.5691
-0.4995 -0.4826 0.7194
A3 =
1.0000
1.2000
1.5000
9.0000 13.0000
7.7107 12.5900
16
3.0000
3.3900
4.0000
9.0000 13.0000
8.5241 12.5900
8.4900 12.0000
2.0000
2.4000
3.0000
4.0000
5.0100
5.8100
7.0100
8.9900
8.0100
7.6100
6.9900
6.8609
8.6301
8.7796
9.0200
6.9800
A6 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
V4 =
0.9001 0.1303 -0.3888 0.1471
-0.4333 0.1827 -0.8036 0.3648
0.0460 -0.8245 0.0430 0.5623
-0.0003 0.5194 0.4485 0.7274
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000 13.0000
8.5132 12.5900
8.4900 12.0000
8.0100 9.3985
7.6100 9.5690
6.9900 9.0200
6.7294 6.9800
A7 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
0.9009 0.1134 -0.3927 0.1457
-0.4320 0.1604 -0.8107 0.3613
0.0413 -0.8280 0.0618 0.5557
0.0025 0.5251 0.4299 0.7345
9.0000
8.4405
8.4900
8.0100
7.6100
6.9900
6.6179
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.2860
8.4900
8.0100
7.6100
6.9900
6.4479
13.0000
12.5900
12.0000
10.4429
10.6428
9.0200
6.9800
V8 =
A5 =
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
13.0000
12.5900
12.0000
10.2158
10.4086
9.0200
6.9800
0.9028 0.0875 -0.3956 0.1442
-0.4291 0.1466 -0.8164 0.3576
0.0267 -0.8332 0.0760 0.5470
0.0117 0.5259 0.4138 0.7430
V5 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
9.0000
8.3584
8.4900
8.0100
7.6100
6.9900
6.5233
V7 =
A4 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
0.9032 0.0808 -0.3963 0.1438
-0.4284 0.1431 -0.8178 0.3566
0.0228 -0.8349 0.0793 0.5442
0.0141 0.5253 0.4097 0.7456
13.0000
12.5900
12.0000
9.8881
10.0715
9.0200
6.9800
A8 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
V6 =
0.9020 0.0989 -0.3946 0.1448
-0.4304 0.1512 -0.8142 0.3591
0.0337 -0.8310 0.0708 0.5508
0.0073 0.5262 0.4199 0.7394
V9 =
17
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.2280
8.4900
8.0100
7.6100
6.9900
6.3903
13.0000
12.5900
12.0000
10.6039
10.8094
9.0200
6.9800
0.9033 0.0774 -0.3967 0.1435
-0.4281 0.1400 -0.8188 0.3560
0.0213 -0.8360 0.0816 0.5421
0.0150 0.5248 0.4069 0.7475
2.0000
2.4000
3.0000
4.0000
A9 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
5.0100
5.8100
7.0100
8.9900
8.0100 10.8656
7.6100 11.0812
6.9900 9.0200
6.2921 6.9800
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.1072
8.4900
8.0100
7.6100
6.9900
6.2748
13.0000
12.5900
12.0000
10.9109
11.1284
9.0200
6.9800
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.0937
8.4900
8.0100
7.6100
6.9900
6.2621
13.0000
12.5900
12.0000
10.9443
11.1632
9.0200
6.9800
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.0838
8.4900
8.0100
7.6100
6.9900
6.2527
13.0000
12.5900
12.0000
10.9689
11.1889
9.0200
6.9800
A12 =
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.1836
8.4900
8.0100
7.6100
6.9900
6.3473
13.0000
12.5900
12.0000
10.7198
10.9296
9.0200
6.9800
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
A13 =
[Only the estimated matrix is recorded
for iteration 10…14]
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
A10 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
3.0000
3.3900
4.0000
5.0100
5.8100
7.0100
8.9900
9.0000
8.1503
8.4900
8.0100
7.6100
6.9900
6.3155
13.0000
12.5900
12.0000
10.8040
11.0171
9.0200
6.9800
3.0000
3.3900
4.0000
9.0000 13.0000
8.1255 12.5900
8.4900 12.0000
A14 =
1.0000
1.2000
1.5000
2.0000
2.4000
3.0000
4.0000
A11 =
1.0000
1.2000
1.5000
18
References
Alter, O, & Brown, P. 0, & Botstein, D. (2000). Singular Value Decomposition for
Genome-wide Expression Data Processing and Modeling. Proc. Natl Acad. Sci.
USA, 97, 10101-10106
Li, J, & Wong, L. (2004). Techniques for Analysis of Gene Expression Data. January 29,
2004 2:46 WSPC/Trim Size: 9in x 6in for Review Volume
Singh, K. (2011). UIT2205 Quantum Computing Lecture Notes. National University of
Singapore.
Troyanskaya, O, & Cantor, M, & Sherlock, G,& Brown, P, & Hastie, T, & Tibshirani, R,
& Botstein D, & Altman R, B. (2000). Missing Value Estimation Methods for
DNA Microarrays. Bioinformatics Vol. 17 No.6 2011, 520-525.
Yu, T, & Peng H, & Sun Wei. (2011). Incorporating Nonlinear in Missing Value
Imputation. IEEE/ACM Transactions on Computational Biology and
Bioinformatics, VOL. 8, NO. 3, May/June 2011
19