FARKAS’ LEMMA
DERIVED BY ELEMENTARY LINEAR ALGEBRA
Krister Svanberg
Technical Report TRITA-MAT-2008-OS7
Department of Mathematics
Royal Institute of Technology (KTH)
SE-10044 Stockholm, SWEDEN
Abstract
This note presents a partially new proof of Farkas’ lemma, based on no other tools than
elementary linear algebra (matrix- and vector calculus). No properties of the real numbers
other than those shared by the rational numbers are used. The general approach is the same
as in the paper by A. Dax from 1997 (SIAM Review, 39(3):503-507), but instead of using
an active set method for proving the existence of optimal solutions to sign-constrained linear
least squares problems (the hard part of the proof), we use a natural partitioning of the
feasible region into disjunct subregions, followed by a simple investigation of each subregion.
1
1
Introduction
The hundred years old Farkas’ lemma is a fundamental result for systems of linear inequalities
and an important tool in optimization theory, e.g., when deriving the Karush-Kuhn-Tucker
optimality conditions for inequality-constrained nonlinear programming and when proving
duality theorems for linear programming. The lemma can be stated as follows:
Assume that A is a given m×n matrix and b a given m-dimensional vector. Then exactly
one of the following two systems has a solution.
( T
(
b y<0
Ax = b
or
Either
x≥0
AT y ≥ 0
The variables in the two systems are, respectively, x = (x1 , . . . , xn )T and y = (y1 , . . . , ym )T .
Vector inequalitites should be interpreted component-wise, so that AT y ≥ 0 means that each
of the n components of the vector AT y should be nonnegative.
The traditional school-book proofs of Farkas’ lemma are based on some “advanced” properties
of the real numbers IR not shared by the rational numbers Q, e.g., the existence of a nearest
point in a closed set in IRm to a given point outside the set. Therefore these traditional
proofs assume that A ∈ IRm×n , b ∈ IRm , x ∈ IRn and y ∈ IRm . However, Farkas’ lemma
still holds under the assumption that A ∈ Qm×n , b ∈ Qm , x ∈ Qn and y ∈ Qm , and the
lemma can be proved without relying on any “advanced” arguments (in the above meaning).
Such an elementary proof, which is most valuable since it puts Farkas’ lemma on the correct
mathematical level, was presented a decade ago by Dax [1].
The purpose of this note is to further simplify the proof in [1]. More precisely, we avoid
the active set method used in [1] for proving the existence of optimal solutions to signconstrained linear least squares problems (the hard part of the proof). Instead, we use a
natural partitioning of the feasible region {x ≥ 0} into 2n disjunct subregions, followed by
a simple investigation of each subregion. Based on our teaching experiences, we believe that
this alternative proof has substantial pedagogical advantages.
2
2
A short proof of Farkas’ lemma
We repeat the formulation of Farkas’ lemma:
Theorem 1: Assume that A is a given m×n matrix and b a given m-dimensional vector.
Then exactly one of the following two systems (2.1) and (2.2) has a solution.
Either
Ax = b, x ≥ 0.
(2.1)
or
bT y < 0, AT y ≥ 0.
(2.2)
Closely related to the first system (2.1) is the following linear least squares problem with
non-negativity constraints on the variables:
minimize
1
2
kAx − b k2 subject to x ≥ 0.
(2.3)
As shown in [1], the proof of Theorem 1 can be based on the following two lemmas:
Lemma 1: There always exists an optimal solution x̂ of problem (2.3).
Lemma 2: The point x̂ is an optimal solution of problem (2.3) if and only if
(1): x̂ ≥ 0,
(2): AT (Ax̂ − b) ≥ 0,
(3): x̂T AT (Ax̂ − b) = 0.
Proof of Theorem 1 (Farkas’ lemma):
First, assume that there is a solution x̂ of the system (2.1).
Then bT y = (Ax̂)T y = x̂T (AT y), which is ≥ 0 if AT y ≥ 0.
Thus, there is no solution of the system (2.2).
Next, assume that there is no solution of the system (2.1).
Let x̂ be an optimal solution of problem (2.3) and let ŷ = Ax̂ − b.
Then ŷ 6= 0, since there is no solution of (2.1).
By Lemma 2, x̂ ≥ 0, AT ŷ ≥ 0 and x̂T AT ŷ = 0, which implies
that bT ŷ = (Ax̂ − ŷ)T ŷ = x̂T AT ŷ − ŷT ŷ = 0 − k ŷ k2 < 0 .
Thus, ŷ is a solution of the system (2.2).
It remains to prove Lemma 1 and Lemma 2. This will be done by using no other mathematical
tools than just elementary linear algebra, and some reasoning of course.
3
3
Linear least squares problems and a proof of Lemma 2
Before proving Lemma 2, we give an elementary proof of a well-known result for unconstrained
linear least squares problems. This is used later in the proof of Lemma 1.
Let f be the following quadratic function of the variable vector x = (x1 , . . . , xn )T :
f (x) =
1
2
kAx − b k2 =
1
2
(Ax − b)T (Ax − b).
(3.1)
Then, for all vectors x̂ and d, f (x̂ + d) = 12 kAx̂ − b + Ad k2 = 12 kAx̂ − b k2 +
1
1
1
1
T
T
2
T T
2
2 (Ax̂ − b) Ad + 2 (Ad) (Ax̂ − b) + 2 kAd k = f (x̂) + d A (Ax̂ − b) + 2 kAd k .
With d = x − x̂, this identity can be written
f (x) = f (x̂) + (x − x̂)T AT (Ax̂ − b) + 21 kA(x − x̂)k2 ,
(3.2)
which thus holds for all n-dimensional vectors x and x̂.
Lemma 3: The point x̂ minimizes f (x), defined by (3.1), if and only if ATAx̂ = AT b.
There is a unique point x̂ which minimizes f (x) if and only if the columns
of A are linearly independent.
Proof : Let ĝ = AT (Ax̂ − b). Then, by (3.2),
f (x) − f (x̂) = (x − x̂)T ĝ + 12 kA(x − x̂)k2 .
(3.3)
Assume first that ATAx̂ = AT b. Then ĝ = 0, which by (3.3) implies that
f (x) − f (x̂) = 12 kA(x − x̂)k2 ≥ 0 for all x, so that x̂ minimizes f (x).
Assume next that ATAx̂ 6= AT b. Then ĝ 6= 0, and then there is a scalar t
such that t > 0 and 12 t kAĝ k2 < k ĝ k2 , which by (3.3) implies that
f (x̂ − t ĝ) − f (x̂) = t ( −k ĝ k2 + 12 t kAĝ k2 ) < 0, so that x̂ does not minimize f (x).
Proof of Lemma 2:
With f (x) = 21 kAx − b k2 and ĝ = AT (Ax̂ − b) as above, the problem (2.3) becomes:
minimize f (x) subject to x ≥ 0,
while the three conditions in Lemma 2 become
x̂ ≥ 0, ĝ ≥ 0 och x̂T ĝ = 0.
(3.4)
First, assume that all the conditions (3.4) are fulfilled. Then (3.3) implies that
f (x) − f (x̂) = xT ĝ − x̂T ĝ + 12 kA(x − x̂)k2 = xT ĝ + 12 kA(x − x̂)k2 ≥ xT ĝ ≥ 0 for all x ≥ 0,
which shows that x̂ is an optimal solution of problem (2.3).
Next, assume that x̂ ≥ 0 does not hold. Then x̂ is infeasible and hence not optimal to (2.3).
Next, assume that x̂ ≥ 0, but ĝk < 0 for some k.
Then there is a scalar t such that t > 0 and 21 t kAek k2 < −ĝk ,
where ek = (0, . . . , 1, . . . , 0)T with 1 in the k:th position.
Now, x̂ + t ek ≥ 0 and, by (3.3), f (x̂ + t ek ) − f (x̂) = t (ĝk + 12 t kAek k2 ) < 0,
which shows that x̂ is not an optimal solution of problem (2.3).
Finally, assume that x̂ ≥ 0 and ĝ ≥ 0, but x̂k > 0 and ĝk > 0 for some k.
Then there is a scalar t such that 0 < t < x̂k and 21 t kAek k2 < ĝk .
Now, x̂ − t ek ≥ 0 and, by (3.3), f (x̂ − t ek ) − f (x̂) = t (−ĝk + 12 t kAek k2 ) < 0,
which shows that x̂ is not an optimal solution of problem (2.3).
4
4
A proof of Lemma 1
In this section, we give an elementary proof of Lemma 1 from section 2. First some notation:
Assume that σ is a subset of the index set {1, . . . , n}.
Then let | σ | denote the number of indices in σ, let Aσ denote the m × | σ | matrix with
columns {aj }j∈σ , where aj is the jth column of A, and let xσ denote the | σ |-dimensional
vector with components {xj }j∈σ (in the same order as the columns in Aσ ).
Further, let the regions Xσ och Xσ+ be defined by
Xσ = {x = (x1 , . . . , xn )T ; xj = 0 for j ∈
/ σ} and
(4.1)
Xσ+ = {x = (x1 , . . . , xn )T ; xj > 0 for j ∈ σ, xj = 0 for j ∈
/ σ}.
(4.2)
Note that there are 2n different subsets σ of {1, . . . , n}. The corresponding 2n different regions
Xσ+ are pair-wise disjunct, and their union is equal to the whole feasible region of problem
(2.3), since for each point x ≥ 0 there is a unique subset σ such that x ∈ Xσ+ (namely the
subset σ defined by j ∈ σ for xj > 0 and j ∈
/ σ for xj = 0).
Let Pσ denote the following linear least squares problem with some variables fixed to zero:
Pσ : minimize f (x) =
1
2
kAx − b k2 subject to x ∈ Xσ .
(4.3)
If σ = ∅ then Xσ+ = Xσ = {0} and then x̂ = 0 is the unique optimal solution of Pσ .
If σ 6= ∅ then there is a unique optimal solution of Pσ if and only if the columns of Aσ
are linearly independent. This follows from Lemma 3, since to solve Pσ is equivalent to let
xj = 0 for j ∈
/ σ and to minimize 12 kAσ xσ − b k2 with respect to xσ , i.e. to solve the system
T
AT
σ Aσ xσ = Aσ b.
Definition: The subset σ is said to be an interesting subset if Pσ has a unique optimal
solution x̂, and this unique optimal solution satisfies x̂ ∈ Xσ+ . In this case,
x̂ is said to be an interesting point.
Otherwise, if Pσ does not have a unique optimal solution, or if Pσ has a unique
optimal solution x̂, but x̂ ∈
/ Xσ+ , the subset σ is said to be a boring subset. In
this case, there is no interesting point corresponding to σ.
Since 0 is always an interesting point (corresponding to σ = ∅), and since there corresponds
at most one interesting point to each subset σ, there is always at least one, and at most 2n ,
interesting points. We will show that at least one of these interesting points is an optimal
solution of problem (2.3).
Lemma 4: Assume that σ is an interesting subset and that x̂ is the corresponding
interesting point. Then f (x̂) ≤ f (x) for all x ∈ Xσ+ .
Assume that σ is a boring subset. Then, for each given point x ∈ Xσ+ , there is
a strict subset σ̃ ⊂ σ and a point x̃ ∈ Xσ̃+ such that f (x̃) ≤ f (x).
Proof : If σ is an interesting subset and x̂ is the corresponding interesting point then f (x̂) ≤
f (x) for all x ∈ Xσ , and thus f (x̂) ≤ f (x) for all x ∈ Xσ+ , since Xσ+ is a subset of Xσ .
Assume now that σ is a boring subset such that the columns of Aσ are linearly dependent.
Then there is a d ∈ Xσ , with at least one strictly negative component, such that Ad = 0.
Assume that x ∈ Xσ+ . Let t̄ = minj { xj /(−dj ) ; dj < 0 }. Note that t̄ > 0.
5
Then x + t d ∈ Xσ+ for all t ∈ [ 0, t̄ ), while x + t̄ d ∈ Xσ̃+ for some strict subset σ̃ ⊂ σ,
since xj + t̄ dj = 0 for at least one j ∈ σ. Moreover,
f (x + t d) = 12 kAx − b + t Ad k2 = 12 kAx − b k2 = f (x) for all t .
Thus, if we let x̃ = x + t̄ d then f (x̃) = f (x) and x̃ ∈ Xσ̃+ with σ̃ ⊂ σ.
Assume now that σ is a boring subset such that the columns of Aσ are linearly independent,
but x̂j ≤ 0 for at least one j ∈ σ, where x̂ is the unique optimal solution of Pσ .
Assume that x ∈ Xσ+ . Let t̄ = minj { xj /(xj − x̂j ) ; xj > x̂j }. Note that t̄ ∈ (0, 1].
Then x + t (x̂ − x) ∈ Xσ+ for all t ∈ [ 0, t̄ ), while x + t̄ (x̂ − x) ∈ Xσ̃+ for some strict subset
σ̃ ⊂ σ, since xj + t̄ (x̂j − xj ) = 0 for at least one j ∈ σ. Moreover, the quadratic function
f (x + t (x̂ − x)), of the single variable t , is minimized uniquely by t = 1, and therefore
f (x + t (x̂ − x)) < f (x) for all t ∈ (0, 1]. In particular, f (x + t̄ (x̂ − x)) < f (x).
Thus, if we let x̃ = x + t̄ (x̂ − x) then f (x̃) < f (x) and x̃ ∈ Xσ̃+ with σ̃ ⊂ σ.
Definition: An interesting point x̂ is said to be a best interesting point if f (x̂) ≤ f (x)
for all interesting points x.
Note that there always exists at least one best interesting point, since there is always at least
one, and at most 2n , interesting points.
Lemma 5: If x̂ is a best interesting point and σ is any interesting subset
then f (x̂) ≤ f (x) for every x ∈ Xσ+ .
Proof : Assume that x ∈ Xσ+ , where σ is an interesting subset.
Since σ is an interesting subset, there is a corresponding interesting point x̄ ∈ Xσ+ .
Then, by Lemma 4, f (x) ≥ f (x̄) ≥ f (x̂).
Lemma 6: If x̂ is a best interesting point and σ is any boring subset
then f (x̂) ≤ f (x) for every x ∈ Xσ+ .
Proof : Assume that x ∈ Xσ+ , where σ is a boring subset.
Then, by Lemma 4, there is a subset σ̃ ⊂ σ and a point x̃ ∈ Xσ̃+ such that f (x̃) ≤ f (x).
Note that | σ̃ | ≤ | σ | − 1, so that there are fewer strictly positive components in x̃ than in x.
If σ̃ is an interesting subset then, by Lemma 5, f (x̂) ≤ f (x̃) ≤ f (x), and we are done.
Otherwise, if σ̃ is a boring subset, we repeat the arguments above, but now starting from
the point x̃ ∈ Xσ̃+ instead of from the point x ∈ Xσ+ . Since the number of strictly positive
components in the variable vector is strictly decreasing each repetition, at most n repetitions
will be needed before an interesting subset has been reached (since σ = ∅ is an interesting
subset), whereafter Lemma 5 can be applied.
Proof of Lemma 1:
Let x̂ be a best interesting point, and assume that x ≥ 0.
Then there is a unique subset σ such that x ∈ Xσ+ ,
namely the subset σ defined by j ∈ σ for xj > 0 and j ∈
/ σ for xj = 0.
If σ is an interesting subset, it follows from Lemma 5 that f (x̂) ≤ f (x),
while if σ is a boring subset, it follows from Lemma 6 that f (x̂) ≤ f (x).
The conclusion is that x̂ is an optimal solution of the problem (2.3).
References
[1] A. Dax. An elementary proof of Farkas’ lemma. SIAM Rev., 39(3):503–507, 1997.
6
© Copyright 2025 Paperzz