Condition Numbers


Learning Objectives

Condition Number Definition

The condition number of a square nonsingular matrix AA is defined by cond(A)=κ(A)=AA1cond(A)=κ(A)=AA1 which is also the condition number associated with solving the linear system A\boldsymbolx=\boldsymbolbA\boldsymbolx=\boldsymbolb. A matrix with a large condition number is said to be ill-conditioned.

The condition number can be measured with any pp-norm, so to be precise we typically specify the norm being used, i.e. cond2cond2, cond1,condcond1,cond.

If AA is singular, we can define cond(A)=cond(A)= by convention.

Perturbed Matrix Problem and Error Bound

Let \boldsymbolx\boldsymbolx be the solution of A\boldsymbolx=\boldsymbolbA\boldsymbolx=\boldsymbolb and ^\boldsymbolx^\boldsymbolx be the solution of the perturbed problem A^\boldsymbolx=\boldsymbolb+Δ\boldsymbolbA^\boldsymbolx=\boldsymbolb+Δ\boldsymbolb. Let Δ\boldsymbolx=^\boldsymbolx\boldsymbolxΔ\boldsymbolx=^\boldsymbolx\boldsymbolx be the absolute error in output. Then we have A\boldsymbolx+AΔ\boldsymbolx=\boldsymbolb+Δ\boldsymbolb,A\boldsymbolx+AΔ\boldsymbolx=\boldsymbolb+Δ\boldsymbolb, so AΔ\boldsymbolx=Δ\boldsymbolb.AΔ\boldsymbolx=Δ\boldsymbolb. Now we want to see how the relative error in output (Δ\boldsymbolx\boldsymbolx)(Δ\boldsymbolx\boldsymbolx) is related to the relative error in input (Δ\boldsymbolb\boldsymbolb)(Δ\boldsymbolb\boldsymbolb):

Δ\boldsymbolx/\boldsymbolxΔ\boldsymbolb/\boldsymbolb=Δ\boldsymbolx\boldsymbolb\boldsymbolxΔ\boldsymbolb=A1Δ\boldsymbolbA\boldsymbolx\boldsymbolxΔ\boldsymbolbA1Δ\boldsymbolbA\boldsymbolx\boldsymbolxΔ\boldsymbolb=A1A=cond(A)

where we used A\boldsymbolxA\boldsymbolx,\boldsymbolx

Then

Δ\boldsymbolx\boldsymbolxcond(A)Δ\boldsymbolb\boldsymbolb(1)

Therefore, if we know the relative error in input, then we can use the condition number of the system to obtain an upper bound for the relative error of our computed solution (output).

Residual vs Error

The residual vector \boldsymbolr of approximate solution ^\boldsymbolx for the linear system A\boldsymbolx=\boldsymbolb is defined as \boldsymbolr=\boldsymbolbA^\boldsymbolx. In the perturbed matrix problem described above, we have

\boldsymbolr=\boldsymbolb(\boldsymbolb+Δ\boldsymbolb)=Δ\boldsymbolb

Therefore, equation (1) can also be written as

Δ\boldsymbolx\boldsymbolxcond(A)\boldsymbolr\boldsymbolb

If we define relative residual as \boldsymbolr\boldsymbolb, we can see that small relative residual implies small relative error in approximate solution only if A is well-conditioned (cond(A) is small).

Gaussian Elimination (with Partial Pivoting) is Guaranteed to Produce a Small Residual

When we use Gaussian elimination with partial pivoting to compute the solution for the linear system A\boldsymbolx=\boldsymbolb and obtain an approximate solution ^\boldsymbolx, the residual vector \boldsymbolr satisfies:

\boldsymbolrA^\boldsymbolxEAcϵmach

where E is backward error in A (which is defined by (A+E)^\boldsymbolx=\boldsymbolb), c is a coefficient related to A and ϵmach is machine epsilon.

Typically c is small with partial pivoting, but c can be arbitrarily large without pivoting.

Therefore, Gaussian elimination with partial pivoting yields small relative residual regardless of conditioning of the system.

For more details, see Gaussian Elimination & Roundoff Error.

Accuracy Rule of Thumb and Example

Suppose we apply Gaussian elimination with partial pivoting and back substitution to the linear system A\boldsymbolx=\boldsymbolb and obtain a computed solution ^\boldsymbolx. If the entries in A and \boldsymbolb are accurate to s decimal digits, and cond(A)10t, then the elements of the solution vector ^\boldsymbolx will be accurate to about st decimal digits.

For a proof of this rule of thumb, please see Fundamentals of Matrix Computations by David S. Watkins.

Example: How many accurate decimal digits in the solution can we expect to obtain if we solve a linear system A\boldsymbolx=\boldsymbolb where cond(A)=1010 using Gaussian elimination with partial pivoting, assuming we are using IEEE double precision and the inputs are accurate to machine precision?

In IEEE double precision, ϵmach2.2×1016, which means the entries in A and \boldsymbolb are accurate to |log10(2.2×1016)|16 decimal digits.

Then, using the rule of thumb, we know the entries in ^\boldsymbolx will be accurate to about 1610=6 decimal digits.

Review Questions

ChangeLog