Condition Numbers


Learning Objectives

Condition Number Definition

The condition number of a square nonsingular matrix A is defined by cond(A)=κ(A)=AA1 which is also the condition number associated with solving the linear system Ax=b. A matrix with a large condition number is said to be ill-conditioned.

The condition number can be measured with any p-norm, so to be precise we typically specify the norm being used, i.e. cond2, cond1,cond.

If A is singular (A1 does not exist), we can define cond(A)= by convention.

The identity matrix is well conditioned. Assuming the inverse of A exists, cond(A)=AA1AA1=I=1. This is the smallest possible condition number.

Things to Remember About Condition Numbers

Perturbed Matrix Problem and Error Bound

Let x be the solution of Ax=b and x^ be the solution of the perturbed problem Ax^=b+Δb. Let Δx=x^x be the absolute error in output. Then we have Ax+AΔx=b+Δb, so AΔx=Δb. Now we want to see how the relative error in output (Δxx) is related to the relative error in input (Δbb):

Δx/xΔb/b=ΔxbxΔb=A1ΔbAxxΔbA1ΔbAxxΔb=A1A=cond(A)

where we used AxAx,x.

Then

Δxxcond(A)Δbb(1)

Therefore, if we know the relative error in input, then we can use the condition number of the system to obtain an upper bound for the relative error of our computed solution (output).

Residual vs Error

The residual vector r of approximate solution x^ for the linear system Ax=b is defined as r=bAx^. In the perturbed matrix problem described above, we have

r=b(b+Δb)=Δb

Therefore, equation (1) can also be written as

Δxxcond(A)rb

If we define relative residual as rb, we can see that small relative residual implies small relative error in approximate solution only if A is well-conditioned (cond(A) is small).

Alternative Definitions of Relative Residual

There are other closely related quantities that also have the name “relative residual”. Note that

Δx=x^x=A1Ax^A1b=A1(Ax^b)=A1rA1r=A1ArA=cond(A)rA.

In summary,

Δxcond(A)rA(2)

We can divide this inequality by x to obtain

Δxxcond(A)rAx.

The quantity rAx is also known as the relative residual. This inequality is useful mathematically, but involves the norm x of the unknown solution, so it isn’t a practical way to bound the relative error. Since b=AxAx, we have

rAxrb

but are sometimes equal for certain choices of b.

We can also divide equation (2) by x^ to obtain

Δxx^cond(A)rAx^.

The left-hand side is no longer the relative error (the norm of the approximate solution is in the denominator, not the exact solution), but the right-hand side can still provide a reasonable estimate of the relative error. It is also computable, since the norm of the true solution does not appear on the right-hand side.

For this reason, the quantity rAx^ is also known as the relative residual. This is used in the next section to describe the relationship between the residual and errors in the matrix A.

While 3 different quantities all being named “relative residual” may be confusing, you should be able to determine which one is being discussed by the context.

Gaussian Elimination (with Partial Pivoting) is Guaranteed to Produce a Small Residual

When we use Gaussian elimination with partial pivoting to compute the solution for the linear system Ax=b and obtain an approximate solution x^, the residual vector r satisfies:

rAx^EAcϵmach

where E is backward error in A (which is defined by (A+E)x^=b), c is a coefficient related to A and ϵmach is machine epsilon.

Typically c is small with partial pivoting, but c can be arbitrarily large without pivoting.

Therefore, Gaussian elimination with partial pivoting yields small relative residual regardless of conditioning of the system.

For more details, see Gaussian Elimination & Roundoff Error.

Accuracy Rule of Thumb and Example

Suppose we apply Gaussian elimination with partial pivoting and back substitution to the linear system Ax=b and obtain a computed solution x^. If the entries in A and b are accurate to s decimal digits, and cond(A)10t, then the elements of the solution vector x^ will be accurate to about st decimal digits.

For a proof of this rule of thumb, please see Fundamentals of Matrix Computations by David S. Watkins.

Example: How many accurate decimal digits in the solution can we expect to obtain if we solve a linear system Ax=b where cond(A)=1010 using Gaussian elimination with partial pivoting, assuming we are using IEEE double precision and the inputs are accurate to machine precision?

In IEEE double precision, ϵmach2.2×1016, which means the entries in A and b are accurate to |log10(2.2×1016)|16 decimal digits.

Then, using the rule of thumb, we know the entries in x^ will be accurate to about 1610=6 decimal digits.

Review Questions

ChangeLog