CS 440/ECE 448

Margaret Fleck

Margaret Fleck

Here's a model of two variables for the University/Goodwin intersection:

E/W light green yellow red N/S light green 0 0 0.2 yellow 0 0 0.1 red 0.5 0.1 0.1

To be a probability distribution, the numbers must add up to 1 (which they do in this example).

Most model-builders assume that probabilities aren't actually zero. That is, unobserved events do occur but they just happen so infrequently that we haven't yet observed one. So a more realistic model might be

E/W light green yellow red N/S light green e e 0.2-f yellow e e 0.1-f red 0.5-f 0.1-f 0.1-f

To make this a proper probability distribution, we need to set f=(4/5)e so all the values add up to 1.

Suppose we are given a joint distribution like the one above, but we want to pay attention to only one variable. To get its distribution, we sum probabilities across all values of the other variable.

E/W light marginals green yellow red N/S light green 0 0 0.2 0.2 yellow 0 0 0.1 0.1 red 0.5 0.1 0.1 0.7 ------------------------------------------------- marginals 0.5 0.1 0.4

So the marginal distribution of the N/S light is

P(green) = 0.2

P(yellow) = 0.1

P(red) = 0.7

To write this in formal notation suppose Y has values \( y_1 ... y_n \). Then we compute the marginal probability P(X=x) using the formula \( P(X=x) = \sum_{k=1}^n P(x,y_k) \).

Suppose we know that the N/S light is red, what are the probabilities for the E/W light? Let's just extract that line of our joint distribution.

E/W light green yellow red N/S light red 0.5 0.1 0.1

So we have a distribution that looks like this:

P(E/W=green | N/S = red) = 0.5

P(E/W=yellow | N/S = red) = 0.1

P(E/W=red | N/S = red) = 0.1

Oops, these three probabilities don't sum to 1. So this isn't a legit probability distribution (see Kolmogorov's Axioms above). To make them sum to 1, divide each one by the sum they currently have (which is 0.7). This gives us

P(E/W=green | N/S = red) = 0.5/0.7 = 5/7

P(E/W=yellow | N/S = red) = 0.1/0.7 = 1/7

P(E/W=red | N/S = red) = 0.1/0.7 = 1/7

Conditional probability models how frequently we see each variable value in some context (e.g. how often is the barrier-arm down if it's nighttime). The conditional probability of A in a context C is defined to be

P(A | C) = P(A,C)/P(C)

Many other useful formulas can be derived from this definition plus the basic formulas given above. In particular, we can transform this definition into

P(A,C) = P(C) * P(A | C)

P(A,C) = P(A) * P(C | A)

These formulas extend to multiple inputs like this:

P(A,B,C) = P(A) * P(B | A) * P(C | A,B)

Two events A and B are independent iff

P(A,B) = P(A) * P(B)

It's equivalent to show that this equation is equivalent to each of the following equations:

P(A | B) = P(A)

P(B | A) = P(B)

Exercise for the reader: why are these three equations all equivalent? Hint: use definition of conditional probability. Figure this out for yourself, because it will help you become familiar with the definitions.