Bayes’s Theorem

import pandas as pd
import numpy as np

from utils import values
Copy to clipboard

Review

In the previous notebook I defined probability, conjunction, and conditional probability, and used data from the General Social Survey (GSS) to compute the probability of various logical propositions.

To review, here’s how we loaded the dataset:

gss = pd.read_csv('gss_bayes.csv', index_col=0)
Copy to clipboard

And here are the logical propositions we defined, represented using Boolean series.

banker = (gss['indus10'] == 6870)
Copy to clipboard
female = (gss['sex'] == 2)
Copy to clipboard
liberal = (gss['polviews'] < 4)
Copy to clipboard
democrat = (gss['partyid'] <= 1)
Copy to clipboard

I defined the following function, which uses mean to compute the fraction of True values in a Boolean series.

def prob(A):
    """Computes the probability of a proposition, A.
    
    A: Boolean series
    
    returns: probability
    """
    assert isinstance(A, pd.Series)
    assert A.dtype == 'bool'
    
    return A.mean()
Copy to clipboard

So we can compute the probability of a proposition like this:

prob(female)
Copy to clipboard
0.5378575776019476
Copy to clipboard

Then we used the & operator to compute the probability of a conjunction, like this:

prob(female & banker)
Copy to clipboard
0.011381618989653074
Copy to clipboard

Next I defined the following function, which uses the bracket operator to compute conditional probability:

def conditional(A, B):
    """Conditional probability of A given B.
    
    A: Boolean series
    B: Boolean series
    
    returns: probability
    """
    return prob(A[B])
Copy to clipboard

We showed that conjunction is commutative, so prob(A & B) equals prob(B & A), for any logical propositions A and B.

For example:

prob(liberal & democrat)
Copy to clipboard
0.1425238385067965
Copy to clipboard
prob(democrat & liberal)
Copy to clipboard
0.1425238385067965
Copy to clipboard

But conditional probability is NOT commutative, so conditional(A, B) is generally not the same as conditional(B, A).

For example, here’s the probability that a respondent is female, given that they are a banker.

conditional(female, banker)
Copy to clipboard
0.7706043956043956
Copy to clipboard

And here’s the probability that a respondent is a banker, given that they are female.

conditional(banker, female)
Copy to clipboard
0.02116102749801969
Copy to clipboard

Not even close.

More propositions

For the sake of variety in our examples, let’s define some new propositions.

Here’s the probability that a random respondent is male.

male = (gss['sex']==1)
prob(male)
Copy to clipboard
0.46214242239805237
Copy to clipboard

The industry code for “Construction” is 770. Let’s call someone in this field a “builder”.

builder = (gss['indus10'] == 770)
prob(builder)
Copy to clipboard
0.05978900385473727
Copy to clipboard

And let’s define propositions for conservatives and Republicans.

conservative = (gss['polviews'] > 4)
prob(conservative)
Copy to clipboard
0.3419354838709677
Copy to clipboard
republican = (gss['partyid'].isin([5,6]))
prob(republican)
Copy to clipboard
0.2610062893081761
Copy to clipboard

The isin function checks whether values are in a given sequence. In this example, the values 5 and 6 represent the responses “Strong Republican” and “Not Strong Republican”.

Finally, I’ll use age to define propositions for young and old.

young = (gss['age'] < 30)
prob(young)
Copy to clipboard
0.19435991073240008
Copy to clipboard
old = (gss['age'] >= 65)
prob(old)
Copy to clipboard
0.17328058429701765
Copy to clipboard

For these thresholds, I chose round numbers near the 20th and 80th percentiles. Depending on your age, you may or may not agree with these definitions of “young” and “old”.

Exercise: There’s a famous quote about young people, old people, liberals, and conservatives that goes something like:

If you are not a liberal at 25, you have no heart. If you are not a conservative at 35, you have no brain.

Whether you agree with this proposition or not, it suggests some probabilities we can compute as a review exercise.
Use prob and conditional to compute these probabilities.

  • What is the probability that a randomly chosen respondent is a young liberal?

  • What is the probability that a young person is liberal?

  • What fraction of respondents are old conservatives?

  • What fraction of conservatives are old?

For each statement, think about whether it is expressing a conjunction, or a conditional probability, or both.

And for the conditional probabilities, be careful about the order!

If your last answer is greater than 30%, you have it backwards!

Onward!

In this notebook, we’ll derive three relationships between conjunction and conditional probability:

  • Theorem 1: Using conjunction to compute a conditional probability.

  • Theorem 2: Using a conditional probability to compute a conjunction.

  • Theorem 3: Using conditional(A, B) to compute conditional(B, A).

Theorem 3 is also known as Bayes’s Theorem, which is the foundation of Bayesian statistics.

For parts of this notebook it will be useful to use mathematical notation for probability, so I’ll introduce that now.

  • P(A) is the probability of proposition A.

  • P(A and B) is the probability of the conjunction of A and B, that is, the probability that both are true.

  • P(A|B) is the conditional probability of A given that B is true. The vertical line between A and B is pronounced “given”.

With that, we are ready for Theorem 1.

Theorem 1

What fraction of builders are male? We have already seen one way to compute the answer:

  1. Use the bracket operator to select the builders, then

  2. Use mean to compute the fraction of builders who are male.

We can write these steps like this:

male[builder].mean()
Copy to clipboard
0.8920936545639634
Copy to clipboard

Or we can use the conditional function, which does the same thing:

conditional(male, builder)
Copy to clipboard
0.8920936545639634
Copy to clipboard

But there is another way: to compute the fraction of builders who are male, we can compute the ratio of two probabilities:

  1. The fraction of respondents who are male builders, and

  2. The fraction of respondents who are builders.

Here’s what that looks like.

prob(male & builder) / prob(builder)
Copy to clipboard
0.8920936545639634
Copy to clipboard

The result is the same.

This example demonstrates a general rule that relates conditional probability and conjunction. Here’s what it looks like in math notation:

P(A|B)=P(A and B)P(B)

And that’s Theorem 1.

In this example:

conditional(male, builder) = prob(male & builder) / prob(builder)

Exercise: What fraction of conservatives are Republican? Compute the answer two ways:

  • Use conditional (which uses the bracket operator), and

  • Use Theorem 1.

Confirm that you get the same answer.

Note: Due to floating-point arithmetic, the results might not be exactly the same, but almost all of the digits should be the same.

Proof?

I didn’t really prove Theorem 1; mostly, it is a statement of what conditional probability means.

For example, consider this Venn diagram:

The blue circle represents male respondents. The red circle represents builders. The intersection represents male builders.

To compute the fraction of builders who are male, we can compute the ratio of the intersection, which is prob(male & builder) to the red circle, which is prob(builder).

Exercise: For practice, compute fraction of bankers who are old both ways: using conditional and using Theorem 1.

Theorem 2

Here’s Theorem 1 again:

P(A|B)=P(A and B)P(B)

If we multiply both sides by P(B), we get Theorem 2.

P(A and B)=P(B)P(A|B)

This formula suggests a second way to compute a conjunction: instead of using the & operator, we can compute the product of two probabilities.

Let’s see if it works for conservative and republican. Here’s the result using &:

prob(conservative & republican)
Copy to clipboard
0.15396632176912153
Copy to clipboard

And here’s the result using Theorem 2:

prob(republican) * conditional(conservative, republican)
Copy to clipboard
0.1539663217691215
Copy to clipboard

Because of floating-point errors, they might not be identical, but almost all of the digits are the same.

Exercise: Check Theorem 2 one more time by computing the fraction of respondents who are old liberals both ways:

  • Using the & operator, and

  • Using Theorem 2.

The results should be the same, or at least very close.

Conjunction is commutative

We have already established that conjunction is commutative. In math notation, that means:

P(A and B)=P(B and A)

If we apply Theorem 2 to both sides, we have

P(B)P(A|B)=P(A)P(B|A)

Here’s one way to interpret that: if you want to check A and B, you can do it in either order:

  1. You can check B first, then A conditioned on B, or

  2. You can check A first, then B conditioned on A.

To try it out, I’ll compute the fraction of young builders both ways:

prob(young) * conditional(builder, young)
Copy to clipboard
0.012314871170622844
Copy to clipboard
prob(builder) * conditional(young, builder)
Copy to clipboard
0.012314871170622844
Copy to clipboard

Same thing!

Exercise: Compute the probability of being a male banker both ways and see if you get the same thing.

Theorem 3

In the previous section we established that

P(B)P(A|B)=P(A)P(B|A)

If we divide through by P(B), we get Theorem 3:

P(A|B)=P(A)P(B|A)P(B)

And that, my friends, is Bayes’s Theorem.

To see how it works, let’s try one more combination of our propositions. Let’s compute the fraction of builders who are liberal, first using conditional:

conditional(liberal, builder)
Copy to clipboard
0.24431625381744146
Copy to clipboard

Now using Bayes’s Theorem:

prob(liberal) * conditional(builder, liberal) / prob(builder)
Copy to clipboard
0.24431625381744151
Copy to clipboard

Same thing!

Exercise: Try it yourself! Compute the fraction of young people who are Republican both ways: using conditional and using Bayes’s Theorem. See if you get the same thing.

conditional(republican, young)
Copy to clipboard
0.23319415448851774
Copy to clipboard
prob(republican) * conditional(young, republican) / prob(young)
Copy to clipboard
0.2331941544885177
Copy to clipboard

Summary

Here’s what we have so far:

Theorem 1 gives us a new way to compute a conditional probability using a conjunction:

P(A|B)=P(A and B)P(B)

Theorem 2 gives us a new way to compute a conjunction using a conditional probability:

P(A and B)=P(B)P(A|B)

Theorem 3, also known as Bayes’s Theorem, gives us a way to get from P(A|B) to P(B|A), or the other way around:

P(A|B)=P(A)P(B|A)P(B)

But at this point you might ask, “So what?” If we have all of the data, we can compute any probability we want, any conjunction, or any conditional probability, just by counting. Why do we need these formulas?

And you are right, if we have all of the data. But often we don’t, and in that case, these formulas can be pretty useful – especially Bayes’s Theorem.

In the next notebook, we’ll see how.