# Bayes’s Theorem¶

```
import pandas as pd
import numpy as np
from utils import values
```

## Review¶

In the previous notebook I defined probability, conjunction, and conditional probability, and used data from the General Social Survey (GSS) to compute the probability of various logical propositions.

To review, here’s how we loaded the dataset:

```
gss = pd.read_csv('gss_bayes.csv', index_col=0)
```

And here are the logical propositions we defined, represented using Boolean series.

```
banker = (gss['indus10'] == 6870)
```

```
female = (gss['sex'] == 2)
```

```
liberal = (gss['polviews'] < 4)
```

```
democrat = (gss['partyid'] <= 1)
```

I defined the following function, which uses `mean`

to compute the fraction of `True`

values in a Boolean series.

```
def prob(A):
"""Computes the probability of a proposition, A.
A: Boolean series
returns: probability
"""
assert isinstance(A, pd.Series)
assert A.dtype == 'bool'
return A.mean()
```

So we can compute the probability of a proposition like this:

```
prob(female)
```

```
0.5378575776019476
```

Then we used the `&`

operator to compute the probability of a conjunction, like this:

```
prob(female & banker)
```

```
0.011381618989653074
```

Next I defined the following function, which uses the bracket operator to compute conditional probability:

```
def conditional(A, B):
"""Conditional probability of A given B.
A: Boolean series
B: Boolean series
returns: probability
"""
return prob(A[B])
```

We showed that conjunction is commutative, so `prob(A & B)`

equals `prob(B & A)`

, for any logical propositions `A`

and `B`

.

For example:

```
prob(liberal & democrat)
```

```
0.1425238385067965
```

```
prob(democrat & liberal)
```

```
0.1425238385067965
```

But conditional probability is NOT commutative, so `conditional(A, B)`

is generally not the same as `conditional(B, A)`

.

For example, here’s the probability that a respondent is female, given that they are a banker.

```
conditional(female, banker)
```

```
0.7706043956043956
```

And here’s the probability that a respondent is a banker, given that they are female.

```
conditional(banker, female)
```

```
0.02116102749801969
```

Not even close.

## More propositions¶

For the sake of variety in our examples, let’s define some new propositions.

Here’s the probability that a random respondent is male.

```
male = (gss['sex']==1)
prob(male)
```

```
0.46214242239805237
```

The industry code for “Construction” is `770`

. Let’s call someone in this field a “builder”.

```
builder = (gss['indus10'] == 770)
prob(builder)
```

```
0.05978900385473727
```

And let’s define propositions for conservatives and Republicans.

```
conservative = (gss['polviews'] > 4)
prob(conservative)
```

```
0.3419354838709677
```

```
republican = (gss['partyid'].isin([5,6]))
prob(republican)
```

```
0.2610062893081761
```

The `isin`

function checks whether values are in a given sequence. In this example, the values `5`

and `6`

represent the responses “Strong Republican” and “Not Strong Republican”.

Finally, I’ll use `age`

to define propositions for `young`

and `old`

.

```
young = (gss['age'] < 30)
prob(young)
```

```
0.19435991073240008
```

```
old = (gss['age'] >= 65)
prob(old)
```

```
0.17328058429701765
```

For these thresholds, I chose round numbers near the 20th and 80th percentiles. Depending on your age, you may or may not agree with these definitions of “young” and “old”.

**Exercise:** There’s a famous quote about young people, old people, liberals, and conservatives that goes something like:

If you are not a liberal at 25, you have no heart. If you are not a conservative at 35, you have no brain.

Whether you agree with this proposition or not, it suggests some probabilities we can compute as a review exercise.

Use `prob`

and `conditional`

to compute these probabilities.

What is the probability that a randomly chosen respondent is a young liberal?

What is the probability that a young person is liberal?

What fraction of respondents are old conservatives?

What fraction of conservatives are old?

For each statement, think about whether it is expressing a conjunction, or a conditional probability, or both.

And for the conditional probabilities, be careful about the order!

If your last answer is greater than 30%, you have it backwards!

## Onward!¶

In this notebook, we’ll derive three relationships between conjunction and conditional probability:

Theorem 1: Using conjunction to compute a conditional probability.

Theorem 2: Using a conditional probability to compute a conjunction.

Theorem 3: Using

`conditional(A, B)`

to compute`conditional(B, A)`

.

Theorem 3 is also known as Bayes’s Theorem, which is the foundation of Bayesian statistics.

For parts of this notebook it will be useful to use mathematical notation for probability, so I’ll introduce that now.

\(P(A)\) is the probability of proposition \(A\).

\(P(A~\mathrm{and}~B)\) is the probability of the conjunction of \(A\) and \(B\), that is, the probability that both are true.

\(P(A | B)\) is the conditional probability of \(A\) given that \(B\) is true. The vertical line between \(A\) and \(B\) is pronounced “given”.

With that, we are ready for Theorem 1.

## Theorem 1¶

What fraction of builders are male? We have already seen one way to compute the answer:

Use the bracket operator to select the builders, then

Use

`mean`

to compute the fraction of builders who are male.

We can write these steps like this:

```
male[builder].mean()
```

```
0.8920936545639634
```

Or we can use the `conditional`

function, which does the same thing:

```
conditional(male, builder)
```

```
0.8920936545639634
```

But there is another way: to compute the fraction of builders who are male, we can compute the ratio of two probabilities:

The fraction of respondents who are male builders, and

The fraction of respondents who are builders.

Here’s what that looks like.

```
prob(male & builder) / prob(builder)
```

```
0.8920936545639634
```

The result is the same.

This example demonstrates a general rule that relates conditional probability and conjunction. Here’s what it looks like in math notation:

\(P(A|B) = \frac{P(A~\mathrm{and}~B)}{P(B)}\)

And that’s Theorem 1.

In this example:

`conditional(male, builder) = prob(male & builder) / prob(builder)`

**Exercise:** What fraction of conservatives are Republican? Compute the answer two ways:

Use

`conditional`

(which uses the bracket operator), andUse Theorem 1.

Confirm that you get the same answer.

Note: Due to floating-point arithmetic, the results might not be exactly the same, but almost all of the digits should be the same.

## Proof?¶

I didn’t really prove Theorem 1; mostly, it is a statement of what conditional probability means.

For example, consider this Venn diagram:

The blue circle represents male respondents. The red circle represents builders. The intersection represents male builders.

To compute the fraction of builders who are male, we can compute the ratio of the intersection, which is `prob(male & builder)`

to the red circle, which is `prob(builder)`

.

**Exercise:** For practice, compute fraction of bankers who are old both ways: using `conditional`

and using Theorem 1.

## Theorem 2¶

Here’s Theorem 1 again:

\(P(A|B) = \frac{P(A~\mathrm{and}~B)}{P(B)}\)

If we multiply both sides by \(P(B)\), we get Theorem 2.

\(P(A~\mathrm{and}~B) = P(B) P(A|B)\)

This formula suggests a second way to compute a conjunction: instead of using the `&`

operator, we can compute the product of two probabilities.

Let’s see if it works for `conservative`

and `republican`

. Here’s the result using `&`

:

```
prob(conservative & republican)
```

```
0.15396632176912153
```

And here’s the result using Theorem 2:

```
prob(republican) * conditional(conservative, republican)
```

```
0.1539663217691215
```

Because of floating-point errors, they might not be identical, but almost all of the digits are the same.

**Exercise:** Check Theorem 2 one more time by computing the fraction of respondents who are old liberals both ways:

Using the

`&`

operator, andUsing Theorem 2.

The results should be the same, or at least very close.

## Conjunction is commutative¶

We have already established that conjunction is commutative. In math notation, that means:

\(P(A~\mathrm{and}~B) = P(B~\mathrm{and}~A)\)

If we apply Theorem 2 to both sides, we have

\(P(B) P(A|B) = P(A) P(B|A)\)

Here’s one way to interpret that: if you want to check \(A\) and \(B\), you can do it in either order:

You can check \(B\) first, then \(A\) conditioned on \(B\), or

You can check \(A\) first, then \(B\) conditioned on \(A\).

To try it out, I’ll compute the fraction of young builders both ways:

```
prob(young) * conditional(builder, young)
```

```
0.012314871170622844
```

```
prob(builder) * conditional(young, builder)
```

```
0.012314871170622844
```

Same thing!

**Exercise:** Compute the probability of being a male banker both ways and see if you get the same thing.

## Theorem 3¶

In the previous section we established that

\(P(B) P(A|B) = P(A) P(B|A)\)

If we divide through by \(P(B)\), we get Theorem 3:

\(P(A|B) = \frac{P(A) P(B|A)}{P(B)}\)

And that, my friends, is Bayes’s Theorem.

To see how it works, let’s try one more combination of our propositions. Let’s compute the fraction of builders who are liberal, first using `conditional`

:

```
conditional(liberal, builder)
```

```
0.24431625381744146
```

Now using Bayes’s Theorem:

```
prob(liberal) * conditional(builder, liberal) / prob(builder)
```

```
0.24431625381744151
```

Same thing!

**Exercise:** Try it yourself! Compute the fraction of young people who are Republican both ways: using `conditional`

and using Bayes’s Theorem. See if you get the same thing.

```
conditional(republican, young)
```

```
0.23319415448851774
```

```
prob(republican) * conditional(young, republican) / prob(young)
```

```
0.2331941544885177
```

## Summary¶

Here’s what we have so far:

**Theorem 1** gives us a new way to compute a conditional probability using a conjunction:

\(P(A|B) = \frac{P(A~\mathrm{and}~B)}{P(B)}\)

**Theorem 2** gives us a new way to compute a conjunction using a conditional probability:

\(P(A~\mathrm{and}~B) = P(B) P(A|B)\)

**Theorem 3**, also known as Bayes’s Theorem, gives us a way to get from \(P(A|B)\) to \(P(B|A)\), or the other way around:

\(P(A|B) = \frac{P(A) P(B|A)}{P(B)}\)

But at this point you might ask, “So what?” If we have all of the data, we can compute any probability we want, any conjunction, or any conditional probability, just by counting. Why do we need these formulas?

And you are right, *if* we have all of the data. But often we don’t, and in that case, these formulas can be pretty useful – especially Bayes’s Theorem.

In the next notebook, we’ll see how.