Causation, Collision, and Confusion#
Click here to run this notebook on Colab.
# Install empiricaldist if we don't already have it
try:
import empiricaldist
except ImportError:
!pip install empiricaldist
# download utils.py
from os.path import basename, exists
def download(url):
filename = basename(url)
if not exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
download("https://github.com/AllenDowney/ProbablyOverthinkingIt/raw/book/notebooks/utils.py")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from utils import decorate
# Set the random seed so we get the same results every time
np.random.seed(17)
The Low Birthweight Paradox was born in 1971, when Jacob Yerushalmy, a researcher at U.C. Berkeley, published “The relationship of parents’ cigarette smoking to outcome of pregnancy – implications as to the problem of inferring causation from observed associations”. As the title suggests, the paper is about the relationship between smoking during pregnancy, the weight of babies at birth, and mortality in the first month of life.
Based on data from about 13,000 babies born near San Francisco between 1960 and 1967, Yerushalmy reported that
Babies of mothers who smoked were about 6% lighter at birth.
Smokers were about twice as likely to have babies lighter than 2500 grams, which is considered “low birthweight”.
Low-birthweight babies were much more likely to die within a month of birth: the mortality rate was 174 per 1000 for low-birthweight babies and 7.8 per 1000 for others.
These results were not surprising. At that time, it was well known that children of smokers were lighter at birth, and that low-birthweight babies were more likely to die.
In the first part of this notebook, there are several cells like the following that compute percentages based on numbers from tables in Yerushalmy’s paper.
(3464 - 3255) / 3255 * 100
6.42089093701997
Putting those results together, you might expect mortality rates to be higher for children of smokers. And you would be right, but the difference was not very big. For White mothers, the mortality rate was 11.3 per 1000 for children of smokers, compared to 11.0 for children of nonsmokers.
That’s strange, but it gets even stranger. If we select only the low-birthweight (LBW) babies, we find:
For LBW babies of nonsmokers, the mortality rate was 218 per 1000;
For LBW babies of smokers, it was only 114 per 1000, about 48% lower.
(218 - 114) / 218 * 100
47.706422018348626
Yerushalmy also compared rates of congenital anomalies (birth defects).
For LBW babies of nonsmokers, the rate was 147 per 1000,
For LBW babies of smokers, it was 72 per 1000, about 53% lower.
These results make maternal smoking seem beneficial for low-birthweight babies, somehow protecting them from birth defects and mortality. Yerushalmy concluded:
These paradoxical findings raise doubts and argue against the proposition that cigarette smoking acts as an exogenous factor which interferes with intrauterine development of the fetus.
In other words, maybe maternal smoking isn’t bad for babies after all.
(147 - 72) / 142
0.528169014084507
But it was a mistake. At the risk of giving away the ending, the Low Birthweight Paradox is a statistical artifact. In fact, maternal smoking is harmful to babies, regardless of birthweight. It only seems beneficial because the analysis is misleading.
An explanation came in 2006 from epidemiologists at Harvard University and the National Institutes of Health (NIH), based on data from 3 million babies born in 1991. Using the same dataset, which is available from the National Center for Health Statistics (NCHS), I will replicate their results and summarize their explanation. Then I’ll repeat the analysis with data from 2018, and we’ll see what has changed.
Three Million Babies Can’t Be Wrong#
The data are originally from the National Center for Health Statistics (NCHS).
I selected the columns we need and stored them in a compressed HDF file.
DATA_PATH = "https://github.com/AllenDowney/ProbablyOverthinkingIt/raw/book/data/"
download(DATA_PATH + "nchs.hdf")
vs1991 = pd.read_hdf("nchs.hdf", "vs1991")
vs1991.shape
(4115493, 5)
If age of death is NaN, that means the baby survived.
vs1991["mort"] = vs1991["aged"].notnull()
vs1991["mort"].value_counts()
mort
False 4079973
True 35520
Name: count, dtype: int64
Recode the tobacco variable.
vs1991["tobacco"].replace([9], np.nan, inplace=True)
vs1991["tobacco"].value_counts()
tobacco
2.0 2471563
1.0 533202
Name: count, dtype: int64
from empiricaldist import Pmf
Pmf.from_seq(vs1991["tobacco"])
probs | |
---|---|
tobacco | |
1.0 | 0.177452 |
2.0 | 0.822548 |
Getting some numbers from the paper into a PMF.
white = np.array([6067, 3726])
black = np.array([2219, 1071])
pmf = Pmf(white + black, index=["Nonsmoker", "Smoker"])
pmf.normalize()
pmf
probs | |
---|---|
Nonsmoker | 0.633341 |
Smoker | 0.366659 |
Checking the distribution of birthweights.
from empiricaldist import Cdf
vs1991["birthweight"].replace([7777, 9999], np.nan, inplace=True)
Cdf.from_seq(vs1991["birthweight"]).plot()
<Axes: xlabel='birthweight'>
vs1991["birthweight"].nlargest(10)
1282965 8164.0
2328658 8164.0
2345800 8147.0
3078492 8108.0
1407733 7966.0
3037159 7965.0
2395933 7910.0
3046903 7890.0
2513239 7889.0
2582863 7880.0
Name: birthweight, dtype: float64
Flagging low birthweight babies.
vs1991["lbw"] = vs1991["birthweight"] < 2500
Dividing birthweights into bins.
bins = np.arange(1000, 5000, 250)
bins
array([1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500,
3750, 4000, 4250, 4500, 4750])
vs1991["bin"] = pd.cut(vs1991["birthweight"], bins)
vs1991["bin"].value_counts().sort_index()
bin
(1000, 1250] 12374
(1250, 1500] 14661
(1500, 1750] 21393
(1750, 2000] 34816
(2000, 2250] 61218
(2250, 2500] 123119
(2500, 2750] 246404
(2750, 3000] 426903
(3000, 3250] 696739
(3250, 3500] 817652
(3500, 3750] 712945
(3750, 4000] 482449
(4000, 4250] 240867
(4250, 4500] 122762
(4500, 4750] 46395
Name: count, dtype: int64
Checking the codes for congenital birth defects.
vs1991["congenit"].value_counts()
congenit
2222222222222222222222 3556241
9999999999999999999999 498038
2222222222222222222221 23448
2222222222222222221222 5843
2222221222222222222222 3771
...
2222211112112122221222 1
2222121222122222222222 1
2212122222221221222222 1
2222222122222211222222 1
2222122222122222122222 1
Name: count, Length: 1008, dtype: int64
Each code is a vector that indicates the presence or absence of a particular condition. All 2’s means no anomalies.
vs1991["anomaly"] = vs1991["congenit"] != "2222222222222222222222"
vs1991["anomaly"].value_counts()
anomaly
False 3556241
True 559252
Name: count, dtype: int64
no_defect = vs1991["congenit"] == "2222222222222222222222"
no_defect.sum()
3556241
unknown = vs1991["congenit"] == "9999999999999999999999"
vs1991.loc[unknown, "anomaly"] = np.nan
unknown.sum()
498038
known_defect = (~no_defect) & (~unknown)
known_defect.sum()
61214
If we select cases with known tobacco use and birthweight, we’re down to about three million live births.
subset = vs1991.dropna(subset=["tobacco", "birthweight"])
subset.shape
(3001607, 9)
for name, group in subset.groupby("tobacco"):
print(name, group["anomaly"].mean() * 1000)
1.0 20.599922088040515
2.0 18.1934659988842
def percent_diff(x, rel):
"""Percentage difference.
x: the number of interest
rel: the number it is relative to
returns: float percent
"""
return (x - rel) / rel * 100
percent_diff(20.6, 18.2)
13.186813186813199
subset_no_defect = subset[subset["anomaly"] == 0]
subset_no_defect.shape
(2816996, 9)
weights = subset.groupby("tobacco")["birthweight"].mean()
weights
tobacco
1.0 3145.189278
2.0 3370.030102
Name: birthweight, dtype: float64
percent_diff(3145, 3370)
-6.6765578635014835
In the 1991 data from NCHS, about 18% of the mothers reported smoking during pregnancy, down from 37% in Yerushalmy’s dataset from the 1960s. Babies of smokers were lighter on average than babies of nonsmokers by about 7%, which is comparable to the difference in the 1960s data.
The following figure shows the distribution of weights for the two groups. The vertical line is at 2500 grams, the threshold for low birthweight.
from utils import kdeplot
labels = {1: "Smokers", 2: "Nonsmokers"}
ls = {1: "-", 2: "--"}
xs = np.linspace(subset["birthweight"].min(), subset["birthweight"].max(), 101)
for name, group in subset.groupby("tobacco"):
kdeplot(group["birthweight"], xs, label=labels[name], ls=ls[name])
plt.axvline(2500, color="gray", ls=":", alpha=0.5)
decorate(
xlabel="Birthweight (grams)",
ylabel="Relative likelihood",
title="Distributions of birthweights",
xlim=[0, 6000],
)
The shapes of the distributions are similar, but for smokers it is shifted to the left. For mothers who smoked, the fraction of babies below 2500 grams is about 11%; for nonsmokers it is only 6%.
subset["lbw"].mean()
0.07262476400141657
for name, group in subset.groupby("tobacco"):
lbw_rate = group["lbw"].mean()
print(name, lbw_rate)
1.0 0.11421285960264155
2.0 0.06365429478663773
percent_diff(11.4, 6.4)
78.125
(9793 * 11.1 + 3290 * 18.5) / (9793 + 3290)
12.960888175494917
subset["mort"].mean() * 1000
8.449473898481713
Overall infant mortality was substantially lower in 1991. In the 1960s dataset, about 13 per 1000 babies died within the first month of life; in 1991, about 8.5 per 1000 died in the first year.
In 1991, the mortality rate was higher for babies of smokers, almost 12 per 1000, than babies of nonsmokers, 7.7 per 1000. So the risk of mortality was 54% higher for babies of mothers who smoked.
for name, group in subset.groupby("tobacco"):
rate = group["mort"].mean() * 1000
print(name, rate)
1.0 11.874562261649707
2.0 7.71068917473998
percent_diff(1187, 771)
53.95590142671855
In summary, babies of mothers who smoked were about twice as likely to be underweight, and underweight babies were about 50% more likely to die. However, if we select babies lighter than 2500 grams, the mortality rate is 20% lower for babies of smokers, compared to LBW babies of nonsmokers.
for name, group in subset[subset["lbw"]].groupby("tobacco"):
rate = group["mort"].mean() * 1000
print(name, rate)
1.0 60.138756452832666
2.0 74.8512709572742
percent_diff(60.1, 74.85)
-19.70607882431529
The analysis so far is based on only two groups, babies born lighter or heavier than 2500 grams. But it might be a mistake to lump all LBW babies together. In reality, a baby born close to 2500 grams has a better chance of surviving than a baby born at 1500 grams.
So, following the analysis in the 2006 paper, I partitioned the dataset into groups with similar birthweight and computed the mortality rate in each group. The following figure shows the results.
table = pd.pivot_table(subset, index="bin", columns="tobacco", values="mort")
table *= 1000
table.index = bins[:-1] + np.diff(bins) / 2
table
tobacco | 1.0 | 2.0 |
---|---|---|
1125.0 | 102.376600 | 106.864725 |
1375.0 | 65.099458 | 72.467402 |
1625.0 | 43.963878 | 48.194837 |
1875.0 | 31.085935 | 32.530380 |
2125.0 | 22.160247 | 20.968439 |
2375.0 | 16.087278 | 13.399794 |
2625.0 | 9.483955 | 7.970344 |
2875.0 | 7.015749 | 4.987658 |
3125.0 | 5.610098 | 3.629459 |
3375.0 | 5.005695 | 2.534403 |
3625.0 | 3.994581 | 2.394741 |
3875.0 | 4.040855 | 1.983212 |
4125.0 | 3.064351 | 1.966262 |
4375.0 | 4.212744 | 2.005241 |
4625.0 | 5.084083 | 2.539889 |
def plot_table(table, title):
table[1].plot(label="Smoker")
table[2].plot(ls="--", label="Nonsmoker")
decorate(
xlabel="Birth weight (grams)",
ylabel="Mortality rate per 1000",
title=title,
)
plot_table(table, "Mortality rate vs birthweight, NCHS 1991")
This figure provides a more detailed view of the Low Birthweight Paradox. Among babies heavier than 2000 grams, mortality is higher for children of smokers, as expected. Among lighter babies, mortality is lower for children of smokers.
Other Groups#
As it turns out, the Low Birthweight Paradox doesn’t apply only to smokers and nonsmokers. The 2006 paper describe a similar effect for babies born at high altitude: they are lighter on average than babies born at low altitude, but if we select LBW babies, the mortality rate is lower for the ones born at high altitude.
And Yerushalmy reported another example. Babies of short mothers are lighter, on average, than babies of tall mothers. In his dataset, babies of short mothers were twice as likely to be LBW, but among LBW babies of short mothers, the mortality rate was 49% lower and the rate of birth defects was 34% lower.
Yerushalmy called the relationship between smokers and nonsmokers, and between short and tall mothers, a “remarkable parallelism”. But he did not recognize it as evidence that statistical bias is the explanation for both. Instead, he doubled down:
This comparison is presented not as proof that the differences between smokers and nonsmokers are necessarily of biological origin, rather it is to indicate that a biological hypothesis is not unreasonable.
With the benefit of further research, we can see that Yerushalmy was mistaken. Smoking, high altitude, and short mothers do not protect low-birthweight babies from birth defects and mortality. Rather, they provide a relatively benign explanation for low birthweight.
percent_diff(110, 214)
-48.598130841121495
percent_diff(96, 146)
-34.24657534246575
To see why, suppose four things can cause low birthweight:
The mother might be short, which is not at all harmful to the baby.
The baby might be born at high altitude, which has little if any effect on mortality.
The mother might be a smoker, which is somewhat harmful to the baby, or
The baby might have a birth defect, which greatly increases the rate of mortality.
Now suppose you are a doctor and you hear that a baby under your care was born underweight. You would be concerned, because you know that the baby faces a higher than average risk of mortality.
But suppose the baby was born in Santa Fe, New Mexico, at 2200 meters of elevation to a mother at only 150 cm of elevation (just under five feet). You would be relieved, because either of those factors might explain low birthweight, and neither implies a substantial increase in mortality.
And if you learned that the mother was a smoker, that would be good news, too, because it provides another possible explanation for low birthweight, which means that the last and most harmful explanation is less likely. Maternal smoking is still bad for babies, but it is not as bad as birth defects.
It is frustrating that Yerushalmy did not discover this explanation. In retrospect, he had all the evidence he needed, including the smoking gun (sorry!): the rates of birth defects.
We’ve seen that LBW babies of smokers are less likely to have birth defects, but that’s not because maternal smoking somehow protects babies from congenital anomalies. It’s because low birthweight generally has a cause, and if the cause is not smoking, it is more likely to be something else, including a birth defect.
We can confirm that this explanation is correct by selecting babies with no congenital anomalies observed at birth. If we do that, we find that babies of smokers have higher mortality rates in nearly every weight category, as expected.
lbw = (subset_no_defect["birthweight"] > 1000) & (
subset_no_defect["birthweight"] < 2500
)
rate = subset_no_defect[lbw].groupby("tobacco")["mort"].mean() * 1000
rate
tobacco
1.0 21.316747
2.0 19.730921
Name: mort, dtype: float64
percent_diff(*rate)
8.037265001800348
table = pd.pivot_table(subset_no_defect, index="bin", columns="tobacco", values="mort")
table *= 1000
table.index = bins[:-1] + np.diff(bins) / 2
table
tobacco | 1.0 | 2.0 |
---|---|---|
1125.0 | 90.295797 | 88.376720 |
1375.0 | 50.944947 | 54.970094 |
1625.0 | 30.974633 | 32.201915 |
1875.0 | 24.103738 | 21.577559 |
2125.0 | 17.170891 | 13.721836 |
2375.0 | 12.980332 | 8.864062 |
2625.0 | 8.139342 | 5.766043 |
2875.0 | 6.234500 | 3.911355 |
3125.0 | 5.139620 | 2.867080 |
3375.0 | 4.480155 | 2.117150 |
3625.0 | 3.559452 | 2.045752 |
3875.0 | 3.715916 | 1.655556 |
4125.0 | 2.733804 | 1.659370 |
4375.0 | 3.903527 | 1.799781 |
4625.0 | 4.558641 | 2.009632 |
plot_table(table, "Mortality rate vs birthweight, no anomaly")
The End of the Paradox#
In the most recent NCHS dataset, including 3.8 million babies born in 2018, the Low Birthweight Paradox has disappeared.
vs2018 = pd.read_hdf("nchs.hdf", "vs2018")
vs2018.shape
(3801533, 3)
vs2018["yod"].value_counts(dropna=False)
yod
NaN 3780154
2018.0 18735
2019.0 2644
Name: count, dtype: int64
vs2018["mort"] = vs2018["yod"].notnull()
vs2018["mort"].value_counts()
mort
False 3780154
True 21379
Name: count, dtype: int64
vs2018["tobacco"].value_counts()
tobacco
N 3539051
Y 245360
U 17122
Name: count, dtype: int64
vs2018["tobacco"].replace(["U"], np.nan, inplace=True)
vs2018["tobacco"].value_counts(dropna=False)
tobacco
N 3539051
Y 245360
NaN 17122
Name: count, dtype: int64
from empiricaldist import Pmf
Pmf.from_seq(vs2018["tobacco"])
probs | |
---|---|
tobacco | |
N | 0.935166 |
Y | 0.064834 |
In this dataset, only 6% of the mothers reported smoking during pregnancy, down from 18% in 1991 and 37% in the 1960s.
vs2018["birthweight"].value_counts().sort_index()
birthweight
227 157
228 5
229 7
230 34
231 4
...
8025 1
8160 1
8161 1
8165 12
9999 2100
Name: count, Length: 5357, dtype: int64
vs2018["birthweight"].replace([7777, 8165, 9999], np.nan, inplace=True)
Cdf.from_seq(vs2018["birthweight"]).plot()
<Axes: xlabel='birthweight'>
vs2018["birthweight"].nlargest(10)
2960521 8161.0
2599521 8160.0
1781210 8025.0
3162300 7975.0
1691806 7940.0
3218818 7940.0
2001264 7875.0
3185226 7860.0
1877614 7853.0
2902164 7815.0
Name: birthweight, dtype: float64
vs2018["lbw"] = vs2018["birthweight"] < 2500
bins = np.arange(1000, 5000, 250)
bins
array([1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500,
3750, 4000, 4250, 4500, 4750])
vs2018["bin"] = pd.cut(vs2018["birthweight"], bins)
vs2018["bin"].value_counts().sort_index()
bin
(1000, 1250] 12263
(1250, 1500] 15841
(1500, 1750] 23552
(1750, 2000] 38581
(2000, 2250] 69739
(2250, 2500] 132815
(2500, 2750] 258005
(2750, 3000] 460541
(3000, 3250] 702121
(3250, 3500] 772754
(3500, 3750] 617937
(3750, 4000] 379535
(4000, 4250] 176360
(4250, 4500] 75196
(4500, 4750] 26293
Name: count, dtype: int64
subset = vs2018.dropna(subset=["tobacco", "birthweight"])
subset.shape
(3782433, 6)
weights = subset.groupby("tobacco")["birthweight"].mean()
weights
tobacco
N 3275.077779
Y 3067.694444
Name: birthweight, dtype: float64
percent_diff(3067, 3275)
-6.35114503816794
Babies of smokers were lighter on average than babies of nonsmokers by about 6%, comparable to the difference in the previous two datasets.
labels = {"Y": "Smokers", "N": "Nonsmokers"}
ls = {"Y": "-", "N": "--"}
xs = np.linspace(subset["birthweight"].min(), subset["birthweight"].max(), 101)
for name, group in subset.groupby("tobacco"):
kdeplot(group["birthweight"], xs, label=labels[name], ls=ls[name])
plt.axvline(2500, color="gray", alpha=0.5)
decorate(
xlabel="Birthweight (grams)",
ylabel="Relative likelihood",
title="Distributions of birth weight",
xlim=[0, 6000],
)
subset["lbw"].mean()
0.08271157744234994
for name, group in subset.groupby("tobacco"):
lbw_rate = group["lbw"].mean()
print(name, lbw_rate)
N 0.07847403941187513
Y 0.1438352533387705
percent_diff(14.38, 7.84)
83.4183673469388
In 2018, fewer babies died in the first year of life; the mortality rate was 5.5 per 1000, down from 8.5 in 1991. And the mortality rate for babies of smokers was more than twice the rate for babies of nonsmokers, almost 11 per 1000 compared to 5.1.
subset["mort"].mean() * 1000
5.5213139267767595
for name, group in subset.groupby("tobacco"):
rate = group["mort"].mean() * 1000
print(name, rate)
N 5.16056731750013
Y 10.724844530533185
percent_diff(10.72, 5.16)
107.75193798449614
for name, group in subset[subset["lbw"]].groupby("tobacco"):
rate = group["mort"].mean() * 1000
print(name, rate)
N 45.07905857431578
Y 43.20707643456566
percent_diff(60.1, 74.85)
-19.70607882431529
Again, we can partition the dataset into groups with similar birthweight and compute the mortality rate in each group. The following figure shows the results.
table = pd.pivot_table(subset, index="bin", columns="tobacco", values="mort")
table *= 1000
table.index = bins[:-1] + np.diff(bins) / 2
table
tobacco | N | Y |
---|---|---|
1125.0 | 59.384364 | 67.014795 |
1375.0 | 40.366712 | 46.951220 |
1625.0 | 31.345859 | 36.663981 |
1875.0 | 19.119896 | 25.081282 |
2125.0 | 12.202122 | 16.664654 |
2375.0 | 6.994627 | 12.336111 |
2625.0 | 3.954523 | 9.317009 |
2875.0 | 2.652722 | 6.695117 |
3125.0 | 1.772246 | 4.676507 |
3375.0 | 1.385945 | 4.235809 |
3625.0 | 1.212840 | 3.887780 |
3875.0 | 1.184276 | 3.244174 |
4125.0 | 1.186478 | 2.538474 |
4375.0 | 1.297196 | 4.076641 |
4625.0 | 1.735084 | 5.995204 |
def plot_table(table, title):
table["Y"].plot(label="Smoker")
table["N"].plot(ls="--", label="Nonsmoker")
decorate(
xlabel="Birth weight (grams)", ylabel="Mortality rate per 1000", title=title
)
plot_table(table, "Mortality rate vs birthweight, NCHS 2018")
At every birthweight, mortality is higher for children of smokers.
Causal diagrams#
The 2006 paper explaining the Low Birthweight Paradox and the 2013 paper explaining the Obesity Paradox are noteworthy because they use causal diagrams to represent hypothetical causes and their effects. For example, here is a causal diagram that represents an explanation for the Low Birthweight Paradox:
# based on https://matplotlib.org/matplotblog/posts/mpl-for-making-diagrams/
def make_diagram(fig_width=6, fig_height=2, bg_color="white"):
fig = plt.figure(figsize=(fig_width, fig_height))
ax = fig.add_axes((0, 0, 1, 1))
ax.set_xlim(0, fig_width)
ax.set_ylim(0, fig_height)
ax.set_facecolor(bg_color)
ax.tick_params(bottom=False, top=False, left=False, right=False)
ax.tick_params(labelbottom=False, labeltop=False, labelleft=False, labelright=False)
return fig, ax
fig, ax = make_diagram()
# add rectangle to plot
# ax.add_patch(Rectangle((1, 1), 2, 1))
options = dict(fontsize=12, va="center")
y = 1
plt.text(1, y + 0.6, "Smoking", ha="right", **options)
plt.text(1, y - 0.6, "Unknown", ha="right", **options)
plt.text(2.75, y, "LBW", ha="right", **options)
plt.text(4, y, "Mortality", ha="left", **options)
arrowprops = dict(arrowstyle="<|-")
plt.annotate("", [1.1, y + 0.5], [2.35, y + 0.05], arrowprops=arrowprops)
plt.annotate("", [1.1, y - 0.5], [2.35, y - 0.05], arrowprops=arrowprops)
plt.annotate("", [2.9, y], [3.9, y], arrowprops=arrowprops)
plt.annotate("", [1.1, y + 0.6], [3.9, y + 0.1], arrowprops=arrowprops)
plt.annotate("", [1.1, y - 0.6], [3.9, y - 0.1], arrowprops=arrowprops)
None
The following causal diagram represents the explanation of the Obesity Paradox proposed in the 2013 paper:
fig, ax = make_diagram()
options = dict(fontsize=12, va="center")
y = 1
plt.text(1.2, y + 0.6, "Obesity", ha="right", **options)
plt.text(1.2, y - 0.55, "Unmeasured\nrisk factors", ha="right", **options)
plt.text(2.6, y, "Heart failure", ha="left", **options)
plt.text(4.6, y, "Mortality", ha="left", **options)
arrowprops = dict(arrowstyle="<|-")
plt.annotate("", [1.3, y + 0.5], [2.55, y + 0.05], arrowprops=arrowprops)
plt.annotate("", [1.3, y - 0.5], [2.55, y - 0.05], arrowprops=arrowprops)
plt.annotate("", [3.6, y], [4.5, y], arrowprops=arrowprops)
plt.annotate("", [1.3, y - 0.6], [4.5, y - 0.1], arrowprops=arrowprops)
None
Probably Overthinking It
Copyright 2022 Allen Downey
The code in this notebook and utils.py
is under the MIT license.