{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Decision Analysis" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "Think Bayes, Second Edition\n", "\n", "Copyright 2020 Allen B. Downey\n", "\n", "License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:41.807708Z", "iopub.status.busy": "2021-04-16T19:35:41.807149Z", "iopub.status.idle": "2021-04-16T19:35:41.810352Z", "shell.execute_reply": "2021-04-16T19:35:41.809718Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# If we're running on Colab, install empiricaldist\n", "# https://pypi.org/project/empiricaldist/\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install empiricaldist" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:41.814888Z", "iopub.status.busy": "2021-04-16T19:35:41.814343Z", "iopub.status.idle": "2021-04-16T19:35:41.816295Z", "shell.execute_reply": "2021-04-16T19:35:41.816736Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# Get utils.py\n", "\n", "from os.path import basename, exists\n", "\n", "def download(url):\n", " filename = basename(url)\n", " if not exists(filename):\n", " from urllib.request import urlretrieve\n", " local, _ = urlretrieve(url, filename)\n", " print('Downloaded ' + local)\n", " \n", "download('https://github.com/AllenDowney/ThinkBayes2/raw/master/soln/utils.py')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:41.820085Z", "iopub.status.busy": "2021-04-16T19:35:41.819538Z", "iopub.status.idle": "2021-04-16T19:35:42.501987Z", "shell.execute_reply": "2021-04-16T19:35:42.501513Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "from utils import set_pyplot_params\n", "set_pyplot_params()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This chapter presents a problem inspired by the game show *The Price is Right*.\n", "It is a silly example, but it demonstrates a useful process called Bayesian [decision analysis](https://en.wikipedia.org/wiki/Decision_analysis).\n", "\n", "As in previous examples, we'll use data and prior distribution to compute a posterior distribution; then we'll use the posterior distribution to choose an optimal strategy in a game that involves bidding.\n", "\n", "As part of the solution, we will use kernel density estimation (KDE) to estimate the prior distribution, and a normal distribution to compute the likelihood of the data.\n", "\n", "And at the end of the chapter, I pose a related problem you can solve as an exercise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Price Is Right Problem\n", "\n", "On November 1, 2007, contestants named Letia and Nathaniel appeared on *The Price is Right*, an American television game show. They competed in a game called \"The Showcase\", where the objective is to guess the price of a collection of prizes. The contestant who comes closest to the actual price, without going over, wins the prizes.\n", "\n", "Nathaniel went first. His showcase included a dishwasher, a wine cabinet, a laptop computer, and a car. He bid \\\\$26,000.\n", "\n", "Letia's showcase included a pinball machine, a video arcade game, a pool table, and a cruise of the Bahamas. She bid \\\\$21,500.\n", "\n", "The actual price of Nathaniel's showcase was \\\\$25,347. His bid was too high, so he lost.\n", "\n", "The actual price of Letia's showcase was \\\\$21,578. \n", "\n", "She was only off by \\\\$78, so she won her showcase and, because her bid was off by less than 250, she also won Nathaniel's showcase." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a Bayesian thinker, this scenario suggests several questions:\n", "\n", "1. Before seeing the prizes, what prior beliefs should the contestants have about the price of the showcase?\n", "\n", "2. After seeing the prizes, how should the contestants update those beliefs?\n", "\n", "3. Based on the posterior distribution, what should the contestants bid?\n", "\n", "The third question demonstrates a common use of Bayesian methods: decision analysis.\n", "\n", "This problem is inspired by [an example](https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter5_LossFunctions/Ch5_LossFunctions_PyMC3.ipynb) in Cameron Davidson-Pilon's book, [*Probablistic Programming and Bayesian Methods for Hackers*](http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Prior\n", "\n", "To choose a prior distribution of prices, we can take advantage of data from previous episodes. Fortunately, [fans of the show keep detailed records](https://web.archive.org/web/20121107204942/http://www.tpirsummaries.8m.com/). \n", "\n", "For this example, I downloaded files containing the price of each showcase from the 2011 and 2012 seasons and the bids offered by the contestants." ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "The following cells load the data files." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:42.507523Z", "iopub.status.busy": "2021-04-16T19:35:42.506995Z", "iopub.status.idle": "2021-04-16T19:35:42.508609Z", "shell.execute_reply": "2021-04-16T19:35:42.509016Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# Load the data files\n", "\n", "download('https://raw.githubusercontent.com/AllenDowney/ThinkBayes2/master/data/showcases.2011.csv')\n", "download('https://raw.githubusercontent.com/AllenDowney/ThinkBayes2/master/data/showcases.2012.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function reads the data and cleans it up a little." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:42.513237Z", "iopub.status.busy": "2021-04-16T19:35:42.512673Z", "iopub.status.idle": "2021-04-16T19:35:42.515256Z", "shell.execute_reply": "2021-04-16T19:35:42.514810Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "def read_data(filename):\n", " \"\"\"Read the showcase price data.\"\"\"\n", " df = pd.read_csv(filename, index_col=0, skiprows=[1])\n", " return df.dropna().transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'll read both files and concatenate them." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:42.519025Z", "iopub.status.busy": "2021-04-16T19:35:42.518583Z", "iopub.status.idle": "2021-04-16T19:35:42.547366Z", "shell.execute_reply": "2021-04-16T19:35:42.546804Z" } }, "outputs": [], "source": [ "df2011 = read_data('showcases.2011.csv')\n", "df2012 = read_data('showcases.2012.csv')\n", "\n", "df = pd.concat([df2011, df2012], ignore_index=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:42.550926Z", "iopub.status.busy": "2021-04-16T19:35:42.550422Z", "iopub.status.idle": "2021-04-16T19:35:42.553032Z", "shell.execute_reply": "2021-04-16T19:35:42.552623Z" }, "tags": [ "hide-cell" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(191, 6) (122, 6) (313, 6)\n" ] } ], "source": [ "print(df2011.shape, df2012.shape, df.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what the dataset looks like:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:35:42.558946Z", "iopub.status.busy": "2021-04-16T19:35:42.557889Z", "iopub.status.idle": "2021-04-16T19:35:42.570199Z", "shell.execute_reply": "2021-04-16T19:35:42.570579Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Showcase 1 | \n", "Showcase 2 | \n", "Bid 1 | \n", "Bid 2 | \n", "Difference 1 | \n", "Difference 2 | \n", "
---|---|---|---|---|---|---|
0 | \n", "50969.0 | \n", "45429.0 | \n", "42000.0 | \n", "34000.0 | \n", "8969.0 | \n", "11429.0 | \n", "
1 | \n", "21901.0 | \n", "34061.0 | \n", "14000.0 | \n", "59900.0 | \n", "7901.0 | \n", "-25839.0 | \n", "
2 | \n", "32815.0 | \n", "53186.0 | \n", "32000.0 | \n", "45000.0 | \n", "815.0 | \n", "8186.0 | \n", "