{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Comparison" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "Think Bayes, Second Edition\n", "\n", "Copyright 2020 Allen B. Downey\n", "\n", "License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.234088Z", "iopub.status.busy": "2021-04-16T19:37:12.233652Z", "iopub.status.idle": "2021-04-16T19:37:12.235885Z", "shell.execute_reply": "2021-04-16T19:37:12.235454Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# If we're running on Colab, install empiricaldist\n", "# https://pypi.org/project/empiricaldist/\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install empiricaldist" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.239747Z", "iopub.status.busy": "2021-04-16T19:37:12.239319Z", "iopub.status.idle": "2021-04-16T19:37:12.241356Z", "shell.execute_reply": "2021-04-16T19:37:12.240953Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# Get utils.py\n", "\n", "from os.path import basename, exists\n", "\n", "def download(url):\n", " filename = basename(url)\n", " if not exists(filename):\n", " from urllib.request import urlretrieve\n", " local, _ = urlretrieve(url, filename)\n", " print('Downloaded ' + local)\n", " \n", "download('https://github.com/AllenDowney/ThinkBayes2/raw/master/soln/utils.py')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.244102Z", "iopub.status.busy": "2021-04-16T19:37:12.243682Z", "iopub.status.idle": "2021-04-16T19:37:12.920327Z", "shell.execute_reply": "2021-04-16T19:37:12.920694Z" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "from utils import set_pyplot_params\n", "set_pyplot_params()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This chapter introduces joint distributions, which are an essential tool for working with distributions of more than one variable.\n", "\n", "We'll use them to solve a silly problem on our way to solving a real problem.\n", "The silly problem is figuring out how tall two people are, given only that one is taller than the other.\n", "The real problem is rating chess players (or participants in other kinds of competition) based on the outcome of a game.\n", "\n", "To construct joint distributions and compute likelihoods for these problems, we will use outer products and similar operations. And that's where we'll start." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outer Operations\n", "\n", "Many useful operations can be expressed as the \"outer product\" of two sequences, or another kind of \"outer\" operation.\n", "Suppose you have sequences like `x` and `y`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.923867Z", "iopub.status.busy": "2021-04-16T19:37:12.923379Z", "iopub.status.idle": "2021-04-16T19:37:12.925164Z", "shell.execute_reply": "2021-04-16T19:37:12.925562Z" } }, "outputs": [], "source": [ "x = [1, 3, 5]\n", "y = [2, 4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The outer product of these sequences is an array that contains the product of every pair of values, one from each sequence.\n", "There are several ways to compute outer products, but the one I think is the most versatile is a \"mesh grid\".\n", "\n", "NumPy provides a function called `meshgrid` that computes a mesh grid. If we give it two sequences, it returns two arrays." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.928585Z", "iopub.status.busy": "2021-04-16T19:37:12.928155Z", "iopub.status.idle": "2021-04-16T19:37:12.929813Z", "shell.execute_reply": "2021-04-16T19:37:12.930151Z" } }, "outputs": [], "source": [ "import numpy as np\n", "\n", "X, Y = np.meshgrid(x, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first array contains copies of `x` arranged in rows, where the number of rows is the length of `y`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.935088Z", "iopub.status.busy": "2021-04-16T19:37:12.934570Z", "iopub.status.idle": "2021-04-16T19:37:12.937438Z", "shell.execute_reply": "2021-04-16T19:37:12.937057Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 3, 5],\n", " [1, 3, 5]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second array contains copies of `y` arranged in columns, where the number of columns is the length of `x`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.940916Z", "iopub.status.busy": "2021-04-16T19:37:12.940298Z", "iopub.status.idle": "2021-04-16T19:37:12.943428Z", "shell.execute_reply": "2021-04-16T19:37:12.942892Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[2, 2, 2],\n", " [4, 4, 4]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because the two arrays are the same size, we can use them as operands for arithmetic functions like multiplication." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.947253Z", "iopub.status.busy": "2021-04-16T19:37:12.946619Z", "iopub.status.idle": "2021-04-16T19:37:12.949435Z", "shell.execute_reply": "2021-04-16T19:37:12.949886Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 6, 10],\n", " [ 4, 12, 20]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X * Y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is result is the outer product of `x` and `y`.\n", "We can see that more clearly if we put it in a `DataFrame`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2021-04-16T19:37:12.957805Z", "iopub.status.busy": "2021-04-16T19:37:12.955256Z", "iopub.status.idle": "2021-04-16T19:37:12.962953Z", "shell.execute_reply": "2021-04-16T19:37:12.962527Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | 1 | \n", "3 | \n", "5 | \n", "
---|---|---|---|
2 | \n", "2 | \n", "6 | \n", "10 | \n", "
4 | \n", "4 | \n", "12 | \n", "20 | \n", "