{ "cells": [ { "cell_type": "markdown", "id": "d2f1937f", "metadata": {}, "source": [ "# Distributions" ] }, { "cell_type": "markdown", "id": "32393b32", "metadata": { "tags": [ "remove-print" ] }, "source": [ "[Click here to run this notebook on Colab](https://colab.research.google.com/github/AllenDowney/ThinkStats/blob/v3/nb/chap02.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "id": "29ea6b54", "metadata": { "tags": [ "remove-print", "hide-cell" ] }, "outputs": [], "source": [ "%load_ext nb_black\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 1, "id": "1b3797b5", "metadata": { "tags": [ "remove-print", "hide-cell" ] }, "outputs": [], "source": [ "from os.path import basename, exists\n", "\n", "\n", "def download(url):\n", " filename = basename(url)\n", " if not exists(filename):\n", " from urllib.request import urlretrieve\n", "\n", " local, _ = urlretrieve(url, filename)\n", " print(\"Downloaded \" + local)\n", "\n", "\n", "download(\"https://github.com/AllenDowney/ThinkStats/raw/v3/nb/thinkstats.py\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "f43cb8b9", "metadata": { "tags": [ "remove-print", "hide-cell" ] }, "outputs": [], "source": [ "try:\n", " import empiricaldist\n", "except ImportError:\n", " !pip install empiricaldist" ] }, { "cell_type": "code", "execution_count": 3, "id": "338d162a", "metadata": { "tags": [ "remove-print", "hide-cell" ] }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from thinkstats import decorate" ] }, { "cell_type": "markdown", "id": "01d03795", "metadata": {}, "source": [ "## Histograms\n", "\n", "One of the best ways to describe a variable is to report the quantities it contains and how many times each one appears.\n", "This description is called the **distribution** of the variable.\n", "\n", "A common representation of a distribution is a **histogram**, which is a graph that shows the **frequency** of each quantity, which is the number of times it appears.\n", "\n", "To represent distributions, we'll use a library called `empiricaldist`.\n", "In this context, \"empirical\" means that the distributions are based on data rather than mathematical models.\n", "\n", "`empiricaldist` provides a class called `Hist` we can use to compute and plot a histogram.\n", "We can import it like this." ] }, { "cell_type": "code", "execution_count": 4, "id": "142ef9f5", "metadata": {}, "outputs": [], "source": [ "from empiricaldist import Hist" ] }, { "cell_type": "markdown", "id": "026f22d6", "metadata": {}, "source": [ "To show how it works, we'll start with a small list of values." ] }, { "cell_type": "code", "execution_count": 5, "id": "a8c88e69", "metadata": {}, "outputs": [], "source": [ "t = [1.0, 2.0, 2.0, 3.0, 5.0]" ] }, { "cell_type": "markdown", "id": "6addd31b", "metadata": {}, "source": [ "`Hist` provides a method called `from_seq` that takes a sequence and makes a `Hist` object." ] }, { "cell_type": "code", "execution_count": 6, "id": "32a8fcc3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | freqs | \n", "
---|---|
1.0 | \n", "1 | \n", "
2.0 | \n", "2 | \n", "
3.0 | \n", "1 | \n", "
5.0 | \n", "1 | \n", "