Think Stats is an introduction to Statistics and Data Science for Python programmers. If you have basic skills in Python, you can use them to learn concepts in probability and statistics and practical skills for working with data.
This book emphasizes simple techniques you can use to explore real data sets and answer interesting questions.
It includes case studies using datasets from the National Institutes of Health and other sources.
Many of the exercises use short programs to run experiments and help readers develop understanding.
This book is available under a Creative Commons license, which means that you are free to copy, distribute, and modify it, as long as you attribute the source and don’t use it for commercial purposes.
The easiest way to work with this code it to run it on Colab, which is a free service that runs Jupyter notebooks in a web browser. For every chapter, I provide two notebooks: one contains the code from the chapter and the exercises; the other also contains the solutions.
If you want to run these notebooks on your own computer, you can download them individually from GitHub or download the entire repository in a Zip file.
I developed this book using Anaconda, which is a free Python distribution that includes all the packages you’ll need to run the code (and lots more). I found Anaconda easy to install. By default it does a user-level installation, so you don’t need administrative privileges. You can download it here.