14. Further Reading#

The first part of this book is an accelerated introduction to Python with emphasis on tools for working with data. One of the benefits of learning Python is that it useful for many other kinds of computing, not just data science. If you would like to learn more about Python, there are a lot of good books and online resources, but if the style of this book works well for you, you might like Think Python, also by Allen Downey and published by O’Reilly Media.

If you are interested in scientific computing, you might like Modeling and Simulation in Python, published by No Starch Press. It is an introduction to Python focused on modeling and simulating physical systems. It explores a range of topics including population growth, infectious disease, and simple mechanical systems.

The second part of this book is about exploratory data analysis and visualization. If you are interested in exploratory data analysis, you might also like Think Stats: Exploratory Data Analysis in Python, published by O’Reilly Media. If you are interested in data visualization, you might like Nathan Yau’s blog, FlowingData, and his books, Visualize This and Data Points.

The third part of this book is about statistical inference, that is, using data from a sample to estimate something about a population. We used resampling to quantify the precision of those estimates, and hypothesis testing to consider whether an effect we observe might be due to chance. I am currently working on a book called Data Q&A: Answering the Real Questions with Python that applies these methods to questions posted on Reddit’s statistics forum.

The methods I demonstrate in this book might be called conventional inference, in contrast to the alternative, which is Bayesian inference. If you are interested in learning more about that, you might like Think Bayes: Bayesian Statistics in Python, also by Allen Downey and published by O’Reilly Media.

The Political Alignment case study uses data from the General Social Survey (GSS) to explore political beliefs in the United States, how they differ between groups, and how they change over time. If you are interested in this topic, you might like Probably Overthinking It, published by University of Chicago Press, which explores the GSS data in greater depth. It presents a variety of other topics as well, exploring, as the subtitle explains, “How to use data to answer questions, avoid statistical traps, and make better decisions”.

Finally, the Recidivism case study explores the use of predictive algorithms in the criminal justice system and explains the metrics we use to assess them. There are many books and articles on this topic, which reflect its importance because of the impact it has on people’s lives, and also the difficulty of resolving conflicting requirements of fairness. If you would like to read more on this topic, I recommend Orly Lobel’s recent book, The Equality Machine, which reviews many of the challenges algorithms pose while also recognizing their potential to do good.

Elements of Data Science

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International