# Introduction

## Contents

# Introduction#

*Data Structures and Information Retrieval in Python* is an introduction to data structures and algorithms using a web search engine as a motivating example.
It is based in part on *Think Data Structures*, which uses Java.

The elements of the search engine are:

The Crawler, which downloads web pages and follows links to other pages,

The Indexer, which builds a map from each search term to the pages where it appears, and

The Retriever, which looks up search terms and finds relevant, high-quality pages.

The index is stored in Redis, which is a data store that provides structures like sets, lists, and hashmaps. The book presents each data structure first in Python, then in Redis, which should help readers see which features are essential and which are implementation details.

As I did with *Think Bayes*, I wrote this book entirely in Jupyter notebooks, and used JupyterBook to translate them to HTML. The notebooks run on Colab, which is a service provided by Google that runs notebooks in a browser. So you can read the book, run the code, and work on exercises without installing anything.

This material is a work in progress, so your feedback is welcome. The best way to provide that feedback is to click here and create an issue in this GitHub repository.

## The notebooks#

Click on the links below to run the notebooks on Colab.

Algorithms: Day One activity checking for anagrams and finding anagram sets.

NotebookAnalysis: Introduction to the analysis of algorithms and Big O notation.

Slides

NotebookTiming: Checking asymptotic behavior by measuring run time.

NotebookQuiz 1

NotebookGenerator functions: Separate the iteration from the program logic

Slides

NotebookSet: Using Python sets to cheat at word games.

NotebookRecursion: Practice recursive functions.

NotebookQuiz 2

NotebookDepth First Search: Tree traversal in BeautifulSoup.

Slides

NotebookSearch: Linear search, bisection, and binary search trees.

Slides

NotebookHashmap: How the greatest of all data structures works.

Slides

NotebookQuiz 3

NotebookHeap: It’s an array, it’s a tree, it’s a PriorityQueue!

Slides

NotebookHuffman Code: Use the structures we’ve learned to make an optimal prefix code.

Slides

NotebookGetting to Philosophy: Follow Wikipedia links until you get to Philosophy.

Slides

NotebookQuiz 4

NotebookRedis: Introduction to the Redis data store.

NotebookLinked List: Trees before lists? Strange but true.

Slides

NotebookIndexer: Make a map of the internet for fast lookups.

NotebookQuiz 5

NotebookDeque: Like a linked list, but more so.

NotebookGraphs: Representing graphs with NetworkX.

NotebookLevel Order Search: Use the

`os`

module to traverse a file system.

Slides

NotebookBreadth-First Search: The foundation of graph algorithms.

Slides

NotebookQuiz 6

NotebookCrawler: Follow links and breadth-first search the internet.

NotebookMergesort: Divide, conquer, and merge in linearithmic time.

Slides

NotebookFast Fourier Transform: It’s like mergesort, but with complex numbers.

Slides

NotebookQuiz 7 Notebook

PageRank: Random walks, adjacency matrices, and eigenvectors!

Slides

Notebook

Copyright 2021 Allen B. Downey

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)