# Introduction#

Data Structures and Information Retrieval in Python is an introduction to data structures and algorithms using a web search engine as a motivating example. It is based in part on Think Data Structures, which uses Java.

The elements of the search engine are:

• The Indexer, which builds a map from each search term to the pages where it appears, and

• The Retriever, which looks up search terms and finds relevant, high-quality pages.

The index is stored in Redis, which is a data store that provides structures like sets, lists, and hashmaps. The book presents each data structure first in Python, then in Redis, which should help readers see which features are essential and which are implementation details.

As I did with Think Bayes, I wrote this book entirely in Jupyter notebooks, and used JupyterBook to translate them to HTML. The notebooks run on Colab, which is a service provided by Google that runs notebooks in a browser. So you can read the book, run the code, and work on exercises without installing anything.

This material is a work in progress, so your feedback is welcome. The best way to provide that feedback is to click here and create an issue in this GitHub repository.

Overview slides

## The notebooks#

Click on the links below to run the notebooks on Colab.

• Algorithms: Day One activity checking for anagrams and finding anagram sets.
Notebook

• Analysis: Introduction to the analysis of algorithms and Big O notation.
Slides
Notebook

• Timing: Checking asymptotic behavior by measuring run time.
Notebook

• Quiz 1
Notebook

• Generator functions: Separate the iteration from the program logic
Slides
Notebook

• Set: Using Python sets to cheat at word games.
Notebook

• Recursion: Practice recursive functions.
Notebook

• Quiz 2
Notebook

• Depth First Search: Tree traversal in BeautifulSoup.
Slides
Notebook

• Search: Linear search, bisection, and binary search trees.
Slides
Notebook

• Hashmap: How the greatest of all data structures works.
Slides
Notebook

• Quiz 3
Notebook

• Heap: It’s an array, it’s a tree, it’s a PriorityQueue!
Slides
Notebook

• Huffman Code: Use the structures we’ve learned to make an optimal prefix code.
Slides
Notebook

• Getting to Philosophy: Follow Wikipedia links until you get to Philosophy.
Slides
Notebook

• Quiz 4
Notebook

• Redis: Introduction to the Redis data store.
Notebook

• Linked List: Trees before lists? Strange but true.
Slides
Notebook

• Indexer: Make a map of the internet for fast lookups.
Notebook

• Quiz 5
Notebook

• Deque: Like a linked list, but more so.
Notebook

• Graphs: Representing graphs with NetworkX.
Notebook

• Level Order Search: Use the os module to traverse a file system.
Slides
Notebook

• Breadth-First Search: The foundation of graph algorithms.
Slides
Notebook

• Quiz 6
Notebook

Notebook

• Mergesort: Divide, conquer, and merge in linearithmic time.
Slides
Notebook

• Fast Fourier Transform: It’s like mergesort, but with complex numbers.
Slides
Notebook

• Quiz 7 Notebook

• PageRank: Random walks, adjacency matrices, and eigenvectors!
Slides
Notebook