Priority Queues and Heaps#

Click here to run this chapter on Colab

The heapq module#

The heapq module provides functions for adding and removing elements to and from a heap.

from heapq import heappush, heappop

The heap itself is literally a list, so if you create an empty list, you can think of it as a heap with no elements.

heap = []

Then you can use heappush to add one element at a time.

data = [4, 9, 3, 7, 5, 1, 6, 8, 2]

for x in data:
    heappush(heap, x)
    
heap
[1, 2, 3, 5, 7, 4, 6, 9, 8]

The result is a list that represents a tree. Here’s how the correspondence works between the list representation and the tree representation:

  • The first element (index 0) is the root.

  • The next two elements are the children of the root.

  • The next four element are the grandchildren of the root.

And so on.

In general, if the index of an element is i, its parent is (i-1)//2 and its children are 2*i + 1 and 2*i + 2.

Drawing the Tree#

To generate the tree representation of the heap, the following function iterates through the heap and makes a NetworkX graph with an edge between each node and its parent.

import networkx as nx

def make_dag(heap):
    """Make a NetworkX graph that represents the heap."""
    G = nx.DiGraph()
    
    for i in range(1, len(heap)):
        parent = (i-1)//2
        G.add_edge(parent, i)
    
    return G
G = make_dag(heap)

To draw the tree, we’ll use a module called EoN that provides a function called hierarchy_pos.

It takes as a parameter a NetworkX graph that represents a tree, and it returns a dictionary that maps from each node to a position in the Cartesian plane. If we pass this dictionary to nx.draw, it lays the tree out accordingly.

try:
    import EoN
except ImportError:
    !pip install EoN
Collecting EoN
  Using cached EoN-1.1-py3-none-any.whl
Requirement already satisfied: numpy in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from EoN) (1.21.4)
Requirement already satisfied: scipy in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from EoN) (1.7.3)
Requirement already satisfied: matplotlib in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from EoN) (3.5.1)
Requirement already satisfied: networkx in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from EoN) (2.6.3)
Requirement already satisfied: cycler>=0.10 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (2.8.2)
Requirement already satisfied: pyparsing>=2.2.1 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (3.0.6)
Requirement already satisfied: fonttools>=4.22.0 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (4.28.5)
Requirement already satisfied: packaging>=20.0 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (21.3)
Requirement already satisfied: pillow>=6.2.0 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (8.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from matplotlib->EoN) (1.3.2)
Requirement already satisfied: six>=1.5 in /home/downey/anaconda3/envs/DSIRP/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->EoN) (1.16.0)
Installing collected packages: EoN
Successfully installed EoN-1.1
from EoN import hierarchy_pos

def draw_heap(heap):
    G = make_dag(heap)
    pos = hierarchy_pos(G)
    labels = dict(enumerate(heap))
    nx.draw(G, pos, labels=labels, alpha=0.4)

Here’s what the tree representation looks like.

print(heap)
draw_heap(heap)
[1, 2, 3, 5, 7, 4, 6, 9, 8]
_images/heap_17_1.png

The Heap Property#

If the list is a heap, the tree should have the heap property:

Every parent is less than or equal to its children.

Or more formally:

For all pairs of nodes P and C, where P is the parent of C, the value of P must be less than or equal to the value of C.

The following function checks whether this property is true for all nodes.

def is_heap(heap):
    """Check if a sequence has the heap property.
    
    Every child should be >= its parent.
    """
    for i in range(1, len(heap)):
        parent = (i-1) // 2
        if heap[parent] > heap[i]:
            return False
    return True

As we might hope, heap is a heap.

is_heap(heap)
True

Here’s a list of integers in no particular order, and as you might expect, it does not have the heap property.

data = [4, 9, 3, 7, 5, 1, 6, 8, 2]
is_heap(data)
False

Using a Heap to Sort#

Given a heap, we can implement a sort algorithm called heapsort.

Let’s start again with a fresh heap:

heap = []
for x in data:
    heappush(heap, x)

If we know that a list is a heap, we can use heappop to find and remove the smallest element.

heappop(heap)
1

heappop rearranges the remaining elements of the list to restore the heap property (we’ll see how soon).

heap
[2, 5, 3, 8, 7, 4, 6, 9]
is_heap(heap)
True

And that means we can use heappop again to get the second smallest element (of the original heap):

heappop(heap)
2

Which means we can use a heap to sort a list.

Exercise: Write a generator function called heapsort that takes an iterable and yields the elements of the iterable in increasing order.

Now let’s see how a heap is implemented. The two key methods are push and pop.

Push#

To insert an element in a heap, you start by appending it to the list.

The result is generally not a heap, so you have to do some work to restore the heap property:

  • If the new element is greater than or equal to its parent, you are done.

  • Otherwise swap the new element with its parent.

  • If the new element is greater than or equal to the parent’s parent, you are done.

  • Otherwise swap the new element with its parent’s parent.

  • And repeat, working your way up the tree, until you’re done or you reach the root.

This process is called “sift-up” or sometimes swim.

Exercise: Write a function called push that does the same thing as heappush: it should take as parameters a list (which should be a heap) and a new element; it should add the new element to the list and restore the heap property.

You can use this example to test your code:

heap = []
for x in data:
    push(heap, x)
    assert is_heap(heap)

heap
[1, 2, 3, 5, 7, 4, 6, 9, 8]
is_heap(heap)
True

Pop#

To remove an element from the heap, you:

  • Make a copy of the root element,

  • Pop the last element off the list and store it at the root.

  • Then you have to restore the heap property. If the new root is less than or equal to both of its children, you are done.

  • Otherwise, swap the parent with the smaller of its children.

  • Then repeat the process with the child you just replaced, and continue until you get to a leaf node.

This process is called a “sift-down” or sometimes “sink”.

Exercise: Write a function called pop that does the same thing as heappop: it should remove the smallest element, restore the heap property, and return the smallest element.

Hint: This one is tricky because you have to deal with several special cases.

heap = []
for x in data:
    heappush(heap, x)

while heap:
    print(pop(heap))
    assert is_heap(heap)
1
2
3
4
5
6
7
8
9

Data Structures and Information Retrieval in Python

Copyright 2021 Allen Downey

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International