# Search

## Contents

# Search#

Click here to run this chapter on Colab

## Linear Search#

Suppose you have a list.

```
t = [5, 1, 2, 4, 2]
```

And you want to know whether an element appears in the list. You can use the `in`

operator, which returns `True`

or `False`

.

```
5 in t, 6 in t
```

```
(True, False)
```

If you want to know where in the list it is, you can use `index`

, which returns the index of the element if it appears.

```
t.index(2)
```

```
2
```

Or raises a `ValueError`

otherwise.

```
try:
t.index(6)
except ValueError as e:
print(e)
```

```
6 is not in list
```

The following function does the same thing as `string.index`

:

```
def index(t, target):
for i, x in enumerate(t):
if x == target:
return i
raise ValueError(f'{target} is not in list')
```

```
index(t, 2)
```

```
2
```

```
try:
index(t, 6)
except ValueError as e:
print(e)
```

```
6 is not in list
```

The runtime of this kind of search is in `O(n)`

, where `n`

is the length of the list, because

If the target is not in the list, you have to check every element in the list.

If the target is in a random location, you have to check half the list on average.

As an exception, if you know that the target is within the first `k`

elements, for a value of `k`

that does not depend on `n`

, you can consider this search `O(1)`

.

## Bisection#

If we know that the elements of the list are in order, we can do better.

The `bisect`

module provides an implementation of a “bisection search”, which works by

Checking the element in the middle of the list. If it’s the target, we’re done.

If the middle element is larger than the target, we search the left half.

If the middle element is smaller than the target, we search the right half.

Here is the documentation of the bisect modle.

To test it, we’ll start with a sorted list.

```
t.sort()
t
```

```
[1, 2, 2, 4, 5]
```

`bisect_left`

is similar to `index`

```
from bisect import bisect_left
bisect_left(t, 1), bisect_left(t, 2), bisect_left(t, 4), bisect_left(t, 5)
```

```
(0, 1, 3, 4)
```

But with elements that are not in the list, it returns their insertion point, that is, the place where you would put the target to keep the list sorted.

```
bisect_left(t, 6)
```

```
5
```

We can use `bisect_left`

to implement `index`

, like this:

```
from bisect import bisect_left
def index_bisect(a, x):
"""Locate the leftmost value exactly equal to x"""
i = bisect_left(a, x)
if i != len(a) and a[i] == x:
return i
raise ValueError(f'{x} not in list')
```

```
index_bisect(t, 1), index_bisect(t, 2), index_bisect(t, 4), index_bisect(t, 5)
```

```
(0, 1, 3, 4)
```

```
try:
index_bisect(t, 6)
except ValueError as e:
print(e)
```

```
6 not in list
```

**Exercise:** Write your own version of `bisect_left`

. You can do it iteratively or recursively.

Each time through the loop, we cut the search region in half, so if we start with `n`

elements, we have `n/2`

during the next loop, `n/4`

during the second loop, and so on. When we get to 1 element, we’re done.

So how many steps does that take? Thinking about it in reverse, starting with 1, how many times do we have to double it before we get to `n`

? In math notation, the question is

Where `x`

is the unknown number of steps. Taking the log of both sides, base 2:

In terms of order of growth, bisection search is in `O(log n)`

. Notice that we don’t bother to specify the base of the logarithm, because a log in one base is a constant multiple of a log in any other base.

`bisect`

also provides methods to insert elements while maintaining sorted order.

```
from bisect import insort
insort(t, 3)
t
```

```
[1, 2, 2, 3, 4, 5]
```

However, as the documentation explains, “Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.”

## Binary search tree#

Using a sorted array to support log-time search is a reasonable choice if we don’t have to add or remove elements very often.

But if the number of add/remove operations is similar to the number of searches, the overall performance would be linear.

We can solve that problem with a binary search tree.

To implement a tree, I’ll define a new class that represents a `Node`

.
Each node contains data and a reference to two “children” called `left`

and `right`

.
(It’s called a binary tree because every node has two children).

```
class Node:
def __init__(self, data, left=None, right=None):
self.data = data
self.left = left
self.right = right
def __repr__(self):
return f'Node({self.data}, {repr(self.left)}, {repr(self.right)})'
```

Here’s how we can instantiate two nodes.

```
node3 = Node(3)
node10 = Node(10)
```

Because `Node`

provides `__repr__`

, we can display a node like this.

```
node3
```

```
Node(3, None, None)
```

Now we’ll make a parent node that has the first two nodes as children.

```
node8 = Node(8, node3, node10)
node8
```

```
Node(8, Node(3, None, None), Node(10, None, None))
```

I’ll define another class to represent the tree. The only thing it contains is a reference to the top of the tree, which is confusingly called the root node.

```
class BSTree:
def __init__(self, root=None):
self.root = root
def __repr__(self):
return f'BSTree({repr(self.root)})'
```

Here’s tree with a reference to `node8`

, so it implicitly contains `node3`

and `node10`

as well.

```
tree = BSTree(node8)
tree
```

```
BSTree(Node(8, Node(3, None, None), Node(10, None, None)))
```

A binary tree is a binary search tree if for every node (1) the value of the left child is lower and (2) the value of the right child is higher. Let’s assume for now that there are no duplicates.

We can check whether a tree is a BST like this:

```
def is_bst(tree):
return is_bst_rec(tree.root)
def is_bst_rec(node):
if node is None:
return True
if node.left and node.left.data > node.data:
return False
if node.right and node.right.data < node.data:
return False
return is_bst_rec(node.left) and is_bst_rec(node.right)
```

```
is_bst(tree)
```

```
True
```

And let’s see an example where it’s not true.

```
node5 = Node(5, node10, node3)
node5
```

```
Node(5, Node(10, None, None), Node(3, None, None))
```

```
tree2 = BSTree(node5)
is_bst(tree2)
```

```
False
```

## Draw the Tree#

One of the better functions for drawing trees is part of a package called `EoN`

, for “Epidemics on Networks”, which provides “tools to study the spread of SIS and SIR diseases in networks”.

The function we’ll use is called hierarchy_pos.
It takes as a parameter a NetworkX graph that represents a tree, and it returns a dictionary that maps from each node to a position in the Cartesian plane.
If we pass this dictionary to `nx.draw`

, it lays the tree out accordingly.

```
try:
import EoN
except ImportError:
!pip install EoN
```

```
import networkx as nx
def add_edges(node, G):
"""Make a NetworkX graph that represents the heap."""
if node is None:
return
G.add_node(node, label=node.data)
for child in (node.left, node.right):
if child:
G.add_edge(node, child)
add_edges(child, G)
```

```
G = nx.DiGraph()
add_edges(tree.root, G)
G.nodes()
```

```
NodeView((Node(8, Node(3, None, None), Node(10, None, None)), Node(3, None, None), Node(10, None, None)))
```

```
labels = {node: node.data for node in G.nodes()}
labels
```

```
{Node(8, Node(3, None, None), Node(10, None, None)): 8,
Node(3, None, None): 3,
Node(10, None, None): 10}
```

```
from EoN import hierarchy_pos
def draw_tree(tree):
G = nx.DiGraph()
add_edges(tree.root, G)
pos = hierarchy_pos(G)
labels = {node: node.data for node in G.nodes()}
nx.draw(G, pos, labels=labels, alpha=0.4)
```

```
draw_tree(tree)
```

## Search#

Given a tree and a target value, how do we determine whether the target is in the tree?

Start at the root. If you find the target, stop.

If the target is less than the value at the root, go left.

If the target is greater than the value at the root, go right.

If you get to a non-existent node, stop.

**Exercise:** Write a function called `search`

that takes a `BSTree`

and a target value and returns `True`

if the target value appears in the tree.

**Exercise:** Many tree operations lend themselves to recursive implementations. Write a function called `search_rec`

that searches the tree recursively.

Hint: Start with a copy of `is_bst`

.

## Insert#

The point of the BST is that we can add and remove elements efficiently, compared to a sorted array.

So let’s see what that looks like.

```
def insert(tree, data):
tree.root = insert_node(tree.root, data)
def insert_node(node, data):
if node is None:
return Node(data)
if data < node.data:
node.left = insert_node(node.left, data)
else:
node.right = insert_node(node.right, data)
return node
```

We’ll test it by starting with an empty tree and adding elements one at a time.

```
tree = BSTree()
values = [8, 3, 10, 1, 6, 14, 4, 7, 13]
for value in values:
insert(tree, value)
tree
```

```
BSTree(Node(8, Node(3, Node(1, None, None), Node(6, Node(4, None, None), Node(7, None, None))), Node(10, None, Node(14, Node(13, None, None), None))))
```

```
draw_tree(tree)
```

If things have gone according to plan, the result should be a BST.

```
is_bst(tree)
```

```
True
```

## Sorting#

If we traverse the tree recursively and print the elements as we go, we get the values in sorted order.

```
def print_tree(tree):
print_tree_rec(tree.root)
def print_tree_rec(node):
if node is None:
return
print_tree_rec(node.left)
print(node.data, end=' ')
print_tree_rec(node.right)
```

```
print_tree(tree)
```

```
1 3 4 6 7 8 10 13 14
```

**Exercise:** Write a generator method called `iterate_tree`

that traverses the tree and yields the elements in order.

You can do this iteratively or recursively.

## Badness 10000#

If the tree is reasonably well balanced, the height is proportional to `log n`

, where `n`

is the number of elements.

But let’s see what happens if we add elements in sorted order.

```
tree3 = BSTree()
for x in sorted(values):
insert(tree3, x)
```

```
draw_tree(tree3)
```

Now traversing the tree takes linear time. To avoid this problem, there are variations of BST that are self-balancing.

Most are based on tree rotation operations. For example, the following is a function that rotates a tree to the left (following Wikipedia’s nomenclature for what “left” and “right” mean).

```
def rotate_left(node):
if node is None or node.right is None:
return node
pivot = node.right
node.right = pivot.left
pivot.left = node
return pivot
```

```
tree3.root = rotate_left(tree3.root)
draw_tree(tree3)
```

*Data Structures and Information Retrieval in Python*

Copyright 2021 Allen Downey

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International