# 2. Erdos-Renyi Graphs#

Code examples from Think Complexity, 2nd edition.

Copyright 2016 Allen Downey, MIT License

```
from os.path import basename, exists
def download(url):
filename = basename(url)
if not exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
download('https://github.com/AllenDowney/ThinkComplexity2/raw/master/notebooks/utils.py')
```

```
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns
from utils import decorate, savefig
# Set the random seed so the notebook
# produces the same results every time.
np.random.seed(17)
```

```
# make a directory for figures
!mkdir -p figs
```

```
# node colors for drawing networks
colors = sns.color_palette('pastel', 5)
#sns.palplot(colors)
sns.set_palette(colors)
```

## 2.1. Directed graph#

The first example is a directed graph that represents a social network with three nodes.

```
G = nx.DiGraph()
G.add_node('Alice')
G.add_node('Bob')
G.add_node('Cate')
list(G.nodes())
```

Here’s how we add edges between nodes.

```
G.add_edge('Alice', 'Bob')
G.add_edge('Alice', 'Cate')
G.add_edge('Bob', 'Alice')
G.add_edge('Bob', 'Cate')
list(G.edges())
```

And here’s how to draw the graph.

```
nx.draw_circular(G,
node_color='C0',
node_size=2000,
with_labels=True)
plt.axis('equal')
savefig('figs/chap02-1')
```

**Exercise:** Add another node and a few more edges and draw the graph again.

## 2.2. Undirected graph#

The second example is an undirected graph that represents cities and the driving times between them.

`positions`

is a dictionary that maps from each city to its coordinates.

```
positions = dict(Albany=(-74, 43),
Boston=(-71, 42),
NYC=(-74, 41),
Philly=(-75, 40))
positions['Albany']
```

We can use the keys in `pos`

to add nodes to the graph.

```
G = nx.Graph()
G.add_nodes_from(positions)
G.nodes()
```

`drive_times`

is a dictionary that maps from pairs of cities to the driving times between them.

```
drive_times = {('Albany', 'Boston'): 3,
('Albany', 'NYC'): 4,
('Boston', 'NYC'): 4,
('NYC', 'Philly'): 2}
```

We can use the keys from `drive_times`

to add edges to the graph.

```
G.add_edges_from(drive_times)
G.edges()
```

Now we can draw the graph using `positions`

to indicate the positions of the nodes, and `drive_times`

to label the edges.

```
nx.draw(G, positions,
node_color='C1',
node_shape='s',
node_size=2500,
with_labels=True)
nx.draw_networkx_edge_labels(G, positions,
edge_labels=drive_times)
plt.axis('equal')
savefig('figs/chap02-2')
```

**Exercise:** Add another city and at least one edge.

## 2.3. Complete graph#

To make a complete graph, we use a generator function that iterates through all pairs of nodes.

```
def all_pairs(nodes):
for i, u in enumerate(nodes):
for j, v in enumerate(nodes):
if i < j:
yield u, v
```

`make_complete_graph`

makes a `Graph`

with the given number of nodes and edges between all pairs of nodes.

```
def make_complete_graph(n):
G = nx.Graph()
nodes = range(n)
G.add_nodes_from(nodes)
G.add_edges_from(all_pairs(nodes))
return G
```

Here’s a complete graph with 10 nodes:

```
complete = make_complete_graph(10)
complete.number_of_nodes()
```

And here’s what it looks like.

```
nx.draw_circular(complete,
node_color='C2',
node_size=1000,
with_labels=True)
savefig('figs/chap02-3')
```

The `neighbors`

method the neighbors for a given node.

```
list(complete.neighbors(0))
```

**Exercise:** Make and draw complete directed graph with 5 nodes.

## 2.4. Random graphs#

Next we’ll make a random graph where the probability of an edge between each pair of nodes is \(p\).

The helper function `flip`

returns True with probability `p`

and False with probability `1-p`

```
def flip(p):
return np.random.random() < p
```

`random_pairs`

is a generator function that enumerates all possible pairs of nodes and yields each one with probability `p`

```
def random_pairs(nodes, p):
for edge in all_pairs(nodes):
if flip(p):
yield edge
```

`make_random_graph`

makes an ER graph where the probability of an edge between each pair of nodes is `p`

.

```
def make_random_graph(n, p):
G = nx.Graph()
nodes = range(n)
G.add_nodes_from(nodes)
G.add_edges_from(random_pairs(nodes, p))
return G
```

Here’s an example with `n=10`

and `p=0.3`

```
np.random.seed(10)
random_graph = make_random_graph(10, 0.3)
len(random_graph.edges())
```

And here’s what it looks like:

```
nx.draw_circular(random_graph,
node_color='C3',
node_size=1000,
with_labels=True)
savefig('figs/chap02-4')
```

## 2.5. Connectivity#

To check whether a graph is connected, we’ll start by finding all nodes that can be reached, starting with a given node:

```
def reachable_nodes(G, start):
seen = set()
stack = [start]
while stack:
node = stack.pop()
if node not in seen:
seen.add(node)
stack.extend(G.neighbors(node))
return seen
```

In the complete graph, starting from node 0, we can reach all nodes:

```
reachable_nodes(complete, 0)
```

In the random graph we generated, we can also reach all nodes (but that’s not always true):

```
reachable_nodes(random_graph, 0)
```

We can use `reachable_nodes`

to check whether a graph is connected:

```
def is_connected(G):
start = next(iter(G))
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)
```

Again, the complete graph is connected:

```
is_connected(complete)
```

But if we generate a random graph with a low value of `p`

, it’s not:

```
random_graph = make_random_graph(10, 0.1)
len(random_graph.edges())
```

```
is_connected(random_graph)
```

**Exercise:** What do you think it means for a directed graph to be connected? Write a function that checks whether a directed graph is connected.

## 2.6. Probability of connectivity#

Now let’s estimare the probability that a randomly-generated ER graph is connected.

This function takes `n`

and `p`

, generates `iters`

graphs, and returns the fraction of them that are connected.

```
# version with a for loop
def prob_connected(n, p, iters=100):
count = 0
for i in range(iters):
random_graph = make_random_graph(n, p)
if is_connected(random_graph):
count += 1
return count/iters
```

```
# version with a list comprehension
def prob_connected(n, p, iters=100):
tf = [is_connected(make_random_graph(n, p))
for i in range(iters)]
return np.mean(tf)
```

With `n=10`

and `p=0.23`

, the probability of being connected is about 33%.

```
np.random.seed(17)
n = 10
prob_connected(n, 0.23, iters=10000)
```

According to Erdos and Renyi, the critical value of `p`

for `n=10`

is about 0.23.

```
pstar = np.log(n) / n
pstar
```

So let’s plot the probability of connectivity for a range of values for `p`

```
ps = np.logspace(-1.3, 0, 11)
ps
```

I’ll estimate the probabilities with `iters=1000`

```
ys = [prob_connected(n, p, 1000) for p in ps]
for p, y in zip(ps, ys):
print(p, y)
```

And then plot them, adding a vertical line at the computed critical value

```
plt.axvline(pstar, color='gray')
plt.plot(ps, ys, color='green')
decorate(xlabel='Prob of edge (p)',
ylabel='Prob connected',
xscale='log')
savefig('figs/chap02-5')
```

We can run the same analysis for a few more values of `n`

.

```
ns = [300, 100, 30]
ps = np.logspace(-2.5, 0, 11)
sns.set_palette('Blues_r', 4)
for n in ns:
print(n)
pstar = np.log(n) / n
plt.axvline(pstar, color='gray', alpha=0.3)
ys = [prob_connected(n, p) for p in ps]
plt.plot(ps, ys, label='n=%d' % n)
decorate(xlabel='Prob of edge (p)',
ylabel='Prob connected',
xscale='log',
xlim=[ps[0], ps[-1]],
loc='upper left')
savefig('figs/chap02-6')
```

As `n`

increases, the critical value gets smaller and the transition gets more abrupt.

## 2.7. Exercises#

**Exercise:** In Chapter 2 we analyzed the performance of `reachable_nodes`

and classified it in \(O(n + m)\), where \(n\) is the number of nodes and \(m\) is the number of edges. Continuing the
analysis, what is the order of growth for `is_connected`

?

```
def is_connected(G):
start = list(G)[0]
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)
```

**Exercise:** In my implementation of `reachable_nodes`

, you might be bothered by the apparent inefficiency of adding *all* neighbors to the stack without checking whether they are already in `seen`

. Write a version of this function that checks the neighbors before adding them to the stack. Does this “optimization” change the order of growth? Does it make the function faster?

```
def reachable_nodes_precheck(G, start):
# FILL THIS IN
return []
```

```
%timeit len(reachable_nodes(complete, 0))
```

```
%timeit len(reachable_nodes_precheck(complete, 0))
```

**Exercise:** There are actually two kinds of ER graphs. The one we generated in the chapter, \(G(n, p)\), is characterized by two parameters, the number of nodes and the probability of an edge between nodes.

An alternative definition, denoted \(G(n, m)\), is also characterized by two parameters: the number of nodes, \(n\), and the number of edges, \(m\). Under this definition, the number of edges is fixed, but their location is random.

Repeat the experiments we did in this chapter using this alternative definition. Here are a few suggestions for how to proceed:

Write a function called

`m_pairs`

that takes a list of nodes and the number of edges, \(m\), and returns a random selection of \(m\) edges. A simple way to do that is to generate a list of all possible edges and use`random.sample`

.Write a function called

`make_m_graph`

that takes \(n\) and \(m\) and returns a random graph with \(n\) nodes and \(m\) edges.Make a version of

`prob_connected`

that uses`make_m_graph`

instead of`make_random_graph`

.Compute the probability of connectivity for a range of values of \(m\).

How do the results of this experiment compare to the results using the first type of ER graph?

```
m_graph.number_of_edges()
```