2. Erdos-Renyi Graphs#
Code examples from Think Complexity, 2nd edition.
Copyright 2016 Allen Downey, MIT License
from os.path import basename, exists
def download(url):
filename = basename(url)
if not exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
download('https://github.com/AllenDowney/ThinkComplexity2/raw/master/notebooks/utils.py')
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns
from utils import decorate, savefig
# Set the random seed so the notebook
# produces the same results every time.
np.random.seed(17)
# make a directory for figures
!mkdir -p figs
# node colors for drawing networks
colors = sns.color_palette('pastel', 5)
#sns.palplot(colors)
sns.set_palette(colors)
2.1. Directed graph#
The first example is a directed graph that represents a social network with three nodes.
G = nx.DiGraph()
G.add_node('Alice')
G.add_node('Bob')
G.add_node('Cate')
list(G.nodes())
Here’s how we add edges between nodes.
G.add_edge('Alice', 'Bob')
G.add_edge('Alice', 'Cate')
G.add_edge('Bob', 'Alice')
G.add_edge('Bob', 'Cate')
list(G.edges())
And here’s how to draw the graph.
nx.draw_circular(G,
node_color='C0',
node_size=2000,
with_labels=True)
plt.axis('equal')
savefig('figs/chap02-1')
Exercise: Add another node and a few more edges and draw the graph again.
2.2. Undirected graph#
The second example is an undirected graph that represents cities and the driving times between them.
positions
is a dictionary that maps from each city to its coordinates.
positions = dict(Albany=(-74, 43),
Boston=(-71, 42),
NYC=(-74, 41),
Philly=(-75, 40))
positions['Albany']
We can use the keys in pos
to add nodes to the graph.
G = nx.Graph()
G.add_nodes_from(positions)
G.nodes()
drive_times
is a dictionary that maps from pairs of cities to the driving times between them.
drive_times = {('Albany', 'Boston'): 3,
('Albany', 'NYC'): 4,
('Boston', 'NYC'): 4,
('NYC', 'Philly'): 2}
We can use the keys from drive_times
to add edges to the graph.
G.add_edges_from(drive_times)
G.edges()
Now we can draw the graph using positions
to indicate the positions of the nodes, and drive_times
to label the edges.
nx.draw(G, positions,
node_color='C1',
node_shape='s',
node_size=2500,
with_labels=True)
nx.draw_networkx_edge_labels(G, positions,
edge_labels=drive_times)
plt.axis('equal')
savefig('figs/chap02-2')
Exercise: Add another city and at least one edge.
2.3. Complete graph#
To make a complete graph, we use a generator function that iterates through all pairs of nodes.
def all_pairs(nodes):
for i, u in enumerate(nodes):
for j, v in enumerate(nodes):
if i < j:
yield u, v
make_complete_graph
makes a Graph
with the given number of nodes and edges between all pairs of nodes.
def make_complete_graph(n):
G = nx.Graph()
nodes = range(n)
G.add_nodes_from(nodes)
G.add_edges_from(all_pairs(nodes))
return G
Here’s a complete graph with 10 nodes:
complete = make_complete_graph(10)
complete.number_of_nodes()
And here’s what it looks like.
nx.draw_circular(complete,
node_color='C2',
node_size=1000,
with_labels=True)
savefig('figs/chap02-3')
The neighbors
method the neighbors for a given node.
list(complete.neighbors(0))
Exercise: Make and draw complete directed graph with 5 nodes.
2.4. Random graphs#
Next we’ll make a random graph where the probability of an edge between each pair of nodes is \(p\).
The helper function flip
returns True with probability p
and False with probability 1-p
def flip(p):
return np.random.random() < p
random_pairs
is a generator function that enumerates all possible pairs of nodes and yields each one with probability p
def random_pairs(nodes, p):
for edge in all_pairs(nodes):
if flip(p):
yield edge
make_random_graph
makes an ER graph where the probability of an edge between each pair of nodes is p
.
def make_random_graph(n, p):
G = nx.Graph()
nodes = range(n)
G.add_nodes_from(nodes)
G.add_edges_from(random_pairs(nodes, p))
return G
Here’s an example with n=10
and p=0.3
np.random.seed(10)
random_graph = make_random_graph(10, 0.3)
len(random_graph.edges())
And here’s what it looks like:
nx.draw_circular(random_graph,
node_color='C3',
node_size=1000,
with_labels=True)
savefig('figs/chap02-4')
2.5. Connectivity#
To check whether a graph is connected, we’ll start by finding all nodes that can be reached, starting with a given node:
def reachable_nodes(G, start):
seen = set()
stack = [start]
while stack:
node = stack.pop()
if node not in seen:
seen.add(node)
stack.extend(G.neighbors(node))
return seen
In the complete graph, starting from node 0, we can reach all nodes:
reachable_nodes(complete, 0)
In the random graph we generated, we can also reach all nodes (but that’s not always true):
reachable_nodes(random_graph, 0)
We can use reachable_nodes
to check whether a graph is connected:
def is_connected(G):
start = next(iter(G))
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)
Again, the complete graph is connected:
is_connected(complete)
But if we generate a random graph with a low value of p
, it’s not:
random_graph = make_random_graph(10, 0.1)
len(random_graph.edges())
is_connected(random_graph)
Exercise: What do you think it means for a directed graph to be connected? Write a function that checks whether a directed graph is connected.
2.6. Probability of connectivity#
Now let’s estimare the probability that a randomly-generated ER graph is connected.
This function takes n
and p
, generates iters
graphs, and returns the fraction of them that are connected.
# version with a for loop
def prob_connected(n, p, iters=100):
count = 0
for i in range(iters):
random_graph = make_random_graph(n, p)
if is_connected(random_graph):
count += 1
return count/iters
# version with a list comprehension
def prob_connected(n, p, iters=100):
tf = [is_connected(make_random_graph(n, p))
for i in range(iters)]
return np.mean(tf)
With n=10
and p=0.23
, the probability of being connected is about 33%.
np.random.seed(17)
n = 10
prob_connected(n, 0.23, iters=10000)
According to Erdos and Renyi, the critical value of p
for n=10
is about 0.23.
pstar = np.log(n) / n
pstar
So let’s plot the probability of connectivity for a range of values for p
ps = np.logspace(-1.3, 0, 11)
ps
I’ll estimate the probabilities with iters=1000
ys = [prob_connected(n, p, 1000) for p in ps]
for p, y in zip(ps, ys):
print(p, y)
And then plot them, adding a vertical line at the computed critical value
plt.axvline(pstar, color='gray')
plt.plot(ps, ys, color='green')
decorate(xlabel='Prob of edge (p)',
ylabel='Prob connected',
xscale='log')
savefig('figs/chap02-5')
We can run the same analysis for a few more values of n
.
ns = [300, 100, 30]
ps = np.logspace(-2.5, 0, 11)
sns.set_palette('Blues_r', 4)
for n in ns:
print(n)
pstar = np.log(n) / n
plt.axvline(pstar, color='gray', alpha=0.3)
ys = [prob_connected(n, p) for p in ps]
plt.plot(ps, ys, label='n=%d' % n)
decorate(xlabel='Prob of edge (p)',
ylabel='Prob connected',
xscale='log',
xlim=[ps[0], ps[-1]],
loc='upper left')
savefig('figs/chap02-6')
As n
increases, the critical value gets smaller and the transition gets more abrupt.
2.7. Exercises#
Exercise: In Chapter 2 we analyzed the performance of reachable_nodes
and classified it in \(O(n + m)\), where \(n\) is the number of nodes and \(m\) is the number of edges. Continuing the
analysis, what is the order of growth for is_connected
?
def is_connected(G):
start = list(G)[0]
reachable = reachable_nodes(G, start)
return len(reachable) == len(G)
Exercise: In my implementation of reachable_nodes
, you might be bothered by the apparent inefficiency of adding all neighbors to the stack without checking whether they are already in seen
. Write a version of this function that checks the neighbors before adding them to the stack. Does this “optimization” change the order of growth? Does it make the function faster?
def reachable_nodes_precheck(G, start):
# FILL THIS IN
return []
%timeit len(reachable_nodes(complete, 0))
%timeit len(reachable_nodes_precheck(complete, 0))
Exercise: There are actually two kinds of ER graphs. The one we generated in the chapter, \(G(n, p)\), is characterized by two parameters, the number of nodes and the probability of an edge between nodes.
An alternative definition, denoted \(G(n, m)\), is also characterized by two parameters: the number of nodes, \(n\), and the number of edges, \(m\). Under this definition, the number of edges is fixed, but their location is random.
Repeat the experiments we did in this chapter using this alternative definition. Here are a few suggestions for how to proceed:
Write a function called
m_pairs
that takes a list of nodes and the number of edges, \(m\), and returns a random selection of \(m\) edges. A simple way to do that is to generate a list of all possible edges and userandom.sample
.Write a function called
make_m_graph
that takes \(n\) and \(m\) and returns a random graph with \(n\) nodes and \(m\) edges.Make a version of
prob_connected
that usesmake_m_graph
instead ofmake_random_graph
.Compute the probability of connectivity for a range of values of \(m\).
How do the results of this experiment compare to the results using the first type of ER graph?
m_graph.number_of_edges()