3. Small World Graphs#
Code examples from Think Complexity, 2nd edition.
Copyright 2016 Allen Downey, MIT License
from os.path import basename, exists
def download(url):
filename = basename(url)
if not exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
download('https://github.com/AllenDowney/ThinkComplexity2/raw/master/notebooks/utils.py')
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns
from utils import decorate, savefig
# Set the random seed so the notebook
# produces the same results every time.
np.random.seed(17)
# make a directory for figures
!mkdir -p figs
# node colors for drawing networks
colors = sns.color_palette('pastel', 5)
#sns.palplot(colors)
sns.set_palette(colors)
3.1. Regular ring lattice#
To make a ring lattice, I’ll start with a generator function that yields edges between each node and the next halfk
neighbors.
def adjacent_edges(nodes, halfk):
"""Yields edges between each node and `halfk` neighbors.
halfk: number of edges from each node
"""
n = len(nodes)
for i, u in enumerate(nodes):
for j in range(i+1, i+halfk+1):
v = nodes[j % n]
yield u, v
We can test it with 3 nodes and halfk=1
nodes = range(3)
for edge in adjacent_edges(nodes, 1):
print(edge)
Now we use adjacent_edges
to write make_ring_lattice
def make_ring_lattice(n, k):
"""Makes a ring lattice with `n` nodes and degree `k`.
Note: this only works correctly if k is even.
n: number of nodes
k: degree of each node
"""
G = nx.Graph()
nodes = range(n)
G.add_nodes_from(nodes)
G.add_edges_from(adjacent_edges(nodes, k//2))
return G
And we can test it out with n=10
and k=4
lattice = make_ring_lattice(10, 4)
nx.draw_circular(lattice,
node_color='C0',
node_size=1000,
with_labels=True)
savefig('figs/chap03-1')
Exercise: To see how this function fails when k
is odd, run it again with k=3
or k=5
.
One of the exercises below asks you to explore regular graphs with odd values of k
.
3.2. WS graph#
To make a WS graph, you start with a ring lattice and then rewire.
def make_ws_graph(n, k, p):
"""Makes a Watts-Strogatz graph.
n: number of nodes
k: degree of each node
p: probability of rewiring an edge
"""
ws = make_ring_lattice(n, k)
rewire(ws, p)
return ws
To do the rewiring, we’ll need flip
.
def flip(p):
"""Returns True with probability `p`."""
return np.random.random() < p
Here’s the function that does the rewiring
def rewire(G, p):
"""Rewires each edge with probability `p`.
G: Graph
p: float
"""
# Fill this in
Here’s an example with p=0.2
ws = make_ws_graph(10, 4, 0.2)
nx.draw_circular(ws,
node_color='C1',
node_size=1000,
with_labels=True)
Just checking that we have the same number of edges we started with:
len(lattice.edges()), len(ws.edges())
Now I’ll generate a plot that shows WS graphs for a few values of p
n = 10
k = 4
ns = 100
plt.subplot(1,3,1)
ws = make_ws_graph(n, k, 0)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')
plt.subplot(1,3,2)
ws = make_ws_graph(n, k, 0.2)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')
plt.subplot(1,3,3)
ws = make_ws_graph(n, k, 1.0)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')
savefig('figs/chap03-2')
Exercise: What is the order of growth of rewire
?
3.3. Clustering#
The following function computes the local clustering coefficient for a given node, u
:
def node_clustering(G, u):
"""Computes local clustering coefficient for `u`.
G: Graph
u: node
returns: float
"""
neighbors = G[u]
k = len(neighbors)
if k < 2:
return np.nan
possible = k * (k-1) / 2
exist = 0
for v, w in all_pairs(neighbors):
if G.has_edge(v, w):
exist +=1
return exist / possible
def all_pairs(nodes):
"""Generates all pairs of nodes."""
for i, u in enumerate(nodes):
for j, v in enumerate(nodes):
if i < j:
yield u, v
The network average clustering coefficient is just the mean of the local CCs.
def clustering_coefficient(G):
"""Average of the local clustering coefficients.
G: Graph
returns: float
"""
# FILL THIS IN
return 0
In a ring lattice with k=4
, the clustering coefficient for each node should be 0.5
lattice = make_ring_lattice(10, 4)
node_clustering(lattice, 1)
And the network average should be 0.5
clustering_coefficient(lattice)
Correct.
%timeit clustering_coefficient(lattice)
Exercise: Write a version of node_clustering
that replaces the for
loop with a list comprehension. Is it faster?
%timeit clustering_coefficient(lattice)
Exercise: What is the order of growth of clustering_coefficient
in terms of n
, m
, and k
?
3.4. Path length#
The following function computes path lengths between all pairs of nodes
def path_lengths(G):
# FILL THIS IN
yield 0
The characteristic path length is the mean path length for all pairs.
def characteristic_path_length(G):
return np.mean(list(path_lengths(G)))
On a complete graph, the average path length should be 1
complete = nx.complete_graph(10)
characteristic_path_length(complete)
On a ring lattice with n=1000
and k=10
, the mean is about 50
lattice = make_ring_lattice(1000, 10)
characteristic_path_length(lattice)
Exercise: What is the mean path length in a ring lattice with n=10
and k=4
?
3.5. The experiment#
This function generates a WS graph with the given parameters and returns a pair of (mean path length, clustering coefficient):
def run_one_graph(n, k, p):
"""Makes a WS graph and computes its stats.
n: number of nodes
k: degree of each node
p: probability of rewiring
returns: tuple of (mean path length, clustering coefficient)
"""
ws = make_ws_graph(n, k, p)
mpl = characteristic_path_length(ws)
cc = clustering_coefficient(ws)
print(mpl, cc)
return mpl, cc
With n=1000
and k=10
, it takes a few seconds on my computer:
%time run_one_graph(1000, 10, 0.01)
Now we’ll run it with a range of values for p
.
ps = np.logspace(-4, 0, 9)
print(ps)
This function runs each value of p
several times and returns a dictionary that maps from each p
to a list of (mpl, cc) pairs.
def run_experiment(ps, n=1000, k=10, iters=10):
"""Computes stats for WS graphs with a range of `p`.
ps: sequence of `p` to try
n: number of nodes
k: degree of each node
iters: number of times to run for each `p`
returns:
"""
res = []
for p in ps:
print(p)
t = [run_one_graph(n, k, p) for _ in range(iters)]
means = np.array(t).mean(axis=0)
print(means)
res.append(means)
return np.array(res)
Here are the raw results. Warning: this takes a few minutes to run.
%time res = run_experiment(ps)
res
Let’s get the results into a form that’s easy to plot.
L, C = np.transpose(res)
L
C
And normalize them so they both start at 1.0
L /= L[0]
C /= C[0]
Here’s the plot that replicates Watts and Strogatz’s Figure 2.
plt.plot(ps, C, 's-', linewidth=1, label='C(p) / C(0)')
plt.plot(ps, L, 'o-', linewidth=1, label='L(p) / L(0)')
decorate(xlabel='Rewiring probability (p)', xscale='log',
title='Normalized clustering coefficient and path length',
xlim=[0.00009, 1.1], ylim=[-0.01, 1.01])
savefig('figs/chap03-3')
3.6. Breadth-first search#
Now let’s see how the shortest path algorithm works. We’ll start with BFS, which is the basis for Dijkstra’s algorithm.
Here’s our old friend, the ring lattice:
lattice = make_ring_lattice(10, 4)
nx.draw_circular(lattice,
node_color='C2',
node_size=1000,
with_labels=True)
And here’s my implementation of BFS using a deque.
from collections import deque
def reachable_nodes_bfs(G, start):
"""Finds reachable nodes by BFS.
G: graph
start: node to start at
returns: set of reachable nodes
"""
seen = set()
queue = deque([start])
while queue:
node = queue.popleft()
if node not in seen:
seen.add(node)
queue.extend(G.neighbors(node))
return seen
It works:
reachable_nodes_bfs(lattice, 0)
Here’s a version that’s a little faster, but maybe less readable.
def reachable_nodes_bfs(G, start):
"""Finds reachable nodes by BFS.
G: graph
start: node to start at
returns: set of reachable nodes
"""
seen = set()
queue = deque([start])
while queue:
node = queue.popleft()
if node not in seen:
seen.add(node)
neighbors = set(G[node]) - seen
queue.extend(neighbors)
return seen
It works, too.
reachable_nodes_bfs(lattice, 0)
3.7. Dijkstra’s algorithm#
Now we’re ready for Dijkstra’s algorithm, at least for graphs where all the edges have the same weight/length.
def shortest_path_dijkstra(G, source):
"""Finds shortest paths from `source` to all other nodes.
G: graph
source: node to start at
returns: make from node to path length
"""
dist = {source: 0}
queue = deque([source])
while queue:
node = queue.popleft()
new_dist = dist[node] + 1
neighbors = set(G[node]).difference(dist)
for n in neighbors:
dist[n] = new_dist
queue.extend(neighbors)
return dist
Again, we’ll test it on a ring lattice.
lattice = make_ring_lattice(10, 4)
nx.draw_circular(lattice,
node_color='C3',
node_size=1000,
with_labels=True)
Here’s my implementation:
d1 = shortest_path_dijkstra(lattice, 0)
d1
And here’s the result from NetworkX:
d2 = nx.shortest_path_length(lattice, 0)
d2
They are the same:
d1 == d2
Exercise: In a ring lattice with n=1000
and k=10
, which node is farthest from 0 and how far is it? Use shortest_path_dijkstra
to check your answer.
Note: the maximum distance between two nodes is the diameter of the graph.
3.8. Exercises#
Exercise: In a ring lattice, every node has the same number of neighbors. The number of neighbors is called the degree of the node, and a graph where all nodes have the same degree is called a regular graph.
All ring lattices are regular, but not all regular graphs are ring lattices. In particular, if k
is odd, we can’t construct a ring lattice, but we might be able to construct a regular graph.
Write a function called make_regular_graph
that takes n
and k
and returns a regular graph that contains n
nodes, where every node has k
neighbors. If it’s not possible to make a regular graph with the given values of n
and k
, the function should raise a ValueError
.
# Here's `adjacent_edges` again for comparison:
def adjacent_edges(nodes, halfk):
n = len(nodes)
for i, u in enumerate(nodes):
for j in range(i+1, i+halfk+1):
v = nodes[j % n]
yield u, v
Exercise: My implementation of reachable_nodes_bfs
is efficient in the sense that it is in \(O(n + m)\), but it incurs a lot of overhead adding nodes to the queue and removing them. NetworkX provides a simple, fast implementation of BFS, available from the NetworkX repository on GitHub.
Here is a version I modified to return a set of nodes:
def plain_bfs(G, start):
"""A fast BFS node generator"""
seen = set()
nextlevel = {start}
while nextlevel:
thislevel = nextlevel
nextlevel = set()
for v in thislevel:
if v not in seen:
seen.add(v)
nextlevel.update(G[v])
return seen
Compare this function to reachable_nodes_bfs
and see which is faster. Then see if you can modify this function to implement a faster version of shortest_path_dijkstra
Exercise: The following implementation of a BFS contains two performance errors. What are they? What is the actual order of growth for this algorithm?
def bfs(G, start):
"""Breadth-first search on a graph, starting at top_node."""
visited = set()
queue = [start]
while len(queue):
curr_node = queue.pop(0) # Dequeue
visited.add(curr_node)
# Enqueue non-visited and non-enqueued children
queue.extend(c for c in G[curr_node]
if c not in visited and c not in queue)
return visited
Exercise: In the book, I claimed that Dijkstra’s algorithm does not work unless it uses BFS. Write a version of shortest_path_dijkstra
that uses DFS and test it on a few examples to see what goes wrong.