Small World Graphs

3. Small World Graphs#

Code examples from Think Complexity, 2nd edition.

from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://github.com/AllenDowney/ThinkComplexity2/raw/master/notebooks/utils.py')

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns

from utils import decorate, savefig

# Set the random seed so the notebook 
# produces the same results every time.
np.random.seed(17)

# make a directory for figures
!mkdir -p figs

# node colors for drawing networks
colors = sns.color_palette('pastel', 5)
#sns.palplot(colors)
sns.set_palette(colors)

3.1. Regular ring lattice#

To make a ring lattice, I’ll start with a generator function that yields edges between each node and the next halfk neighbors.

def adjacent_edges(nodes, halfk):
    """Yields edges between each node and `halfk` neighbors.
    
    halfk: number of edges from each node
    """
    n = len(nodes)
    for i, u in enumerate(nodes):
        for j in range(i+1, i+halfk+1):
            v = nodes[j % n]
            yield u, v

We can test it with 3 nodes and halfk=1

nodes = range(3)
for edge in adjacent_edges(nodes, 1):
    print(edge)

Now we use adjacent_edges to write make_ring_lattice

def make_ring_lattice(n, k):
    """Makes a ring lattice with `n` nodes and degree `k`.
    
    Note: this only works correctly if k is even.
    
    n: number of nodes
    k: degree of each node
    """
    G = nx.Graph()
    nodes = range(n)
    G.add_nodes_from(nodes)
    G.add_edges_from(adjacent_edges(nodes, k//2))
    return G

And we can test it out with n=10 and k=4

lattice = make_ring_lattice(10, 4)

nx.draw_circular(lattice, 
                 node_color='C0', 
                 node_size=1000, 
                 with_labels=True)

savefig('figs/chap03-1')

Exercise: To see how this function fails when k is odd, run it again with k=3 or k=5.

One of the exercises below asks you to explore regular graphs with odd values of k.

3.2. WS graph#

To make a WS graph, you start with a ring lattice and then rewire.

def make_ws_graph(n, k, p):
    """Makes a Watts-Strogatz graph.
    
    n: number of nodes
    k: degree of each node
    p: probability of rewiring an edge
    """
    ws = make_ring_lattice(n, k)
    rewire(ws, p)
    return ws

To do the rewiring, we’ll need flip.

def flip(p):
    """Returns True with probability `p`."""
    return np.random.random() < p

Here’s the function that does the rewiring

def rewire(G, p):
    """Rewires each edge with probability `p`.
    
    G: Graph
    p: float
    """
    # Fill this in

Here’s an example with p=0.2

ws = make_ws_graph(10, 4, 0.2)
nx.draw_circular(ws, 
                 node_color='C1', 
                 node_size=1000, 
                 with_labels=True)

Just checking that we have the same number of edges we started with:

len(lattice.edges()), len(ws.edges())

Now I’ll generate a plot that shows WS graphs for a few values of p

n = 10
k = 4
ns = 100

plt.subplot(1,3,1)
ws = make_ws_graph(n, k, 0)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')

plt.subplot(1,3,2)
ws = make_ws_graph(n, k, 0.2)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')

plt.subplot(1,3,3)
ws = make_ws_graph(n, k, 1.0)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')

savefig('figs/chap03-2')

Exercise: What is the order of growth of rewire?

3.3. Clustering#

The following function computes the local clustering coefficient for a given node, u:

def node_clustering(G, u):
    """Computes local clustering coefficient for `u`.
    
    G: Graph
    u: node
    
    returns: float
    """
    neighbors = G[u]
    k = len(neighbors)
    if k < 2:
        return np.nan
        
    possible = k * (k-1) / 2
    exist = 0    
    for v, w in all_pairs(neighbors):
        if G.has_edge(v, w):
            exist +=1
    return exist / possible

def all_pairs(nodes):
    """Generates all pairs of nodes."""
    for i, u in enumerate(nodes):
        for j, v in enumerate(nodes):
            if i < j:
                yield u, v

The network average clustering coefficient is just the mean of the local CCs.

def clustering_coefficient(G):
    """Average of the local clustering coefficients.
    
    G: Graph
    
    returns: float
    """
    # FILL THIS IN
    return 0

In a ring lattice with k=4, the clustering coefficient for each node should be 0.5

lattice = make_ring_lattice(10, 4)
node_clustering(lattice, 1)

And the network average should be 0.5

clustering_coefficient(lattice)

Correct.

%timeit clustering_coefficient(lattice)

Exercise: Write a version of node_clustering that replaces the for loop with a list comprehension. Is it faster?

%timeit clustering_coefficient(lattice)

Exercise: What is the order of growth of clustering_coefficient in terms of n, m, and k?

3.4. Path length#

The following function computes path lengths between all pairs of nodes

def path_lengths(G):
    # FILL THIS IN
    yield 0

The characteristic path length is the mean path length for all pairs.

def characteristic_path_length(G):
    return np.mean(list(path_lengths(G)))

On a complete graph, the average path length should be 1

complete = nx.complete_graph(10)
characteristic_path_length(complete)

On a ring lattice with n=1000 and k=10, the mean is about 50

lattice = make_ring_lattice(1000, 10)
characteristic_path_length(lattice)

Exercise: What is the mean path length in a ring lattice with n=10 and k=4?

3.5. The experiment#

This function generates a WS graph with the given parameters and returns a pair of (mean path length, clustering coefficient):

def run_one_graph(n, k, p):
    """Makes a WS graph and computes its stats.
    
    n: number of nodes
    k: degree of each node
    p: probability of rewiring
    
    returns: tuple of (mean path length, clustering coefficient)
    """
    ws = make_ws_graph(n, k, p)    
    mpl = characteristic_path_length(ws)
    cc = clustering_coefficient(ws)
    print(mpl, cc)
    return mpl, cc

With n=1000 and k=10, it takes a few seconds on my computer:

%time run_one_graph(1000, 10, 0.01)

Now we’ll run it with a range of values for p.

ps = np.logspace(-4, 0, 9)
print(ps)

This function runs each value of p several times and returns a dictionary that maps from each p to a list of (mpl, cc) pairs.

def run_experiment(ps, n=1000, k=10, iters=10):
    """Computes stats for WS graphs with a range of `p`.
    
    ps: sequence of `p` to try
    n: number of nodes
    k: degree of each node
    iters: number of times to run for each `p`
    
    returns:
    """
    res = []
    for p in ps:
        print(p)
        t = [run_one_graph(n, k, p) for _ in range(iters)]
        means = np.array(t).mean(axis=0)
        print(means)
        res.append(means)
    return np.array(res)

Here are the raw results. Warning: this takes a few minutes to run.

%time res = run_experiment(ps)

res

Let’s get the results into a form that’s easy to plot.

L, C = np.transpose(res)

And normalize them so they both start at 1.0

L /= L[0]
C /= C[0]

Here’s the plot that replicates Watts and Strogatz’s Figure 2.

plt.plot(ps, C, 's-', linewidth=1, label='C(p) / C(0)')
plt.plot(ps, L, 'o-', linewidth=1, label='L(p) / L(0)')
decorate(xlabel='Rewiring probability (p)', xscale='log',
         title='Normalized clustering coefficient and path length',
         xlim=[0.00009, 1.1], ylim=[-0.01, 1.01])

savefig('figs/chap03-3')

3.6. Breadth-first search#

Now let’s see how the shortest path algorithm works. We’ll start with BFS, which is the basis for Dijkstra’s algorithm.

Here’s our old friend, the ring lattice:

lattice = make_ring_lattice(10, 4)

nx.draw_circular(lattice, 
                 node_color='C2', 
                 node_size=1000, 
                 with_labels=True)

And here’s my implementation of BFS using a deque.

from collections import deque

def reachable_nodes_bfs(G, start):
    """Finds reachable nodes by BFS.
    
    G: graph
    start: node to start at
    
    returns: set of reachable nodes
    """
    seen = set()
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node not in seen:
            seen.add(node)
            queue.extend(G.neighbors(node))
    return seen

It works:

reachable_nodes_bfs(lattice, 0)

Here’s a version that’s a little faster, but maybe less readable.

def reachable_nodes_bfs(G, start):
    """Finds reachable nodes by BFS.
    
    G: graph
    start: node to start at
    
    returns: set of reachable nodes
    """
    seen = set()
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node not in seen:
            seen.add(node)
            neighbors = set(G[node]) - seen
            queue.extend(neighbors)
    return seen

It works, too.

reachable_nodes_bfs(lattice, 0)

3.7. Dijkstra’s algorithm#

Now we’re ready for Dijkstra’s algorithm, at least for graphs where all the edges have the same weight/length.

def shortest_path_dijkstra(G, source):
    """Finds shortest paths from `source` to all other nodes.
    
    G: graph
    source: node to start at
    
    returns: make from node to path length
    """
    dist = {source: 0}
    queue = deque([source])
    while queue:
        node = queue.popleft()
        new_dist = dist[node] + 1

        neighbors = set(G[node]).difference(dist)
        for n in neighbors:
            dist[n] = new_dist
        
        queue.extend(neighbors)
    return dist

Again, we’ll test it on a ring lattice.

lattice = make_ring_lattice(10, 4)

nx.draw_circular(lattice, 
                 node_color='C3', 
                 node_size=1000, 
                 with_labels=True)

Here’s my implementation:

d1 = shortest_path_dijkstra(lattice, 0)
d1

And here’s the result from NetworkX:

d2 = nx.shortest_path_length(lattice, 0)
d2

They are the same:

d1 == d2

Exercise: In a ring lattice with n=1000 and k=10, which node is farthest from 0 and how far is it? Use shortest_path_dijkstra to check your answer.

Note: the maximum distance between two nodes is the diameter of the graph.

3.8. Exercises#

Exercise: In a ring lattice, every node has the same number of neighbors. The number of neighbors is called the degree of the node, and a graph where all nodes have the same degree is called a regular graph.

All ring lattices are regular, but not all regular graphs are ring lattices. In particular, if k is odd, we can’t construct a ring lattice, but we might be able to construct a regular graph.

Write a function called make_regular_graph that takes n and k and returns a regular graph that contains n nodes, where every node has k neighbors. If it’s not possible to make a regular graph with the given values of n and k, the function should raise a ValueError.

# Here's `adjacent_edges` again for comparison:

def adjacent_edges(nodes, halfk):
    n = len(nodes)
    for i, u in enumerate(nodes):
        for j in range(i+1, i+halfk+1):
            v = nodes[j % n]
            yield u, v

Exercise: My implementation of reachable_nodes_bfs is efficient in the sense that it is in \(O(n + m)\), but it incurs a lot of overhead adding nodes to the queue and removing them. NetworkX provides a simple, fast implementation of BFS, available from the NetworkX repository on GitHub.

Here is a version I modified to return a set of nodes:

def plain_bfs(G, start):
    """A fast BFS node generator"""
    seen = set()
    nextlevel = {start}
    while nextlevel:
        thislevel = nextlevel
        nextlevel = set()
        for v in thislevel:
            if v not in seen:
                seen.add(v)
                nextlevel.update(G[v])
    return seen

Compare this function to reachable_nodes_bfs and see which is faster. Then see if you can modify this function to implement a faster version of shortest_path_dijkstra

Exercise: The following implementation of a BFS contains two performance errors. What are they? What is the actual order of growth for this algorithm?

def bfs(G, start):
    """Breadth-first search on a graph, starting at top_node."""
    visited = set()
    queue = [start]
    while len(queue):
        curr_node = queue.pop(0)    # Dequeue
        visited.add(curr_node)

        # Enqueue non-visited and non-enqueued children
        queue.extend(c for c in G[curr_node]
                     if c not in visited and c not in queue)
    return visited

Exercise: In the book, I claimed that Dijkstra’s algorithm does not work unless it uses BFS. Write a version of shortest_path_dijkstra that uses DFS and test it on a few examples to see what goes wrong.