Erdos-Renyi Graphs

2. Erdos-Renyi Graphs#

Code examples from Think Complexity, 2nd edition.

from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://github.com/AllenDowney/ThinkComplexity2/raw/master/notebooks/utils.py')

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns

from utils import decorate, savefig

# Set the random seed so the notebook 
# produces the same results every time.
np.random.seed(17)

# make a directory for figures
!mkdir -p figs

# node colors for drawing networks
colors = sns.color_palette('pastel', 5)
#sns.palplot(colors)
sns.set_palette(colors)

2.1. Directed graph#

The first example is a directed graph that represents a social network with three nodes.

G = nx.DiGraph()
G.add_node('Alice')
G.add_node('Bob')
G.add_node('Cate')
list(G.nodes())

Here’s how we add edges between nodes.

G.add_edge('Alice', 'Bob')
G.add_edge('Alice', 'Cate')
G.add_edge('Bob', 'Alice')
G.add_edge('Bob', 'Cate')
list(G.edges())

And here’s how to draw the graph.

nx.draw_circular(G,
                 node_color='C0',
                 node_size=2000, 
                 with_labels=True)
plt.axis('equal')
savefig('figs/chap02-1')

Exercise: Add another node and a few more edges and draw the graph again.

2.2. Undirected graph#

The second example is an undirected graph that represents cities and the driving times between them.

positions is a dictionary that maps from each city to its coordinates.

positions = dict(Albany=(-74, 43),
                 Boston=(-71, 42),
                 NYC=(-74, 41),
                 Philly=(-75, 40))

positions['Albany']

We can use the keys in pos to add nodes to the graph.

G = nx.Graph()
G.add_nodes_from(positions)
G.nodes()

drive_times is a dictionary that maps from pairs of cities to the driving times between them.

drive_times = {('Albany', 'Boston'): 3,
               ('Albany', 'NYC'): 4,
               ('Boston', 'NYC'): 4,
               ('NYC', 'Philly'): 2}

We can use the keys from drive_times to add edges to the graph.

G.add_edges_from(drive_times)
G.edges()

Now we can draw the graph using positions to indicate the positions of the nodes, and drive_times to label the edges.

nx.draw(G, positions, 
        node_color='C1', 
        node_shape='s', 
        node_size=2500, 
        with_labels=True)

nx.draw_networkx_edge_labels(G, positions, 
                             edge_labels=drive_times)

plt.axis('equal')
savefig('figs/chap02-2')

Exercise: Add another city and at least one edge.

2.3. Complete graph#

To make a complete graph, we use a generator function that iterates through all pairs of nodes.

def all_pairs(nodes):
    for i, u in enumerate(nodes):
        for j, v in enumerate(nodes):
            if i < j:
                yield u, v

make_complete_graph makes a Graph with the given number of nodes and edges between all pairs of nodes.

def make_complete_graph(n):
    G = nx.Graph()
    nodes = range(n)
    G.add_nodes_from(nodes)
    G.add_edges_from(all_pairs(nodes))
    return G

Here’s a complete graph with 10 nodes:

complete = make_complete_graph(10)
complete.number_of_nodes()

And here’s what it looks like.

nx.draw_circular(complete, 
                 node_color='C2', 
                 node_size=1000, 
                 with_labels=True)
savefig('figs/chap02-3')

The neighbors method the neighbors for a given node.

list(complete.neighbors(0))

Exercise: Make and draw complete directed graph with 5 nodes.

2.4. Random graphs#

Next we’ll make a random graph where the probability of an edge between each pair of nodes is $p$ .

The helper function flip returns True with probability p and False with probability 1-p

def flip(p):
    return np.random.random() < p

random_pairs is a generator function that enumerates all possible pairs of nodes and yields each one with probability p

def random_pairs(nodes, p):
    for edge in all_pairs(nodes):
        if flip(p):
            yield edge

make_random_graph makes an ER graph where the probability of an edge between each pair of nodes is p.

def make_random_graph(n, p):
    G = nx.Graph()
    nodes = range(n)
    G.add_nodes_from(nodes)
    G.add_edges_from(random_pairs(nodes, p))
    return G

Here’s an example with n=10 and p=0.3

np.random.seed(10)

random_graph = make_random_graph(10, 0.3)
len(random_graph.edges())

And here’s what it looks like:

nx.draw_circular(random_graph, 
                 node_color='C3', 
                 node_size=1000, 
                 with_labels=True)
savefig('figs/chap02-4')

2.5. Connectivity#

To check whether a graph is connected, we’ll start by finding all nodes that can be reached, starting with a given node:

def reachable_nodes(G, start):
    seen = set()
    stack = [start]
    while stack:
        node = stack.pop()
        if node not in seen:
            seen.add(node)
            stack.extend(G.neighbors(node))
    return seen

In the complete graph, starting from node 0, we can reach all nodes:

reachable_nodes(complete, 0)

In the random graph we generated, we can also reach all nodes (but that’s not always true):

reachable_nodes(random_graph, 0)

We can use reachable_nodes to check whether a graph is connected:

def is_connected(G):
    start = next(iter(G))
    reachable = reachable_nodes(G, start)
    return len(reachable) == len(G)

Again, the complete graph is connected:

is_connected(complete)

But if we generate a random graph with a low value of p, it’s not:

random_graph = make_random_graph(10, 0.1)
len(random_graph.edges())

is_connected(random_graph)

Exercise: What do you think it means for a directed graph to be connected? Write a function that checks whether a directed graph is connected.

2.6. Probability of connectivity#

Now let’s estimare the probability that a randomly-generated ER graph is connected.

This function takes n and p, generates iters graphs, and returns the fraction of them that are connected.

# version with a for loop

def prob_connected(n, p, iters=100):
    count = 0
    for i in range(iters):
        random_graph = make_random_graph(n, p)
        if is_connected(random_graph):
            count += 1
    return count/iters

# version with a list comprehension

def prob_connected(n, p, iters=100):
    tf = [is_connected(make_random_graph(n, p))
          for i in range(iters)]
    return np.mean(tf)

With n=10 and p=0.23, the probability of being connected is about 33%.

np.random.seed(17)

n = 10
prob_connected(n, 0.23, iters=10000)

According to Erdos and Renyi, the critical value of p for n=10 is about 0.23.

pstar = np.log(n) / n
pstar

So let’s plot the probability of connectivity for a range of values for p

ps = np.logspace(-1.3, 0, 11)
ps

I’ll estimate the probabilities with iters=1000

ys = [prob_connected(n, p, 1000) for p in ps]

for p, y in zip(ps, ys):
    print(p, y)

And then plot them, adding a vertical line at the computed critical value

plt.axvline(pstar, color='gray')
plt.plot(ps, ys, color='green')
decorate(xlabel='Prob of edge (p)',
                 ylabel='Prob connected',
                 xscale='log')

savefig('figs/chap02-5')

We can run the same analysis for a few more values of n.

ns = [300, 100, 30]
ps = np.logspace(-2.5, 0, 11)

sns.set_palette('Blues_r', 4)
for n in ns:
    print(n)
    pstar = np.log(n) / n
    plt.axvline(pstar, color='gray', alpha=0.3)

    ys = [prob_connected(n, p) for p in ps]
    plt.plot(ps, ys, label='n=%d' % n)

decorate(xlabel='Prob of edge (p)',
         ylabel='Prob connected',
         xscale='log', 
         xlim=[ps[0], ps[-1]],
         loc='upper left')

savefig('figs/chap02-6')

As n increases, the critical value gets smaller and the transition gets more abrupt.

2.7. Exercises#

Exercise: In Chapter 2 we analyzed the performance of reachable_nodes and classified it in $O (n + m)$ , where $n$ is the number of nodes and $m$ is the number of edges. Continuing the analysis, what is the order of growth for is_connected?

def is_connected(G):
    start = list(G)[0]
    reachable = reachable_nodes(G, start)
    return len(reachable) == len(G)

Exercise: In my implementation of reachable_nodes, you might be bothered by the apparent inefficiency of adding all neighbors to the stack without checking whether they are already in seen. Write a version of this function that checks the neighbors before adding them to the stack. Does this “optimization” change the order of growth? Does it make the function faster?

def reachable_nodes_precheck(G, start):
    # FILL THIS IN
    return []

%timeit len(reachable_nodes(complete, 0))

%timeit len(reachable_nodes_precheck(complete, 0))

Exercise: There are actually two kinds of ER graphs. The one we generated in the chapter, $G (n, p)$ , is characterized by two parameters, the number of nodes and the probability of an edge between nodes.

An alternative definition, denoted $G (n, m)$ , is also characterized by two parameters: the number of nodes, $n$ , and the number of edges, $m$ . Under this definition, the number of edges is fixed, but their location is random.

Repeat the experiments we did in this chapter using this alternative definition. Here are a few suggestions for how to proceed:

Write a function called m_pairs that takes a list of nodes and the number of edges, $m$ , and returns a random selection of $m$ edges. A simple way to do that is to generate a list of all possible edges and use random.sample.
Write a function called make_m_graph that takes $n$ and $m$ and returns a random graph with $n$ nodes and $m$ edges.
Make a version of prob_connected that uses make_m_graph instead of make_random_graph.
Compute the probability of connectivity for a range of values of $m$ .

How do the results of this experiment compare to the results using the first type of ER graph?

m_graph.number_of_edges()