5.6. Clustering¶
The next step is to compute the clustering coefficient, which quantifies the tendency for the nodes to form cliques. A clique is a set of nodes that are completely connected; that is, there are edges between all pairs of nodes in the set.
Suppose a particular node, u
, has k neighbors. If all of the neighbors are connected to each other, there would be
If we compute the average of
Here is a function that computes it.
def node_clustering(G, u):
neighbors = G[u]
k = len(neighbors)
if k < 2:
return np.nan
possible = k * (k-1) / 2
exist = 0
for v, w in all_pairs(neighbors):
if G.has_edge(v, w):
exist +=1
return exist / possible
Again I use G[u]
, which returns a dictionary with the neighbors of u
as keys.
If a node has fewer than 2 neighbors, the clustering coefficient is undefined, so we return np.nan
, which is a special value that indicates “Not a Number”.
Otherwise we compute the number of possible edges among the neighbors, count the number of those edges that actually exist, and return the fraction that exist.
We can test the function like this:
>>> lattice = make_ring_lattice(10, 4)
>>> node_clustering(lattice, 1)
0.5
In a ring lattice with
Now we can compute the network average clustering coefficient like this:
def clustering_coefficient(G):
cu = [node_clustering(G, node) for node in G]
return np.nanmean(cu)
The NumPy function nanmean
computes the mean of the local clustering coefficients, ignoring any values that are NaN
.
We can test clustering_coefficient
like this:
>>> clustering_coefficient(lattice)
0.5
In this graph, the local clustering coefficient for all nodes is