Community Detection Algorithms

Important - page not maintained

This page is no longer being maintained and its content may be out of date. For the latest guidance, please visit the Neo4j Graph Data Science Library .

Goals
This guide covers community detection algorithms in the Neo4j Data Science Library, like Louvain, Label Propagation, Weakly Connected Components, and more.
Prerequisites
Please have Neo4j (version 4.0 or later) and the Graph Data Science Library downloaded and installed to use centrality algorithms.

Beginner

What are community detection algorithms?

Community Algo Icon

Community detection algorithms are used to evaluate how groups of nodes are clustered or partitioned, as well as their tendency to strengthen or break apart.

The Neo4j Graph Data Science Library supports many different centrality algorithms.

Louvain

The Louvain method for community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means that the algorithm evaluates how much more densely connected the nodes within a community are, compared to how connected they would be in a random network. On each iteration the Louvain algorithm recursively merges communities into a single node and executes the modularity clustering on the condensed graphs. It has the following use cases:

Modularity Optimization

Similar to the Louvain algorithm, the Modularity Optimization algorithm tries to detect communities in the graph based on their modularity. Modularity is a measure of the structure of a graph, measuring the density of connections within a module or community. The algorithm explores for every node if its modularity score might increase if it changes its community to one of its neighboring nodes.

Label Propagation

The Label Propagation algorithm (LPA) detects communities in a graph using network structure alone as its guide, and doesn’t require a pre-defined objective function or prior information about the communities. LPA propagates labels throughout the network and forms communities based on this process of label propagation. It has the following use cases:

Weakly Connected Components

The Weakly Connected Components algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set. It is often used early in an analysis to understand a graph’s structure, and also has the following use cases:

  • Testing whether a graph is connected is an essential pre-processing step for every graph algorithm. Such tests can be performed so quickly, and easily, that you should always verify that your input graph is connected, even when you know it has to be. Subtle, difficult-to-detect, bugs often result when your algorithm is run only on one component of a disconnected graph.

  • Keeping track of clusters of database records, as part of the de-duplication process - an important task in master data management applications. Read more in "An efficient domain-independent algorithm for detecting approximately duplicate database records".

  • Analysing citation networks. One study uses WCC to work out how well connected the network is, and then to see whether the connectivity remains if 'hub' or 'authority' nodes are moved from the graph. Read more in "Characterizing and Mining Citation Graph of Computer Science Literature".

Strongly Connected Components

The Strongly Connected Components (SCC) algorithm finds sets of connected nodes in a directed graph where each node is reachable in both directions from any other node in the same set. It has the following use cases:

  • In the analysis of powerful transnational corporations, finding the set of firms in which every member owns directly and/or indirectly owns shares in every other member. Although it has benefits, such as reducing transaction costs and increasing trust, this type of structure can weaken market competition. Read more in "The Network of Global Corporate Control".

  • Computing the connectivity of different network configurations when measuring routing performance in multihop wireless networks. Read more in "Routing performance in the presence of unidirectional links in multihop wireless networks".

  • As the first step in many graph algorithms that work only on strongly connected graph. In social networks, a group of people are generally strongly connected (For example, students of a class or any other common place). Many people in these groups generally like some common pages, or play common games. The SCC algorithms can be used to find such groups, and suggest the commonly liked pages or games to the people in the group who have not yet liked those pages or games.

Triangle Counting

Triangle counting is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

The triangle count of a node is useful as a features for classifying a given website as spam, or non-spam, content. This is described in "Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs".

Local Clustering Coefficient

The Local Clustering Coefficient algorithm computes the local clustering coefficient for each node in the graph. The local clustering coefficient of a node describes the likelihood that the neighbors of that node are also connected. It has the following use cases:

K-1 Coloring

The K-1 Coloring algorithm assigns colors to each node in the graph, while trying to use as few colors as possible and make sure that neighbors have different colors.

K-1 Coloring is one of the graph coloring algorithms, which are often used to solve scheduling and assignment problems. See more in Graph colouring problems and their applications in scheduling by Daniel Marx.