Built-in Algorithms

The graph analytical engine (GAE) of GraphScope offers many built-in algorithms, which enable users to analyze their graph data with least effort. The built-in algorithms covers a wide range of applications, such as the shortest path, community detection, clustering, etc. The built-in algorithms are implemented with the PIE programming model and highly optimized for the best performance, and users can use them in the out-of-box manner.

Here is the full list of supported built-in algorithms:

All Pairs Shortest Path Length

This algorithm is used to find all pair shortest path problem from a given weighted graph. As a result of this algorithm, it will compute the minimum distance from any vertex to all other vertices in the graph.

all_pairs_shortest_path_length()

Compute the shortest path lengths between all vertices in the graph.

Attribute Assortativity

Assortativity in a graph refers to the tendency of vertices to connect with other similar vertices over dissimilar vertices. Attribute assortativity is a measure of the extent to which vertices with the same properties connect to each other.

attribute_assortativity()

Compute assortativity for vertex attributes.

Average Degree Connectivity

The average degree connectivity is the average nearest neighbor degree of vertices with degree k. This algorithm returns a list of k-average degree connectivity for a graph for successive k=0,1,2….

average_degree_connectivity(source_degree_type, target_degree_type)

Compute the average degree connectivity of the graph.

Parameters:
  • source_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for source vertex

  • target_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for target vertex

Betweenness Centrality

Betweenness centrality is a measure of centrality in a graph based on shortest paths. Betweenness centrality of a vertex v is the sum of the fraction of all-pairs shortest paths that pass through v.

betweenness_centrality(normalized, endpoints)

Compute the shortest-path betweenness centrality for vertices.

Parameters:
  • normalized (bool) – whether the result should be normalized

  • endpoints (bool) – whether including the endpoints in the shortest path counts

CDLP

Evaluate Community Detection with Label Propagation, Also known as LPA. See LPA

cdlp(max_round=10)

Evaluate Community Detection with Label Propagation

Parameters:

max_round (int) – Maximum rounds, default to 10

Closeness Centrality

The original closeness centrality of a vertex v is the reciprocal of the average shortest path distance to v over all n-1 reachable nodes. Wasserman and Faust proposed an improved formula for graphs with more than one connected component. The result is “a ratio of the fraction of actors in the group who are reachable, to the average distance” from the reachable actors.

closeness_centrality(wf)

Compute closeness centrality for vertices.

Parameters:

wf (bool) – whether the Wasserman and Faust improved formula is used

Clustering

The clustering algorithm is to compute the clustering coefficient for each vertex of a graph. The clustering coefficient of a vertex in a graph quantifies how close its neighbors are to being a clique (complete graph).

clustering()

Compute the clustering coefficient for each vertex in the graph.

Degree Assortativity Coefficient

Similar to attribute assortativity, degree assortativity coefficient measures tendency of having an edge (u,v) such that, degree(u) equals to degree(v).

degree_assortativity_coefficient(source_degree_type, target_degree_type, weighted)

Compute degree assortativity coefficient of the graph.

Parameters:
  • source_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for source vertex

  • target_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for target vertex

  • weight (bool) – The edge attribute that holds the numerical value used as a weight. If False, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the vertex.

Degree Centrality

The degree centrality for a vertex v is the fraction of nodes it is connected to. If the graph is directed, then three separate measures of degree centrality are defined, namely, in-degree, out-degree and both in- and out-degree.

degree_centrality(centrality_type)

Compute the degree centrality for vertices.

Parameters:

centrality_type (str) – in, out or both degree are applied

Eigenvector Centrality

Eigenvector centrality is a measure of the influence of a vertex in a graph. Relationships originating from high-scoring vertices contribute more to the score of a vertex than connections from low-scoring vertices. A high eigenvector score means that a vertex is connected to many vertices who themselves have high scores.

eigenvector_centrality(tolerance, max_round)

Compute the eigenvector centrality for the graph.

Parameters:
  • tolerance (double) – error tolerance used to check convergence

  • max_round (int) – maximum number of iterations

Katz Centrality

Katz centrality computes the relative influence of a vertex within a graph by measuring the number of the immediate neighbors (first degree vertices) and also all other vertices in the graph that connect to the vertex under consideration through these immediate neighbors.

katz_centrality(alpha, beta, tolerance, max_round, normalized)

Compute the Katz centrality for the vertices of the graph.

Parameters:
  • alpha (double) – attenuation factor

  • beta (double) – weight attributed to the immediate neighborhood

  • tolerance (double) – error tolerance used to check convergence

  • max_round (int) – maximum number of iterations

  • normalized (bool) – whether we need to normalize the resulting values

K-Core

This algorithm is to find a k-core from an original graph, and a k-core is a maximal subgraph that contains vertices of degree k or more. The k-core is found by recursively pruning vertices with degrees less than k.

kkore(k)

Compute the k-core of the graph.

Parameters:

k (int) – The order of the core

K-Shell

The k-shell is the subgraph induced by vertices with core number k. That is, vertices in the k-core that are not in the (k+1)-core.

kshell(k)

Compute the k-shell of the graph.

Parameters:

k (int) – The order of the shell

Label Propagation Algorithm

The Label Propagation Algorithm (LPA) is a fast algorithm for finding communities in a graph. It is an iterative algorithm where we assign labels to unlabelled vertices by propagating labels through the graph.

lpa(max_round)

Compute the label of each vertex in the graph.

Parameters:

max_round (int) – the number of iterations during the computation

LCC

The LCC is aims to compute the local clustering coefficient, follow the specification of LDBC

lcc()

Compute the local clustering coefficient for each vertex in the graph.

PageRank

PageRank is a way of measuring the importance of each vertices in a graph. The PageRank algorithm exists in many variants, the PageRank in GAE follows the PageRank implementation of NetworkX.

pagerank(alpha, max_round, tolerance)

Compute the PageRank value for each vertex in the graph.

Parameters:
  • alpha (double) – damping parameter for PageRank

  • max_round (int) – maximum number of iterations

  • tolerance (double) – error tolerance used to check convergence

Sampling Path

This algorithm samples a set of paths starting from a root vertex with path length.

sampling_path(source_id, cutoff)

Compute a set of paths starting from a root vertex.

Parameters:
  • source_id (int) – the source vertex of path sampling

  • cutoff (int) – maximum length of paths

Single-Source Shortest Paths

The Single-Source Shortest Paths (SSSP) fixes a single vertex as the “source” vertex and finds shortest paths from the source to all other vertices in the graph.

sssp(source)

Compute shortest paths from a source vertex in the graph.

Parameters:

source (int) – The source vertex for the shortest paths

VoteRank

VoteRank is to measure a ranking of the vertices in a graph based on a voting scheme. With VoteRank, all vertices vote for each of its in-neighbors and the vertices with the top highest votes is elected iteratively.

voterank(num_of_nodes)

Select a list of influential vertices in a graph using VoteRank algorithm.

Parameters:

num_of_nodes (int) – the number of ranked vertices to extract

WCC

Compute the weakly connected component

wcc()

Compute the weakly connected component in the graph.