GraphRAG

cluster_graph

Apply a hierarchical clustering algorithm to a graph. The graph is expected to be in graphml format. The verb outputs a new column containing the clustered graph, and a new column containing the level of the graph.

Usage

verb: cluster_graph
args:
    column: entity_graph # The name of the column containing the graph, should be a graphml graph
    to: clustered_graph # The name of the column to output the clustered graph to
    level_to: level # The name of the column to output the level to
    strategy: <strategy config> # See strategies section below

Strategies

The cluster graph verb uses a strategy to cluster the graph. The strategy is a json object which defines the strategy to use. The following strategies are available:

leiden

This strategy uses the leiden algorithm to cluster a graph. The strategy config is as follows:

strategy:
    type: leiden
    max_cluster_size: 10 # Optional, The max cluster size to use, default: 10
    use_lcc: true # Optional, if the largest connected component should be used with the leiden algorithm, default: true
    seed: 0xDEADBEEF # Optional, the seed to use for the leiden algorithm, default: 0xDEADBEEF
    levels: [0, 1] # Optional, the levels to output, default: all the levels detected

Code

cluster_graph.py