Graph Merge

The Graph Merge operation takes one or more instances of kgx.graph.base_graph.BaseGraph and merges them into a single graph.

Depending on the desired outcome, there are two entry points for merging graphs:

  • kgx.graph_operations.graph_merge.merge_all_graphs: This method takes a list of graphs, identifies the largest graph in the list and merges all the remaining graphs to the largest graph. This is done to reduce the memory footprint. The side-effect is that the incoming graphs are modified during this operation.

  • kgx.graph_operations.graph_merge.merge_graphs: This method takes a list of graphs and merges all of them into a new graph. While this approach ensures that the incoming graphs are not modified, there is an increased memory requirement to accommodate the newly created graph.

Following are the criteria used for merging graphs:

  • Two nodes are said to be identical if they have the same id

  • If a two identical nodes have conflicting node properties,

    • when preserve is True, the values for the properties are concatenated to a list, if and only if the node property is not a core node property

    • when preserve is False, the values for the properties are replaced with the values from the incoming node, if and only if the node property is not a core node property

  • Two edges are said to be identical if they have the same subject, object and edge key, where the edge key can be a pre-defined UUID or these are IDs autogenerated using and edge’s subject, predicate, and object

  • If a two identical edges have conflicting edges properties,

    • when preserve is True, the values for the properties are concatenated to a list, if and only if the edge property is not a core edge property

    • when preserve is False, the values for the properties are replaced with the values from the incoming edge, if and only if the edge property is not a core edge property

kgx.graph_operations.graph_merge