Graph Merge

The Graph Merge operation takes one or more instances of kgx.graph.base_graph.BaseGraph and merges them into a single graph.

Depending on the desired outcome, there are two entry points for merging graphs:

  • kgx.graph_operations.graph_merge.merge_all_graphs: This method takes a list of graphs, identifies the largest graph in the list and merges all the remaining graphs to the largest graph. This is done to reduce the memory footprint. The side-effect is that the incoming graphs are modified during this operation.

  • kgx.graph_operations.graph_merge.merge_graphs: This method takes a list of graphs and merges all of them into a new graph. While this approach ensures that the incoming graphs are not modified, there is an increased memory requirement to accommodate the newly created graph.

Following are the criteria used for merging graphs:

  • Two nodes are said to be identical if they have the same id

  • If a two identical nodes have conflicting node properties,

    • when preserve is True, the values for the properties are concatenated to a list, if and only if the node property is not a core node property

    • when preserve is False, the values for the properties are replaced with the values from the incoming node, if and only if the node property is not a core node property

  • Two edges are said to be identical if they have the same subject, object and edge key, where the edge key can be a pre-defined UUID or these are IDs autogenerated using and edge’s subject, predicate, and object

  • If a two identical edges have conflicting edges properties,

    • when preserve is True, the values for the properties are concatenated to a list, if and only if the edge property is not a core edge property

    • when preserve is False, the values for the properties are replaced with the values from the incoming edge, if and only if the edge property is not a core edge property

kgx.graph_operations.graph_merge

kgx.graph_operations.graph_merge.add_all_edges(g1: kgx.graph.base_graph.BaseGraph, g2: kgx.graph.base_graph.BaseGraph, preserve: bool = True) → int[source]

Add all edges from source graph (g2) to target graph (g1).

Parameters
Returns

Number of edges merged during this operation

Return type

int

kgx.graph_operations.graph_merge.add_all_nodes(g1: kgx.graph.base_graph.BaseGraph, g2: kgx.graph.base_graph.BaseGraph, preserve: bool = True) → int[source]

Add all nodes from source graph (g2) to target graph (g1).

Parameters
Returns

Number of nodes merged during this operation

Return type

int

kgx.graph_operations.graph_merge.merge_all_graphs(graphs: List[kgx.graph.base_graph.BaseGraph], preserve: bool = True) → kgx.graph.base_graph.BaseGraph[source]

Merge one or more graphs.

Note

This method will first pick the largest graph in graphs and use that as the target to merge the remaining graphs. This is to reduce the memory footprint for this operation. The criteria for largest graph is the graph with the largest number of edges.

The caveat is that the merge operation has a side effect where the largest graph is altered.

If you would like to ensure that all incoming graphs remain as-is, then look at merge_graphs.

The outcome of the merge on node and edge properties depend on the preserve parameter. If preserve is True then, - core properties will not be overwritten - other properties will be concatenated to a list

If preserve is False then, - core properties will not be overwritten - other properties will be replaced

Parameters
  • graphs (List[kgx.graph.base_graph.BaseGraph]) – A list of instances of BaseGraph to merge

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged graph

Return type

kgx.graph.base_graph.BaseGraph

kgx.graph_operations.graph_merge.merge_edge(g: kgx.graph.base_graph.BaseGraph, u: str, v: str, key: str, data: dict, preserve: bool = True) → dict[source]

Merge edge u -> v into graph g.

Parameters
  • g (kgx.graph.base_graph.BaseGraph) – The target graph

  • u (str) – Subject node id

  • v (str) – Object node id

  • key (str) – Edge key

  • data (dict) – Node properties

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged edge

Return type

dict

kgx.graph_operations.graph_merge.merge_graphs(graph: kgx.graph.base_graph.BaseGraph, graphs: List[kgx.graph.base_graph.BaseGraph], preserve: bool = True) → kgx.graph.base_graph.BaseGraph[source]

Merge all graphs in graphs to graph.

Parameters
Returns

The merged graph

Return type

kgx.graph.base_graph.BaseGraph

kgx.graph_operations.graph_merge.merge_node(g: kgx.graph.base_graph.BaseGraph, n: str, data: dict, preserve: bool = True) → dict[source]

Merge node n into graph g.

Parameters
  • g (kgx.graph.base_graph.BaseGraph) – The target graph

  • n (str) – Node id

  • data (dict) – Node properties

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged node

Return type

dict