Operations

This module provides a set of operations that are supported by KGX. Each operation has an entrypoint - a function that takes a networkx.MultiDiGraph as input and performs some operation on the nodes and/or edges of that graph.

Clique Merge

kgx.operations.clique_merge.build_cliques(target_graph: networkx.classes.multidigraph.MultiDiGraph)networkx.classes.graph.Graph[source]

Builds a clique graph from same_as edges in target_graph.

Parameters

target_graph (networkx.MultiDiGraph) – A MultiDiGraph that contains nodes and edges

Returns

The clique graph with only same_as edges

Return type

networkx.Graph

kgx.operations.clique_merge.clique_merge(target_graph: networkx.classes.multidigraph.MultiDiGraph, leader_annotation: Optional[str] = None, prefix_prioritization_map: Optional[Dict[str, List[str]]] = None, category_mapping: Optional[Dict[str, str]] = None)Tuple[networkx.classes.multidigraph.MultiDiGraph, networkx.classes.graph.Graph][source]
Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

  • prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

Returns

A tuple containing the updated target graph, and the clique graph

Return type

Tuple[networkx.MultiDiGraph, networkx.Graph]

kgx.operations.clique_merge.consolidate_edges(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str)networkx.classes.multidigraph.MultiDiGraph[source]

Move all edges from nodes in a clique to the clique leader.

Original subject and object of a node are preserved via ORIGINAL_SUBJECT_PROPERTY and ORIGINAL_OBJECT_PROPERTY

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

Returns

The target graph where all edges from nodes in a clique are moved to clique leader

Return type

nx.MultiDiGraph

kgx.operations.clique_merge.elect_leader(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str, prefix_prioritization_map: Optional[Dict[str, List[str]]], category_mapping: Optional[Dict[str, str]])networkx.classes.multidigraph.MultiDiGraph[source]

Elect leader for each clique in a graph.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

  • prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

Returns

The updated target graph

Return type

networkx.MultiDiGraph

kgx.operations.clique_merge.get_category_from_equivalence(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, node: str, attributes: Dict)List[source]

Get category for a node based on its equivalent nodes in a graph.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • node (str) – Node identifier

  • attributes (Dict) – Node’s attributes

Returns

Category for the node

Return type

List

kgx.operations.clique_merge.get_leader_by_annotation(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, leader_annotation: str)Tuple[Optional[str], Optional[str]][source]

Get leader by searching for leader annotation property in any of the nodes in a given clique.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes from a clique

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.operations.clique_merge.get_leader_by_prefix_priority(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, prefix_priority_list: List)Tuple[Optional[str], Optional[str]][source]

Get leader from clique based on a given prefix priority.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes that correspond to a clique

  • prefix_priority_list (List) – A list of prefixes in descending priority

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.operations.clique_merge.get_leader_by_sort(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List)Tuple[Optional[str], Optional[str]][source]

Get leader from clique based on the first selection from an alphabetical sort of the node id prefixes.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes that correspond to a clique

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.operations.clique_merge.get_the_most_specific_category(categories: List)Tuple[Optional[Any], List[Any]][source]

From a list of categories, get ancestors for all. The category with the longest ancestor is considered to be the most specific.

Note

This assumes that all the category in categories are part of the same closure.

Parameters

categories (List) – A list of categories

Returns

A tuple of the most specific category and a list of ancestors of that category

Return type

Tuple[Optional[Any], List[Any]]

kgx.operations.clique_merge.update_node_categories(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, category_mapping: Optional[Dict[str, str]])List[source]

For a given clique, get category for each node in clique and validate against Biolink Model, mapping to Biolink Model category where needed.

For example, If a node has biolink:Gene as its category, then this method adds all of its ancestors.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes from a clique

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

Returns

The clique

Return type

List

kgx.operations.clique_merge.validate_clique_category(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List)Tuple[Optional[str], List[Any]][source]

For nodes in a clique, validate the category for each node to make sure that all nodes in a clique are of the same type.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes from a clique

Returns

A tuple of clique category string and a list of invalid nodes

Return type

Tuple[Optional[str], List[Any]]

Graph Merge

kgx.operations.graph_merge.add_all_edges(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True)int[source]

Add all edges from source graph (g2) to target graph (g1).

Parameters
  • g1 (networkx.MultiDiGraph) – Target graph

  • g2 (networkx.MultiDiGraph) – Source graph

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

Number of edges merged during this operation

Return type

int

kgx.operations.graph_merge.add_all_nodes(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True)int[source]

Add all nodes from source graph (g2) to target graph (g1).

Parameters
  • g1 (networkx.MultiDiGraph) – Target graph

  • g2 (networkx.MultiDiGraph) – Source graph

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

Number of nodes merged during this operation

Return type

int

kgx.operations.graph_merge.merge_all_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True)networkx.classes.multidigraph.MultiDiGraph[source]

Merge one or more graphs.

Note

This method will first pick the largest graph in graphs and use that as the target to merge the remaining graphs. This is to reduce the memory footprint for this operation. The criteria for largest graph is the graph with the largest number of edges.

The caveat is that the merge operation has a side effect where the largest graph is altered.

If you would like to ensure that all incoming graphs remain as-is, then look at merge_graphs.

The outcome of the merge on node and edge properties depend on the preserve parameter. If preserve is True then, - core properties will not be overwritten - other properties will be concatenated to a list

If preserve is False then, - core properties will not be overwritten - other properties will be replaced

Parameters
  • graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged graph

Return type

nx.MultiDiGraph

kgx.operations.graph_merge.merge_edge(g: networkx.classes.multidigraph.MultiDiGraph, u: str, v: str, key: str, data: dict, preserve: bool = True)dict[source]

Merge edge u -> v into graph g.

Parameters
  • g (nx.MultiDiGraph) – The target graph

  • u (str) – Subject node id

  • v (str) – Object node id

  • key (str) – Edge key

  • data (dict) – Node properties

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged edge

Return type

dict

kgx.operations.graph_merge.merge_graphs(graph: networkx.classes.multidigraph.MultiDiGraph, graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True)networkx.classes.multidigraph.MultiDiGraph[source]

Merge all graphs in graphs to graph.

Parameters
  • graph (networkx.MultiDiGraph) – A networkx graph

  • graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged graph

Return type

nx.MultiDiGraph

kgx.operations.graph_merge.merge_node(g: networkx.classes.multidigraph.MultiDiGraph, n: str, data: dict, preserve: bool = True)dict[source]

Merge node n into graph g.

Parameters
  • g (nx.MultiDiGraph) – The target graph

  • n (str) – Node id

  • data (dict) – Node properties

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged node

Return type

dict

Summarize Graph

kgx.operations.summarize_graph.generate_graph_stats(graph: networkx.classes.multidigraph.MultiDiGraph, graph_name: str, filename: str, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None)None[source]

Generate stats from Graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • graph_name (str) – Name for the graph

  • filename (str) – Filename to write the stats to

  • node_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

  • edge_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

kgx.operations.summarize_graph.get_facet_counts(data: Dict, stats: Dict, x: str, y: str, facet_property: str)Dict[source]

Facet on facet_property and record the count for stats[x][y][facet_property].

Parameters
  • data (dict) – Node/edge data dictionary

  • stats (dict) – The stats dictionary

  • x (str) – first key

  • y (str) – second key

  • facet_property (str) – The property to facet on

Returns

The stats dictionary

Return type

Dict

kgx.operations.summarize_graph.summarize_edges(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None)[source]

Summarize the edges in a graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • facet_properties (Optional[List]) – The properties to facet on

Returns

The edge stats

Return type

Dict

kgx.operations.summarize_graph.summarize_graph(graph: networkx.classes.multidigraph.MultiDiGraph, name: Optional[str] = None, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None)Dict[source]

Summarize the entire graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • name (str) – Name for the graph

  • node_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

  • edge_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

Returns

The stats dictionary

Return type

Dict

kgx.operations.summarize_graph.summarize_nodes(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None)Dict[source]

Summarize the nodes in a graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • facet_properties (Optional[List]) – A list of properties to facet on

Returns

The node stats

Return type

Dict