Operations

This module provides a set of operations that are supported by KGX. Each operation has an entrypoint - a function that takes a networkx.MultiDiGraph as input and performs some operation on the nodes and/or edges of that graph.

Clique Merge

kgx.operations.clique_merge.build_cliques(target_graph: networkx.classes.multidigraph.MultiDiGraph)networkx.classes.graph.Graph[source]

Builds a clique graph from same_as edges in target_graph.

Parameters

target_graph (networkx.MultiDiGraph) – A MultiDiGraph that contains nodes and edges

Returns

The clique graph with only same_as edges

Return type

networkx.Graph

kgx.operations.clique_merge.check_all_categories(categories)Tuple[List, List, List][source]

Check all categories in categories.

Parameters

categories (List) – A list of categories

Returns

A tuple consisting of valid biolink categories, invalid biolink categories, and invalid categories

Return type

Tuple[List, List, List]

kgx.operations.clique_merge.check_categories(categories: List, closure: List, category_mapping: Optional[Dict[str, str]] = None)Tuple[List, List, List][source]

Check categories to ensure whether values in categories are valid biolink categories.

Parameters
  • categories (List) – A list of categories to check

  • closure (List) – A list of nodes in a clique

  • category_mapping (Optional[Dict[str, str]]) – A map that provides mapping from a non-biolink category to a biolink category

Returns

A tuple consisting of valid biolink categories, invalid biolink categories, and invalid categories

Return type

Tuple[List, List, List]

kgx.operations.clique_merge.clique_merge(target_graph: networkx.classes.multidigraph.MultiDiGraph, leader_annotation: Optional[str] = None, prefix_prioritization_map: Optional[Dict[str, List[str]]] = None, category_mapping: Optional[Dict[str, str]] = None, strict: bool = True)Tuple[networkx.classes.multidigraph.MultiDiGraph, networkx.classes.graph.Graph][source]
Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

  • prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

  • strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories

Returns

A tuple containing the updated target graph, and the clique graph

Return type

Tuple[networkx.MultiDiGraph, networkx.Graph]

kgx.operations.clique_merge.consolidate_edges(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str)networkx.classes.multidigraph.MultiDiGraph[source]

Move all edges from nodes in a clique to the clique leader.

Original subject and object of a node are preserved via ORIGINAL_SUBJECT_PROPERTY and ORIGINAL_OBJECT_PROPERTY

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

Returns

The target graph where all edges from nodes in a clique are moved to clique leader

Return type

nx.MultiDiGraph

kgx.operations.clique_merge.elect_leader(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str, prefix_prioritization_map: Optional[Dict[str, List[str]]], category_mapping: Optional[Dict[str, str]], strict: bool = True)networkx.classes.multidigraph.MultiDiGraph[source]

Elect leader for each clique in a graph.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

  • prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

  • strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories

Returns

The updated target graph

Return type

networkx.MultiDiGraph

kgx.operations.clique_merge.get_category_from_equivalence(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, node: str, attributes: Dict)List[source]

Get category for a node based on its equivalent nodes in a graph.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • node (str) – Node identifier

  • attributes (Dict) – Node’s attributes

Returns

Category for the node

Return type

List

kgx.operations.clique_merge.get_clique_category(clique_graph: networkx.classes.graph.Graph, clique: List)Tuple[str, List][source]

Given a clique, identify the category of the clique.

Parameters
  • clique_graph (nx.Graph) – Clique graph

  • clique (List) – A list of nodes in clique

Returns

A tuple of clique category and its ancestors

Return type

Tuple[str, list]

kgx.operations.clique_merge.get_leader_by_annotation(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, leader_annotation: str)Tuple[Optional[str], Optional[str]][source]

Get leader by searching for leader annotation property in any of the nodes in a given clique.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes from a clique

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.operations.clique_merge.get_leader_by_prefix_priority(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, prefix_priority_list: List)Tuple[Optional[str], Optional[str]][source]

Get leader from clique based on a given prefix priority.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes that correspond to a clique

  • prefix_priority_list (List) – A list of prefixes in descending priority

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.operations.clique_merge.get_leader_by_sort(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List)Tuple[Optional[str], Optional[str]][source]

Get leader from clique based on the first selection from an alphabetical sort of the node id prefixes.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes that correspond to a clique

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.operations.clique_merge.sort_categories(categories: Union[List, Set, ordered_set.OrderedSet])List[source]

Sort a list of categories from most specific to the most generic.

Parameters

categories (Union[List, Set, OrderedSet]) – A list of categories

Returns

A sorted list of categories

Return type

List

kgx.operations.clique_merge.update_node_categories(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, category_mapping: Optional[Dict[str, str]], strict: bool = True)List[source]

For a given clique, get category for each node in clique and validate against Biolink Model, mapping to Biolink Model category where needed.

For example, If a node has biolink:Gene as its category, then this method adds all of its ancestors.

Parameters
  • target_graph (networkx.MultiDiGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes from a clique

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

  • strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories

Returns

The clique

Return type

List

Graph Merge

kgx.operations.graph_merge.add_all_edges(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True)int[source]

Add all edges from source graph (g2) to target graph (g1).

Parameters
  • g1 (networkx.MultiDiGraph) – Target graph

  • g2 (networkx.MultiDiGraph) – Source graph

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

Number of edges merged during this operation

Return type

int

kgx.operations.graph_merge.add_all_nodes(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True)int[source]

Add all nodes from source graph (g2) to target graph (g1).

Parameters
  • g1 (networkx.MultiDiGraph) – Target graph

  • g2 (networkx.MultiDiGraph) – Source graph

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

Number of nodes merged during this operation

Return type

int

kgx.operations.graph_merge.merge_all_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True)networkx.classes.multidigraph.MultiDiGraph[source]

Merge one or more graphs.

Note

This method will first pick the largest graph in graphs and use that as the target to merge the remaining graphs. This is to reduce the memory footprint for this operation. The criteria for largest graph is the graph with the largest number of edges.

The caveat is that the merge operation has a side effect where the largest graph is altered.

If you would like to ensure that all incoming graphs remain as-is, then look at merge_graphs.

The outcome of the merge on node and edge properties depend on the preserve parameter. If preserve is True then, - core properties will not be overwritten - other properties will be concatenated to a list

If preserve is False then, - core properties will not be overwritten - other properties will be replaced

Parameters
  • graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged graph

Return type

nx.MultiDiGraph

kgx.operations.graph_merge.merge_edge(g: networkx.classes.multidigraph.MultiDiGraph, u: str, v: str, key: str, data: dict, preserve: bool = True)dict[source]

Merge edge u -> v into graph g.

Parameters
  • g (nx.MultiDiGraph) – The target graph

  • u (str) – Subject node id

  • v (str) – Object node id

  • key (str) – Edge key

  • data (dict) – Node properties

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged edge

Return type

dict

kgx.operations.graph_merge.merge_graphs(graph: networkx.classes.multidigraph.MultiDiGraph, graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True)networkx.classes.multidigraph.MultiDiGraph[source]

Merge all graphs in graphs to graph.

Parameters
  • graph (networkx.MultiDiGraph) – A networkx graph

  • graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged graph

Return type

nx.MultiDiGraph

kgx.operations.graph_merge.merge_node(g: networkx.classes.multidigraph.MultiDiGraph, n: str, data: dict, preserve: bool = True)dict[source]

Merge node n into graph g.

Parameters
  • g (nx.MultiDiGraph) – The target graph

  • n (str) – Node id

  • data (dict) – Node properties

  • preserve (bool) – Whether or not to preserve conflicting properties

Returns

The merged node

Return type

dict

Summarize Graph

kgx.operations.summarize_graph.generate_graph_stats(graph: networkx.classes.multidigraph.MultiDiGraph, graph_name: str, filename: str, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None)None[source]

Generate stats from Graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • graph_name (str) – Name for the graph

  • filename (str) – Filename to write the stats to

  • node_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

  • edge_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

kgx.operations.summarize_graph.get_facet_counts(data: Dict, stats: Dict, x: str, y: str, facet_property: str)Dict[source]

Facet on facet_property and record the count for stats[x][y][facet_property].

Parameters
  • data (dict) – Node/edge data dictionary

  • stats (dict) – The stats dictionary

  • x (str) – first key

  • y (str) – second key

  • facet_property (str) – The property to facet on

Returns

The stats dictionary

Return type

Dict

kgx.operations.summarize_graph.summarize_edges(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None)[source]

Summarize the edges in a graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • facet_properties (Optional[List]) – The properties to facet on

Returns

The edge stats

Return type

Dict

kgx.operations.summarize_graph.summarize_graph(graph: networkx.classes.multidigraph.MultiDiGraph, name: Optional[str] = None, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None)Dict[source]

Summarize the entire graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • name (str) – Name for the graph

  • node_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

  • edge_facet_properties (Optional[List]) – A list of properties to facet on. For example, ['provided_by']

Returns

The stats dictionary

Return type

Dict

kgx.operations.summarize_graph.summarize_nodes(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None)Dict[source]

Summarize the nodes in a graph.

Parameters
  • graph (networkx.MultiDiGraph) – The graph

  • facet_properties (Optional[List]) – A list of properties to facet on

Returns

The node stats

Return type

Dict