Operations¶
This module provides a set of operations that are supported by KGX. Each operation has an entrypoint - a function
that takes a networkx.MultiDiGraph
as input and performs some operation on the nodes and/or edges of that graph.
Clique Merge¶
-
kgx.operations.clique_merge.
build_cliques
(target_graph: networkx.classes.multidigraph.MultiDiGraph) → networkx.classes.graph.Graph[source]¶ Builds a clique graph from
same_as
edges intarget_graph
.- Parameters
target_graph (networkx.MultiDiGraph) – A MultiDiGraph that contains nodes and edges
- Returns
The clique graph with only
same_as
edges- Return type
networkx.Graph
-
kgx.operations.clique_merge.
clique_merge
(target_graph: networkx.classes.multidigraph.MultiDiGraph, leader_annotation: Optional[str] = None, prefix_prioritization_map: Optional[Dict[str, List[str]]] = None, category_mapping: Optional[Dict[str, str]] = None) → Tuple[networkx.classes.multidigraph.MultiDiGraph, networkx.classes.graph.Graph][source]¶ - Parameters
target_graph (networkx.MultiDiGraph) – The original graph
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories
category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories
- Returns
A tuple containing the updated target graph, and the clique graph
- Return type
Tuple[networkx.MultiDiGraph, networkx.Graph]
-
kgx.operations.clique_merge.
consolidate_edges
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Move all edges from nodes in a clique to the clique leader.
Original subject and object of a node are preserved via
ORIGINAL_SUBJECT_PROPERTY
andORIGINAL_OBJECT_PROPERTY
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
- Returns
The target graph where all edges from nodes in a clique are moved to clique leader
- Return type
nx.MultiDiGraph
-
kgx.operations.clique_merge.
elect_leader
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str, prefix_prioritization_map: Optional[Dict[str, List[str]]], category_mapping: Optional[Dict[str, str]]) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Elect leader for each clique in a graph.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories
category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories
- Returns
The updated target graph
- Return type
networkx.MultiDiGraph
-
kgx.operations.clique_merge.
get_category_from_equivalence
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, node: str, attributes: Dict) → List[source]¶ Get category for a node based on its equivalent nodes in a graph.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
node (str) – Node identifier
attributes (Dict) – Node’s attributes
- Returns
Category for the node
- Return type
List
-
kgx.operations.clique_merge.
get_leader_by_annotation
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, leader_annotation: str) → Tuple[Optional[str], Optional[str]][source]¶ Get leader by searching for leader annotation property in any of the nodes in a given clique.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes from a clique
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
- Returns
A tuple containing the node that has been elected as the leader and the election strategy
- Return type
Tuple[Optional[str], Optional[str]]
-
kgx.operations.clique_merge.
get_leader_by_prefix_priority
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, prefix_priority_list: List) → Tuple[Optional[str], Optional[str]][source]¶ Get leader from clique based on a given prefix priority.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes that correspond to a clique
prefix_priority_list (List) – A list of prefixes in descending priority
- Returns
A tuple containing the node that has been elected as the leader and the election strategy
- Return type
Tuple[Optional[str], Optional[str]]
-
kgx.operations.clique_merge.
get_leader_by_sort
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List) → Tuple[Optional[str], Optional[str]][source]¶ Get leader from clique based on the first selection from an alphabetical sort of the node id prefixes.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes that correspond to a clique
- Returns
A tuple containing the node that has been elected as the leader and the election strategy
- Return type
Tuple[Optional[str], Optional[str]]
-
kgx.operations.clique_merge.
get_the_most_specific_category
(categories: List) → Tuple[Optional[Any], List[Any]][source]¶ From a list of categories, get ancestors for all. The category with the longest ancestor is considered to be the most specific.
Note
This assumes that all the category in
categories
are part of the same closure.- Parameters
categories (List) – A list of categories
- Returns
A tuple of the most specific category and a list of ancestors of that category
- Return type
Tuple[Optional[Any], List[Any]]
-
kgx.operations.clique_merge.
update_node_categories
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, category_mapping: Optional[Dict[str, str]]) → List[source]¶ For a given clique, get category for each node in clique and validate against Biolink Model, mapping to Biolink Model category where needed.
For example, If a node has
biolink:Gene
as its category, then this method adds all of its ancestors.- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes from a clique
category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories
- Returns
The clique
- Return type
List
-
kgx.operations.clique_merge.
validate_clique_category
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List) → Tuple[Optional[str], List[Any]][source]¶ For nodes in a clique, validate the category for each node to make sure that all nodes in a clique are of the same type.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes from a clique
- Returns
A tuple of clique category string and a list of invalid nodes
- Return type
Tuple[Optional[str], List[Any]]
Graph Merge¶
-
kgx.operations.graph_merge.
add_all_edges
(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True) → int[source]¶ Add all edges from source graph (
g2
) to target graph (g1
).- Parameters
g1 (networkx.MultiDiGraph) – Target graph
g2 (networkx.MultiDiGraph) – Source graph
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
Number of edges merged during this operation
- Return type
int
-
kgx.operations.graph_merge.
add_all_nodes
(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True) → int[source]¶ Add all nodes from source graph (
g2
) to target graph (g1
).- Parameters
g1 (networkx.MultiDiGraph) – Target graph
g2 (networkx.MultiDiGraph) – Source graph
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
Number of nodes merged during this operation
- Return type
int
-
kgx.operations.graph_merge.
merge_all_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Merge one or more graphs.
Note
This method will first pick the largest graph in
graphs
and use that as the target to merge the remaining graphs. This is to reduce the memory footprint for this operation. The criteria for largest graph is the graph with the largest number of edges.The caveat is that the merge operation has a side effect where the largest graph is altered.
If you would like to ensure that all incoming graphs remain as-is, then look at
merge_graphs
.The outcome of the merge on node and edge properties depend on the
preserve
parameter. If preserve isTrue
then, - core properties will not be overwritten - other properties will be concatenated to a listIf preserve is
False
then, - core properties will not be overwritten - other properties will be replaced- Parameters
graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged graph
- Return type
nx.MultiDiGraph
-
kgx.operations.graph_merge.
merge_edge
(g: networkx.classes.multidigraph.MultiDiGraph, u: str, v: str, key: str, data: dict, preserve: bool = True) → dict[source]¶ Merge edge
u
->v
into graphg
.- Parameters
g (nx.MultiDiGraph) – The target graph
u (str) – Subject node id
v (str) – Object node id
key (str) – Edge key
data (dict) – Node properties
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged edge
- Return type
dict
-
kgx.operations.graph_merge.
merge_graphs
(graph: networkx.classes.multidigraph.MultiDiGraph, graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Merge all graphs in
graphs
tograph
.- Parameters
graph (networkx.MultiDiGraph) – A networkx graph
graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged graph
- Return type
nx.MultiDiGraph
-
kgx.operations.graph_merge.
merge_node
(g: networkx.classes.multidigraph.MultiDiGraph, n: str, data: dict, preserve: bool = True) → dict[source]¶ Merge node
n
into graphg
.- Parameters
g (nx.MultiDiGraph) – The target graph
n (str) – Node id
data (dict) – Node properties
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged node
- Return type
dict
Summarize Graph¶
-
kgx.operations.summarize_graph.
generate_graph_stats
(graph: networkx.classes.multidigraph.MultiDiGraph, graph_name: str, filename: str, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None) → None[source]¶ Generate stats from Graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
graph_name (str) – Name for the graph
filename (str) – Filename to write the stats to
node_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
edge_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
-
kgx.operations.summarize_graph.
get_facet_counts
(data: Dict, stats: Dict, x: str, y: str, facet_property: str) → Dict[source]¶ Facet on
facet_property
and record the count forstats[x][y][facet_property]
.- Parameters
data (dict) – Node/edge data dictionary
stats (dict) – The stats dictionary
x (str) – first key
y (str) – second key
facet_property (str) – The property to facet on
- Returns
The stats dictionary
- Return type
Dict
-
kgx.operations.summarize_graph.
summarize_edges
(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None)[source]¶ Summarize the edges in a graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
facet_properties (Optional[List]) – The properties to facet on
- Returns
The edge stats
- Return type
Dict
-
kgx.operations.summarize_graph.
summarize_graph
(graph: networkx.classes.multidigraph.MultiDiGraph, name: Optional[str] = None, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None) → Dict[source]¶ Summarize the entire graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
name (str) – Name for the graph
node_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
edge_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
- Returns
The stats dictionary
- Return type
Dict
-
kgx.operations.summarize_graph.
summarize_nodes
(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None) → Dict[source]¶ Summarize the nodes in a graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
facet_properties (Optional[List]) – A list of properties to facet on
- Returns
The node stats
- Return type
Dict