Operations¶
This module provides a set of operations that are supported by KGX. Each operation has an entrypoint - a function
that takes a networkx.MultiDiGraph
as input and performs some operation on the nodes and/or edges of that graph.
Clique Merge¶
-
kgx.operations.clique_merge.
build_cliques
(target_graph: networkx.classes.multidigraph.MultiDiGraph) → networkx.classes.graph.Graph[source]¶ Builds a clique graph from
same_as
edges intarget_graph
.- Parameters
target_graph (networkx.MultiDiGraph) – A MultiDiGraph that contains nodes and edges
- Returns
The clique graph with only
same_as
edges- Return type
networkx.Graph
-
kgx.operations.clique_merge.
check_all_categories
(categories) → Tuple[List, List, List][source]¶ Check all categories in
categories
.- Parameters
categories (List) – A list of categories
- Returns
A tuple consisting of valid biolink categories, invalid biolink categories, and invalid categories
- Return type
Tuple[List, List, List]
-
kgx.operations.clique_merge.
check_categories
(categories: List, closure: List, category_mapping: Optional[Dict[str, str]] = None) → Tuple[List, List, List][source]¶ Check categories to ensure whether values in
categories
are valid biolink categories.- Parameters
categories (List) – A list of categories to check
closure (List) – A list of nodes in a clique
category_mapping (Optional[Dict[str, str]]) – A map that provides mapping from a non-biolink category to a biolink category
- Returns
A tuple consisting of valid biolink categories, invalid biolink categories, and invalid categories
- Return type
Tuple[List, List, List]
-
kgx.operations.clique_merge.
clique_merge
(target_graph: networkx.classes.multidigraph.MultiDiGraph, leader_annotation: Optional[str] = None, prefix_prioritization_map: Optional[Dict[str, List[str]]] = None, category_mapping: Optional[Dict[str, str]] = None, strict: bool = True) → Tuple[networkx.classes.multidigraph.MultiDiGraph, networkx.classes.graph.Graph][source]¶ - Parameters
target_graph (networkx.MultiDiGraph) – The original graph
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories
category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories
strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories
- Returns
A tuple containing the updated target graph, and the clique graph
- Return type
Tuple[networkx.MultiDiGraph, networkx.Graph]
-
kgx.operations.clique_merge.
consolidate_edges
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Move all edges from nodes in a clique to the clique leader.
Original subject and object of a node are preserved via
ORIGINAL_SUBJECT_PROPERTY
andORIGINAL_OBJECT_PROPERTY
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
- Returns
The target graph where all edges from nodes in a clique are moved to clique leader
- Return type
nx.MultiDiGraph
-
kgx.operations.clique_merge.
elect_leader
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, leader_annotation: str, prefix_prioritization_map: Optional[Dict[str, List[str]]], category_mapping: Optional[Dict[str, str]], strict: bool = True) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Elect leader for each clique in a graph.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories
category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories
strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories
- Returns
The updated target graph
- Return type
networkx.MultiDiGraph
-
kgx.operations.clique_merge.
get_category_from_equivalence
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, node: str, attributes: Dict) → List[source]¶ Get category for a node based on its equivalent nodes in a graph.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
node (str) – Node identifier
attributes (Dict) – Node’s attributes
- Returns
Category for the node
- Return type
List
-
kgx.operations.clique_merge.
get_clique_category
(clique_graph: networkx.classes.graph.Graph, clique: List) → Tuple[str, List][source]¶ Given a clique, identify the category of the clique.
- Parameters
clique_graph (nx.Graph) – Clique graph
clique (List) – A list of nodes in clique
- Returns
A tuple of clique category and its ancestors
- Return type
Tuple[str, list]
-
kgx.operations.clique_merge.
get_leader_by_annotation
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, leader_annotation: str) → Tuple[Optional[str], Optional[str]][source]¶ Get leader by searching for leader annotation property in any of the nodes in a given clique.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes from a clique
leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique
- Returns
A tuple containing the node that has been elected as the leader and the election strategy
- Return type
Tuple[Optional[str], Optional[str]]
-
kgx.operations.clique_merge.
get_leader_by_prefix_priority
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, prefix_priority_list: List) → Tuple[Optional[str], Optional[str]][source]¶ Get leader from clique based on a given prefix priority.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes that correspond to a clique
prefix_priority_list (List) – A list of prefixes in descending priority
- Returns
A tuple containing the node that has been elected as the leader and the election strategy
- Return type
Tuple[Optional[str], Optional[str]]
-
kgx.operations.clique_merge.
get_leader_by_sort
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List) → Tuple[Optional[str], Optional[str]][source]¶ Get leader from clique based on the first selection from an alphabetical sort of the node id prefixes.
- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes that correspond to a clique
- Returns
A tuple containing the node that has been elected as the leader and the election strategy
- Return type
Tuple[Optional[str], Optional[str]]
-
kgx.operations.clique_merge.
sort_categories
(categories: Union[List, Set, ordered_set.OrderedSet]) → List[source]¶ Sort a list of categories from most specific to the most generic.
- Parameters
categories (Union[List, Set, OrderedSet]) – A list of categories
- Returns
A sorted list of categories
- Return type
List
-
kgx.operations.clique_merge.
update_node_categories
(target_graph: networkx.classes.multidigraph.MultiDiGraph, clique_graph: networkx.classes.graph.Graph, clique: List, category_mapping: Optional[Dict[str, str]], strict: bool = True) → List[source]¶ For a given clique, get category for each node in clique and validate against Biolink Model, mapping to Biolink Model category where needed.
For example, If a node has
biolink:Gene
as its category, then this method adds all of its ancestors.- Parameters
target_graph (networkx.MultiDiGraph) – The original graph
clique_graph (networkx.Graph) – The clique graph
clique (List) – A list of nodes from a clique
category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories
strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories
- Returns
The clique
- Return type
List
Graph Merge¶
-
kgx.operations.graph_merge.
add_all_edges
(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True) → int[source]¶ Add all edges from source graph (
g2
) to target graph (g1
).- Parameters
g1 (networkx.MultiDiGraph) – Target graph
g2 (networkx.MultiDiGraph) – Source graph
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
Number of edges merged during this operation
- Return type
int
-
kgx.operations.graph_merge.
add_all_nodes
(g1: networkx.classes.multidigraph.MultiDiGraph, g2: networkx.classes.multidigraph.MultiDiGraph, preserve: bool = True) → int[source]¶ Add all nodes from source graph (
g2
) to target graph (g1
).- Parameters
g1 (networkx.MultiDiGraph) – Target graph
g2 (networkx.MultiDiGraph) – Source graph
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
Number of nodes merged during this operation
- Return type
int
-
kgx.operations.graph_merge.
merge_all_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Merge one or more graphs.
Note
This method will first pick the largest graph in
graphs
and use that as the target to merge the remaining graphs. This is to reduce the memory footprint for this operation. The criteria for largest graph is the graph with the largest number of edges.The caveat is that the merge operation has a side effect where the largest graph is altered.
If you would like to ensure that all incoming graphs remain as-is, then look at
merge_graphs
.The outcome of the merge on node and edge properties depend on the
preserve
parameter. If preserve isTrue
then, - core properties will not be overwritten - other properties will be concatenated to a listIf preserve is
False
then, - core properties will not be overwritten - other properties will be replaced- Parameters
graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged graph
- Return type
nx.MultiDiGraph
-
kgx.operations.graph_merge.
merge_edge
(g: networkx.classes.multidigraph.MultiDiGraph, u: str, v: str, key: str, data: dict, preserve: bool = True) → dict[source]¶ Merge edge
u
->v
into graphg
.- Parameters
g (nx.MultiDiGraph) – The target graph
u (str) – Subject node id
v (str) – Object node id
key (str) – Edge key
data (dict) – Node properties
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged edge
- Return type
dict
-
kgx.operations.graph_merge.
merge_graphs
(graph: networkx.classes.multidigraph.MultiDiGraph, graphs: List[networkx.classes.multidigraph.MultiDiGraph], preserve: bool = True) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Merge all graphs in
graphs
tograph
.- Parameters
graph (networkx.MultiDiGraph) – A networkx graph
graphs (List[networkx.MultiDiGraph]) – A list of networkx.MultiDiGraph to merge
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged graph
- Return type
nx.MultiDiGraph
-
kgx.operations.graph_merge.
merge_node
(g: networkx.classes.multidigraph.MultiDiGraph, n: str, data: dict, preserve: bool = True) → dict[source]¶ Merge node
n
into graphg
.- Parameters
g (nx.MultiDiGraph) – The target graph
n (str) – Node id
data (dict) – Node properties
preserve (bool) – Whether or not to preserve conflicting properties
- Returns
The merged node
- Return type
dict
Summarize Graph¶
-
kgx.operations.summarize_graph.
generate_graph_stats
(graph: networkx.classes.multidigraph.MultiDiGraph, graph_name: str, filename: str, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None) → None[source]¶ Generate stats from Graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
graph_name (str) – Name for the graph
filename (str) – Filename to write the stats to
node_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
edge_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
-
kgx.operations.summarize_graph.
get_facet_counts
(data: Dict, stats: Dict, x: str, y: str, facet_property: str) → Dict[source]¶ Facet on
facet_property
and record the count forstats[x][y][facet_property]
.- Parameters
data (dict) – Node/edge data dictionary
stats (dict) – The stats dictionary
x (str) – first key
y (str) – second key
facet_property (str) – The property to facet on
- Returns
The stats dictionary
- Return type
Dict
-
kgx.operations.summarize_graph.
summarize_edges
(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None)[source]¶ Summarize the edges in a graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
facet_properties (Optional[List]) – The properties to facet on
- Returns
The edge stats
- Return type
Dict
-
kgx.operations.summarize_graph.
summarize_graph
(graph: networkx.classes.multidigraph.MultiDiGraph, name: Optional[str] = None, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None) → Dict[source]¶ Summarize the entire graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
name (str) – Name for the graph
node_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
edge_facet_properties (Optional[List]) – A list of properties to facet on. For example,
['provided_by']
- Returns
The stats dictionary
- Return type
Dict
-
kgx.operations.summarize_graph.
summarize_nodes
(graph: networkx.classes.multidigraph.MultiDiGraph, facet_properties: Optional[List] = None) → Dict[source]¶ Summarize the nodes in a graph.
- Parameters
graph (networkx.MultiDiGraph) – The graph
facet_properties (Optional[List]) – A list of properties to facet on
- Returns
The node stats
- Return type
Dict