Clique Merge

The Clique Merge operation performs a series of operations on your target (input) graph:

  • Build cliques from nodes in the target graph

  • Elect a leader for each individual clique

  • Move all edges in a clique to the leader node

The main entry point is kgx.graph_operations.clique_merge.clique_merge method which takes an instance of kgx.graph.base_graph.BaseGraph.

Build cliques from nodes in the target graph

Given a target graph, create a clique graph where nodes in the same clique are connected via biolink:same_as edges.

In the target graph, you can define nodes that belong to the same clique as follows:

  • Having biolink:same_as edges between nodes (preferred and consistent with Biolink Model)

  • Having same_as node property on a node that lists all equivalent nodes (deprecated)

Elect a leader for each individual clique

Once the clique graph is built, go through each clique and elect a representative node or leader node for that clique.

Elect leader for each clique based on three election criteria, listed in the order in which they are checked:

  • Leader annotation: Elect the leader node for a clique based on clique_leader annotation on the node

  • Prefix prioritization: Elect the leader node for a clique that has a prefix which is of the highest priority in the identifier prefixes list, as defined in the Biolink Model

  • Prefix prioritization fallback: Elect the leader node for a clique that has a prefix which is the first in an alphabetically sorted list of all ID prefixes within the clique

Move all edges in a clique to the leader node

The last step is edge consolidation where all the edges from nodes in a clique are moved to the leader node.

The original subject and object node of an edge is tracked via the _original_subject and _original_object edge property.

kgx.graph_operations.clique_merge

kgx.graph_operations.clique_merge.build_cliques(target_graph: kgx.graph.base_graph.BaseGraph) → networkx.classes.multidigraph.MultiDiGraph[source]

Builds a clique graph from same_as edges in target_graph.

Parameters

target_graph (kgx.graph.base_graph.BaseGraph) – An instance of BaseGraph that contains nodes and edges

Returns

The clique graph with only same_as edges

Return type

networkx.MultiDiGraph

kgx.graph_operations.clique_merge.check_all_categories(categories) → Tuple[List, List, List][source]

Check all categories in categories.

Parameters

categories (List) – A list of categories

Returns

  • Tuple[List, List, List] – A tuple consisting of valid biolink categories, invalid biolink categories, and invalid categories

  • Note (the sort_categories method will re-arrange the passed in category list according to the distance)

  • of each list member from the top of their hierarchy. Each category’s hierarchy is made up of its

  • ’is_a’ and mixin ancestors.

kgx.graph_operations.clique_merge.check_categories(categories: List, closure: List, category_mapping: Optional[Dict[str, str]] = None) → Tuple[List, List, List][source]

Check categories to ensure whether values in categories are valid biolink categories. Valid biolink categories are classes that descend from ‘NamedThing’. Mixins, while valid ancestors, are not valid categories.

Parameters
  • categories (List) – A list of categories to check

  • closure (List) – A list of nodes in a clique

  • category_mapping (Optional[Dict[str, str]]) – A map that provides mapping from a non-biolink category to a biolink category

Returns

A tuple consisting of valid biolink categories, invalid biolink categories, and invalid categories

Return type

Tuple[List, List, List]

kgx.graph_operations.clique_merge.clique_merge(target_graph: kgx.graph.base_graph.BaseGraph, leader_annotation: str = None, prefix_prioritization_map: Optional[Dict[str, List[str]]] = None, category_mapping: Optional[Dict[str, str]] = None, strict: bool = True) → Tuple[kgx.graph.base_graph.BaseGraph, networkx.classes.multidigraph.MultiDiGraph][source]
Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

  • prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

  • strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories

Returns

A tuple containing the updated target graph, and the clique graph

Return type

Tuple[kgx.graph.base_graph.BaseGraph, networkx.MultiDiGraph]

kgx.graph_operations.clique_merge.consolidate_edges(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, leader_annotation: str) → kgx.graph.base_graph.BaseGraph[source]

Move all edges from nodes in a clique to the clique leader.

Original subject and object of a node are preserved via ORIGINAL_SUBJECT_PROPERTY and ORIGINAL_OBJECT_PROPERTY

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.MultiDiGraph) – The clique graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

Returns

The target graph where all edges from nodes in a clique are moved to clique leader

Return type

kgx.graph.base_graph.BaseGraph

kgx.graph_operations.clique_merge.elect_leader(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, leader_annotation: str, prefix_prioritization_map: Optional[Dict[str, List[str]]], category_mapping: Optional[Dict[str, str]], strict: bool = True) → kgx.graph.base_graph.BaseGraph[source]

Elect leader for each clique in a graph.

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

  • prefix_prioritization_map (Optional[Dict[str, List[str]]]) – A map that gives a prefix priority for one or more categories

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

  • strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories

Returns

The updated target graph

Return type

kgx.graph.base_graph.BaseGraph

kgx.graph_operations.clique_merge.get_category_from_equivalence(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, node: str, attributes: Dict) → List[source]

Get category for a node based on its equivalent nodes in a graph.

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.MultiDiGraph) – The clique graph

  • node (str) – Node identifier

  • attributes (Dict) – Node’s attributes

Returns

Category for the node

Return type

List

kgx.graph_operations.clique_merge.get_clique_category(clique_graph: networkx.classes.multidigraph.MultiDiGraph, clique: List) → Tuple[str, List][source]

Given a clique, identify the category of the clique.

Parameters
  • clique_graph (nx.MultiDiGraph) – Clique graph

  • clique (List) – A list of nodes in clique

Returns

A tuple of clique category and its ancestors

Return type

Tuple[str, list]

kgx.graph_operations.clique_merge.get_leader_by_annotation(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, clique: List, leader_annotation: str) → Tuple[Optional[str], Optional[str]][source]

Get leader by searching for leader annotation property in any of the nodes in a given clique.

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.MultiDiGraph) – The clique graph

  • clique (List) – A list of nodes from a clique

  • leader_annotation (str) – The field on a node that signifies that the node is the leader of a clique

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.graph_operations.clique_merge.get_leader_by_prefix_priority(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, clique: List, prefix_priority_list: List) → Tuple[Optional[str], Optional[str]][source]

Get leader from clique based on a given prefix priority.

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.MultiDiGraph) – The clique graph

  • clique (List) – A list of nodes that correspond to a clique

  • prefix_priority_list (List) – A list of prefixes in descending priority

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.graph_operations.clique_merge.get_leader_by_sort(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, clique: List) → Tuple[Optional[str], Optional[str]][source]

Get leader from clique based on the first selection from an alphabetical sort of the node id prefixes.

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.MultiDiGraph) – The clique graph

  • clique (List) – A list of nodes that correspond to a clique

Returns

A tuple containing the node that has been elected as the leader and the election strategy

Return type

Tuple[Optional[str], Optional[str]]

kgx.graph_operations.clique_merge.sort_categories(categories: Union[List, Set, ordered_set.OrderedSet]) → List[source]

Sort a list of categories from most specific to the most generic.

Parameters

categories (Union[List, Set, OrderedSet]) – A list of categories

Returns

A sorted list of categories where sorted means that the first element in the list returned has the most number of parents in the class hierarchy.

Return type

List

kgx.graph_operations.clique_merge.update_node_categories(target_graph: kgx.graph.base_graph.BaseGraph, clique_graph: networkx.classes.multidigraph.MultiDiGraph, clique: List, category_mapping: Optional[Dict[str, str]], strict: bool = True) → List[source]

For a given clique, get category for each node in clique and validate against Biolink Model, mapping to Biolink Model category where needed.

For example, If a node has biolink:Gene as its category, then this method adds all of its ancestors.

Parameters
  • target_graph (kgx.graph.base_graph.BaseGraph) – The original graph

  • clique_graph (networkx.Graph) – The clique graph

  • clique (List) – A list of nodes from a clique

  • category_mapping (Optional[Dict[str, str]]) – Mapping for non-Biolink Model categories to Biolink Model categories

  • strict (bool) – Whether or not to merge nodes in a clique that have conflicting node categories

Returns

The clique

Return type

List