Utilities

The utilities module include all the utility methods used throughout KGX.

graph_utils

Utility methods for working with graphs.

kgx.utils.graph_utils.curie_lookup(curie: str)Optional[str][source]

Given a CURIE, find its label.

This method first does a lookup in predefined maps. If none found, it makes use of CurieLookupService to look for the CURIE in a set of preloaded ontologies.

Parameters

curie (str) – A CURIE

Returns

The label corresponding to the given CURIE

Return type

Optional[str]

kgx.utils.graph_utils.get_ancestors(graph: kgx.graph.base_graph.BaseGraph, node: str, relations: Optional[List[str]] = None)List[str][source]

Return all ancestors of specified node, filtered by relations.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – Graph to traverse

  • node (str) – node identifier

  • relations (List[str]) – list of relations

Returns

A list of ancestor nodes

Return type

List[str]

kgx.utils.graph_utils.get_category_via_superclass(graph: kgx.graph.base_graph.BaseGraph, curie: str, load_ontology: bool = True)Set[str][source]

Get category for a given CURIE by tracing its superclass, via subclass_of hierarchy, and getting the most appropriate category based on the superclass.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – Graph to traverse

  • curie (str) – Input CURIE

  • load_ontology (bool) – Determines whether to load ontology, based on CURIE prefix, or to simply rely on subclass_of hierarchy from graph

Returns

A set containing one (or more) category for the given CURIE

Return type

Set[str]

kgx.utils.graph_utils.get_parents(graph: kgx.graph.base_graph.BaseGraph, node: str, relations: Optional[List[str]] = None)List[str][source]

Return all direct parents of a specified node, filtered by relations.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – Graph to traverse

  • node (str) – node identifier

  • relations (List[str]) – list of relations

Returns

A list of parent node(s)

Return type

List[str]

kgx.utils.graph_utils.remap_edge_property(graph: kgx.graph.base_graph.BaseGraph, edge_predicate: str, old_property: str, new_property: str)None[source]

Remap the value in an edge old_property attribute with value from edge new_property attribute.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – The graph

  • edge_predicate (string) – edge_predicate referring to edges whose property needs to be remapped

  • old_property (string) – Old property name whose value needs to be replaced

  • new_property (string) – New property name from which the value is pulled from

kgx.utils.graph_utils.remap_node_identifier(graph: kgx.graph.base_graph.BaseGraph, category: str, alternative_property: str, prefix=None)kgx.graph.base_graph.BaseGraph[source]

Remap a node’s ‘id’ attribute with value from a node’s alternative_property attribute.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – The graph

  • category (string) – category referring to nodes whose ‘id’ needs to be remapped

  • alternative_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for alternative_property is a list and the prefix indicates which value to pick from the list

Returns

The modified graph

Return type

kgx.graph.base_graph.BaseGraph

kgx.utils.graph_utils.remap_node_property(graph: kgx.graph.base_graph.BaseGraph, category: str, old_property: str, new_property: str)None[source]

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – The graph

  • category (string) – Category referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

kgx_utils

Utility methods that are reused across the codebase.

kgx.utils.kgx_utils.apply_edge_filters(graph: kgx.graph.base_graph.BaseGraph, edge_filters: Dict[str, Union[str, Set]])None[source]

Apply filters to graph and remove edges that do not pass given filters.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – The graph

  • edge_filters (Dict[str, Union[str, Set]]) – Edge filters

kgx.utils.kgx_utils.apply_filters(graph: kgx.graph.base_graph.BaseGraph, node_filters: Dict[str, Union[str, Set]], edge_filters: Dict[str, Union[str, Set]])None[source]

Apply filters to graph and remove nodes and edges that do not pass given filters.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – The graph

  • node_filters (Dict[str, Union[str, Set]]) – Node filters

  • edge_filters (Dict[str, Union[str, Set]]) – Edge filters

kgx.utils.kgx_utils.apply_node_filters(graph: kgx.graph.base_graph.BaseGraph, node_filters: Dict[str, Union[str, Set]])None[source]

Apply filters to graph and remove nodes that do not pass given filters.

Parameters
  • graph (kgx.graph.base_graph.BaseGraph) – The graph

  • node_filters (Dict[str, Union[str, Set]]) – Node filters

kgx.utils.kgx_utils.camelcase_to_sentencecase(s: str)str[source]

Convert CamelCase to sentence case.

Parameters

s (str) – Input string in CamelCase

Returns

string in sentence case form

Return type

str

kgx.utils.kgx_utils.contract(uri: str, prefix_maps: Optional[List[Dict]] = None, fallback: bool = True)str[source]

Contract a given URI to a CURIE, based on mappings from prefix_maps. If no prefix map is provided then will use defaults from prefixcommons-py.

This method will return the URI as the CURIE if there is no mapping found.

Parameters
  • uri (str) – A URI

  • prefix_maps (Optional[List[Dict]]) – A list of prefix maps to use for mapping

  • fallback (bool) – Determines whether to fallback to default prefix mappings, as determined by prefixcommons.curie_util, when URI prefix is not found in prefix_maps.

Returns

A CURIE corresponding to the URI

Return type

str

kgx.utils.kgx_utils.current_time_in_millis()[source]

Get current time in milliseconds.

Returns

Time in milliseconds

Return type

int

kgx.utils.kgx_utils.expand(curie: str, prefix_maps: Optional[List[dict]] = None, fallback: bool = True)str[source]

Expand a given CURIE to an URI, based on mappings from prefix_map.

This method will return the CURIE as the IRI if there is no mapping found.

Parameters
  • curie (str) – A CURIE

  • prefix_maps (Optional[List[dict]]) – A list of prefix maps to use for mapping

  • fallback (bool) – Determines whether to fallback to default prefix mappings, as determined by prefixcommons.curie_util, when CURIE prefix is not found in prefix_maps.

Returns

A URI corresponding to the CURIE

Return type

str

Convert a sentence case Biolink category name to a proper Biolink CURIE with the category itself in CamelCase form.

Parameters

s (str) – Input string in sentence case

Returns

a proper Biolink CURIE

Return type

str

kgx.utils.kgx_utils.generate_edge_identifiers(graph: kgx.graph.base_graph.BaseGraph)[source]

Generate unique identifiers for edges in a graph that do not have an id field.

Parameters

graph (kgx.graph.base_graph.BaseGraph) –

kgx.utils.kgx_utils.generate_edge_key(s: str, edge_predicate: str, o: str)str[source]

Generates an edge key based on a given subject, predicate, and object.

Parameters
  • s (str) – Subject

  • edge_predicate (str) – Edge label

  • o (str) – Object

Returns

Edge key as a string

Return type

str

kgx.utils.kgx_utils.generate_uuid()[source]

Generates a UUID.

Returns

A UUID

Return type

str

Get ancestors for a given Biolink class.

Parameters

name (str) –

Returns

A list of ancestors

Return type

List

Get Biolink element for a given name, where name can be a class, slot, or relation.

Parameters

name (str) – The name

Returns

An instance of biolinkml.meta.Element

Return type

Optional[biolinkml.meta.Element]

Get all Biolink property types. This includes both node and edges properties.

Returns

A dict containing all Biolink property and their types

Return type

Dict

kgx.utils.kgx_utils.get_cache(maxsize=10000)[source]

Get an instance of cachetools.cache

Parameters

maxsize (int) – The max size for the cache (10000, by default)

Returns

An instance of cachetools.cache

Return type

cachetools.cache

kgx.utils.kgx_utils.get_curie_lookup_service()[source]

Get an instance of kgx.curie_lookup_service.CurieLookupService

Returns

An instance of CurieLookupService

Return type

kgx.curie_lookup_service.CurieLookupService

kgx.utils.kgx_utils.get_prefix_prioritization_map()Dict[str, List][source]

Get prefix prioritization map as defined in Biolink Model.

Returns

Return type

Dict[str, List]

kgx.utils.kgx_utils.get_toolkit(schema: Optional[str] = None)bmt.toolkit.Toolkit[source]

Get an instance of bmt.Toolkit If there no instance defined, then one is instantiated and returned.

kgx.utils.kgx_utils.get_type_for_property(p: str)str[source]

Get type for a property.

TODO: Move this to biolink-model-toolkit

Parameters

p (str) –

Returns

The type for a given property

Return type

str

kgx.utils.kgx_utils.prepare_data_dict(d1: Dict, d2: Dict, preserve: bool = True)Dict[source]

Given two dict objects, make a new dict object that is the intersection of the two.

If a key is known to be multivalued then it’s value is converted to a list. If a key is already multivalued then it is updated with new values. If a key is single valued, and a new unique value is found then the existing value is converted to a list and the new value is appended to this list.

Parameters
  • d1 (Dict) – Dict object

  • d2 (Dict) – Dict object

  • preserve (bool) – Whether or not to preserve values for conflicting keys

Returns

The intersection of d1 and d2

Return type

Dict

kgx.utils.kgx_utils.sentencecase_to_camelcase(s: str)str[source]

Convert sentence case to CamelCase.

Parameters

s (str) – Input string in sentence case

Returns

string in CamelCase form

Return type

str

kgx.utils.kgx_utils.sentencecase_to_snakecase(s: str)str[source]

Convert sentence case to snake_case.

Parameters

s (str) – Input string in sentence case

Returns

string in snake_case form

Return type

str

kgx.utils.kgx_utils.snakecase_to_sentencecase(s: str)str[source]

Convert snake_case to sentence case.

Parameters

s (str) – Input string in snake_case

Returns

string in sentence case form

Return type

str

rdf_utils

Utility methods that are used for handling RDF.

kgx.utils.rdf_utils.infer_category(iri: rdflib.term.URIRef, rdfgraph: rdflib.graph.Graph)Optional[List][source]

Infer category for a given iri by traversing rdfgraph.

Parameters
  • iri (rdflib.term.URIRef) – IRI

  • rdfgraph (rdflib.Graph) – A graph to traverse

Returns

A list of category corresponding to the given IRI

Return type

Optional[List]

cli_utils

Utility methods that are used in KGX command line.

kgx.cli.cli_utils.apply_filters(transformer: kgx.transformers.transformer.Transformer, node_filters: Optional[Dict], edge_filters: Optional[Dict])kgx.transformers.transformer.Transformer[source]

Apply filters to the given transformer.

Parameters
  • transformer (kgx.Transformer) – The transformer corresponding to the source

  • node_filters (Optional[Dict]) – Node filters

  • edge_filters (Optional[Dict]) – Edge filters

Returns

transformer – The transformer with filters applied

Return type

kgx.Transformer

kgx.cli.cli_utils.apply_operations(source: dict, graph: kgx.graph.base_graph.BaseGraph)kgx.graph.base_graph.BaseGraph[source]

Apply operations as defined in the YAML.

Parameters
  • source (dict) – The source from the YAML

  • graph (kgx.graph.base_graph.BaseGraph) – The graph corresponding to the source

Returns

The graph corresponding to the source

Return type

kgx.graph.base_graph.BaseGraph

kgx.cli.cli_utils.get_file_types()Tuple[source]

Get all file formats supported by KGX.

Returns

A tuple of supported file formats

Return type

Tuple

kgx.cli.cli_utils.get_transformer(file_format: str)Any[source]

Get a Transformer corresponding to a given file format.

Note

This method returns a reference to kgx.Transformer class and not an instance of kgx.Transformer class. You will have to instantiate the class by calling its constructor.

Parameters

file_format (str) – File format

Returns

Reference to kgx.Transformer class corresponding to file_format

Return type

Any

kgx.cli.cli_utils.graph_summary(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str], node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None)Dict[source]

Loads and summarizes a knowledge graph from a set of input files.

Parameters
  • inputs (List[str]) – Input file

  • input_format (str) – Input file format

  • input_compression (Optional[str]) – The input compression type

  • output (Optional[str]) – Where to write the output (stdout, by default)

  • node_facet_properties (Optional[List]) – A list of node properties from which to generate counts per value for those properties. For example, ['provided_by']

  • edge_facet_properties (Optional[List]) – A list of edge properties from which to generate counts per value for those properties. For example, ['provided_by']

Returns

A dictionary with the graph stats

Return type

Dict

kgx.cli.cli_utils.merge(merge_config: str, source: Optional[List] = None, destination: Optional[List] = None, processes: int = 1)kgx.graph.base_graph.BaseGraph[source]

Load nodes and edges from files and KGs, as defined in a config YAML, and merge them into a single graph. The merged graph can then be written to a local/remote Neo4j instance OR be serialized into a file.

Parameters
  • merge_config (str) – Merge config YAML

  • source (Optional[List]) – A list of source to load from the YAML

  • destination (Optional[List]) – A list of destination to write to, as defined in the YAML

  • processes (int) – Number of processes to use

Returns

The merged graph

Return type

kgx.graph.base_graph.BaseGraph

kgx.cli.cli_utils.neo4j_download(uri: str, username: str, password: str, output: str, output_format: str, output_compression: Optional[str], node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None)kgx.transformers.transformer.Transformer[source]

Download nodes and edges from Neo4j database.

Parameters
  • uri (str) – Neo4j URI. For example, https://localhost:7474

  • username (str) – Username for authentication

  • password (str) – Password for authentication

  • output (str) – Where to write the output (stdout, by default)

  • output_format (Optional[str]) – The output type (tsv, by default)

  • output_compression (Optional[str]) – The output compression type

  • node_filters (Optional[Tuple]) – Node filters

  • edge_filters (Optional[Tuple]) – Edge filters

Returns

The NeoTransformer

Return type

kgx.Transformer

kgx.cli.cli_utils.neo4j_upload(inputs: List[str], input_format: str, input_compression: Optional[str], uri: str, username: str, password: str, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None)kgx.transformers.transformer.Transformer[source]

Upload a set of nodes/edges to a Neo4j database.

Parameters
  • inputs (List[str]) – A list of files that contains nodes/edges

  • input_format (str) – The input format

  • input_compression (Optional[str]) – The input compression type

  • uri (str) – The full HTTP address for Neo4j database

  • username (str) – Username for authentication

  • password (str) – Password for authentication

  • node_filters (Optional[Tuple]) – Node filters

  • edge_filters (Optional[Tuple]) – Edge filters

Returns

The NeoTransformer

Return type

kgx.Transformer

kgx.cli.cli_utils.parse_source(key: str, source: dict, output_directory: str, curie_map: Optional[Dict[str, str]] = None, node_properties: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, checkpoint: bool = False)[source]

Parse a source from a merge config YAML.

Parameters
  • key (str) – Source key

  • source (Dict) – Source configuration

  • output_directory (str) – Location to write output to

  • curie_map (Dict[str, str]) – Non-canonical CURIE mappings

  • node_properties (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)

  • predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)

  • checkpoint (bool) – Whether to serialize each individual source to a TSV

Returns

Returns an instance of BaseGraph corresponding to the source

Return type

kgx.graph.base_graph.BaseGraph

kgx.cli.cli_utils.parse_source_input(key: Optional[str], source: Dict, output_directory: Optional[str], curie_map: Optional[Dict[str, str]] = None, node_properties: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, property_types=None, checkpoint: bool = False)kgx.transformers.transformer.Transformer[source]

Parse a source’s input from a transform config YAML.

Parameters
  • key (Optional[str]) – Source key

  • source (Dict) – Source configuration

  • output_directory (Optional[str]) – Location to write output to

  • curie_map (Dict[str, str]) – Non-canonical CURIE mappings

  • node_properties (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)

  • predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)

  • property_types (Dict[str, str]) – The xml property type for properties that are other than xsd:string. Relevant for RDF export.

  • checkpoint (bool) – Whether to serialize each individual source to a TSV

Returns

An instance of kgx.Transformer corresponding to the source format

Return type

kgx.Transformer

kgx.cli.cli_utils.transform(inputs: Optional[List[str]], input_format: Optional[str] = None, input_compression: Optional[str] = None, output: Optional[str] = None, output_format: Optional[str] = None, output_compression: Optional[str] = None, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None, transform_config: Optional[str] = None, source: Optional[List] = None, destination: Optional[List] = None, processes: int = 1)None[source]

Transform a Knowledge Graph from one serialization form to another.

Parameters
  • inputs (Optional[List[str]]) – A list of files that contains nodes/edges

  • input_format (Optional[str]) – The input format

  • input_compression (Optional[str]) – The input compression type

  • output (Optional[str]) – The output file

  • output_format (Optional[str]) – The output format

  • output_compression (Optional[str]) – The output compression type

  • node_filters (Optional[Tuple]) – Node filters

  • edge_filters (Optional[Tuple]) – Edge filters

  • transform_config (Optional[str]) – The transform config YAML

  • source (Optional[List]) – A list of source to load from the YAML

  • destination (Optional[List]) – A list of destination to write to, as defined in the YAML

  • processes (int) – Number of processes to use

kgx.cli.cli_utils.transform_source(key: str, source: Dict, output_directory: Optional[str], curie_map: Optional[Dict[str, str]] = None, node_properties: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, property_types=None, checkpoint: bool = False, preserve_graph: bool = True)kgx.graph.base_graph.BaseGraph[source]

Transform a source from a transform config YAML.

Parameters
  • key (str) – Source key

  • source (Dict) – Source configuration

  • output_directory (Optional[str]) – Location to write output to

  • curie_map (Dict[str, str]) – Non-canonical CURIE mappings

  • node_properties (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)

  • predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)

  • property_types (Dict[str, str]) – The xml property type for properties that are other than xsd:string. Relevant for RDF export.

  • checkpoint (bool) – Whether to serialize each individual source to a TSV

  • preserve_graph (true) – Whether or not to preserve the graph corresponding to the source

Returns

Returns an instance of BaseGraph corresponding to the source

Return type

kgx.graph.base_graph.BaseGraph

kgx.cli.cli_utils.validate(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str])List[source]

Run KGX validator on an input file to check for Biolink Model compliance.

Parameters
  • inputs (List[str]) – Input files

  • input_format (str) – The input format

  • input_compression (Optional[str]) – The input compression type

  • output (Optional[str]) – Path to output file (stdout, by default)

Returns

Returns a list of errors, if any

Return type

List