Utilities¶
The utilities module include all the utility methods used throughout KGX.
graph_utils¶
Utility methods for working with graphs.
-
kgx.utils.graph_utils.
curie_lookup
(curie: str) → Optional[str][source]¶ Given a CURIE, find its label.
This method first does a lookup in predefined maps. If none found, it makes use of CurieLookupService to look for the CURIE in a set of preloaded ontologies.
- Parameters
curie (str) – A CURIE
- Returns
The label corresponding to the given CURIE
- Return type
Optional[str]
-
kgx.utils.graph_utils.
get_ancestors
(graph: kgx.graph.base_graph.BaseGraph, node: str, relations: Optional[List[str]] = None) → List[str][source]¶ Return all ancestors of specified node, filtered by
relations
.- Parameters
graph (kgx.graph.base_graph.BaseGraph) – Graph to traverse
node (str) – node identifier
relations (List[str]) – list of relations
- Returns
A list of ancestor nodes
- Return type
List[str]
-
kgx.utils.graph_utils.
get_category_via_superclass
(graph: kgx.graph.base_graph.BaseGraph, curie: str, load_ontology: bool = True) → Set[str][source]¶ Get category for a given CURIE by tracing its superclass, via
subclass_of
hierarchy, and getting the most appropriate category based on the superclass.- Parameters
graph (kgx.graph.base_graph.BaseGraph) – Graph to traverse
curie (str) – Input CURIE
load_ontology (bool) – Determines whether to load ontology, based on CURIE prefix, or to simply rely on
subclass_of
hierarchy from graph
- Returns
A set containing one (or more) category for the given CURIE
- Return type
Set[str]
-
kgx.utils.graph_utils.
get_parents
(graph: kgx.graph.base_graph.BaseGraph, node: str, relations: Optional[List[str]] = None) → List[str][source]¶ Return all direct parents of a specified node, filtered by
relations
.- Parameters
graph (kgx.graph.base_graph.BaseGraph) – Graph to traverse
node (str) – node identifier
relations (List[str]) – list of relations
- Returns
A list of parent node(s)
- Return type
List[str]
-
kgx.utils.graph_utils.
remap_edge_property
(graph: kgx.graph.base_graph.BaseGraph, edge_predicate: str, old_property: str, new_property: str) → None[source]¶ Remap the value in an edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph
edge_predicate (string) – edge_predicate referring to edges whose property needs to be remapped
old_property (string) – Old property name whose value needs to be replaced
new_property (string) – New property name from which the value is pulled from
-
kgx.utils.graph_utils.
remap_node_identifier
(graph: kgx.graph.base_graph.BaseGraph, category: str, alternative_property: str, prefix=None) → kgx.graph.base_graph.BaseGraph[source]¶ Remap a node’s ‘id’ attribute with value from a node’s
alternative_property
attribute.- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph
category (string) – category referring to nodes whose ‘id’ needs to be remapped
alternative_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
alternative_property
is a list and theprefix
indicates which value to pick from the list
- Returns
The modified graph
- Return type
kgx.graph.base_graph.BaseGraph
-
kgx.utils.graph_utils.
remap_node_property
(graph: kgx.graph.base_graph.BaseGraph, category: str, old_property: str, new_property: str) → None[source]¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph
category (string) – Category referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
kgx_utils¶
Utility methods that are reused across the codebase.
-
kgx.utils.kgx_utils.
apply_edge_filters
(graph: kgx.graph.base_graph.BaseGraph, edge_filters: Dict[str, Union[str, Set]]) → None[source]¶ Apply filters to graph and remove edges that do not pass given filters.
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph
edge_filters (Dict[str, Union[str, Set]]) – Edge filters
-
kgx.utils.kgx_utils.
apply_filters
(graph: kgx.graph.base_graph.BaseGraph, node_filters: Dict[str, Union[str, Set]], edge_filters: Dict[str, Union[str, Set]]) → None[source]¶ Apply filters to graph and remove nodes and edges that do not pass given filters.
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph
node_filters (Dict[str, Union[str, Set]]) – Node filters
edge_filters (Dict[str, Union[str, Set]]) – Edge filters
-
kgx.utils.kgx_utils.
apply_node_filters
(graph: kgx.graph.base_graph.BaseGraph, node_filters: Dict[str, Union[str, Set]]) → None[source]¶ Apply filters to graph and remove nodes that do not pass given filters.
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph
node_filters (Dict[str, Union[str, Set]]) – Node filters
-
kgx.utils.kgx_utils.
camelcase_to_sentencecase
(s: str) → str[source]¶ Convert CamelCase to sentence case.
- Parameters
s (str) – Input string in CamelCase
- Returns
string in sentence case form
- Return type
str
-
kgx.utils.kgx_utils.
contract
(uri: str, prefix_maps: Optional[List[Dict]] = None, fallback: bool = True) → str[source]¶ Contract a given URI to a CURIE, based on mappings from prefix_maps. If no prefix map is provided then will use defaults from prefixcommons-py.
This method will return the URI as the CURIE if there is no mapping found.
- Parameters
uri (str) – A URI
prefix_maps (Optional[List[Dict]]) – A list of prefix maps to use for mapping
fallback (bool) – Determines whether to fallback to default prefix mappings, as determined by prefixcommons.curie_util, when URI prefix is not found in prefix_maps.
- Returns
A CURIE corresponding to the URI
- Return type
str
-
kgx.utils.kgx_utils.
current_time_in_millis
()[source]¶ Get current time in milliseconds.
- Returns
Time in milliseconds
- Return type
int
-
kgx.utils.kgx_utils.
expand
(curie: str, prefix_maps: Optional[List[dict]] = None, fallback: bool = True) → str[source]¶ Expand a given CURIE to an URI, based on mappings from prefix_map.
This method will return the CURIE as the IRI if there is no mapping found.
- Parameters
curie (str) – A CURIE
prefix_maps (Optional[List[dict]]) – A list of prefix maps to use for mapping
fallback (bool) – Determines whether to fallback to default prefix mappings, as determined by prefixcommons.curie_util, when CURIE prefix is not found in prefix_maps.
- Returns
A URI corresponding to the CURIE
- Return type
str
-
kgx.utils.kgx_utils.
format_biolink_category
(s: str) → str[source]¶ Convert a sentence case Biolink category name to a proper Biolink CURIE with the category itself in CamelCase form.
- Parameters
s (str) – Input string in sentence case
- Returns
a proper Biolink CURIE
- Return type
str
-
kgx.utils.kgx_utils.
generate_edge_identifiers
(graph: kgx.graph.base_graph.BaseGraph)[source]¶ Generate unique identifiers for edges in a graph that do not have an
id
field.- Parameters
graph (kgx.graph.base_graph.BaseGraph) –
-
kgx.utils.kgx_utils.
generate_edge_key
(s: str, edge_predicate: str, o: str) → str[source]¶ Generates an edge key based on a given subject, predicate, and object.
- Parameters
s (str) – Subject
edge_predicate (str) – Edge label
o (str) – Object
- Returns
Edge key as a string
- Return type
str
-
kgx.utils.kgx_utils.
get_biolink_ancestors
(name: str)[source]¶ Get ancestors for a given Biolink class.
- Parameters
name (str) –
- Returns
A list of ancestors
- Return type
List
-
kgx.utils.kgx_utils.
get_biolink_element
(name) → Optional[biolinkml.meta.Element][source]¶ Get Biolink element for a given name, where name can be a class, slot, or relation.
- Parameters
name (str) – The name
- Returns
An instance of biolinkml.meta.Element
- Return type
Optional[biolinkml.meta.Element]
-
kgx.utils.kgx_utils.
get_biolink_property_types
() → Dict[source]¶ Get all Biolink property types. This includes both node and edges properties.
- Returns
A dict containing all Biolink property and their types
- Return type
Dict
-
kgx.utils.kgx_utils.
get_cache
(maxsize=10000)[source]¶ Get an instance of cachetools.cache
- Parameters
maxsize (int) – The max size for the cache (
10000
, by default)- Returns
An instance of cachetools.cache
- Return type
cachetools.cache
-
kgx.utils.kgx_utils.
get_curie_lookup_service
()[source]¶ Get an instance of kgx.curie_lookup_service.CurieLookupService
- Returns
An instance of
CurieLookupService
- Return type
kgx.curie_lookup_service.CurieLookupService
-
kgx.utils.kgx_utils.
get_prefix_prioritization_map
() → Dict[str, List][source]¶ Get prefix prioritization map as defined in Biolink Model.
- Returns
- Return type
Dict[str, List]
-
kgx.utils.kgx_utils.
get_toolkit
(schema: Optional[str] = None) → bmt.toolkit.Toolkit[source]¶ Get an instance of bmt.Toolkit If there no instance defined, then one is instantiated and returned.
-
kgx.utils.kgx_utils.
get_type_for_property
(p: str) → str[source]¶ Get type for a property.
TODO: Move this to biolink-model-toolkit
- Parameters
p (str) –
- Returns
The type for a given property
- Return type
str
-
kgx.utils.kgx_utils.
prepare_data_dict
(d1: Dict, d2: Dict, preserve: bool = True) → Dict[source]¶ Given two dict objects, make a new dict object that is the intersection of the two.
If a key is known to be multivalued then it’s value is converted to a list. If a key is already multivalued then it is updated with new values. If a key is single valued, and a new unique value is found then the existing value is converted to a list and the new value is appended to this list.
- Parameters
d1 (Dict) – Dict object
d2 (Dict) – Dict object
preserve (bool) – Whether or not to preserve values for conflicting keys
- Returns
The intersection of d1 and d2
- Return type
Dict
-
kgx.utils.kgx_utils.
sentencecase_to_camelcase
(s: str) → str[source]¶ Convert sentence case to CamelCase.
- Parameters
s (str) – Input string in sentence case
- Returns
string in CamelCase form
- Return type
str
rdf_utils¶
Utility methods that are used for handling RDF.
-
kgx.utils.rdf_utils.
infer_category
(iri: rdflib.term.URIRef, rdfgraph: rdflib.graph.Graph) → Optional[List][source]¶ Infer category for a given iri by traversing rdfgraph.
- Parameters
iri (rdflib.term.URIRef) – IRI
rdfgraph (rdflib.Graph) – A graph to traverse
- Returns
A list of category corresponding to the given IRI
- Return type
Optional[List]
cli_utils¶
Utility methods that are used in KGX command line.
-
kgx.cli.cli_utils.
apply_filters
(transformer: kgx.transformers.transformer.Transformer, node_filters: Optional[Dict], edge_filters: Optional[Dict]) → kgx.transformers.transformer.Transformer[source]¶ Apply filters to the given transformer.
- Parameters
transformer (kgx.Transformer) – The transformer corresponding to the source
node_filters (Optional[Dict]) – Node filters
edge_filters (Optional[Dict]) – Edge filters
- Returns
transformer – The transformer with filters applied
- Return type
kgx.Transformer
-
kgx.cli.cli_utils.
apply_operations
(source: dict, graph: kgx.graph.base_graph.BaseGraph) → kgx.graph.base_graph.BaseGraph[source]¶ Apply operations as defined in the YAML.
- Parameters
source (dict) – The source from the YAML
graph (kgx.graph.base_graph.BaseGraph) – The graph corresponding to the source
- Returns
The graph corresponding to the source
- Return type
kgx.graph.base_graph.BaseGraph
-
kgx.cli.cli_utils.
get_file_types
() → Tuple[source]¶ Get all file formats supported by KGX.
- Returns
A tuple of supported file formats
- Return type
Tuple
-
kgx.cli.cli_utils.
get_transformer
(file_format: str) → Any[source]¶ Get a Transformer corresponding to a given file format.
Note
This method returns a reference to kgx.Transformer class and not an instance of kgx.Transformer class. You will have to instantiate the class by calling its constructor.
- Parameters
file_format (str) – File format
- Returns
Reference to kgx.Transformer class corresponding to
file_format
- Return type
Any
-
kgx.cli.cli_utils.
graph_summary
(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str], node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None) → Dict[source]¶ Loads and summarizes a knowledge graph from a set of input files.
- Parameters
inputs (List[str]) – Input file
input_format (str) – Input file format
input_compression (Optional[str]) – The input compression type
output (Optional[str]) – Where to write the output (stdout, by default)
node_facet_properties (Optional[List]) – A list of node properties from which to generate counts per value for those properties. For example,
['provided_by']
edge_facet_properties (Optional[List]) – A list of edge properties from which to generate counts per value for those properties. For example,
['provided_by']
- Returns
A dictionary with the graph stats
- Return type
Dict
-
kgx.cli.cli_utils.
merge
(merge_config: str, source: Optional[List] = None, destination: Optional[List] = None, processes: int = 1) → kgx.graph.base_graph.BaseGraph[source]¶ Load nodes and edges from files and KGs, as defined in a config YAML, and merge them into a single graph. The merged graph can then be written to a local/remote Neo4j instance OR be serialized into a file.
- Parameters
merge_config (str) – Merge config YAML
source (Optional[List]) – A list of source to load from the YAML
destination (Optional[List]) – A list of destination to write to, as defined in the YAML
processes (int) – Number of processes to use
- Returns
The merged graph
- Return type
kgx.graph.base_graph.BaseGraph
-
kgx.cli.cli_utils.
neo4j_download
(uri: str, username: str, password: str, output: str, output_format: str, output_compression: Optional[str], node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None) → kgx.transformers.transformer.Transformer[source]¶ Download nodes and edges from Neo4j database.
- Parameters
uri (str) – Neo4j URI. For example, https://localhost:7474
username (str) – Username for authentication
password (str) – Password for authentication
output (str) – Where to write the output (stdout, by default)
output_format (Optional[str]) – The output type (
tsv
, by default)output_compression (Optional[str]) – The output compression type
node_filters (Optional[Tuple]) – Node filters
edge_filters (Optional[Tuple]) – Edge filters
- Returns
The NeoTransformer
- Return type
kgx.Transformer
-
kgx.cli.cli_utils.
neo4j_upload
(inputs: List[str], input_format: str, input_compression: Optional[str], uri: str, username: str, password: str, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None) → kgx.transformers.transformer.Transformer[source]¶ Upload a set of nodes/edges to a Neo4j database.
- Parameters
inputs (List[str]) – A list of files that contains nodes/edges
input_format (str) – The input format
input_compression (Optional[str]) – The input compression type
uri (str) – The full HTTP address for Neo4j database
username (str) – Username for authentication
password (str) – Password for authentication
node_filters (Optional[Tuple]) – Node filters
edge_filters (Optional[Tuple]) – Edge filters
- Returns
The NeoTransformer
- Return type
kgx.Transformer
-
kgx.cli.cli_utils.
parse_source
(key: str, source: dict, output_directory: str, curie_map: Optional[Dict[str, str]] = None, node_properties: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, checkpoint: bool = False)[source]¶ Parse a source from a merge config YAML.
- Parameters
key (str) – Source key
source (Dict) – Source configuration
output_directory (str) – Location to write output to
curie_map (Dict[str, str]) – Non-canonical CURIE mappings
node_properties (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)
predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)
checkpoint (bool) – Whether to serialize each individual source to a TSV
- Returns
Returns an instance of BaseGraph corresponding to the source
- Return type
kgx.graph.base_graph.BaseGraph
-
kgx.cli.cli_utils.
parse_source_input
(key: Optional[str], source: Dict, output_directory: Optional[str], curie_map: Optional[Dict[str, str]] = None, node_properties: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, property_types=None, checkpoint: bool = False) → kgx.transformers.transformer.Transformer[source]¶ Parse a source’s input from a transform config YAML.
- Parameters
key (Optional[str]) – Source key
source (Dict) – Source configuration
output_directory (Optional[str]) – Location to write output to
curie_map (Dict[str, str]) – Non-canonical CURIE mappings
node_properties (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)
predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)
property_types (Dict[str, str]) – The xml property type for properties that are other than
xsd:string
. Relevant for RDF export.checkpoint (bool) – Whether to serialize each individual source to a TSV
- Returns
An instance of kgx.Transformer corresponding to the source format
- Return type
kgx.Transformer
-
kgx.cli.cli_utils.
transform
(inputs: Optional[List[str]], input_format: Optional[str] = None, input_compression: Optional[str] = None, output: Optional[str] = None, output_format: Optional[str] = None, output_compression: Optional[str] = None, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None, transform_config: Optional[str] = None, source: Optional[List] = None, destination: Optional[List] = None, processes: int = 1) → None[source]¶ Transform a Knowledge Graph from one serialization form to another.
- Parameters
inputs (Optional[List[str]]) – A list of files that contains nodes/edges
input_format (Optional[str]) – The input format
input_compression (Optional[str]) – The input compression type
output (Optional[str]) – The output file
output_format (Optional[str]) – The output format
output_compression (Optional[str]) – The output compression type
node_filters (Optional[Tuple]) – Node filters
edge_filters (Optional[Tuple]) – Edge filters
transform_config (Optional[str]) – The transform config YAML
source (Optional[List]) – A list of source to load from the YAML
destination (Optional[List]) – A list of destination to write to, as defined in the YAML
processes (int) – Number of processes to use
-
kgx.cli.cli_utils.
transform_source
(key: str, source: Dict, output_directory: Optional[str], curie_map: Optional[Dict[str, str]] = None, node_properties: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, property_types=None, checkpoint: bool = False, preserve_graph: bool = True) → kgx.graph.base_graph.BaseGraph[source]¶ Transform a source from a transform config YAML.
- Parameters
key (str) – Source key
source (Dict) – Source configuration
output_directory (Optional[str]) – Location to write output to
curie_map (Dict[str, str]) – Non-canonical CURIE mappings
node_properties (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)
predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)
property_types (Dict[str, str]) – The xml property type for properties that are other than
xsd:string
. Relevant for RDF export.checkpoint (bool) – Whether to serialize each individual source to a TSV
preserve_graph (true) – Whether or not to preserve the graph corresponding to the source
- Returns
Returns an instance of BaseGraph corresponding to the source
- Return type
kgx.graph.base_graph.BaseGraph
-
kgx.cli.cli_utils.
validate
(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str]) → List[source]¶ Run KGX validator on an input file to check for Biolink Model compliance.
- Parameters
inputs (List[str]) – Input files
input_format (str) – The input format
input_compression (Optional[str]) – The input compression type
output (Optional[str]) – Path to output file (stdout, by default)
- Returns
Returns a list of errors, if any
- Return type
List