Validator¶
The Validator validates an instance of kgx.graph.base_graph.BaseGraph for Biolink Model compliance.
To validate a graph,
from kgx.validator import Validator
v = Validator()
v.validate(graph)
Streaming Data Processing Mode¶
For very large graphs, the Validator operation may now successfully process graph data equally well using data streaming (command flag --stream=True
) which significantly minimizes the memory footprint required to process such graphs.
Biolink Model Versioning¶
By default, the Validator validates against the latest Biolink Model release hosted by the current Biolink Model Toolkit; hwoever, one may override this default at the Validator class level using the Validator.set_biolink_model(version="#.#.#")
where #.#.# is the major.minor.patch semantic versioning of the desired Biolink Model release.
Every instance of Validator() persistently assumes the most recently set class level Biolink Model version. Resetting the class level Biolink Model does not change the version of previously instantiated Validator() objects. In a multi-threaded environment instantiating multiple validator objects, it may be necessary to wrap the Validator.set_biolink_model
and Validator()
object instantiation together within a single thread locked block.
Note that the kgx validate CLI operation also has an optional biolink_release
argument for the same purpose.
kgx.validator¶
-
class
kgx.validator.
ValidationError
(entity: str, error_type: kgx.validator.ErrorType, message: str, message_level: kgx.validator.MessageLevel)[source]¶ Bases:
object
ValidationError class that represents an error.
- Parameters
entity (str) – The node or edge entity that is failing validation
error_type (kgx.validator.ErrorType) – The nature of the error
message (str) – The error message
message_level (kgx.validator.MessageLevel) – The message level
-
class
kgx.validator.
Validator
(verbose: bool = False, progress_monitor: Optional[Callable[[kgx.utils.kgx_utils.GraphEntityType, List], None]] = None, schema: Optional[str] = None)[source]¶ Bases:
object
Class for validating a property graph.
The optional ‘progress_monitor’ for the validator should be a lightweight Callable which is injected into the class ‘inspector’ Callable, designed to intercepts node and edge records streaming through the Validator (inside a Transformer.process() call. The first (GraphEntityType) argument of the Callable tags the record as a NODE or an EDGE. The second argument given to the Callable is the current record itself. This Callable is strictly meant to be procedural and should not mutate the record. The intent of this Callable is to provide a hook to KGX applications wanting the namesake function of passively monitoring the graph data stream. As such, the Callable could simply tally up the number of times it is called with a NODE or an EDGE, then provide a suitable (quick!) report of that count back to the KGX application. The Callable (function/callable class) should not modify the record and should be of low complexity, so as not to introduce a large computational overhead to validation!
- Parameters
verbose (bool) – Whether the generated report should be verbose or not (default:
False
)progress_monitor (Optional[Callable[[GraphEntityType, List], None]]) – Function given a peek at the current record being processed by the class wrapped Callable.
schema (Optional[str]) – URL to (Biolink) Model Schema to be used for validated (default: None, use default Biolink Model Toolkit schema)
-
__call__
(entity_type: kgx.utils.kgx_utils.GraphEntityType, rec: List)[source]¶ Transformer ‘inspector’ Callable
-
static
get_all_prefixes
(jsonld: Optional[Dict] = None) → set[source]¶ Get all prefixes from Biolink Model JSON-LD context.
It also sets
self.prefixes
for subsequent access.- Parameters
jsonld (Optional[Dict]) – The JSON-LD context
- Returns
A set of prefixes
- Return type
Optional[Dict]
-
get_error_messages
()[source]¶ A direct Validator “instance” method version of report() that directly accesses the internal Validator self.errors list.
- Returns
A list of formatted error messages.
- Return type
List
-
static
get_required_edge_properties
(toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]¶ Get all properties for an edge that are required, as defined by Biolink Model.
- Parameters
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns
A list of required edge properties
- Return type
list
-
static
get_required_node_properties
(toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]¶ Get all properties for a node that are required, as defined by Biolink Model.
- Parameters
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns
A list of required node properties
- Return type
list
-
static
report
(errors: List[kgx.validator.ValidationError]) → List[source]¶ Prepare error report.
- Parameters
errors (List[ValidationError]) – List of kgx.validator.ValidationError
- Returns
A list of formatted errors
- Return type
List
-
validate
(graph: kgx.graph.base_graph.BaseGraph) → list[source]¶ Validate nodes and edges in a graph. TODO: Support strict mode
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph to validate
- Returns
A list of errors for a given graph
- Return type
list
-
static
validate_categories
(node: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]¶ Validate
category
field of a given node.- Parameters
node (str) – Node identifier
data (dict) – Node properties
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns
A list of errors for a given node
- Return type
list
-
static
validate_edge_predicate
(subject: str, object: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]¶ Validate
edge_predicate
field of a given edge.- Parameters
subject (str) – Subject identifier
object (str) – Object identifier
data (dict) – Edge properties
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns
A list of errors for a given edge
- Return type
list
-
static
validate_edge_properties
(subject: str, object: str, data: dict, required_properties: list) → list[source]¶ Checks if all the required edge properties exist for a given edge.
- Parameters
subject (str) – Subject identifier
object (str) – Object identifier
data (dict) – Edge properties
required_properties (list) – Required edge properties
- Returns
A list of errors for a given edge
- Return type
list
-
static
validate_edge_property_types
(subject: str, object: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]¶ Checks if edge properties have the expected value type.
- Parameters
subject (str) – Subject identifier
object (str) – Object identifier
data (dict) – Edge properties
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns
A list of errors for a given edge
- Return type
list
-
static
validate_edge_property_values
(subject: str, object: str, data: dict) → list[source]¶ Validate an edge property’s value.
- Parameters
subject (str) – Subject identifier
object (str) – Object identifier
data (dict) – Edge properties
- Returns
A list of errors for a given edge
- Return type
list
-
validate_edges
(graph: kgx.graph.base_graph.BaseGraph) → list[source]¶ Validate all the edges in a graph.
This method validates for the following, - Edge properties - Edge property type - Edge property value type - Edge predicate
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph to validate
- Returns
A list of errors for a given graph
- Return type
list
-
static
validate_node_properties
(node: str, data: dict, required_properties: list) → list[source]¶ Checks if all the required node properties exist for a given node.
- Parameters
node (str) – Node identifier
data (dict) – Node properties
required_properties (list) – Required node properties
- Returns
A list of errors for a given node
- Return type
list
-
static
validate_node_property_types
(node: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]¶ Checks if node properties have the expected value type.
- Parameters
node (str) – Node identifier
data (dict) – Node properties
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns
A list of errors for a given node
- Return type
list
-
static
validate_node_property_values
(node: str, data: dict) → list[source]¶ Validate a node property’s value.
- Parameters
node (str) – Node identifier
data (dict) – Node properties
- Returns
A list of errors for a given node
- Return type
list
-
validate_nodes
(graph: kgx.graph.base_graph.BaseGraph) → list[source]¶ Validate all the nodes in a graph.
This method validates for the following, - Node properties - Node property type - Node property value type - Node categories
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph to validate
- Returns
A list of errors for a given graph
- Return type
list