Validator

The Validator validates an instance of kgx.graph.base_graph.BaseGraph for Biolink Model compliance.

To validate a graph,

from kgx.validator import Validator
v = Validator()
v.validate(graph)

Streaming Data Processing Mode

For very large graphs, the Validator operation may now successfully process graph data equally well using data streaming (command flag --stream=True) which significantly minimizes the memory footprint required to process such graphs.

kgx.validator

class kgx.validator.ErrorType[source]

Bases: enum.Enum

Validation error types

class kgx.validator.MessageLevel[source]

Bases: enum.Enum

Message level for validation reports

class kgx.validator.ValidationError(entity: str, error_type: kgx.validator.ErrorType, message: str, message_level: kgx.validator.MessageLevel)[source]

Bases: object

ValidationError class that represents an error.

Parameters
class kgx.validator.Validator(verbose: bool = False, progress_monitor: Optional[Callable[[kgx.utils.kgx_utils.GraphEntityType, List], None]] = None, schema: Optional[str] = None)[source]

Bases: object

Class for validating a property graph.

The optional ‘progress_monitor’ for the validator should be a lightweight Callable which is injected into the class ‘inspector’ Callable, designed to intercepts node and edge records streaming through the Validator (inside a Transformer.process() call. The first (GraphEntityType) argument of the Callable tags the record as a NODE or an EDGE. The second argument given to the Callable is the current record itself. This Callable is strictly meant to be procedural and should not mutate the record. The intent of this Callable is to provide a hook to KGX applications wanting the namesake function of passively monitoring the graph data stream. As such, the Callable could simply tally up the number of times it is called with a NODE or an EDGE, then provide a suitable (quick!) report of that count back to the KGX application. The Callable (function/callable class) should not modify the record and should be of low complexity, so as not to introduce a large computational overhead to validation!

Parameters
  • verbose (bool) – Whether the generated report should be verbose or not (default: False)

  • progress_monitor (Optional[Callable[[GraphEntityType, List], None]]) – Function given a peek at the current record being processed by the class wrapped Callable.

  • schema (Optional[str]) – URL to (Biolink) Model Schema to be used for validated (default: None, use default Biolink Model Toolkit schema)

__call__(entity_type: kgx.utils.kgx_utils.GraphEntityType, rec: List)[source]

Transformer ‘inspector’ Callable

static get_all_prefixes(jsonld: Optional[Dict] = None) → set[source]

Get all prefixes from Biolink Model JSON-LD context.

It also sets self.prefixes for subsequent access.

Parameters

jsonld (Optional[Dict]) – The JSON-LD context

Returns

A set of prefixes

Return type

Optional[Dict]

get_error_messages()[source]

A direct Validator “instance” method version of report() that directly accesses the internal Validator self.errors list.

Returns

A list of formatted error messages.

Return type

List

static get_required_edge_properties(toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]

Get all properties for an edge that are required, as defined by Biolink Model.

Parameters

toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns

A list of required edge properties

Return type

list

static get_required_node_properties(toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]

Get all properties for a node that are required, as defined by Biolink Model.

Parameters

toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns

A list of required node properties

Return type

list

static report(errors: List[kgx.validator.ValidationError]) → List[source]

Prepare error report.

Parameters

errors (List[ValidationError]) – List of kgx.validator.ValidationError

Returns

A list of formatted errors

Return type

List

validate(graph: kgx.graph.base_graph.BaseGraph) → list[source]

Validate nodes and edges in a graph. TODO: Support strict mode

Parameters

graph (kgx.graph.base_graph.BaseGraph) – The graph to validate

Returns

A list of errors for a given graph

Return type

list

static validate_categories(node: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]

Validate category field of a given node.

Parameters
  • node (str) – Node identifier

  • data (dict) – Node properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns

A list of errors for a given node

Return type

list

static validate_edge_predicate(subject: str, object: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]

Validate edge_predicate field of a given edge.

Parameters
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns

A list of errors for a given edge

Return type

list

static validate_edge_properties(subject: str, object: str, data: dict, required_properties: list) → list[source]

Checks if all the required edge properties exist for a given edge.

Parameters
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

  • required_properties (list) – Required edge properties

Returns

A list of errors for a given edge

Return type

list

static validate_edge_property_types(subject: str, object: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]

Checks if edge properties have the expected value type.

Parameters
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns

A list of errors for a given edge

Return type

list

static validate_edge_property_values(subject: str, object: str, data: dict) → list[source]

Validate an edge property’s value.

Parameters
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

Returns

A list of errors for a given edge

Return type

list

validate_edges(graph: kgx.graph.base_graph.BaseGraph) → list[source]

Validate all the edges in a graph.

This method validates for the following, - Edge properties - Edge property type - Edge property value type - Edge predicate

Parameters

graph (kgx.graph.base_graph.BaseGraph) – The graph to validate

Returns

A list of errors for a given graph

Return type

list

static validate_node_properties(node: str, data: dict, required_properties: list) → list[source]

Checks if all the required node properties exist for a given node.

Parameters
  • node (str) – Node identifier

  • data (dict) – Node properties

  • required_properties (list) – Required node properties

Returns

A list of errors for a given node

Return type

list

static validate_node_property_types(node: str, data: dict, toolkit: Optional[bmt.toolkit.Toolkit] = None) → list[source]

Checks if node properties have the expected value type.

Parameters
  • node (str) – Node identifier

  • data (dict) – Node properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns

A list of errors for a given node

Return type

list

static validate_node_property_values(node: str, data: dict) → list[source]

Validate a node property’s value.

Parameters
  • node (str) – Node identifier

  • data (dict) – Node properties

Returns

A list of errors for a given node

Return type

list

validate_nodes(graph: kgx.graph.base_graph.BaseGraph) → list[source]

Validate all the nodes in a graph.

This method validates for the following, - Node properties - Node property type - Node property value type - Node categories

Parameters

graph (kgx.graph.base_graph.BaseGraph) – The graph to validate

Returns

A list of errors for a given graph

Return type

list

write_report(outstream: TextIO) → None[source]

Write error report to a file

Parameters

outstream (TextIO) – The stream to write to