CLI Utils

Utility methods that are used in KGX command line.

kgx.cli.cli_utils

kgx.cli.cli_utils.apply_operations(source: dict, graph: kgx.graph.base_graph.BaseGraph) → kgx.graph.base_graph.BaseGraph[source]

Apply operations as defined in the YAML.

Parameters
Returns

The graph corresponding to the source

Return type

kgx.graph.base_graph.BaseGraph

kgx.cli.cli_utils.get_input_file_types() → Tuple[source]

Get all input file formats supported by KGX.

Returns

A tuple of supported file formats

Return type

Tuple

kgx.cli.cli_utils.get_output_file_types() → Tuple[source]

Get all output file formats supported by KGX.

Returns

A tuple of supported file formats

Return type

Tuple

kgx.cli.cli_utils.get_report_format_types() → Tuple[source]

Get all graph summary report formats supported by KGX.

Returns

A tuple of supported file formats

Return type

Tuple

kgx.cli.cli_utils.graph_summary(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str], report_type: str, report_format: Optional[str] = None, stream: bool = False, graph_name: Optional[str] = None, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None, error_log: str = '') → Dict[source]

Loads and summarizes a knowledge graph from a set of input files.

Parameters
  • inputs (List[str]) – Input file

  • input_format (str) – Input file format

  • input_compression (Optional[str]) – The input compression type

  • output (Optional[str]) – Where to write the output (stdout, by default)

  • report_type (str) – The summary report type

  • report_format (Optional[str]) – The summary report format file types: ‘yaml’ or ‘json’

  • stream (bool) – Whether to parse input as a stream

  • graph_name (str) – User specified name of graph being summarized

  • node_facet_properties (Optional[List]) – A list of node properties from which to generate counts per value for those properties. For example, ['provided_by']

  • edge_facet_properties (Optional[List]) – A list of edge properties (e.g. knowledge_source tags) to facet on. For example, ['original_knowledge_source', 'aggregator_knowledge_source']

  • error_log (str) – Where to write any graph processing error message (stderr, by default)

Returns

A dictionary with the graph stats

Return type

Dict

kgx.cli.cli_utils.merge(merge_config: str, source: Optional[List] = None, destination: Optional[List] = None, processes: int = 1) → kgx.graph.base_graph.BaseGraph[source]

Load nodes and edges from files and KGs, as defined in a config YAML, and merge them into a single graph. The merged graph can then be written to a local/remote Neo4j instance OR be serialized into a file.

Parameters
  • merge_config (str) – Merge config YAML

  • source (Optional[List]) – A list of source to load from the YAML

  • destination (Optional[List]) – A list of destination to write to, as defined in the YAML

  • processes (int) – Number of processes to use

Returns

The merged graph

Return type

kgx.graph.base_graph.BaseGraph

kgx.cli.cli_utils.neo4j_download(uri: str, username: str, password: str, output: str, output_format: str, output_compression: Optional[str], stream: bool, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None) → kgx.transformer.Transformer[source]

Download nodes and edges from Neo4j database.

Parameters
  • uri (str) – Neo4j URI. For example, https://localhost:7474

  • username (str) – Username for authentication

  • password (str) – Password for authentication

  • output (str) – Where to write the output (stdout, by default)

  • output_format (Optional[str]) – The output type (tsv, by default)

  • output_compression (Optional[str]) – The output compression type

  • stream (bool) – Whether to parse input as a stream

  • node_filters (Optional[Tuple]) – Node filters

  • edge_filters (Optional[Tuple]) – Edge filters

Returns

The NeoTransformer

Return type

kgx.Transformer

kgx.cli.cli_utils.neo4j_upload(inputs: List[str], input_format: str, input_compression: Optional[str], uri: str, username: str, password: str, stream: bool, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None) → kgx.transformer.Transformer[source]

Upload a set of nodes/edges to a Neo4j database.

Parameters
  • inputs (List[str]) – A list of files that contains nodes/edges

  • input_format (str) – The input format

  • input_compression (Optional[str]) – The input compression type

  • uri (str) – The full HTTP address for Neo4j database

  • username (str) – Username for authentication

  • password (str) – Password for authentication

  • stream (bool) – Whether to parse input as a stream

  • node_filters (Optional[Tuple]) – Node filters

  • edge_filters (Optional[Tuple]) – Edge filters

Returns

The NeoTransformer

Return type

kgx.Transformer

kgx.cli.cli_utils.parse_source(key: str, source: dict, output_directory: str, prefix_map: Dict[str, str] = None, node_property_predicates: Set[str] = None, predicate_mappings: Dict[str, str] = None, checkpoint: bool = False) → kgx.sink.sink.Sink[source]

Parse a source from a merge config YAML.

Parameters
  • key (str) – Source key

  • source (Dict) – Source configuration

  • output_directory (str) – Location to write output to

  • prefix_map (Dict[str, str]) – Non-canonical CURIE mappings

  • node_property_predicates (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)

  • predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)

  • checkpoint (bool) – Whether to serialize each individual source to a TSV

Returns

Returns an instance of Sink

Return type

kgx.sink.sink.Sink

kgx.cli.cli_utils.prepare_input_args(key: str, source: Dict, output_directory: Optional[str], prefix_map: Dict[str, str] = None, node_property_predicates: Set[str] = None, predicate_mappings: Dict[str, str] = None) → Dict[source]

Prepare input arguments for Transformer.

Parameters
  • key (str) – Source key

  • source (Dict) – Source configuration

  • output_directory (str) – Location to write output to

  • prefix_map (Dict[str, str]) – Non-canonical CURIE mappings

  • node_property_predicates (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)

  • predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)

Returns

Input arguments as dictionary

Return type

Dict

kgx.cli.cli_utils.prepare_output_args(key: str, source: Dict, output_directory: Optional[str], reverse_prefix_map: Dict = None, reverse_predicate_mappings: Dict = None, property_types: Dict = None) → Dict[source]

Prepare output arguments for Transformer.

Parameters
  • key (str) – Source key

  • source (Dict) – Source configuration

  • output_directory (str) – Location to write output to

  • reverse_prefix_map (Dict[str, str]) – Non-canonical CURIE mappings for export

  • reverse_predicate_mappings (Dict[str, str]) – A mapping of property names to predicate IRIs (This is applicable for RDF)

  • property_types (Dict[str, str]) – The xml property type for properties that are other than xsd:string. Relevant for RDF export.

Returns

Output arguments as dictionary

Return type

Dict

kgx.cli.cli_utils.prepare_top_level_args(d: Dict) → Dict[source]

Parse top-level configuration.

Parameters

d (Dict) – The configuration section from the transform/merge YAML

Returns

A parsed dictionary with parameters from configuration

Return type

Dict

kgx.cli.cli_utils.transform(inputs: Optional[List[str]], input_format: Optional[str] = None, input_compression: Optional[str] = None, output: Optional[str] = None, output_format: Optional[str] = None, output_compression: Optional[str] = None, stream: bool = False, node_filters: Optional[List[Tuple[str, str]]] = None, edge_filters: Optional[List[Tuple[str, str]]] = None, transform_config: str = None, source: Optional[List] = None, knowledge_sources: Optional[List[Tuple[str, str]]] = None, processes: int = 1, infores_catalog: Optional[str] = None) → None[source]

Transform a Knowledge Graph from one serialization form to another.

Parameters
  • inputs (Optional[List[str]]) – A list of files that contains nodes/edges

  • input_format (Optional[str]) – The input format

  • input_compression (Optional[str]) – The input compression type

  • output (Optional[str]) – The output file

  • output_format (Optional[str]) – The output format

  • output_compression (Optional[str]) – The output compression type

  • stream (bool) – Whether to parse input as a stream

  • node_filters (Optional[List[Tuple[str, str]]]) – Node input filters

  • edge_filters (Optional[List[Tuple[str, str]]]) – Edge input filters

  • transform_config (Optional[str]) – The transform config YAML

  • source (Optional[List]) – A list of source to load from the YAML

  • knowledge_sources (Optional[List[Tuple[str, str]]]) – A list of named knowledge sources with (string, boolean or tuple rewrite) specification

  • processes (int) – Number of processes to use

  • infores_catalog (Optional[str]) – Optional dump of a TSV file of InfoRes CURIE to Knowledge Source mappings (not yet available in transform_config calling mode)

kgx.cli.cli_utils.transform_source(key: str, source: Dict, output_directory: Optional[str], prefix_map: Dict[str, str] = None, node_property_predicates: Set[str] = None, predicate_mappings: Dict[str, str] = None, reverse_prefix_map: Dict = None, reverse_predicate_mappings: Dict = None, property_types: Dict = None, checkpoint: bool = False, preserve_graph: bool = True, stream: bool = False, infores_catalog: Optional[str] = None) → kgx.sink.sink.Sink[source]

Transform a source from a transform config YAML.

Parameters
  • key (str) – Source key

  • source (Dict) – Source configuration

  • output_directory (Optional[str]) – Location to write output to

  • prefix_map (Dict[str, str]) – Non-canonical CURIE mappings

  • node_property_predicates (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)

  • predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)

  • reverse_prefix_map (Dict[str, str]) – Non-canonical CURIE mappings for export

  • reverse_predicate_mappings (Dict[str, str]) – A mapping of property names to predicate IRIs (This is applicable for RDF)

  • property_types (Dict[str, str]) – The xml property type for properties that are other than xsd:string. Relevant for RDF export.

  • checkpoint (bool) – Whether to serialize each individual source to a TSV

  • preserve_graph (true) – Whether or not to preserve the graph corresponding to the source

  • stream (bool) – Whether to parse input as a stream

  • infores_catalog (Optional[str]) – Optional dump of a TSV file of InfoRes CURIE to Knowledge Source mappings

Returns

Returns an instance of Sink

Return type

kgx.sink.sink.Sink

kgx.cli.cli_utils.validate(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str], stream: bool, biolink_release: Optional[str] = None) → List[source]

Run KGX validator on an input file to check for Biolink Model compliance.

Parameters
  • inputs (List[str]) – Input files

  • input_format (str) – The input format

  • input_compression (Optional[str]) – The input compression type

  • output (Optional[str]) – Path to output file (stdout, by default)

  • stream (bool) – Whether to parse input as a stream.

  • biolink_release (Optional[str] = None) – SemVer version of Biolink Model Release used for validation (default: latest Biolink Model Toolkit version)

Returns

Returns a list of errors, if any

Return type

List