Source

A Source can be implemented for any file, local, and/or remote store that can contains a graph. A Source is responsible for reading nodes and edges from the graph.

A source must subclass kgx.source.source.Source class and must implement the following methods:

  • parse

  • read_nodes

  • read_edges

parse method

  • Responsible for parsing a graph from a file/store

  • Must return a generator that iterates over list of node and edge records from the graph

read_nodes method

  • Responsible for reading nodes from the file/store

  • Must return a generator that iterates over list of node records

  • Each node record must be a 2-tuple (node_id, node_data) where,

    • node_id is the node CURIE

    • node_data is a dictionary that represents the node properties

read_edges method

  • Responsible for reading edges from the file/store

  • Must return a generator that iterates over list of edge records

  • Each edge record must be a 4-tuple (subject_id, object_id, edge_key, edge_data) where,

    • subject_id is the subject node CURIE

    • object_id is the object node CURIE

    • edge_key is the unique key for the edge

    • edge_data is a dictionary that represents the edge properties

kgx.source.source

Base class for all Sources in KGX.

class kgx.source.source.Source[source]

Bases: object

A Source is responsible for reading data as records from a store where the store is a file or a database.

check_edge_filter(edge: Dict) → bool[source]

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool[source]

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()[source]

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str][source]

Return the InfoRes Context of the source

set_edge_filter(key: str, value: set) → None[source]

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None[source]

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)[source]

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None[source]

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None[source]

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)[source]

Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]

Update default prefix map.

Parameters

m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)[source]

Set up a provenance (Knowledge Source to InfoRes) map

kgx.source.graph_source

GraphSource is responsible for reading from an instance of kgx.graph.base_graph.BaseGraph and must use only the methods exposed by BaseGraph to access the graph.

class kgx.source.graph_source.GraphSource[source]

Bases: kgx.source.source.Source

GraphSource is responsible for reading data as records from an in memory graph representation.

The underlying store must be an instance of kgx.graph.base_graph.BaseGraph

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(graph: kgx.graph.base_graph.BaseGraph, **kwargs: Any) → Generator[source]

This method reads from a graph and yields records.

Parameters
Returns

A generator for node and edge records read from the graph

Return type

Generator

read_edges() → Generator[source]

Read edges as records from the graph.

Returns

A generator for edges

Return type

Generator

read_nodes() → Generator[source]

Read nodes as records from the graph.

Returns

A generator for nodes

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None

Update default prefix map.

Parameters

m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

kgx.source.tsv_source

TsvSource is responsible for reading from KGX formatted CSV or TSV using Pandas where every flat file is treated as a Pandas DataFrame and from which data are read in chunks.

KGX expects two separate files - one for nodes and another for edges.

class kgx.source.tsv_source.TsvSource[source]

Bases: kgx.source.source.Source

TsvSource is responsible for reading data as records from a TSV/CSV.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from a TSV/CSV and yields records.

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (tsv, csv)

  • compression (Optional[str]) – The compression type (tar, tar.gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple][source]

Load an edge into an instance of BaseGraph.

Parameters

edge (Dict) – An edge

Returns

A tuple that contains subject id, object id, edge key, and edge data

Return type

Optional[Tuple]

read_edges(df: pandas.core.frame.DataFrame) → Generator[source]

Load edges from pandas.DataFrame into an instance of BaseGraph.

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent edges

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]][source]

Prepare a node.

Parameters

node (Dict) – A node

Returns

A tuple that contains node id and node data

Return type

Optional[Tuple[str, Dict]]

read_nodes(df: pandas.core.frame.DataFrame) → Generator[source]

Read records from pandas.DataFrame and yield records.

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent nodes

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]

Add or override default prefix to IRI map.

Parameters

m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None[source]

Add or override default IRI to prefix map.

Parameters

m (Dict) – IRI to prefix map

kgx.source.json_source

JsonSource is responsible for reading data from a KGX formatted JSON using the ijson library, which allows for streaming data from the file.

class kgx.source.json_source.JsonSource[source]

Bases: kgx.source.tsv_source.TsvSource

JsonSource is responsible for reading data as records from a JSON.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from a JSON and yields records.

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (json)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records read from the file

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple]

Load an edge into an instance of BaseGraph.

Parameters

edge (Dict) – An edge

Returns

A tuple that contains subject id, object id, edge key, and edge data

Return type

Optional[Tuple]

read_edges(filename: str) → Generator[source]

Read edge records from a JSON.

Parameters

filename (str) – The filename to read from

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]

Prepare a node.

Parameters

node (Dict) – A node

Returns

A tuple that contains node id and node data

Return type

Optional[Tuple[str, Dict]]

read_nodes(filename: str) → Generator[source]

Read node records from a JSON.

Parameters

filename (str) – The filename to read from

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None

Add or override default prefix to IRI map.

Parameters

m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None

Add or override default IRI to prefix map.

Parameters

m (Dict) – IRI to prefix map

kgx.source.jsonl_source

JsonlSource is responsible for reading data from a KGX formatted JSON Lines using the jsonlines library.

KGX expects two separate JSON Lines files - one for nodes and another for edges.

class kgx.source.jsonl_source.JsonlSource[source]

Bases: kgx.source.json_source.JsonSource

JsonlSource is responsible for reading data as records from JSON Lines.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(filename: str, format: str = 'jsonl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from JSON Lines and yields records.

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (json)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple]

Load an edge into an instance of BaseGraph.

Parameters

edge (Dict) – An edge

Returns

A tuple that contains subject id, object id, edge key, and edge data

Return type

Optional[Tuple]

read_edges(filename: str) → Generator

Read edge records from a JSON.

Parameters

filename (str) – The filename to read from

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]

Prepare a node.

Parameters

node (Dict) – A node

Returns

A tuple that contains node id and node data

Return type

Optional[Tuple[str, Dict]]

read_nodes(filename: str) → Generator

Read node records from a JSON.

Parameters

filename (str) – The filename to read from

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None

Add or override default prefix to IRI map.

Parameters

m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None

Add or override default IRI to prefix map.

Parameters

m (Dict) – IRI to prefix map

kgx.source.trapi_source

TrapiSource is responsible for reading data from a Translator Reasoner API formatted JSON.

class kgx.source.trapi_source.TrapiSource[source]

Bases: kgx.source.json_source.JsonSource

TrapiSource is responsible for reading data as records from a TRAPI JSON.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

load_edge(edge: Dict) → Tuple[str, str, str, Dict][source]

Load an edge into an instance of BaseGraph

Note

This methods transformers Reasoner Std API format fields to Biolink Model fields.

Parameters

edge (Dict) – An edge

load_node(node: Dict) → Tuple[str, Dict][source]

Load a node into an instance of BaseGraph

Note

This method transformers Reasoner Std API format fields to Biolink Model fields.

Parameters

node (Dict) – A node

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from a JSON and yields records.

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (trapi-json)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple]

Load an edge into an instance of BaseGraph.

Parameters

edge (Dict) – An edge

Returns

A tuple that contains subject id, object id, edge key, and edge data

Return type

Optional[Tuple]

read_edges(filename: str, compression: Optional[str] = None) → Generator[source]

Read edge records from a JSON.

Parameters
  • filename (str) – The filename to read from

  • compression (Optional[str]) – The compression type

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]

Prepare a node.

Parameters

node (Dict) – A node

Returns

A tuple that contains node id and node data

Return type

Optional[Tuple[str, Dict]]

read_nodes(filename: str, compression: Optional[str] = None) → Generator[source]

Read node records from a JSON.

Parameters
  • filename (str) – The filename to read from

  • compression (Optional[str]) – The compression type

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None

Add or override default prefix to IRI map.

Parameters

m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None

Add or override default IRI to prefix map.

Parameters

m (Dict) – IRI to prefix map

kgx.source.obograph_source

ObographSource is responsible for reading data from OBOGraphs in JSON.

class kgx.source.obograph_source.ObographSource[source]

Bases: kgx.source.json_source.JsonSource

ObographSource is responsible for reading data as records from an OBO Graph JSON.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_category(curie: str, node: dict) → Optional[str][source]

Get category for a given CURIE.

Parameters
  • curie (str) – Curie for node

  • node (dict) – Node data

Returns

Category for the given node CURIE.

Return type

Optional[str]

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from JSON and yields records.

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (json)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

parse_meta(node: str, meta: Dict) → Dict[source]

Parse ‘meta’ field of a node.

Parameters
  • node (str) – Node identifier

  • meta (Dict) – meta dictionary for the node

Returns

A dictionary that contains ‘description’, ‘synonyms’, ‘xrefs’, and ‘equivalent_nodes’.

Return type

Dict

read_edge(edge: Dict) → Dict[source]

Read and parse an edge record.

Parameters

edge (Dict) – The edge record

Returns

The processed edge

Return type

Dict

read_edges(filename: str, compression: Optional[str] = None) → Generator[source]

Read edge records from a JSON.

Parameters
  • filename (str) – The filename to read from

  • compression (Optional[str]) – The compression type

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Dict[source]

Read and parse a node record.

Parameters

node (Dict) – The node record

Returns

The processed node

Return type

Dict

read_nodes(filename: str, compression: Optional[str] = None) → Generator[source]

Read node records from a JSON.

Parameters
  • filename (str) – The filename to read from

  • compression (Optional[str]) – The compression type

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None

Add or override default prefix to IRI map.

Parameters

m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None

Add or override default IRI to prefix map.

Parameters

m (Dict) – IRI to prefix map

kgx.source.sssom_source

SssomSource is responsible for reading data from an SSSOM formatted files.

KGX Source for Simple Standard for Sharing Ontology Mappings (“SSSOM”)

class kgx.source.sssom_source.SssomSource[source]

Bases: kgx.source.source.Source

SssomSource is responsible for reading data as records from an SSSOM file.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

load_edge(edge: Dict) → Generator[source]

Load an edge into an instance of BaseGraph

Parameters

edge (Dict) – An edge

Returns

A generator for node and edge records

Return type

Generator

load_edges(df: pandas.core.frame.DataFrame) → Generator[source]

Load edges from pandas.DataFrame into an instance of BaseGraph

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent edges

Returns

A generator for edge records

Return type

Generator

load_node(node: Dict) → Tuple[str, Dict][source]

Load a node into an instance of BaseGraph

Parameters

node (Dict) – A node

Returns

A tuple that contains node id and node data

Return type

Optional[Tuple[str, Dict]]

parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]

Parse a SSSOM TSV

Parameters
  • filename (str) – File to read from

  • format (str) – The input file format (tsv, by default)

  • compression (Optional[str]) – The compression (gz)

  • kwargs (Dict) – Any additional arguments

Returns

A generator for node and edge records

Return type

Generator

parse_header(filename: str, compression: Optional[str] = None) → None[source]

Parse metadata from SSSOM headers.

Parameters
  • filename (str) – Filename to parse

  • compression (Optional[str]) – Compression type

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]

Add or override default prefix to IRI map.

Parameters

m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None[source]

Add or override default IRI to prefix map.

Parameters

m (Dict) – IRI to prefix map

kgx.source.neo_source

NeoSource is responsible for reading data from a local or remote Neo4j instance.

class kgx.source.neo_source.NeoSource[source]

Bases: kgx.source.source.Source

NeoSource is responsible for reading data as records from a Neo4j instance.

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

count(is_directed: bool = True) → int[source]

Get the total count of records to be fetched from the Neo4j database.

Parameters

is_directed (bool) – Are edges directed or undirected. True, by default, since edges in most cases are directed.

Returns

The total count of records

Return type

int

static format_edge_filter(edge_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]

Get the value for edge filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters
  • edge_filters (Dict) – All edge filters

  • key (str) – Name of the edge filter

  • variable (Optional[str]) – Variable binding for cypher query

  • prefix (Optional[str]) – Prefix for the cypher

  • op (Optional[str]) – The operator

Returns

Value corresponding to the given edge filter key, formatted for CQL

Return type

str

static format_node_filter(node_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]

Get the value for node filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters
  • node_filters (Dict) – All node filters

  • key (str) – Name of the node filter

  • variable (Optional[str]) – Variable binding for cypher query

  • prefix (Optional[str]) – Prefix for the cypher

  • op (Optional[str]) – The operator

Returns

Value corresponding to the given node filter key, formatted for CQL

Return type

str

get_edges(skip: int = 0, limit: int = 0, is_directed: bool = True, **kwargs: Any) → List[source]

Get a page of edges from the Neo4j database.

Parameters
  • skip (int) – Records to skip

  • limit (int) – Total number of records to query for

  • is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

  • kwargs (Any) – Any additional arguments

Returns

A list of 3-tuples

Return type

List

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

get_nodes(skip: int = 0, limit: int = 0, **kwargs: Any) → List[source]

Get a page of nodes from the Neo4j database.

Parameters
  • skip (int) – Records to skip

  • limit (int) – Total number of records to query for

  • kwargs (Any) – Any additional arguments

Returns

A list of nodes

Return type

List

get_pages(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Any) → Iterator[source]

Get pages of size page_size from Neo4j. Returns an iterator of pages where number of pages is (end - start)/page_size

Parameters
  • query_function (func) – The function to use to fetch records. Usually this is self.get_nodes or self.get_edges

  • start (int) – Start for pagination

  • end (Optional[int]) – End for pagination

  • page_size (int) – Size of each page (10000, by default)

  • kwargs (Dict) – Any additional arguments that might be relevant for query_function

Returns

An iterator for a list of records from Neo4j. The size of the list is page_size

Return type

Iterator

load_edge(edge_record: List) → Tuple[source]

Load an edge into an instance of BaseGraph

Parameters

edge_record (List) – A 4-tuple edge record

Returns

A tuple with subject ID, object ID, edge key, and edge data

Return type

Tuple

load_edges(edges: List) → None[source]

Load edges into an instance of BaseGraph

Parameters

edges (List) – A list of edge records

load_node(node: Dict) → Tuple[source]

Load node into an instance of BaseGraph

Parameters

node (Dict) – A node

Returns

A tuple with node ID and node data

Return type

Tuple

load_nodes(nodes: List) → None[source]

Load nodes into an instance of BaseGraph

Parameters

nodes (List) – A list of nodes

parse(uri: str, username: str, password: str, node_filters: Dict = None, edge_filters: Dict = None, start: int = 0, end: int = None, is_directed: bool = True, page_size: int = 50000, **kwargs: Any) → Generator[source]

This method reads from Neo4j instance and yields records

Parameters
  • uri (str) – The URI for the Neo4j instance. For example, http://localhost:7474

  • username (str) – The username

  • password (str) – The password

  • node_filters (Dict) – Node filters

  • edge_filters (Dict) – Edge filters

  • start (int) – Number of records to skip before streaming

  • end (int) – Total number of records to fetch

  • is_directed (bool) – Whether or not the edges should be treated as directed

  • page_size (int) – The size of each page/batch fetched from Neo4j (50000)

  • kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_provenance(node_data)

Set a specific node provenance value.

set_prefix_map(m: Dict) → None

Update default prefix map.

Parameters

m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

kgx.source.rdf_source

RdfSource is responsible for reading data from RDF N-Triples.

This source makes use of a custom kgx.parsers.ntriples_parser.CustomNTriplesParser for parsing N-Triples, which extends rdflib.plugins.parsers.ntriples.NTriplesParser.

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

sort -k 1,2 -t ' ' data.nt > data_sorted.nt
class kgx.source.rdf_source.RdfSource[source]

Bases: kgx.source.source.Source

RdfSource is responsible for reading data as records from RDF.

Note

Currently only RDF N-Triples are supported.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict[source]

Add an edge to cache.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict[source]

Add a node to cache.

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None[source]

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None[source]

Dereify a node to create a corresponding edge.

Parameters
  • n (str) – Node identifier

  • node (Dict) – Node data

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from RDF N-Triples and yields records.

Note

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (nt)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

process_predicate(p: Union[rdflib.term.URIRef, str, None]) → Tuple[source]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_property_predicates(predicates) → None[source]

Set predicates that are to be treated as node properties.

Parameters

predicates (Set) – Set of predicates

set_node_provenance(node_data)

Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None[source]

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None

Update default prefix map.

Parameters

m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None[source]

Parse a triple.

Parameters
  • s (URIRef) – Subject

  • p (URIRef) – Predicate

  • o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict[source]

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict[source]

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

kgx.source.owl_source

OwlSource is responsible for parsing an OWL ontology.

When parsing an OWL, this source also adds OwlStar annotations to certain OWL axioms.

class kgx.source.owl_source.OwlSource[source]

Bases: kgx.source.rdf_source.RdfSource

OwlSource is responsible for parsing an OWL ontology.

..note::

This is a simple parser that loads direct class-class relationships. For more formal OWL parsing, refer to Robot: http://robot.obolibrary.org/

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict

Add an edge to cache.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict

Add a node to cache.

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None

Dereify a node to create a corresponding edge.

Parameters
  • n (str) – Node identifier

  • node (Dict) – Node data

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

load_graph(rdfgraph: rdflib.graph.Graph, **kwargs: Any) → None[source]

Walk through the rdflib.Graph and load all triples into kgx.graph.base_graph.BaseGraph

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • kwargs (Any) – Any additional arguments

parse(filename: str, format: str = 'owl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]

This method reads from an OWL and yields records.

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (owl)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records read from the file

Return type

Generator

process_predicate(p: Union[rdflib.term.URIRef, str, None]) → Tuple

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_property_predicates(predicates) → None

Set predicates that are to be treated as node properties.

Parameters

predicates (Set) – Set of predicates

set_node_provenance(node_data)

Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None

Update default prefix map.

Parameters

m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None

Parse a triple.

Parameters
  • s (URIRef) – Subject

  • p (URIRef) – Predicate

  • o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

kgx.source.sparql_source

SparqlSource has yet to be implemented.

In principle, SparqlSource should be able to read data from a local or remote SPARQL endpoint.

class kgx.source.sparql_source.SparqlSource[source]

Bases: kgx.source.rdf_source.RdfSource

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict

Add an edge to cache.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict

Add a node to cache.

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

check_edge_filter(edge: Dict) → bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict) → bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

clear_graph_metadata()

Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None

Dereify a node to create a corresponding edge.

Parameters
  • n (str) – Node identifier

  • node (Dict) – Node data

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

get_infores_catalog() → Dict[str, str]

Return the InfoRes Context of the source

parse(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator

This method reads from RDF N-Triples and yields records.

Note

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`

Parameters
  • filename (str) – The filename to parse

  • format (str) – The format (nt)

  • compression (Optional[str]) – The compression type (gz)

  • kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

process_predicate(p: Union[rdflib.term.URIRef, str, None]) → Tuple

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple

set_edge_filter(key: str, value: set) → None

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None

Set edge filters.

Parameters

filters (Dict) – Edge filters

set_edge_provenance(edge_data)

Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None

Set node filters.

Parameters

filters (Dict) – Node filters

set_node_property_predicates(predicates) → None

Set predicates that are to be treated as node properties.

Parameters

predicates (Set) – Set of predicates

set_node_provenance(node_data)

Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None

Update default prefix map.

Parameters

m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)

Set up a provenance (Knowledge Source to InfoRes) map

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None

Parse a triple.

Parameters
  • s (URIRef) – Subject

  • p (URIRef) – Predicate

  • o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict