Source¶

A Source can be implemented for any file, local, and/or remote store that can contains a graph. A Source is responsible for reading nodes and edges from the graph.

A source must subclass kgx.source.source.Source class and must implement the following methods:

parse
read_nodes
read_edges

parse method

Responsible for parsing a graph from a file/store
Must return a generator that iterates over list of node and edge records from the graph

read_nodes method

Responsible for reading nodes from the file/store
Must return a generator that iterates over list of node records
Each node record must be a 2-tuple (node_id, node_data) where,
- node_id is the node CURIE
- node_data is a dictionary that represents the node properties

read_edges method

Responsible for reading edges from the file/store
Must return a generator that iterates over list of edge records
Each edge record must be a 4-tuple (subject_id, object_id, edge_key, edge_data) where,
- subject_id is the subject node CURIE
- object_id is the object node CURIE
- edge_key is the unique key for the edge
- edge_data is a dictionary that represents the edge properties

kgx.source.source¶

Base class for all Sources in KGX.

class kgx.source.source.Source[source]¶

Bases: object

A Source is responsible for reading data as records from a store where the store is a file or a database.

check_edge_filter(edge: Dict) → bool[source]¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool[source]¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()[source]¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str][source]¶: Return the InfoRes Context of the source

set_edge_filter(key: str, value: set) → None[source]¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None[source]¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)[source]¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None[source]¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None[source]¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)[source]¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]¶

Update default prefix map.

Parameters: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)[source]¶: Set up a provenance (Knowledge Source to InfoRes) map

kgx.source.graph_source¶

GraphSource is responsible for reading from an instance of kgx.graph.base_graph.BaseGraph and must use only the methods exposed by BaseGraph to access the graph.

class kgx.source.graph_source.GraphSource[source]¶

Bases: kgx.source.source.Source

GraphSource is responsible for reading data as records from an in memory graph representation.

The underlying store must be an instance of kgx.graph.base_graph.BaseGraph

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(graph: kgx.graph.base_graph.BaseGraph, **kwargs: Any) → Generator[source]¶

This method reads from a graph and yields records.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – The graph to read from
kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records read from the graph

Return type

Generator

read_edges() → Generator[source]¶

Read edges as records from the graph.

Returns: A generator for edges
Return type: Generator

read_nodes() → Generator[source]¶

Read nodes as records from the graph.

Returns: A generator for nodes
Return type: Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None¶

Update default prefix map.

Parameters: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

kgx.source.tsv_source¶

TsvSource is responsible for reading from KGX formatted CSV or TSV using Pandas where every flat file is treated as a Pandas DataFrame and from which data are read in chunks.

KGX expects two separate files - one for nodes and another for edges.

class kgx.source.tsv_source.TsvSource[source]¶

Bases: kgx.source.source.Source

TsvSource is responsible for reading data as records from a TSV/CSV.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from a TSV/CSV and yields records.

Parameters

filename (str) – The filename to parse
format (str) – The format (tsv, csv)
compression (Optional[str]) – The compression type (tar, tar.gz)
kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple][source]¶

Load an edge into an instance of BaseGraph.

Parameters: edge (Dict) – An edge
Returns: A tuple that contains subject id, object id, edge key, and edge data
Return type: Optional[Tuple]

read_edges(df: pandas.core.frame.DataFrame) → Generator[source]¶

Load edges from pandas.DataFrame into an instance of BaseGraph.

Parameters: df (pandas.DataFrame) – Dataframe containing records that represent edges
Returns: A generator for edge records
Return type: Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]][source]¶

Prepare a node.

Parameters: node (Dict) – A node
Returns: A tuple that contains node id and node data
Return type: Optional[Tuple[str, Dict]]

read_nodes(df: pandas.core.frame.DataFrame) → Generator[source]¶

Read records from pandas.DataFrame and yield records.

Parameters: df (pandas.DataFrame) – Dataframe containing records that represent nodes
Returns: A generator for node records
Return type: Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]¶

Add or override default prefix to IRI map.

Parameters: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None[source]¶

Add or override default IRI to prefix map.

Parameters: m (Dict) – IRI to prefix map

kgx.source.json_source¶

JsonSource is responsible for reading data from a KGX formatted JSON using the ijson library, which allows for streaming data from the file.

class kgx.source.json_source.JsonSource[source]¶

Bases: kgx.source.tsv_source.TsvSource

JsonSource is responsible for reading data as records from a JSON.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from a JSON and yields records.

Parameters

filename (str) – The filename to parse
format (str) – The format (json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records read from the file

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple]¶

Load an edge into an instance of BaseGraph.

Parameters: edge (Dict) – An edge
Returns: A tuple that contains subject id, object id, edge key, and edge data
Return type: Optional[Tuple]

read_edges(filename: str) → Generator[source]¶

Read edge records from a JSON.

Parameters: filename (str) – The filename to read from
Returns: A generator for edge records
Return type: Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]¶

Prepare a node.

Parameters: node (Dict) – A node
Returns: A tuple that contains node id and node data
Return type: Optional[Tuple[str, Dict]]

read_nodes(filename: str) → Generator[source]¶

Read node records from a JSON.

Parameters: filename (str) – The filename to read from
Returns: A generator for node records
Return type: Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None¶

Add or override default prefix to IRI map.

Parameters: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None¶

Add or override default IRI to prefix map.

Parameters: m (Dict) – IRI to prefix map

kgx.source.jsonl_source¶

JsonlSource is responsible for reading data from a KGX formatted JSON Lines using the jsonlines library.

KGX expects two separate JSON Lines files - one for nodes and another for edges.

class kgx.source.jsonl_source.JsonlSource[source]¶

Bases: kgx.source.json_source.JsonSource

JsonlSource is responsible for reading data as records from JSON Lines.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(filename: str, format: str = 'jsonl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from JSON Lines and yields records.

Parameters

filename (str) – The filename to parse
format (str) – The format (json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple]¶

Load an edge into an instance of BaseGraph.

Parameters: edge (Dict) – An edge
Returns: A tuple that contains subject id, object id, edge key, and edge data
Return type: Optional[Tuple]

read_edges(filename: str) → Generator¶

Read edge records from a JSON.

Parameters: filename (str) – The filename to read from
Returns: A generator for edge records
Return type: Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]¶

Prepare a node.

Parameters: node (Dict) – A node
Returns: A tuple that contains node id and node data
Return type: Optional[Tuple[str, Dict]]

read_nodes(filename: str) → Generator¶

Read node records from a JSON.

Parameters: filename (str) – The filename to read from
Returns: A generator for node records
Return type: Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None¶

Add or override default prefix to IRI map.

Parameters: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None¶

Add or override default IRI to prefix map.

Parameters: m (Dict) – IRI to prefix map

kgx.source.trapi_source¶

TrapiSource is responsible for reading data from a Translator Reasoner API formatted JSON.

class kgx.source.trapi_source.TrapiSource[source]¶

Bases: kgx.source.json_source.JsonSource

TrapiSource is responsible for reading data as records from a TRAPI JSON.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

load_edge(edge: Dict) → Tuple[str, str, str, Dict][source]¶

Load an edge into an instance of BaseGraph

Note

This methods transformers Reasoner Std API format fields to Biolink Model fields.

Parameters: edge (Dict) – An edge

load_node(node: Dict) → Tuple[str, Dict][source]¶

Load a node into an instance of BaseGraph

Note

This method transformers Reasoner Std API format fields to Biolink Model fields.

Parameters: node (Dict) – A node

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from a JSON and yields records.

Parameters

filename (str) – The filename to parse
format (str) – The format (trapi-json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records

Return type

Generator

read_edge(edge: Dict) → Optional[Tuple]¶

Load an edge into an instance of BaseGraph.

Parameters: edge (Dict) – An edge
Returns: A tuple that contains subject id, object id, edge key, and edge data
Return type: Optional[Tuple]

read_edges(filename: str, compression: Optional[str] = None) → Generator[source]¶

Read edge records from a JSON.

Parameters

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]¶

Prepare a node.

Parameters: node (Dict) – A node
Returns: A tuple that contains node id and node data
Return type: Optional[Tuple[str, Dict]]

read_nodes(filename: str, compression: Optional[str] = None) → Generator[source]¶

Read node records from a JSON.

Parameters

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None¶

Add or override default prefix to IRI map.

Parameters: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None¶

Add or override default IRI to prefix map.

Parameters: m (Dict) – IRI to prefix map

kgx.source.obograph_source¶

ObographSource is responsible for reading data from OBOGraphs in JSON.

class kgx.source.obograph_source.ObographSource[source]¶

Bases: kgx.source.json_source.JsonSource

ObographSource is responsible for reading data as records from an OBO Graph JSON.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_category(curie: str, node: dict) → Optional[str][source]¶

Get category for a given CURIE.

Parameters

curie (str) – Curie for node
node (dict) – Node data

Returns

Category for the given node CURIE.

Return type

Optional[str]

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from JSON and yields records.

Parameters

filename (str) – The filename to parse
format (str) – The format (json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

parse_meta(node: str, meta: Dict) → Dict[source]¶

Parse ‘meta’ field of a node.

Parameters

node (str) – Node identifier
meta (Dict) – meta dictionary for the node

Returns

A dictionary that contains ‘description’, ‘synonyms’, ‘xrefs’, and ‘equivalent_nodes’.

Return type

Dict

read_edge(edge: Dict) → Dict[source]¶

Read and parse an edge record.

Parameters: edge (Dict) – The edge record
Returns: The processed edge
Return type: Dict

read_edges(filename: str, compression: Optional[str] = None) → Generator[source]¶

Read edge records from a JSON.

Parameters

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns

A generator for edge records

Return type

Generator

read_node(node: Dict) → Dict[source]¶

Read and parse a node record.

Parameters: node (Dict) – The node record
Returns: The processed node
Return type: Dict

read_nodes(filename: str, compression: Optional[str] = None) → Generator[source]¶

Read node records from a JSON.

Parameters

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns

A generator for node records

Return type

Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None¶

Add or override default prefix to IRI map.

Parameters: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None¶

Add or override default IRI to prefix map.

Parameters: m (Dict) – IRI to prefix map

kgx.source.sssom_source¶

SssomSource is responsible for reading data from an SSSOM formatted files.

KGX Source for Simple Standard for Sharing Ontology Mappings (“SSSOM”)

class kgx.source.sssom_source.SssomSource[source]¶

Bases: kgx.source.source.Source

SssomSource is responsible for reading data as records from an SSSOM file.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

load_edge(edge: Dict) → Generator[source]¶

Load an edge into an instance of BaseGraph

Parameters: edge (Dict) – An edge
Returns: A generator for node and edge records
Return type: Generator

load_edges(df: pandas.core.frame.DataFrame) → Generator[source]¶

Load edges from pandas.DataFrame into an instance of BaseGraph

Parameters: df (pandas.DataFrame) – Dataframe containing records that represent edges
Returns: A generator for edge records
Return type: Generator

load_node(node: Dict) → Tuple[str, Dict][source]¶

Load a node into an instance of BaseGraph

Parameters: node (Dict) – A node
Returns: A tuple that contains node id and node data
Return type: Optional[Tuple[str, Dict]]

parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

Parse a SSSOM TSV

Parameters

filename (str) – File to read from
format (str) – The input file format (tsv, by default)
compression (Optional[str]) – The compression (gz)
kwargs (Dict) – Any additional arguments

Returns

A generator for node and edge records

Return type

Generator

parse_header(filename: str, compression: Optional[str] = None) → None[source]¶

Parse metadata from SSSOM headers.

Parameters

filename (str) – Filename to parse
compression (Optional[str]) – Compression type

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]¶

Add or override default prefix to IRI map.

Parameters: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None[source]¶

Add or override default IRI to prefix map.

Parameters: m (Dict) – IRI to prefix map

kgx.source.neo_source¶

NeoSource is responsible for reading data from a local or remote Neo4j instance.

class kgx.source.neo_source.NeoSource[source]¶

Bases: kgx.source.source.Source

NeoSource is responsible for reading data as records from a Neo4j instance.

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

count(is_directed: bool = True) → int[source]¶

Get the total count of records to be fetched from the Neo4j database.

Parameters: is_directed (bool) – Are edges directed or undirected. True, by default, since edges in most cases are directed.
Returns: The total count of records
Return type: int

static format_edge_filter(edge_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]¶

Get the value for edge filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters

edge_filters (Dict) – All edge filters
key (str) – Name of the edge filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator

Returns

Value corresponding to the given edge filter key, formatted for CQL

Return type

str

static format_node_filter(node_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]¶

Get the value for node filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters

node_filters (Dict) – All node filters
key (str) – Name of the node filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator

Returns

Value corresponding to the given node filter key, formatted for CQL

Return type

str

get_edges(skip: int = 0, limit: int = 0, is_directed: bool = True, **kwargs: Any) → List[source]¶

Get a page of edges from the Neo4j database.

Parameters

skip (int) – Records to skip
limit (int) – Total number of records to query for
is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)
kwargs (Any) – Any additional arguments

Returns

A list of 3-tuples

Return type

List

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

get_nodes(skip: int = 0, limit: int = 0, **kwargs: Any) → List[source]¶

Get a page of nodes from the Neo4j database.

Parameters

skip (int) – Records to skip
limit (int) – Total number of records to query for
kwargs (Any) – Any additional arguments

Returns

A list of nodes

Return type

List

get_pages(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Any) → Iterator[source]¶

Get pages of size page_size from Neo4j. Returns an iterator of pages where number of pages is (end - start)/page_size

Parameters

query_function (func) – The function to use to fetch records. Usually this is self.get_nodes or self.get_edges
start (int) – Start for pagination
end (Optional[int]) – End for pagination
page_size (int) – Size of each page (10000, by default)
kwargs (Dict) – Any additional arguments that might be relevant for query_function

Returns

An iterator for a list of records from Neo4j. The size of the list is page_size

Return type

Iterator

load_edge(edge_record: List) → Tuple[source]¶

Load an edge into an instance of BaseGraph

Parameters: edge_record (List) – A 4-tuple edge record
Returns: A tuple with subject ID, object ID, edge key, and edge data
Return type: Tuple

load_edges(edges: List) → None[source]¶

Load edges into an instance of BaseGraph

Parameters: edges (List) – A list of edge records

load_node(node: Dict) → Tuple[source]¶

Load node into an instance of BaseGraph

Parameters: node (Dict) – A node
Returns: A tuple with node ID and node data
Return type: Tuple

load_nodes(nodes: List) → None[source]¶

Load nodes into an instance of BaseGraph

Parameters: nodes (List) – A list of nodes

parse(uri: str, username: str, password: str, node_filters: Dict = None, edge_filters: Dict = None, start: int = 0, end: int = None, is_directed: bool = True, page_size: int = 50000, **kwargs: Any) → Generator[source]¶

This method reads from Neo4j instance and yields records

Parameters

uri (str) – The URI for the Neo4j instance. For example, http://localhost:7474
username (str) – The username
password (str) – The password
node_filters (Dict) – Node filters
edge_filters (Dict) – Edge filters
start (int) – Number of records to skip before streaming
end (int) – Total number of records to fetch
is_directed (bool) – Whether or not the edges should be treated as directed
page_size (int) – The size of each page/batch fetched from Neo4j (50000)
kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_prefix_map(m: Dict) → None¶

Update default prefix map.

Parameters: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

kgx.source.rdf_source¶

RdfSource is responsible for reading data from RDF N-Triples.

This source makes use of a custom kgx.parsers.ntriples_parser.CustomNTriplesParser for parsing N-Triples, which extends rdflib.plugins.parsers.ntriples.NTriplesParser.

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

sort -k 1,2 -t ' ' data.nt > data_sorted.nt

class kgx.source.rdf_source.RdfSource[source]¶

Bases: kgx.source.source.Source

RdfSource is responsible for reading data as records from RDF.

Note

Currently only RDF N-Triples are supported.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict[source]¶

Add an edge to cache.

Parameters

subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict[source]¶

Add a node to cache.

Parameters

iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None[source]¶

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters

iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None[source]¶

Dereify a node to create a corresponding edge.

Parameters

n (str) – Node identifier
node (Dict) – Node data

get_biolink_element(predicate: Any) → Optional[linkml_runtime.linkml_model.meta.Element][source]¶

Returns a Biolink Model element for a given predicate.

Parameters: predicate (Any) – The CURIE of a predicate
Returns: The corresponding Biolink Model element
Return type: Optional[Element]

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from RDF N-Triples and yields records.

Note

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`

Parameters

filename (str) – The filename to parse
format (str) – The format (nt)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

process_predicate(p: Union[rdflib.term.URIRef, str, None]) → Tuple[source]¶

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters: p (Optional[Union[URIRef, str]]) – The predicate
Returns: A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
Return type: Tuple

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_property_predicates(predicates) → None[source]¶

Set predicates that are to be treated as node properties.

Parameters: predicates (Set) – Set of predicates

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None[source]¶

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters: m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None¶

Update default prefix map.

Parameters: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None[source]¶

Parse a triple.

Parameters

s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict[source]¶

Update an edge with properties.

Parameters

subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict[source]¶

Update a node with properties.

Parameters

n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

kgx.source.owl_source¶

OwlSource is responsible for parsing an OWL ontology.

When parsing an OWL, this source also adds OwlStar annotations to certain OWL axioms.

class kgx.source.owl_source.OwlSource[source]¶

Bases: kgx.source.rdf_source.RdfSource

OwlSource is responsible for parsing an OWL ontology.

..note::: This is a simple parser that loads direct class-class relationships. For more formal OWL parsing, refer to Robot: http://robot.obolibrary.org/

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶

Add an edge to cache.

Parameters

subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶

Add a node to cache.

Parameters

iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None¶

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters

iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None¶

Dereify a node to create a corresponding edge.

Parameters

n (str) – Node identifier
node (Dict) – Node data

get_biolink_element(predicate: Any) → Optional[linkml_runtime.linkml_model.meta.Element]¶

Returns a Biolink Model element for a given predicate.

Parameters: predicate (Any) – The CURIE of a predicate
Returns: The corresponding Biolink Model element
Return type: Optional[Element]

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

load_graph(rdfgraph: rdflib.graph.Graph, **kwargs: Any) → None[source]¶

Walk through the rdflib.Graph and load all triples into kgx.graph.base_graph.BaseGraph

Parameters

rdfgraph (rdflib.Graph) – Graph containing nodes and edges
kwargs (Any) – Any additional arguments

parse(filename: str, format: str = 'owl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶

This method reads from an OWL and yields records.

Parameters

filename (str) – The filename to parse
format (str) – The format (owl)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for node and edge records read from the file

Return type

Generator

process_predicate(p: Union[rdflib.term.URIRef, str, None]) → Tuple¶

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters: p (Optional[Union[URIRef, str]]) – The predicate
Returns: A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
Return type: Tuple

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_property_predicates(predicates) → None¶

Set predicates that are to be treated as node properties.

Parameters: predicates (Set) – Set of predicates

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None¶

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters: m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None¶

Update default prefix map.

Parameters: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None¶

Parse a triple.

Parameters

s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶

Update an edge with properties.

Parameters

subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶

Update a node with properties.

Parameters

n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

kgx.source.sparql_source¶

SparqlSource has yet to be implemented.

In principle, SparqlSource should be able to read data from a local or remote SPARQL endpoint.

class kgx.source.sparql_source.SparqlSource[source]¶

Bases: kgx.source.rdf_source.RdfSource

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶

Add an edge to cache.

Parameters

subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶

Add a node to cache.

Parameters

iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None¶

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters

iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

check_edge_filter(edge: Dict) → bool¶

Check if an edge passes defined edge filters.

Parameters: edge (Dict) – An edge
Returns: Whether the given edge has passed all defined edge filters
Return type: bool

check_node_filter(node: Dict) → bool¶

Check if a node passes defined node filters.

Parameters: node (Dict) – A node
Returns: Whether the given node has passed all defined node filters
Return type: bool

clear_graph_metadata()¶: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None¶

Dereify a node to create a corresponding edge.

Parameters

n (str) – Node identifier
node (Dict) – Node data

get_biolink_element(predicate: Any) → Optional[linkml_runtime.linkml_model.meta.Element]¶

Returns a Biolink Model element for a given predicate.

Parameters: predicate (Any) – The CURIE of a predicate
Returns: The corresponding Biolink Model element
Return type: Optional[Element]

get_infores_catalog() → Dict[str, str]¶: Return the InfoRes Context of the source

parse(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator¶

This method reads from RDF N-Triples and yields records.

Note

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`

Parameters

filename (str) – The filename to parse
format (str) – The format (nt)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns

A generator for records

Return type

Generator

process_predicate(p: Union[rdflib.term.URIRef, str, None]) → Tuple¶

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters: p (Optional[Union[URIRef, str]]) – The predicate
Returns: A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
Return type: Tuple

set_edge_filter(key: str, value: set) → None¶

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None¶

Set edge filters.

Parameters: filters (Dict) – Edge filters

set_edge_provenance(edge_data)¶: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None¶

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None¶

Set node filters.

Parameters: filters (Dict) – Node filters

set_node_property_predicates(predicates) → None¶

Set predicates that are to be treated as node properties.

Parameters: predicates (Set) – Set of predicates

set_node_provenance(node_data)¶: Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None¶

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters: m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None¶

Update default prefix map.

Parameters: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)¶: Set up a provenance (Knowledge Source to InfoRes) map

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None¶

Parse a triple.

Parameters

s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶

Update an edge with properties.

Parameters

subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶

Update a node with properties.

Parameters

n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict