Source¶
A Source can be implemented for any file, local, and/or remote store that can contains a graph. A Source is responsible for reading nodes and edges from the graph.
A source must subclass kgx.source.source.Source
class and must implement the following methods:
parse
read_nodes
read_edges
parse
method
Responsible for parsing a graph from a file/store
Must return a generator that iterates over list of node and edge records from the graph
read_nodes
method
Responsible for reading nodes from the file/store
Must return a generator that iterates over list of node records
Each node record must be a 2-tuple
(node_id, node_data)
where,node_id
is the node CURIEnode_data
is a dictionary that represents the node properties
read_edges
method
Responsible for reading edges from the file/store
Must return a generator that iterates over list of edge records
Each edge record must be a 4-tuple
(subject_id, object_id, edge_key, edge_data)
where,subject_id
is the subject node CURIEobject_id
is the object node CURIEedge_key
is the unique key for the edgeedge_data
is a dictionary that represents the edge properties
kgx.source.source¶
Base class for all Sources in KGX.
-
class
kgx.source.source.
Source
[source]¶ Bases:
object
A Source is responsible for reading data as records from a store where the store is a file or a database.
-
check_edge_filter
(edge: Dict) → bool[source]¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool[source]¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()[source]¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
set_edge_filter
(key: str, value: set) → None[source]¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None[source]¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_node_filter
(key: str, value: Union[str, set]) → None[source]¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None[source]¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
kgx.source.graph_source¶
GraphSource
is responsible for reading from an instance of kgx.graph.base_graph.BaseGraph
and must use only
the methods exposed by BaseGraph
to access the graph.
-
class
kgx.source.graph_source.
GraphSource
[source]¶ Bases:
kgx.source.source.Source
GraphSource is responsible for reading data as records from an in memory graph representation.
The underlying store must be an instance of
kgx.graph.base_graph.BaseGraph
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(graph: kgx.graph.base_graph.BaseGraph, **kwargs: Any) → Generator[source]¶ This method reads from a graph and yields records.
- Parameters
graph (kgx.graph.base_graph.BaseGraph) – The graph to read from
kwargs (Any) – Any additional arguments
- Returns
A generator for node and edge records read from the graph
- Return type
Generator
-
read_edges
() → Generator[source]¶ Read edges as records from the graph.
- Returns
A generator for edges
- Return type
Generator
-
read_nodes
() → Generator[source]¶ Read nodes as records from the graph.
- Returns
A generator for nodes
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None¶ Update default prefix map.
- Parameters
m (Dict) – A dictionary with prefix to IRI mappings
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
kgx.source.tsv_source¶
TsvSource
is responsible for reading from KGX formatted CSV or TSV using Pandas where every flat file is treated as a
Pandas DataFrame and from which data are read in chunks.
KGX expects two separate files - one for nodes and another for edges.
-
class
kgx.source.tsv_source.
TsvSource
[source]¶ Bases:
kgx.source.source.Source
TsvSource is responsible for reading data as records from a TSV/CSV.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from a TSV/CSV and yields records.
- Parameters
filename (str) – The filename to parse
format (str) – The format (
tsv
,csv
)compression (Optional[str]) – The compression type (
tar
,tar.gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for node and edge records
- Return type
Generator
-
read_edge
(edge: Dict) → Optional[Tuple][source]¶ Load an edge into an instance of BaseGraph.
- Parameters
edge (Dict) – An edge
- Returns
A tuple that contains subject id, object id, edge key, and edge data
- Return type
Optional[Tuple]
-
read_edges
(df: pandas.core.frame.DataFrame) → Generator[source]¶ Load edges from pandas.DataFrame into an instance of BaseGraph.
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent edges
- Returns
A generator for edge records
- Return type
Generator
-
read_node
(node: Dict) → Optional[Tuple[str, Dict]][source]¶ Prepare a node.
- Parameters
node (Dict) – A node
- Returns
A tuple that contains node id and node data
- Return type
Optional[Tuple[str, Dict]]
-
read_nodes
(df: pandas.core.frame.DataFrame) → Generator[source]¶ Read records from pandas.DataFrame and yield records.
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent nodes
- Returns
A generator for node records
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None[source]¶ Add or override default prefix to IRI map.
- Parameters
m (Dict) – Prefix to IRI map
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
kgx.source.json_source¶
JsonSource
is responsible for reading data from a KGX formatted JSON using the ijson
library, which allows for streaming data from the file.
-
class
kgx.source.json_source.
JsonSource
[source]¶ Bases:
kgx.source.tsv_source.TsvSource
JsonSource is responsible for reading data as records from a JSON.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from a JSON and yields records.
- Parameters
filename (str) – The filename to parse
format (str) – The format (
json
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for node and edge records read from the file
- Return type
Generator
-
read_edge
(edge: Dict) → Optional[Tuple]¶ Load an edge into an instance of BaseGraph.
- Parameters
edge (Dict) – An edge
- Returns
A tuple that contains subject id, object id, edge key, and edge data
- Return type
Optional[Tuple]
-
read_edges
(filename: str) → Generator[source]¶ Read edge records from a JSON.
- Parameters
filename (str) – The filename to read from
- Returns
A generator for edge records
- Return type
Generator
-
read_node
(node: Dict) → Optional[Tuple[str, Dict]]¶ Prepare a node.
- Parameters
node (Dict) – A node
- Returns
A tuple that contains node id and node data
- Return type
Optional[Tuple[str, Dict]]
-
read_nodes
(filename: str) → Generator[source]¶ Read node records from a JSON.
- Parameters
filename (str) – The filename to read from
- Returns
A generator for node records
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None¶ Add or override default prefix to IRI map.
- Parameters
m (Dict) – Prefix to IRI map
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
set_reverse_prefix_map
(m: Dict) → None¶ Add or override default IRI to prefix map.
- Parameters
m (Dict) – IRI to prefix map
-
kgx.source.jsonl_source¶
JsonlSource
is responsible for reading data from a KGX formatted JSON Lines using the
jsonlines library.
KGX expects two separate JSON Lines files - one for nodes and another for edges.
-
class
kgx.source.jsonl_source.
JsonlSource
[source]¶ Bases:
kgx.source.json_source.JsonSource
JsonlSource is responsible for reading data as records from JSON Lines.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(filename: str, format: str = 'jsonl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from JSON Lines and yields records.
- Parameters
filename (str) – The filename to parse
format (str) – The format (
json
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for records
- Return type
Generator
-
read_edge
(edge: Dict) → Optional[Tuple]¶ Load an edge into an instance of BaseGraph.
- Parameters
edge (Dict) – An edge
- Returns
A tuple that contains subject id, object id, edge key, and edge data
- Return type
Optional[Tuple]
-
read_edges
(filename: str) → Generator¶ Read edge records from a JSON.
- Parameters
filename (str) – The filename to read from
- Returns
A generator for edge records
- Return type
Generator
-
read_node
(node: Dict) → Optional[Tuple[str, Dict]]¶ Prepare a node.
- Parameters
node (Dict) – A node
- Returns
A tuple that contains node id and node data
- Return type
Optional[Tuple[str, Dict]]
-
read_nodes
(filename: str) → Generator¶ Read node records from a JSON.
- Parameters
filename (str) – The filename to read from
- Returns
A generator for node records
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None¶ Add or override default prefix to IRI map.
- Parameters
m (Dict) – Prefix to IRI map
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
set_reverse_prefix_map
(m: Dict) → None¶ Add or override default IRI to prefix map.
- Parameters
m (Dict) – IRI to prefix map
-
kgx.source.trapi_source¶
TrapiSource
is responsible for reading data from a Translator Reasoner API
formatted JSON.
-
class
kgx.source.trapi_source.
TrapiSource
[source]¶ Bases:
kgx.source.json_source.JsonSource
TrapiSource is responsible for reading data as records from a TRAPI JSON.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
load_edge
(edge: Dict) → Tuple[str, str, str, Dict][source]¶ Load an edge into an instance of BaseGraph
Note
This methods transformers Reasoner Std API format fields to Biolink Model fields.
- Parameters
edge (Dict) – An edge
-
load_node
(node: Dict) → Tuple[str, Dict][source]¶ Load a node into an instance of BaseGraph
Note
This method transformers Reasoner Std API format fields to Biolink Model fields.
- Parameters
node (Dict) – A node
-
parse
(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from a JSON and yields records.
- Parameters
filename (str) – The filename to parse
format (str) – The format (
trapi-json
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for node and edge records
- Return type
Generator
-
read_edge
(edge: Dict) → Optional[Tuple]¶ Load an edge into an instance of BaseGraph.
- Parameters
edge (Dict) – An edge
- Returns
A tuple that contains subject id, object id, edge key, and edge data
- Return type
Optional[Tuple]
-
read_edges
(filename: str, compression: Optional[str] = None) → Generator[source]¶ Read edge records from a JSON.
- Parameters
filename (str) – The filename to read from
compression (Optional[str]) – The compression type
- Returns
A generator for edge records
- Return type
Generator
-
read_node
(node: Dict) → Optional[Tuple[str, Dict]]¶ Prepare a node.
- Parameters
node (Dict) – A node
- Returns
A tuple that contains node id and node data
- Return type
Optional[Tuple[str, Dict]]
-
read_nodes
(filename: str, compression: Optional[str] = None) → Generator[source]¶ Read node records from a JSON.
- Parameters
filename (str) – The filename to read from
compression (Optional[str]) – The compression type
- Returns
A generator for node records
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None¶ Add or override default prefix to IRI map.
- Parameters
m (Dict) – Prefix to IRI map
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
set_reverse_prefix_map
(m: Dict) → None¶ Add or override default IRI to prefix map.
- Parameters
m (Dict) – IRI to prefix map
-
kgx.source.obograph_source¶
ObographSource
is responsible for reading data from OBOGraphs in JSON.
-
class
kgx.source.obograph_source.
ObographSource
[source]¶ Bases:
kgx.source.json_source.JsonSource
ObographSource is responsible for reading data as records from an OBO Graph JSON.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_category
(curie: str, node: dict) → Optional[str][source]¶ Get category for a given CURIE.
- Parameters
curie (str) – Curie for node
node (dict) – Node data
- Returns
Category for the given node CURIE.
- Return type
Optional[str]
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from JSON and yields records.
- Parameters
filename (str) – The filename to parse
format (str) – The format (
json
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for records
- Return type
Generator
-
parse_meta
(node: str, meta: Dict) → Dict[source]¶ Parse ‘meta’ field of a node.
- Parameters
node (str) – Node identifier
meta (Dict) – meta dictionary for the node
- Returns
A dictionary that contains ‘description’, ‘synonyms’, ‘xrefs’, and ‘equivalent_nodes’.
- Return type
Dict
-
read_edge
(edge: Dict) → Dict[source]¶ Read and parse an edge record.
- Parameters
edge (Dict) – The edge record
- Returns
The processed edge
- Return type
Dict
-
read_edges
(filename: str, compression: Optional[str] = None) → Generator[source]¶ Read edge records from a JSON.
- Parameters
filename (str) – The filename to read from
compression (Optional[str]) – The compression type
- Returns
A generator for edge records
- Return type
Generator
-
read_node
(node: Dict) → Dict[source]¶ Read and parse a node record.
- Parameters
node (Dict) – The node record
- Returns
The processed node
- Return type
Dict
-
read_nodes
(filename: str, compression: Optional[str] = None) → Generator[source]¶ Read node records from a JSON.
- Parameters
filename (str) – The filename to read from
compression (Optional[str]) – The compression type
- Returns
A generator for node records
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None¶ Add or override default prefix to IRI map.
- Parameters
m (Dict) – Prefix to IRI map
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
set_reverse_prefix_map
(m: Dict) → None¶ Add or override default IRI to prefix map.
- Parameters
m (Dict) – IRI to prefix map
-
kgx.source.sssom_source¶
SssomSource
is responsible for reading data from an SSSOM
formatted files.
KGX Source for Simple Standard for Sharing Ontology Mappings (“SSSOM”)
-
class
kgx.source.sssom_source.
SssomSource
[source]¶ Bases:
kgx.source.source.Source
SssomSource is responsible for reading data as records from an SSSOM file.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
load_edge
(edge: Dict) → Generator[source]¶ Load an edge into an instance of BaseGraph
- Parameters
edge (Dict) – An edge
- Returns
A generator for node and edge records
- Return type
Generator
-
load_edges
(df: pandas.core.frame.DataFrame) → Generator[source]¶ Load edges from pandas.DataFrame into an instance of BaseGraph
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent edges
- Returns
A generator for edge records
- Return type
Generator
-
load_node
(node: Dict) → Tuple[str, Dict][source]¶ Load a node into an instance of BaseGraph
- Parameters
node (Dict) – A node
- Returns
A tuple that contains node id and node data
- Return type
Optional[Tuple[str, Dict]]
-
parse
(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ Parse a SSSOM TSV
- Parameters
filename (str) – File to read from
format (str) – The input file format (
tsv
, by default)compression (Optional[str]) – The compression (
gz
)kwargs (Dict) – Any additional arguments
- Returns
A generator for node and edge records
- Return type
Generator
-
parse_header
(filename: str, compression: Optional[str] = None) → None[source]¶ Parse metadata from SSSOM headers.
- Parameters
filename (str) – Filename to parse
compression (Optional[str]) – Compression type
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None[source]¶ Add or override default prefix to IRI map.
- Parameters
m (Dict) – Prefix to IRI map
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
kgx.source.neo_source¶
NeoSource
is responsible for reading data from a local or remote Neo4j instance.
-
class
kgx.source.neo_source.
NeoSource
[source]¶ Bases:
kgx.source.source.Source
NeoSource is responsible for reading data as records from a Neo4j instance.
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
count
(is_directed: bool = True) → int[source]¶ Get the total count of records to be fetched from the Neo4j database.
- Parameters
is_directed (bool) – Are edges directed or undirected.
True
, by default, since edges in most cases are directed.- Returns
The total count of records
- Return type
int
-
static
format_edge_filter
(edge_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]¶ Get the value for edge filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters
edge_filters (Dict) – All edge filters
key (str) – Name of the edge filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator
- Returns
Value corresponding to the given edge filter
key
, formatted for CQL- Return type
str
-
static
format_node_filter
(node_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]¶ Get the value for node filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters
node_filters (Dict) – All node filters
key (str) – Name of the node filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator
- Returns
Value corresponding to the given node filter
key
, formatted for CQL- Return type
str
-
get_edges
(skip: int = 0, limit: int = 0, is_directed: bool = True, **kwargs: Any) → List[source]¶ Get a page of edges from the Neo4j database.
- Parameters
skip (int) – Records to skip
limit (int) – Total number of records to query for
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)kwargs (Any) – Any additional arguments
- Returns
A list of 3-tuples
- Return type
List
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
get_nodes
(skip: int = 0, limit: int = 0, **kwargs: Any) → List[source]¶ Get a page of nodes from the Neo4j database.
- Parameters
skip (int) – Records to skip
limit (int) – Total number of records to query for
kwargs (Any) – Any additional arguments
- Returns
A list of nodes
- Return type
List
-
get_pages
(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Any) → Iterator[source]¶ Get pages of size
page_size
from Neo4j. Returns an iterator of pages where number of pages is (end
-start
)/page_size
- Parameters
query_function (func) – The function to use to fetch records. Usually this is
self.get_nodes
orself.get_edges
start (int) – Start for pagination
end (Optional[int]) – End for pagination
page_size (int) – Size of each page (
10000
, by default)kwargs (Dict) – Any additional arguments that might be relevant for
query_function
- Returns
An iterator for a list of records from Neo4j. The size of the list is
page_size
- Return type
Iterator
-
load_edge
(edge_record: List) → Tuple[source]¶ Load an edge into an instance of BaseGraph
- Parameters
edge_record (List) – A 4-tuple edge record
- Returns
A tuple with subject ID, object ID, edge key, and edge data
- Return type
Tuple
-
load_edges
(edges: List) → None[source]¶ Load edges into an instance of BaseGraph
- Parameters
edges (List) – A list of edge records
-
load_node
(node: Dict) → Tuple[source]¶ Load node into an instance of BaseGraph
- Parameters
node (Dict) – A node
- Returns
A tuple with node ID and node data
- Return type
Tuple
-
load_nodes
(nodes: List) → None[source]¶ Load nodes into an instance of BaseGraph
- Parameters
nodes (List) – A list of nodes
-
parse
(uri: str, username: str, password: str, node_filters: Dict = None, edge_filters: Dict = None, start: int = 0, end: int = None, is_directed: bool = True, page_size: int = 50000, **kwargs: Any) → Generator[source]¶ This method reads from Neo4j instance and yields records
- Parameters
uri (str) – The URI for the Neo4j instance. For example, http://localhost:7474
username (str) – The username
password (str) – The password
node_filters (Dict) – Node filters
edge_filters (Dict) – Edge filters
start (int) – Number of records to skip before streaming
end (int) – Total number of records to fetch
is_directed (bool) – Whether or not the edges should be treated as directed
page_size (int) – The size of each page/batch fetched from Neo4j (
50000
)kwargs (Any) – Any additional arguments
- Returns
A generator for records
- Return type
Generator
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_prefix_map
(m: Dict) → None¶ Update default prefix map.
- Parameters
m (Dict) – A dictionary with prefix to IRI mappings
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
kgx.source.rdf_source¶
RdfSource
is responsible for reading data from RDF N-Triples.
This source makes use of a custom kgx.parsers.ntriples_parser.CustomNTriplesParser
for parsing N-Triples,
which extends rdflib.plugins.parsers.ntriples.NTriplesParser
.
To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.
sort -k 1,2 -t ' ' data.nt > data_sorted.nt
-
class
kgx.source.rdf_source.
RdfSource
[source]¶ Bases:
kgx.source.source.Source
RdfSource is responsible for reading data as records from RDF.
Note
Currently only RDF N-Triples are supported.
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict[source]¶ Add an edge to cache.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict[source]¶ Add a node to cache.
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None[source]¶ Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.
The
key
may be a rdflib.URIRef or an URI string that maps onto a property name as defined inrdf_utils.property_mapping
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
dereify
(n: str, node: Dict) → None[source]¶ Dereify a node to create a corresponding edge.
- Parameters
n (str) – Node identifier
node (Dict) – Node data
-
get_biolink_element
(predicate: Any) → Optional[linkml_runtime.linkml_model.meta.Element][source]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from RDF N-Triples and yields records.
Note
To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.
`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`
- Parameters
filename (str) – The filename to parse
format (str) – The format (
nt
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for records
- Return type
Generator
-
process_predicate
(p: Union[rdflib.term.URIRef, str, None]) → Tuple[source]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_property_predicates
(predicates) → None[source]¶ Set predicates that are to be treated as node properties.
- Parameters
predicates (Set) – Set of predicates
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_predicate_mapping
(m: Dict) → None[source]¶ Set predicate mappings.
Use this method to update mappings for predicates that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
-
set_prefix_map
(m: Dict) → None¶ Update default prefix map.
- Parameters
m (Dict) – A dictionary with prefix to IRI mappings
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
triple
(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None[source]¶ Parse a triple.
- Parameters
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict[source]¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
kgx.source.owl_source¶
OwlSource
is responsible for parsing an OWL ontology.
When parsing an OWL, this source also adds OwlStar annotations to certain OWL axioms.
-
class
kgx.source.owl_source.
OwlSource
[source]¶ Bases:
kgx.source.rdf_source.RdfSource
OwlSource is responsible for parsing an OWL ontology.
- ..note::
This is a simple parser that loads direct class-class relationships. For more formal OWL parsing, refer to Robot: http://robot.obolibrary.org/
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ Add an edge to cache.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ Add a node to cache.
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None¶ Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.
The
key
may be a rdflib.URIRef or an URI string that maps onto a property name as defined inrdf_utils.property_mapping
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
dereify
(n: str, node: Dict) → None¶ Dereify a node to create a corresponding edge.
- Parameters
n (str) – Node identifier
node (Dict) – Node data
-
get_biolink_element
(predicate: Any) → Optional[linkml_runtime.linkml_model.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
load_graph
(rdfgraph: rdflib.graph.Graph, **kwargs: Any) → None[source]¶ Walk through the rdflib.Graph and load all triples into kgx.graph.base_graph.BaseGraph
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
kwargs (Any) – Any additional arguments
-
parse
(filename: str, format: str = 'owl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]¶ This method reads from an OWL and yields records.
- Parameters
filename (str) – The filename to parse
format (str) – The format (
owl
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for node and edge records read from the file
- Return type
Generator
-
process_predicate
(p: Union[rdflib.term.URIRef, str, None]) → Tuple¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_property_predicates
(predicates) → None¶ Set predicates that are to be treated as node properties.
- Parameters
predicates (Set) – Set of predicates
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_predicate_mapping
(m: Dict) → None¶ Set predicate mappings.
Use this method to update mappings for predicates that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
-
set_prefix_map
(m: Dict) → None¶ Update default prefix map.
- Parameters
m (Dict) – A dictionary with prefix to IRI mappings
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
triple
(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None¶ Parse a triple.
- Parameters
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
kgx.source.sparql_source¶
SparqlSource
has yet to be implemented.
In principle, SparqlSource
should be able to read data from a local or remote SPARQL endpoint.
-
class
kgx.source.sparql_source.
SparqlSource
[source]¶ Bases:
kgx.source.rdf_source.RdfSource
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ Add an edge to cache.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ Add a node to cache.
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → None¶ Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.
The
key
may be a rdflib.URIRef or an URI string that maps onto a property name as defined inrdf_utils.property_mapping
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
clear_graph_metadata
()¶ Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
-
dereify
(n: str, node: Dict) → None¶ Dereify a node to create a corresponding edge.
- Parameters
n (str) – Node identifier
node (Dict) – Node data
-
get_biolink_element
(predicate: Any) → Optional[linkml_runtime.linkml_model.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
get_infores_catalog
() → Dict[str, str]¶ Return the InfoRes Context of the source
-
parse
(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator¶ This method reads from RDF N-Triples and yields records.
Note
To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.
`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`
- Parameters
filename (str) – The filename to parse
format (str) – The format (
nt
)compression (Optional[str]) – The compression type (
gz
)kwargs (Any) – Any additional arguments
- Returns
A generator for records
- Return type
Generator
-
process_predicate
(p: Union[rdflib.term.URIRef, str, None]) → Tuple¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_edge_filters
(filters: Dict) → None¶ Set edge filters.
- Parameters
filters (Dict) – Edge filters
-
set_edge_provenance
(edge_data)¶ Set a specific edge provenance value.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_node_filters
(filters: Dict) → None¶ Set node filters.
- Parameters
filters (Dict) – Node filters
-
set_node_property_predicates
(predicates) → None¶ Set predicates that are to be treated as node properties.
- Parameters
predicates (Set) – Set of predicates
-
set_node_provenance
(node_data)¶ Set a specific node provenance value.
-
set_predicate_mapping
(m: Dict) → None¶ Set predicate mappings.
Use this method to update mappings for predicates that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
-
set_prefix_map
(m: Dict) → None¶ Update default prefix map.
- Parameters
m (Dict) – A dictionary with prefix to IRI mappings
-
set_provenance_map
(kwargs)¶ Set up a provenance (Knowledge Source to InfoRes) map
-
triple
(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None¶ Parse a triple.
- Parameters
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-