Transformers¶
Transformers are classes in KGX that allow you to read and write data of a particular form.
Transformer¶
The base class for all Transformers in KGX.
-
class
kgx.transformers.transformer.
Transformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None)[source]¶ Bases:
object
Base class for performing a transformation.
- This can be,
from a source to an in-memory property graph (networkx.MultiDiGraph)
from an in-memory property graph to a target format or database (Neo4j, CSV, RDF Triple Store, TTL)
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None[source]¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool[source]¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict[source]¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None[source]¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None[source]¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
static
validate_edge
(edge: dict) → dict[source]¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict[source]¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
NeoTransformer¶
-
class
kgx.transformers.neo_transformer.
NeoTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, uri: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None)[source]¶ Bases:
kgx.transformers.transformer.Transformer
Transformer for reading from and writing to a Neo4j database.
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
uri (Optional[str]) – The Neo4j URI (with port)
username (Optional[str]) – The Neo4j username for authentication
password (Optional[str]) – The Neo4j password for authentication
-
count
(is_directed: bool = True) → int[source]¶ Get the total count of records to be fetched from the Neo4j database.
- Parameters
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)- Returns
The total count of records
- Return type
int
-
static
create_constraint_query
(category: str) → str[source]¶ Create a Cypher CONSTRAINT query
- Parameters
category (str) – The category to create a constraint on
- Returns
The Cypher CONSTRAINT query
- Return type
str
-
create_constraints
(categories: Union[set, list]) → None[source]¶ Create a unique constraint on node ‘id’ for all
categories
in Neo4j.- Parameters
categories (set) – Set of categories
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
static
generate_unwind_edge_query
(edge_label: str) → str[source]¶ Generate UNWIND cypher query for saving edges into Neo4j.
Query uses
self.DEFAULT_NODE_CATEGORY
to quickly lookup the required subject and object node.- Parameters
edge_label (str) – Edge label as string
- Returns
The UNWIND cypher query
- Return type
str
-
static
generate_unwind_node_query
(category: str) → str[source]¶ Generate UNWIND cypher query for saving nodes into Neo4j.
There should be a CONSTRAINT in Neo4j for
self.DEFAULT_NODE_CATEGORY
. The query usesself.DEFAULT_NODE_CATEGORY
as the node label to increase speed for adding nodes. The query also sets label toself.DEFAULT_NODE_CATEGORY
for any node to make sure that the CONSTRAINT applies.- Parameters
category (str) – Node category
- Returns
The UNWIND cypher query
- Return type
str
-
get_edge_filter
(key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]¶ Get the value for edge filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters
key (str) – Name of the edge filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator
- Returns
Value corresponding to the given edge filter key, formatted for CQL
- Return type
str
-
get_edges
(skip: int = 0, limit: int = 0, is_directed: bool = True) → List[source]¶ Get a page of edges from the Neo4j database.
- Parameters
skip (int) – Records to skip
limit (int) – Total number of records to query for
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)
- Returns
A list of 3-tuples
- Return type
list
-
get_node_filter
(key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]¶ Get the value for node filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters
key (str) – Name of the node filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator
- Returns
Value corresponding to the given node filter key, formatted for CQL
- Return type
str
-
get_nodes
(skip: int = 0, limit: int = 0) → List[source]¶ Get a page of nodes from the Neo4j database.
- Parameters
skip (int) – Records to skip
limit (int) – Total number of records to query for
- Returns
A list of nodes
- Return type
list
-
get_pages
(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Dict) → Iterator[source]¶ Get pages of size
page_size
from Neo4j. Returns an iterator of pages where number of pages is (end
-start
)/page_size
- Parameters
query_function (func) – The function to use to fetch records. Usually this is
self.get_nodes
orself.get_edges
start (int) – Start for pagination
end (Optional[int]) – End for pagination
page_size (int) – Size of each page (
10000
, by default)kwargs (Dict) – Any additional arguments that might be relevant for
query_function
- Returns
An iterator for a list of records from Neo4j. The size of the list is
page_size
- Return type
Iterator
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load
(start: int = 0, end: Optional[int] = None, is_directed: bool = True, page_size: int = 50000, provided_by: Optional[str] = None) → None[source]¶ Read nodes and edges from a Neo4j database and create a networkx.MultiDiGraph
- Parameters
start (int) – Start for pagination
end (Optional[int]) – End for pagination
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)page_size (int) – Size of page (or chunk) to fetch from Neo4j
provided_by (Optional[str]) – Define the source providing the data
-
load_edge
(edge_record: List) → None[source]¶ Load an edge into networkx.MultiDiGraph
- Parameters
edge_record (List) – A 3-tuple edge record
-
load_edges
(edges: List) → None[source]¶ Load edges into networkx.MultiDiGraph
- Parameters
edges (List) – A list of edge records
-
load_node
(node: Dict) → None[source]¶ Load node into networkx.MultiDiGraph
- Parameters
node (Dict) – A node
-
load_nodes
(nodes: List) → None[source]¶ Load nodes into networkx.MultiDiGraph
- Parameters
nodes (List) – A list of nodes
-
neo4j_report
() → None[source]¶ Give a summary on the number of nodes and edges in the Neo4j database.
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
sanitize_category
(category: List) → List[source]¶ Sanitize category for use in UNWIND cypher clause. This method adds escape characters to each element in category list to ensure the category is processed correctly.
- Parameters
category (List) – Category
- Returns
Sanitized category list
- Return type
List
-
save
() → None[source]¶ Save all nodes and edges from networkx.MultiDiGraph into Neo4j using the UNWIND cypher clause.
-
save_edge
(edges_by_edge_label: Dict[str, list], batch_size: int = 10000) → None[source]¶ Save all edges into Neo4j using the UNWIND cypher clause.
- Parameters
edges_by_edge_label (dict) – A dictionary where edge label is the key and the value is a list of edges with that edge label
batch_size (int) – Size of batch per transaction (default: 10000)
-
save_node
(nodes_by_category: Dict[str, list], batch_size: int = 10000) → None[source]¶ Save all nodes into Neo4j using the UNWIND cypher clause.
- Parameters
nodes_by_category (Dict[str, list]) – A dictionary where node category is the key and the value is a list of nodes of that category
batch_size (int) – Size of batch per transaction (default: 10000)
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
PandasTransformer¶
-
class
kgx.transformers.pandas_transformer.
PandasTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None)[source]¶ Bases:
kgx.transformers.transformer.Transformer
Transformer that parses a TSV/CSV, and loads nodes and edges into a networkx.MultiDiGraph
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
-
check_edge_filter
(edge: Dict) → bool[source]¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool[source]¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export_edges
(filename: str, delimiter: str) → None[source]¶ Export edges from networkx.MultiDiGraph
- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_neo4j_edges
(filename: str, delimiter: str) → None[source]¶ Export edges from networkx.MultiDiGraph in Neo4j compatible format. This format is meant for use with the
neo4j-admin import
tool.- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_neo4j_nodes
(filename: str, delimiter: str) → None[source]¶ Export nodes from networkx.MultiDiGraph in Neo4j compatible format. This format is meant for use with the
neo4j-admin import
tool.- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_nodes
(filename: str, delimiter: str) → None[source]¶ Export nodes from networkx.MultiDiGraph
- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
static
get_all_edge_properties
(graph: networkx.classes.multidigraph.MultiDiGraph) → Set[source]¶ Given a graph, get all possible property names for edges.
- Parameters
graph (networkx.MultiDiGraph) – A graph
- Returns
A set of edge properties
- Return type
Set
-
static
get_all_node_properties
(graph: networkx.classes.multidigraph.MultiDiGraph) → Set[source]¶ Given a graph, get all possible property names for nodes.
- Parameters
graph (networkx.MultiDiGraph) – A graph
- Returns
A set of node properties
- Return type
Set
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
static
is_null
(item: Any) → bool[source]¶ Checks if a given item is null or correspond to null.
This method checks for: None, numpy.nan, pandas.NA, pandas.NaT, “”, and ” “
- Parameters
item (Any) – The item to check
- Returns
Whether the given item is null or not
- Return type
bool
-
load_edge
(edge: Dict) → None[source]¶ Load an edge into a networkx.MultiDiGraph
- Parameters
edge (Dict) – An edge
-
load_edges
(df: pandas.core.frame.DataFrame) → None[source]¶ Load edges from pandas.DataFrame into a networkx.MultiDiGraph
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent edges
-
load_node
(node: Dict) → None[source]¶ Load a node into a networkx.MultiDiGraph
- Parameters
node (Dict) – A node
-
load_nodes
(df: pandas.core.frame.DataFrame) → None[source]¶ Load nodes from pandas.DataFrame into a networkx.MultiDiGraph
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent nodes
-
parse
(filename: str, input_format: str = 'csv', compression: Optional[str] = None, provided_by: Optional[str] = None, **kwargs: Dict) → None[source]¶ Parse a CSV/TSV (or plain text) file.
The file can represent either nodes (nodes.csv) or edges (edges.csv) or both (data.tar), where the tar archive contains nodes.csv and edges.csv
The file can also be data.tar.gz or data.tar.bz2
- Parameters
filename (str) – File to read from
input_format (str) – The input file format (
csv
, by default)compression (Optional[str]) – The compression. For example,
tar
provided_by (Optional[str]) – Define the source providing the input file
kwargs (Dict) – Any additional arguments
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, output_format: str = 'csv', compression: Optional[str] = None, **kwargs: Dict) → str[source]¶ Writes two files representing the node set and edge set of a networkx.MultiDiGraph, and add them to a .tar archive.
- ..note::
If your node/edge properties are likely to contain commas then it is recommended to export to a TSV format instead of CSV.
- Parameters
filename (str) – Name of tar archive file to create
output_format (str) – The output file format (
csv
, by default)compression (Optional[str]) – The compression. For example, tar
kwargs (Dict) – Any additional arguments
- Returns
The filename
- Return type
str
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
JsonTransformer¶
-
class
kgx.transformers.json_transformer.
JsonTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None)[source]¶ Bases:
kgx.transformers.pandas_transformer.PandasTransformer
Transformer that parses a JSON, and loads nodes and edges into a networkx.MultiDiGraph
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export
() → Dict[source]¶ Export networkx.MultiDiGraph as a dictionary.
- Returns
A dictionary with a list nodes and a list of edges
- Return type
dict
-
export_edges
(filename: str, delimiter: str) → None¶ Export edges from networkx.MultiDiGraph
- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_neo4j_edges
(filename: str, delimiter: str) → None¶ Export edges from networkx.MultiDiGraph in Neo4j compatible format. This format is meant for use with the
neo4j-admin import
tool.- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_neo4j_nodes
(filename: str, delimiter: str) → None¶ Export nodes from networkx.MultiDiGraph in Neo4j compatible format. This format is meant for use with the
neo4j-admin import
tool.- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_nodes
(filename: str, delimiter: str) → None¶ Export nodes from networkx.MultiDiGraph
- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
static
get_all_edge_properties
(graph: networkx.classes.multidigraph.MultiDiGraph) → Set¶ Given a graph, get all possible property names for edges.
- Parameters
graph (networkx.MultiDiGraph) – A graph
- Returns
A set of edge properties
- Return type
Set
-
static
get_all_node_properties
(graph: networkx.classes.multidigraph.MultiDiGraph) → Set¶ Given a graph, get all possible property names for nodes.
- Parameters
graph (networkx.MultiDiGraph) – A graph
- Returns
A set of node properties
- Return type
Set
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
static
is_null
(item: Any) → bool¶ Checks if a given item is null or correspond to null.
This method checks for: None, numpy.nan, pandas.NA, pandas.NaT, “”, and ” “
- Parameters
item (Any) – The item to check
- Returns
Whether the given item is null or not
- Return type
bool
-
load
(obj: Dict[str, Any]) → None[source]¶ Load a JSON object, containing nodes and edges, into a networkx.MultiDiGraph
- Parameters
obj (Dict[str, Any]) – JSON Object with all nodes and edges
-
load_edge
(edge: Dict) → None¶ Load an edge into a networkx.MultiDiGraph
- Parameters
edge (Dict) – An edge
-
load_edges
(edges: List[Dict]) → None[source]¶ Load a list of edges into a networkx.MultiDiGraph
- Parameters
edges (list) – List of edges
-
load_node
(node: Dict) → None¶ Load a node into a networkx.MultiDiGraph
- Parameters
node (Dict) – A node
-
load_nodes
(nodes: List[Dict]) → None[source]¶ Load a list of nodes into a networkx.MultiDiGraph
- Parameters
nodes (list) – List of nodes
-
parse
(filename: str, input_format: str = 'json', compression: Optional[str] = None, provided_by: Optional[str] = None, **kwargs) → None[source]¶ Parse a JSON file of the format,
- {
“nodes” : […], “edges” : […],
}
- Parameters
filename (str) – JSON file to read from
input_format (str) – The input file format (
json
, by default)compression (Optional[str]) – The compression type. For example,
gz
provided_by (Optional[str]) – Define the source providing the input file
kwargs (dict) – Any additional arguments
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, output_format: str = 'json', compression: Optional[str] = None, **kwargs) → str[source]¶ Write networkx.MultiDiGraph to a file as JSON.
- Parameters
filename (str) – Filename to write to
output_format (str) – The output file format (
json
, by default)compression (Optional[str]) – The compression type. For example,
gz
kwargs (dict) – Any additional arguments
- Returns
The filename
- Return type
str
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
class
kgx.transformers.json_transformer.
ObographJsonTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None)[source]¶ Bases:
kgx.transformers.json_transformer.JsonTransformer
Transformer that parses an Obograph JSON, and loads nodes and edges into a networkx.MultiDiGraph
-
check_edge_filter
(edge: Dict) → bool¶ Check if an edge passes defined edge filters.
- Parameters
edge (Dict) – An edge
- Returns
Whether the given edge has passed all defined edge filters
- Return type
bool
-
check_node_filter
(node: Dict) → bool¶ Check if a node passes defined node filters.
- Parameters
node (Dict) – A node
- Returns
Whether the given node has passed all defined node filters
- Return type
bool
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export
() → Dict¶ Export networkx.MultiDiGraph as a dictionary.
- Returns
A dictionary with a list nodes and a list of edges
- Return type
dict
-
export_edges
(filename: str, delimiter: str) → None¶ Export edges from networkx.MultiDiGraph
- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_neo4j_edges
(filename: str, delimiter: str) → None¶ Export edges from networkx.MultiDiGraph in Neo4j compatible format. This format is meant for use with the
neo4j-admin import
tool.- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_neo4j_nodes
(filename: str, delimiter: str) → None¶ Export nodes from networkx.MultiDiGraph in Neo4j compatible format. This format is meant for use with the
neo4j-admin import
tool.- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
export_nodes
(filename: str, delimiter: str) → None¶ Export nodes from networkx.MultiDiGraph
- Parameters
filename (str) – The filename
delimiter (str) – The delimiter to use as a separator
-
static
get_all_edge_properties
(graph: networkx.classes.multidigraph.MultiDiGraph) → Set¶ Given a graph, get all possible property names for edges.
- Parameters
graph (networkx.MultiDiGraph) – A graph
- Returns
A set of edge properties
- Return type
Set
-
static
get_all_node_properties
(graph: networkx.classes.multidigraph.MultiDiGraph) → Set¶ Given a graph, get all possible property names for nodes.
- Parameters
graph (networkx.MultiDiGraph) – A graph
- Returns
A set of node properties
- Return type
Set
-
get_category
(curie: str, node: dict) → Optional[str][source]¶ Get category for a given CURIE.
- Parameters
curie (str) – Curie for node
node (dict) – Node data
- Returns
Category for the given node CURIE.
- Return type
Optional[str]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
static
is_null
(item: Any) → bool¶ Checks if a given item is null or correspond to null.
This method checks for: None, numpy.nan, pandas.NA, pandas.NaT, “”, and ” “
- Parameters
item (Any) – The item to check
- Returns
Whether the given item is null or not
- Return type
bool
-
load
(obj: Dict[str, Any]) → None¶ Load a JSON object, containing nodes and edges, into a networkx.MultiDiGraph
- Parameters
obj (Dict[str, Any]) – JSON Object with all nodes and edges
-
load_edge
(edge: dict) → None[source]¶ Load an edge from Obograph JSON into a networkx.MultiDiGraph
- Parameters
edge (dict) – An edge
-
load_edges
(edges: List[Dict]) → None¶ Load a list of edges into a networkx.MultiDiGraph
- Parameters
edges (list) – List of edges
-
load_node
(node: Dict) → None[source]¶ Load a node into a networkx.MultiDiGraph
- Parameters
node (dict) – A node
-
load_nodes
(nodes: List[Dict]) → None¶ Load a list of nodes into a networkx.MultiDiGraph
- Parameters
nodes (list) – List of nodes
-
parse
(filename: str, input_format: str = 'json', compression: Optional[str] = None, provided_by: Optional[str] = None, **kwargs) → None[source]¶ Parse Obograph JSON file of the format,
- {
- “graphs”: [
- {
- “nodes”[
- {
“id” : “UBERON:0002102”, “lbl” : “forelimb”
- }, {
“id” : “UBERON:0002101”, “lbl” : “limb”
}
], “edges” : [
- {
“subj” : “UBERON:0002102”, “pred” : “is_a”, “obj” : “UBERON:0002101”
}
]
}
]
}
- Parameters
filename (str) – JSON file to read from
input_format (str) – The input file format (
json
, by default)compression (Optional[str]) – The compression type. For example,
gz
provided_by (Optional[str]) – Define the source providing the input file
kwargs (dict) – Any additional arguments
-
parse_meta
(node: str, meta: dict)[source]¶ Parse ‘meta’ field of a node.
- Parameters
node (str) – Node identifier
meta (dict) – meta dictionary for the node
- Returns
A dictionary that contains ‘description’, ‘synonyms’, ‘xrefs’, and ‘equivalent_nodes’.
- Return type
dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, output_format: str = 'json', compression: Optional[str] = None, **kwargs) → str¶ Write networkx.MultiDiGraph to a file as JSON.
- Parameters
filename (str) – Filename to write to
output_format (str) – The output file format (
json
, by default)compression (Optional[str]) – The compression type. For example,
gz
kwargs (dict) – Any additional arguments
- Returns
The filename
- Return type
str
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
LogicTermTransformer¶
-
class
kgx.transformers.logicterm_transformer.
LogicTermTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, output_format: Optional[str] = None, **kwargs: Dict)[source]¶ Bases:
kgx.transformers.transformer.Transformer
TODO
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
static
NxTransformer¶
RdfGraphMixin¶
A mixin for handling operations on RDF-stores.
-
class
kgx.transformers.rdf_graph_mixin.
RdfGraphMixin
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, curie_map: Optional[Dict] = None)[source]¶ Bases:
object
- A mixin that defines the following methods,
load_networkx_graph(): template method that all deriving classes should implement
add_node(): method to add a node from a RDF form to property graph form
add_node_attribute(): method to add a node attribute from a RDF form to property graph form
add_edge(): method to add an edge from a RDF form to property graph form
add_edge_attribute(): method to add an edge attribute from an RDF form to property graph form
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict[source]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict[source]¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict[source]¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict[source]¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element][source]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None[source]¶ This method should be overridden and be implemented by the derived class, and should load all desired nodes and edges from rdflib.Graph into networkx.MultiDiGraph
Its preferred that this method does not use the networkx API directly when adding nodes, edges, and their attributes.
- Instead, Using the following methods,
add_node()
add_node_attribute()
add_edge()
add_edge_attribute()
to ensure that nodes, edges, and their attributes are added in conformance with the BioLink Model, and that URIRef’s are translated into CURIEs or BioLink Model elements whenever appropriate.
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form
kwargs (Dict) – Any additional arguments
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str][source]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict[source]¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
RdfTransformer¶
-
class
kgx.transformers.rdf_transformer.
ObanRdfTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, curie_map: Optional[Dict] = None)[source]¶ Bases:
kgx.transformers.rdf_transformer.RdfTransformer
Transformer that parses a ‘turtle’ file and loads triples, as nodes and edges, into a networkx.MultiDiGraph
This Transformer supports OBAN style of modeling where, - it dereifies OBAN.association triples into a property graph form - it reifies property graph into OBAN.association triples
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
curie_map (Optional[Dict]) – A curie map that maps non-canonical CURIEs to IRIs
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
dereify
(nodes: Set[str]) → None¶ Dereify a set of nodes where each node has all the properties necessary to create an edge.
- Parameters
nodes (Set[str]) – A set of nodes
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export_edges
(reify_all_edges: bool = False) → Iterator¶ Export edges and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).
- Parameters
reify_all_edges (bool) – Whether to reify all edges in the graph
- Returns
An iterator
- Return type
Iterator
-
export_nodes
() → Iterator¶ Export nodes and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).
- Returns
An iterator
- Return type
Iterator
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None¶ Walk through the rdflib.Graph and load all required triples into networkx.MultiDiGraph
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form
kwargs (Dict) – Any additional arguments
-
parse
(filename: str, input_format: Optional[str] = None, compression: Optional[str] = None, provided_by: Optional[str] = None, node_property_predicates: Optional[Set[str]] = None) → None¶ Parse a file, containing triples, into a rdflib.Graph
The file can be either a ‘turtle’ file or any other format supported by rdflib.
- Parameters
filename (Optional[str]) – File to read from.
input_format (Optional[str]) – The input file format. If
None
is provided then the format is guessed usingrdflib.util.guess_format()
compression (Optional[str]) – The compression type. For example,
gz
provided_by (Optional[str]) – Define the source providing the input file.
node_property_predicates (Optional[Set[str]]) – A set of rdflib.URIRef representing predicates that are to be treated as node properties
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
reify
(u: str, v: str, k: str, data: Dict) → Dict¶ Create a node representation of an edge.
- Parameters
u (str) – Subject
v (str) – Object
k (str) – Edge key
data (Dict) – Edge data
- Returns
The reified node
- Return type
Dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, output_format: str = 'turtle', compression: Optional[str] = None, reify_all_edges: bool = False, **kwargs) → None¶ Transform networkx.MultiDiGraph into rdflib.Graph and export this graph as a file (
turtle
, by default).- Parameters
filename (str) – Filename to write to
output_format (str) – The output format; default:
turtle
compression (Optional[str]) – The compression type. Not yet supported.
reify_all_edges (bool) – Whether to reify all edges in the graph
kwargs (Dict) – Any additional arguments
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_predicate_mapping
(m: Dict) → None¶ Set predicate mappings.
Use this method to update predicate mappings for predicates that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
-
set_property_types
(m: Dict) → None¶ Set property types.
Use this method to populate type information for properties that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are property URI and values are the type
-
triple
(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None¶ Parse a triple.
- Parameters
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-
uriref
(identifier: str) → rdflib.term.URIRef¶ Generate a rdflib.URIRef for a given string.
- Parameters
identifier (str) – Identifier as string.
- Returns
URIRef form of the input
identifier
- Return type
rdflib.URIRef
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
class
kgx.transformers.rdf_transformer.
RdfOwlTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, curie_map: Optional[Dict] = None)[source]¶ Bases:
kgx.transformers.rdf_transformer.RdfTransformer
Transformer that parses an OWL ontology.
Note
This is a simple parser that loads direct class-class relationships.
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
curie_map (Optional[Dict]) – A curie map that maps non-canonical CURIEs to IRIs
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
dereify
(nodes: Set[str]) → None¶ Dereify a set of nodes where each node has all the properties necessary to create an edge.
- Parameters
nodes (Set[str]) – A set of nodes
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export_edges
(reify_all_edges: bool = False) → Iterator¶ Export edges and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).
- Parameters
reify_all_edges (bool) – Whether to reify all edges in the graph
- Returns
An iterator
- Return type
Iterator
-
export_nodes
() → Iterator¶ Export nodes and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).
- Returns
An iterator
- Return type
Iterator
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None[source]¶ Walk through the rdflib.Graph and load all triples into networkx.MultiDiGraph
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (Optional[Set[URIRef]]) – A list of rdflib.URIRef representing predicates to be loaded
kwargs (Dict) – Any additional arguments
-
parse
(filename: str, input_format: Optional[str] = None, compression: Optional[str] = None, provided_by: Optional[str] = None, node_property_predicates: Optional[Set[str]] = None) → None[source]¶ Parse an OWL, and load into a rdflib.Graph
- Parameters
filename (str) – File to read from.
input_format (Optional[str]) – The input file format. If
None
is provided then the format is guessed usingrdflib.util.guess_format()
compression (Optional[str]) – The compression type. For example,
gz
provided_by (Optional[str]) – Define the source providing the input file.
node_property_predicates (Optional[Set[str]]) – A set of rdflib.URIRef representing predicates that are to be treated as node properties
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
reify
(u: str, v: str, k: str, data: Dict) → Dict¶ Create a node representation of an edge.
- Parameters
u (str) – Subject
v (str) – Object
k (str) – Edge key
data (Dict) – Edge data
- Returns
The reified node
- Return type
Dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, output_format: str = 'turtle', compression: Optional[str] = None, reify_all_edges: bool = False, **kwargs) → None¶ Transform networkx.MultiDiGraph into rdflib.Graph and export this graph as a file (
turtle
, by default).- Parameters
filename (str) – Filename to write to
output_format (str) – The output format; default:
turtle
compression (Optional[str]) – The compression type. Not yet supported.
reify_all_edges (bool) – Whether to reify all edges in the graph
kwargs (Dict) – Any additional arguments
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_predicate_mapping
(m: Dict) → None¶ Set predicate mappings.
Use this method to update predicate mappings for predicates that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
-
set_property_types
(m: Dict) → None¶ Set property types.
Use this method to populate type information for properties that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are property URI and values are the type
-
triple
(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None¶ Parse a triple.
- Parameters
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-
uriref
(identifier: str) → rdflib.term.URIRef¶ Generate a rdflib.URIRef for a given string.
- Parameters
identifier (str) – Identifier as string.
- Returns
URIRef form of the input
identifier
- Return type
rdflib.URIRef
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
class
kgx.transformers.rdf_transformer.
RdfTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, curie_map: Optional[Dict] = None)[source]¶ Bases:
kgx.transformers.rdf_graph_mixin.RdfGraphMixin
,kgx.transformers.transformer.Transformer
Transformer that parses RDF and loads triples, as nodes and edges, into a networkx.MultiDiGraph
This is the base class which is used to implement other RDF-based transformers.
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
curie_map (Optional[Dict]) – A curie map that maps non-canonical CURIEs to IRIs
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
dereify
(nodes: Set[str]) → None[source]¶ Dereify a set of nodes where each node has all the properties necessary to create an edge.
- Parameters
nodes (Set[str]) – A set of nodes
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export_edges
(reify_all_edges: bool = False) → Iterator[source]¶ Export edges and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).
- Parameters
reify_all_edges (bool) – Whether to reify all edges in the graph
- Returns
An iterator
- Return type
Iterator
-
export_nodes
() → Iterator[source]¶ Export nodes and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).
- Returns
An iterator
- Return type
Iterator
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None[source]¶ Walk through the rdflib.Graph and load all required triples into networkx.MultiDiGraph
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form
kwargs (Dict) – Any additional arguments
-
parse
(filename: str, input_format: Optional[str] = None, compression: Optional[str] = None, provided_by: Optional[str] = None, node_property_predicates: Optional[Set[str]] = None) → None[source]¶ Parse a file, containing triples, into a rdflib.Graph
The file can be either a ‘turtle’ file or any other format supported by rdflib.
- Parameters
filename (Optional[str]) – File to read from.
input_format (Optional[str]) – The input file format. If
None
is provided then the format is guessed usingrdflib.util.guess_format()
compression (Optional[str]) – The compression type. For example,
gz
provided_by (Optional[str]) – Define the source providing the input file.
node_property_predicates (Optional[Set[str]]) – A set of rdflib.URIRef representing predicates that are to be treated as node properties
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
reify
(u: str, v: str, k: str, data: Dict) → Dict[source]¶ Create a node representation of an edge.
- Parameters
u (str) – Subject
v (str) – Object
k (str) – Edge key
data (Dict) – Edge data
- Returns
The reified node
- Return type
Dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, output_format: str = 'turtle', compression: Optional[str] = None, reify_all_edges: bool = False, **kwargs) → None[source]¶ Transform networkx.MultiDiGraph into rdflib.Graph and export this graph as a file (
turtle
, by default).- Parameters
filename (str) – Filename to write to
output_format (str) – The output format; default:
turtle
compression (Optional[str]) – The compression type. Not yet supported.
reify_all_edges (bool) – Whether to reify all edges in the graph
kwargs (Dict) – Any additional arguments
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
set_predicate_mapping
(m: Dict) → None[source]¶ Set predicate mappings.
Use this method to update predicate mappings for predicates that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
-
set_property_types
(m: Dict) → None[source]¶ Set property types.
Use this method to populate type information for properties that are not in Biolink Model.
- Parameters
m (Dict) – A dictionary where the keys are property URI and values are the type
-
triple
(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef) → None[source]¶ Parse a triple.
- Parameters
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-
uriref
(identifier: str) → rdflib.term.URIRef[source]¶ Generate a rdflib.URIRef for a given string.
- Parameters
identifier (str) – Identifier as string.
- Returns
URIRef form of the input
identifier
- Return type
rdflib.URIRef
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
SparqlTransformer¶
-
class
kgx.transformers.sparql_transformer.
MonarchSparqlTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None)[source]¶ Bases:
kgx.transformers.sparql_transformer.SparqlTransformer
TODO
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None¶ Fetch triples from the SPARQL endpoint and load them as edges.
- Parameters
rdfgraph (rdflib.Graph) – A rdflib Graph (unused)
predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form
kwargs (Dict) – Any additional arguments.
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
query
(q: str) → Dict¶ Query a SPARQL endpoint.
- Parameters
q (str) – The query string
- Returns
A dictionary containing results from the query
- Return type
Dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.sparql_transformer.
RedSparqlTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, url: str = 'http://graphdb.dumontierlab.com/repositories/ncats-red-kg')[source]¶ Bases:
kgx.transformers.sparql_transformer.SparqlTransformer
Transformer for communicating with Data2Services Knowledge Graph, a.k.a. Translator Red KG.
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
categorize
() → None[source]¶ Checks for a node’s category property and assigns a category from BioLink Model. TODO: categorize for edges?
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None[source]¶ Fetch all triples using the specified predicates and add them to networkx.MultiDiGraph.
- Parameters
rdfgraph (rdflib.Graph) – A rdflib Graph (unused)
predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form
kwargs (dict) – Any additional arguments. Ex: specifying ‘limit’ argument will limit the number of triples fetched.
-
load_nodes
(node_set: Set) → None[source]¶ Load nodes into networkx.MultiDiGraph.
This method queries the SPARQL endpoint for all triples where nodes in the node_set is a subject.
- Parameters
node_set (list) – A list of node CURIEs
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
query
(q: str) → Dict¶ Query a SPARQL endpoint.
- Parameters
q (str) – The query string
- Returns
A dictionary containing results from the query
- Return type
Dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.sparql_transformer.
SparqlTransformer
(source_graph: Optional[networkx.classes.multidigraph.MultiDiGraph] = None, url: Optional[str] = None)[source]¶ Bases:
kgx.transformers.rdf_graph_mixin.RdfGraphMixin
,kgx.transformers.transformer.Transformer
Transformer for communicating with a SPARQL endpoint.
- Parameters
source_graph (Optional[networkx.MultiDiGraph]) – The source graph
url (Optional[str]) – The URL to a SPARQL endpoint
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None) → Dict¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This method ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns
The edge data
- Return type
Dict
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → Dict¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created.
- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
- Returns
The edge data
- Return type
Dict
-
add_node
(iri: rdflib.term.URIRef, data: Optional[Dict] = None) → Dict¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns
The node data
- Return type
Dict
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List]) → Dict¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created.
- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute
- Returns
The node data
- Return type
Dict
-
static
deserialize
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
get_biolink_element
(predicate: Any) → Optional[biolinkml.meta.Element]¶ Returns a Biolink Model element for a given predicate.
- Parameters
predicate (Any) – The CURIE of a predicate
- Returns
The corresponding Biolink Model element
- Return type
Optional[Element]
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict) → None[source]¶ Fetch triples from the SPARQL endpoint and load them as edges.
- Parameters
rdfgraph (rdflib.Graph) – A rdflib Graph (unused)
predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form
kwargs (Dict) – Any additional arguments.
-
process_predicate
(p: Optional[Union[rdflib.term.URIRef, str]]) → Tuple[str, str, str, str]¶ Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters
p (Optional[Union[URIRef, str]]) – The predicate
- Returns
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type
Tuple[str, str, str, str]
-
query
(q: str) → Dict[source]¶ Query a SPARQL endpoint.
- Parameters
q (str) – The query string
- Returns
A dictionary containing results from the query
- Return type
Dict
-
report
() → None¶ Print a summary report about self.graph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
serialize
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
set_edge_filter
(key: str, value: set) → None¶ Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.
-
set_node_filter
(key: str, value: Union[str, set]) → None¶ Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.- Parameters
key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.
-
update_edge
(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict¶ Update an edge with properties.
- Parameters
subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties
- Returns
The edge data
- Return type
Dict
-
update_node
(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None) → Dict¶ Update a node with properties.
- Parameters
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns
The node data
- Return type
Dict
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict