Transformers

Transformers are classes in KGX that allow you to read and write data of a particular form.

Transformer

The base class for all Transformers in KGX.

class kgx.transformers.transformer.Transformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None)[source]

Bases: object

Base class for performing a transformation.

This can be,
  • from a source to an in-memory property graph (kgx.graph.base_graph.BaseGraph)

  • from an in-memory property graph to a target format or database (Neo4j, CSV, RDF Triple Store, TTL)

Parameters

source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

is_empty()bool[source]

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

report()None[source]

Print a summary report about self.graph

set_edge_filter(key: str, value: set)None[source]

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None[source]

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

static validate_edge(edge: dict)dict[source]

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict[source]

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

NeoTransformer

class kgx.transformers.neo_transformer.NeoTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, uri: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None)[source]

Bases: kgx.transformers.transformer.Transformer

Transformer for reading from and writing to a Neo4j database.

Parameters
  • source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

  • uri (Optional[str]) – The Neo4j URI (with port)

  • username (Optional[str]) – The Neo4j username for authentication

  • password (Optional[str]) – The Neo4j password for authentication

count(is_directed: bool = True)int[source]

Get the total count of records to be fetched from the Neo4j database.

Parameters

is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

Returns

The total count of records

Return type

int

static create_constraint_query(category: str)str[source]

Create a Cypher CONSTRAINT query

Parameters

category (str) – The category to create a constraint on

Returns

The Cypher CONSTRAINT query

Return type

str

create_constraints(categories: Union[set, list])None[source]

Create a unique constraint on node ‘id’ for all categories in Neo4j.

Parameters

categories (set) – Set of categories

static generate_unwind_edge_query(edge_predicate: str)str[source]

Generate UNWIND cypher query for saving edges into Neo4j.

Query uses self.DEFAULT_NODE_CATEGORY to quickly lookup the required subject and object node.

Parameters

edge_predicate (str) – Edge label as string

Returns

The UNWIND cypher query

Return type

str

static generate_unwind_node_query(category: str)str[source]

Generate UNWIND cypher query for saving nodes into Neo4j.

There should be a CONSTRAINT in Neo4j for self.DEFAULT_NODE_CATEGORY. The query uses self.DEFAULT_NODE_CATEGORY as the node label to increase speed for adding nodes. The query also sets label to self.DEFAULT_NODE_CATEGORY for any node to make sure that the CONSTRAINT applies.

Parameters

category (str) – Node category

Returns

The UNWIND cypher query

Return type

str

get_edge_filter(key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None)str[source]

Get the value for edge filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters
  • key (str) – Name of the edge filter

  • variable (Optional[str]) – Variable binding for cypher query

  • prefix (Optional[str]) – Prefix for the cypher

  • op (Optional[str]) – The operator

Returns

Value corresponding to the given edge filter key, formatted for CQL

Return type

str

get_edges(skip: int = 0, limit: int = 0, is_directed: bool = True)List[source]

Get a page of edges from the Neo4j database.

Parameters
  • skip (int) – Records to skip

  • limit (int) – Total number of records to query for

  • is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

Returns

A list of 3-tuples

Return type

list

get_node_filter(key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None)str[source]

Get the value for node filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters
  • key (str) – Name of the node filter

  • variable (Optional[str]) – Variable binding for cypher query

  • prefix (Optional[str]) – Prefix for the cypher

  • op (Optional[str]) – The operator

Returns

Value corresponding to the given node filter key, formatted for CQL

Return type

str

get_nodes(skip: int = 0, limit: int = 0)List[source]

Get a page of nodes from the Neo4j database.

Parameters
  • skip (int) – Records to skip

  • limit (int) – Total number of records to query for

Returns

A list of nodes

Return type

list

get_pages(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Any)Iterator[source]

Get pages of size page_size from Neo4j. Returns an iterator of pages where number of pages is (end - start)/page_size

Parameters
  • query_function (func) – The function to use to fetch records. Usually this is self.get_nodes or self.get_edges

  • start (int) – Start for pagination

  • end (Optional[int]) – End for pagination

  • page_size (int) – Size of each page (10000, by default)

  • kwargs (Dict) – Any additional arguments that might be relevant for query_function

Returns

An iterator for a list of records from Neo4j. The size of the list is page_size

Return type

Iterator

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load(start: int = 0, end: Optional[int] = None, is_directed: bool = True, page_size: int = 50000, provided_by: Optional[str] = None)None[source]

Read nodes and edges from a Neo4j database and populate an instance of BaseGraph

Parameters
  • start (int) – Start for pagination

  • end (Optional[int]) – End for pagination

  • is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

  • page_size (int) – Size of page (or chunk) to fetch from Neo4j

  • provided_by (Optional[str]) – Define the source providing the data

load_edge(edge_record: List)None[source]

Load an edge into an instance of BaseGraph

Parameters

edge_record (List) – A 3-tuple edge record

load_edges(edges: List)None[source]

Load edges into an instance of BaseGraph

Parameters

edges (List) – A list of edge records

load_node(node: Dict)None[source]

Load node into an instance of BaseGraph

Parameters

node (Dict) – A node

load_nodes(nodes: List)None[source]

Load nodes into an instance of BaseGraph

Parameters

nodes (List) – A list of nodes

neo4j_report()None[source]

Give a summary on the number of nodes and edges in the Neo4j database.

report()None

Print a summary report about self.graph

static sanitize_category(category: List)List[source]

Sanitize category for use in UNWIND cypher clause. This method adds escape characters to each element in category list to ensure the category is processed correctly.

Parameters

category (List) – Category

Returns

Sanitized category list

Return type

List

save()None[source]

Save all nodes and edges from an instance of BaseGraph into Neo4j using the UNWIND cypher clause.

save_edge(edges_by_edge_predicate: Dict[str, list], batch_size: int = 10000)None[source]

Save all edges into Neo4j using the UNWIND cypher clause.

Parameters
  • edges_by_edge_predicate (dict) – A dictionary where edge label is the key and the value is a list of edges with that edge label

  • batch_size (int) – Size of batch per transaction (default: 10000)

save_node(nodes_by_category: Dict[str, list], batch_size: int = 10000)None[source]

Save all nodes into Neo4j using the UNWIND cypher clause.

Parameters
  • nodes_by_category (Dict[str, list]) – A dictionary where node category is the key and the value is a list of nodes of that category

  • batch_size (int) – Size of batch per transaction (default: 10000)

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

PandasTransformer

class kgx.transformers.pandas_transformer.PandasTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None)[source]

Bases: kgx.transformers.transformer.Transformer

Transformer that parses a TSV/CSV, and loads nodes and edges into an instance of kgx.graph.base_graph.BaseGraph

Parameters

source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

check_edge_filter(edge: Dict)bool[source]

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict)bool[source]

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

export_edges(filename: str, delimiter: str)None[source]

Export edges from an instance of BaseGraph

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_neo4j_edges(filename: str, delimiter: str)None[source]

Export edges from an instance of BaseGraph in Neo4j compatible format. This format is meant for use with the neo4j-admin import tool.

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_neo4j_nodes(filename: str, delimiter: str)None[source]

Export nodes from an instance of BaseGraph in Neo4j compatible format. This format is meant for use with the neo4j-admin import tool.

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_nodes(filename: str, delimiter: str)None[source]

Export nodes from an instance of BaseGraph

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

static get_all_edge_properties(graph: kgx.graph.base_graph.BaseGraph)Set[source]

Given a graph, get all possible property names for edges.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – A graph

Returns

A set of edge properties

Return type

Set

static get_all_node_properties(graph: kgx.graph.base_graph.BaseGraph)Set[source]

Given a graph, get all possible property names for nodes.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – A graph

Returns

A set of node properties

Return type

Set

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

static is_null(item: Any)bool[source]

Checks if a given item is null or correspond to null.

This method checks for: None, numpy.nan, pandas.NA, pandas.NaT, “”, and ” “

Parameters

item (Any) – The item to check

Returns

Whether the given item is null or not

Return type

bool

load_edge(edge: Dict)None[source]

Load an edge into an instance of BaseGraph

Parameters

edge (Dict) – An edge

load_edges(df: pandas.core.frame.DataFrame)None[source]

Load edges from pandas.DataFrame into an instance of BaseGraph

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent edges

load_node(node: Dict)None[source]

Load a node into an instance of BaseGraph

Parameters

node (Dict) – A node

load_nodes(df: pandas.core.frame.DataFrame)None[source]

Load nodes from pandas.DataFrame into an instance of BaseGraph

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent nodes

parse(filename: str, input_format: str = 'tsv', compression: Optional[str] = None, provided_by: Optional[str] = None, **kwargs: Dict)None[source]

Parse a CSV/TSV (or plain text) file.

The file can represent either nodes (nodes.tsv) or edges (edges.tsv) or both (data.tar), where the tar archive contains nodes.tsv and edges.tsv

The file can also be data.tar.gz or data.tar.bz2

Parameters
  • filename (str) – File to read from

  • input_format (str) – The input file format (tsv, by default)

  • compression (Optional[str]) – The compression. For example, tar

  • provided_by (Optional[str]) – Define the source providing the input file

  • kwargs (Dict) – Any additional arguments

report()None

Print a summary report about self.graph

save(filename: str, output_format: str = 'tsv', compression: Optional[str] = None, **kwargs: Dict)str[source]

Writes two files representing the node set and edge set of an instance of BaseGraph and add them to a .tar archive.

..note::

If your node/edge properties are likely to contain commas then it is recommended to export to a TSV format instead of CSV.

Parameters
  • filename (str) – Name of tar archive file to create

  • output_format (str) – The output file format (tsv, by default)

  • compression (Optional[str]) – The compression. For example, tar

  • kwargs (Dict) – Any additional arguments

Returns

The filename

Return type

str

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

JsonTransformer

class kgx.transformers.json_transformer.JsonTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None)[source]

Bases: kgx.transformers.pandas_transformer.PandasTransformer

Transformer that parses a JSON, and loads nodes and edges into an instance of BaseGraph

Parameters

source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

check_edge_filter(edge: Dict)bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict)bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

export()Dict[source]

Export an instance of BaseGraph as a dictionary.

Returns

A dictionary with a list nodes and a list of edges

Return type

dict

export_edges(filename: str, delimiter: str)None

Export edges from an instance of BaseGraph

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_neo4j_edges(filename: str, delimiter: str)None

Export edges from an instance of BaseGraph in Neo4j compatible format. This format is meant for use with the neo4j-admin import tool.

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_neo4j_nodes(filename: str, delimiter: str)None

Export nodes from an instance of BaseGraph in Neo4j compatible format. This format is meant for use with the neo4j-admin import tool.

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_nodes(filename: str, delimiter: str)None

Export nodes from an instance of BaseGraph

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

static get_all_edge_properties(graph: kgx.graph.base_graph.BaseGraph)Set

Given a graph, get all possible property names for edges.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – A graph

Returns

A set of edge properties

Return type

Set

static get_all_node_properties(graph: kgx.graph.base_graph.BaseGraph)Set

Given a graph, get all possible property names for nodes.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – A graph

Returns

A set of node properties

Return type

Set

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

static is_null(item: Any)bool

Checks if a given item is null or correspond to null.

This method checks for: None, numpy.nan, pandas.NA, pandas.NaT, “”, and ” “

Parameters

item (Any) – The item to check

Returns

Whether the given item is null or not

Return type

bool

load(obj: Dict[str, Any])None[source]

Load a JSON object, containing nodes and edges, into an instance of BaseGraph

Parameters

obj (Dict[str, Any]) – JSON Object with all nodes and edges

load_edge(edge: Dict)None

Load an edge into an instance of BaseGraph

Parameters

edge (Dict) – An edge

load_edges(edges: List[Dict])None[source]

Load a list of edges into an instance of BaseGraph

Parameters

edges (list) – List of edges

load_node(node: Dict)None

Load a node into an instance of BaseGraph

Parameters

node (Dict) – A node

load_nodes(nodes: List[Dict])None[source]

Load a list of nodes into an instance of BaseGraph

Parameters

nodes (list) – List of nodes

parse(filename: str, input_format: str = 'json', compression: Optional[str] = None, provided_by: Optional[str] = None, **kwargs)None[source]

Parse a JSON file of the format,

{

“nodes” : […], “edges” : […],

}

Parameters
  • filename (str) – JSON file to read from

  • input_format (str) – The input file format (json, by default)

  • compression (Optional[str]) – The compression type. For example, gz

  • provided_by (Optional[str]) – Define the source providing the input file

  • kwargs (dict) – Any additional arguments

report()None

Print a summary report about self.graph

save(filename: str, output_format: str = 'json', compression: Optional[str] = None, **kwargs)str[source]

Write an instance of BaseGraph to a file as JSON.

Parameters
  • filename (str) – Filename to write to

  • output_format (str) – The output file format (json, by default)

  • compression (Optional[str]) – The compression type. For example, gz

  • kwargs (dict) – Any additional arguments

Returns

The filename

Return type

str

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.json_transformer.ObographJsonTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None)[source]

Bases: kgx.transformers.json_transformer.JsonTransformer

Transformer that parses an Obograph JSON, and loads nodes and edges into an instance of BaseGraph

check_edge_filter(edge: Dict)bool

Check if an edge passes defined edge filters.

Parameters

edge (Dict) – An edge

Returns

Whether the given edge has passed all defined edge filters

Return type

bool

check_node_filter(node: Dict)bool

Check if a node passes defined node filters.

Parameters

node (Dict) – A node

Returns

Whether the given node has passed all defined node filters

Return type

bool

export()Dict

Export an instance of BaseGraph as a dictionary.

Returns

A dictionary with a list nodes and a list of edges

Return type

dict

export_edges(filename: str, delimiter: str)None

Export edges from an instance of BaseGraph

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_neo4j_edges(filename: str, delimiter: str)None

Export edges from an instance of BaseGraph in Neo4j compatible format. This format is meant for use with the neo4j-admin import tool.

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_neo4j_nodes(filename: str, delimiter: str)None

Export nodes from an instance of BaseGraph in Neo4j compatible format. This format is meant for use with the neo4j-admin import tool.

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

export_nodes(filename: str, delimiter: str)None

Export nodes from an instance of BaseGraph

Parameters
  • filename (str) – The filename

  • delimiter (str) – The delimiter to use as a separator

static get_all_edge_properties(graph: kgx.graph.base_graph.BaseGraph)Set

Given a graph, get all possible property names for edges.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – A graph

Returns

A set of edge properties

Return type

Set

static get_all_node_properties(graph: kgx.graph.base_graph.BaseGraph)Set

Given a graph, get all possible property names for nodes.

Parameters

graph (kgx.graph.base_graph.BaseGraph) – A graph

Returns

A set of node properties

Return type

Set

get_category(curie: str, node: dict)Optional[str][source]

Get category for a given CURIE.

Parameters
  • curie (str) – Curie for node

  • node (dict) – Node data

Returns

Category for the given node CURIE.

Return type

Optional[str]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

static is_null(item: Any)bool

Checks if a given item is null or correspond to null.

This method checks for: None, numpy.nan, pandas.NA, pandas.NaT, “”, and ” “

Parameters

item (Any) – The item to check

Returns

Whether the given item is null or not

Return type

bool

load(obj: Dict[str, Any])None

Load a JSON object, containing nodes and edges, into an instance of BaseGraph

Parameters

obj (Dict[str, Any]) – JSON Object with all nodes and edges

load_edge(edge: dict)None[source]

Load an edge from Obograph JSON into an instance of BaseGraph

Parameters

edge (dict) – An edge

load_edges(edges: List[Dict])None

Load a list of edges into an instance of BaseGraph

Parameters

edges (list) – List of edges

load_node(node: Dict)None[source]

Load a node into an instance of BaseGraph

Parameters

node (dict) – A node

load_nodes(nodes: List[Dict])None

Load a list of nodes into an instance of BaseGraph

Parameters

nodes (list) – List of nodes

parse(filename: str, input_format: str = 'json', compression: Optional[str] = None, provided_by: Optional[str] = None, **kwargs)None[source]

Parse Obograph JSON file of the format,

{
“graphs”: [
{
“nodes”[
{

“id” : “UBERON:0002102”, “lbl” : “forelimb”

}, {

“id” : “UBERON:0002101”, “lbl” : “limb”

}

], “edges” : [

{

“subj” : “UBERON:0002102”, “pred” : “is_a”, “obj” : “UBERON:0002101”

}

]

}

]

}

Parameters
  • filename (str) – JSON file to read from

  • input_format (str) – The input file format (json, by default)

  • compression (Optional[str]) – The compression type. For example, gz

  • provided_by (Optional[str]) – Define the source providing the input file

  • kwargs (dict) – Any additional arguments

parse_meta(node: str, meta: dict)[source]

Parse ‘meta’ field of a node.

Parameters
  • node (str) – Node identifier

  • meta (dict) – meta dictionary for the node

Returns

A dictionary that contains ‘description’, ‘synonyms’, ‘xrefs’, and ‘equivalent_nodes’.

Return type

dict

report()None

Print a summary report about self.graph

save(filename: str, output_format: str = 'json', compression: Optional[str] = None, **kwargs)str

Write an instance of BaseGraph to a file as JSON.

Parameters
  • filename (str) – Filename to write to

  • output_format (str) – The output file format (json, by default)

  • compression (Optional[str]) – The compression type. For example, gz

  • kwargs (dict) – Any additional arguments

Returns

The filename

Return type

str

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

LogicTermTransformer

class kgx.transformers.logicterm_transformer.LogicTermTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, output_format: Optional[str] = None, **kwargs: Dict)[source]

Bases: kgx.transformers.transformer.Transformer

TODO

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

report()None

Print a summary report about self.graph

save(filename: str, output_format='sxpr', zipmode='w', **kwargs)[source]
set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

NxTransformer

RdfGraphMixin

A mixin for handling operations on RDF-stores.

class kgx.transformers.rdf_graph_mixin.RdfGraphMixin(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, curie_map: Optional[Dict] = None)[source]

Bases: object

A mixin that defines the following methods,
  • load_graph(): template method that all deriving classes should implement

  • add_node(): method to add a node from a RDF form to property graph form

  • add_node_attribute(): method to add a node attribute from a RDF form to property graph form

  • add_edge(): method to add an edge from a RDF form to property graph form

  • add_edge_attribute(): method to add an edge attribute from an RDF form to property graph form

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict[source]

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict[source]

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict[source]

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict[source]

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None[source]

This method should be overridden and be implemented by the derived class, and should load all desired nodes and edges from rdflib.Graph into an instance of BaseGraph.

Its preferred that this method does not use the BaseGraph API directly when adding nodes, edges, and their attributes.

Instead, Using the following methods,
  • add_node()

  • add_node_attribute()

  • add_edge()

  • add_edge_attribute()

to ensure that nodes, edges, and their attributes are added in conformance with the BioLink Model, and that URIRef’s are translated into CURIEs or BioLink Model elements whenever appropriate.

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form

  • kwargs (Dict) – Any additional arguments

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str][source]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict[source]

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict[source]

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

RdfTransformer

class kgx.transformers.rdf_transformer.ObanRdfTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, curie_map: Optional[Dict] = None)[source]

Bases: kgx.transformers.rdf_transformer.RdfTransformer

Transformer that parses a ‘turtle’ file and loads triples, as nodes and edges, into an instance of BaseGraph

This Transformer supports OBAN style of modeling where, - it dereifies OBAN.association triples into a property graph form - it reifies property graph into OBAN.association triples

Parameters
  • source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

  • curie_map (Optional[Dict]) – A curie map that maps non-canonical CURIEs to IRIs

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

dereify(nodes: Set[str])None

Dereify a set of nodes where each node has all the properties necessary to create an edge.

Parameters

nodes (Set[str]) – A set of nodes

export_edges(reify_all_edges: bool = False)Iterator

Export edges and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).

Parameters

reify_all_edges (bool) – Whether to reify all edges in the graph

Returns

An iterator

Return type

Iterator

export_nodes()Iterator

Export nodes and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).

Returns

An iterator

Return type

Iterator

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None

Walk through the rdflib.Graph and load all required triples into an instance of BaseGraph

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form

  • kwargs (Dict) – Any additional arguments

parse(filename: str, input_format: Optional[str] = None, compression: Optional[str] = None, provided_by: Optional[str] = None, node_property_predicates: Optional[Set[str]] = None)None

Parse a file, containing triples, into a rdflib.Graph

The file can be either a ‘turtle’ file or any other format supported by rdflib.

Parameters
  • filename (Optional[str]) – File to read from.

  • input_format (Optional[str]) – The input file format. If None is provided then the format is guessed using rdflib.util.guess_format()

  • compression (Optional[str]) – The compression type. For example, gz

  • provided_by (Optional[str]) – Define the source providing the input file.

  • node_property_predicates (Optional[Set[str]]) – A set of rdflib.URIRef representing predicates that are to be treated as node properties

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

reify(u: str, v: str, k: str, data: Dict)Dict

Create a node representation of an edge.

Parameters
  • u (str) – Subject

  • v (str) – Object

  • k (str) – Edge key

  • data (Dict) – Edge data

Returns

The reified node

Return type

Dict

report()None

Print a summary report about self.graph

save(filename: str, output_format: str = 'turtle', compression: Optional[str] = None, reify_all_edges: bool = False, **kwargs)None

Transform an instance of BaseGraph into rdflib.Graph and export this graph as a file (turtle, by default).

Parameters
  • filename (str) – Filename to write to

  • output_format (str) – The output format; default: turtle

  • compression (Optional[str]) – The compression type. Not yet supported.

  • reify_all_edges (bool) – Whether to reify all edges in the graph

  • kwargs (Dict) – Any additional arguments

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_predicate_mapping(m: Dict)None

Set predicate mappings.

Use this method to update predicate mappings for predicates that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_property_types(m: Dict)None

Set property types.

Use this method to populate type information for properties that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are property URI and values are the type

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef)None

Parse a triple.

Parameters
  • s (URIRef) – Subject

  • p (URIRef) – Predicate

  • o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

uriref(identifier: str)rdflib.term.URIRef

Generate a rdflib.URIRef for a given string.

Parameters

identifier (str) – Identifier as string.

Returns

URIRef form of the input identifier

Return type

rdflib.URIRef

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.rdf_transformer.RdfOwlTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, curie_map: Optional[Dict] = None)[source]

Bases: kgx.transformers.rdf_transformer.RdfTransformer

Transformer that parses an OWL ontology.

Note

This is a simple parser that loads direct class-class relationships.

Parameters
  • source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

  • curie_map (Optional[Dict]) – A curie map that maps non-canonical CURIEs to IRIs

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

dereify(nodes: Set[str])None

Dereify a set of nodes where each node has all the properties necessary to create an edge.

Parameters

nodes (Set[str]) – A set of nodes

export_edges(reify_all_edges: bool = False)Iterator

Export edges and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).

Parameters

reify_all_edges (bool) – Whether to reify all edges in the graph

Returns

An iterator

Return type

Iterator

export_nodes()Iterator

Export nodes and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).

Returns

An iterator

Return type

Iterator

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None[source]

Walk through the rdflib.Graph and load all triples into kgx.graph.base_graph.BaseGraph

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (Optional[Set[URIRef]]) – A list of rdflib.URIRef representing predicates to be loaded

  • kwargs (Dict) – Any additional arguments

parse(filename: str, input_format: Optional[str] = None, compression: Optional[str] = None, provided_by: Optional[str] = None, node_property_predicates: Optional[Set[str]] = None)None[source]

Parse an OWL, and load into a rdflib.Graph

Parameters
  • filename (str) – File to read from.

  • input_format (Optional[str]) – The input file format. If None is provided then the format is guessed using rdflib.util.guess_format()

  • compression (Optional[str]) – The compression type. For example, gz

  • provided_by (Optional[str]) – Define the source providing the input file.

  • node_property_predicates (Optional[Set[str]]) – A set of rdflib.URIRef representing predicates that are to be treated as node properties

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

reify(u: str, v: str, k: str, data: Dict)Dict

Create a node representation of an edge.

Parameters
  • u (str) – Subject

  • v (str) – Object

  • k (str) – Edge key

  • data (Dict) – Edge data

Returns

The reified node

Return type

Dict

report()None

Print a summary report about self.graph

save(filename: str, output_format: str = 'turtle', compression: Optional[str] = None, reify_all_edges: bool = False, **kwargs)None

Transform an instance of BaseGraph into rdflib.Graph and export this graph as a file (turtle, by default).

Parameters
  • filename (str) – Filename to write to

  • output_format (str) – The output format; default: turtle

  • compression (Optional[str]) – The compression type. Not yet supported.

  • reify_all_edges (bool) – Whether to reify all edges in the graph

  • kwargs (Dict) – Any additional arguments

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_predicate_mapping(m: Dict)None

Set predicate mappings.

Use this method to update predicate mappings for predicates that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_property_types(m: Dict)None

Set property types.

Use this method to populate type information for properties that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are property URI and values are the type

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef)None

Parse a triple.

Parameters
  • s (URIRef) – Subject

  • p (URIRef) – Predicate

  • o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

uriref(identifier: str)rdflib.term.URIRef

Generate a rdflib.URIRef for a given string.

Parameters

identifier (str) – Identifier as string.

Returns

URIRef form of the input identifier

Return type

rdflib.URIRef

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.rdf_transformer.RdfTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, curie_map: Optional[Dict] = None)[source]

Bases: kgx.transformers.rdf_graph_mixin.RdfGraphMixin, kgx.transformers.transformer.Transformer

Transformer that parses RDF and loads triples, as nodes and edges, into an instance of BaseGraph.

This is the base class which is used to implement other RDF-based transformers.

Parameters
  • source_graph (Optional[kgx/transformers/rdf_transformer.py:34]) – The source graph

  • curie_map (Optional[Dict]) – A curie map that maps non-canonical CURIEs to IRIs

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

dereify(nodes: Set[str])None[source]

Dereify a set of nodes where each node has all the properties necessary to create an edge.

Parameters

nodes (Set[str]) – A set of nodes

export_edges(reify_all_edges: bool = False)Iterator[source]

Export edges and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).

Parameters

reify_all_edges (bool) – Whether to reify all edges in the graph

Returns

An iterator

Return type

Iterator

export_nodes()Iterator[source]

Export nodes and its attributes as triples. This methods yields a 3-tuple of (subject, predicate, object).

Returns

An iterator

Return type

Iterator

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None[source]

Walk through the rdflib.Graph and load all required triples into an instance of BaseGraph

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form

  • kwargs (Dict) – Any additional arguments

parse(filename: str, input_format: Optional[str] = None, compression: Optional[str] = None, provided_by: Optional[str] = None, node_property_predicates: Optional[Set[str]] = None)None[source]

Parse a file, containing triples, into a rdflib.Graph

The file can be either a ‘turtle’ file or any other format supported by rdflib.

Parameters
  • filename (Optional[str]) – File to read from.

  • input_format (Optional[str]) – The input file format. If None is provided then the format is guessed using rdflib.util.guess_format()

  • compression (Optional[str]) – The compression type. For example, gz

  • provided_by (Optional[str]) – Define the source providing the input file.

  • node_property_predicates (Optional[Set[str]]) – A set of rdflib.URIRef representing predicates that are to be treated as node properties

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

reify(u: str, v: str, k: str, data: Dict)Dict[source]

Create a node representation of an edge.

Parameters
  • u (str) – Subject

  • v (str) – Object

  • k (str) – Edge key

  • data (Dict) – Edge data

Returns

The reified node

Return type

Dict

report()None

Print a summary report about self.graph

save(filename: str, output_format: str = 'turtle', compression: Optional[str] = None, reify_all_edges: bool = False, **kwargs)None[source]

Transform an instance of BaseGraph into rdflib.Graph and export this graph as a file (turtle, by default).

Parameters
  • filename (str) – Filename to write to

  • output_format (str) – The output format; default: turtle

  • compression (Optional[str]) – The compression type. Not yet supported.

  • reify_all_edges (bool) – Whether to reify all edges in the graph

  • kwargs (Dict) – Any additional arguments

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_predicate_mapping(m: Dict)None[source]

Set predicate mappings.

Use this method to update predicate mappings for predicates that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_property_types(m: Dict)None[source]

Set property types.

Use this method to populate type information for properties that are not in Biolink Model.

Parameters

m (Dict) – A dictionary where the keys are property URI and values are the type

triple(s: rdflib.term.URIRef, p: rdflib.term.URIRef, o: rdflib.term.URIRef)None[source]

Parse a triple.

Parameters
  • s (URIRef) – Subject

  • p (URIRef) – Predicate

  • o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

uriref(identifier: str)rdflib.term.URIRef[source]

Generate a rdflib.URIRef for a given string.

Parameters

identifier (str) – Identifier as string.

Returns

URIRef form of the input identifier

Return type

rdflib.URIRef

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

SparqlTransformer

class kgx.transformers.sparql_transformer.MonarchSparqlTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None)[source]

Bases: kgx.transformers.sparql_transformer.SparqlTransformer

TODO

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None

Fetch triples from the SPARQL endpoint and load them as edges.

Parameters
  • rdfgraph (rdflib.Graph) – A rdflib Graph (unused)

  • predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form

  • kwargs (Dict) – Any additional arguments.

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

query(q: str)Dict

Query a SPARQL endpoint.

Parameters

q (str) – The query string

Returns

A dictionary containing results from the query

Return type

Dict

report()None

Print a summary report about self.graph

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.sparql_transformer.RedSparqlTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, url: str = 'http://graphdb.dumontierlab.com/repositories/ncats-red-kg')[source]

Bases: kgx.transformers.sparql_transformer.SparqlTransformer

Transformer for communicating with Data2Services Knowledge Graph, a.k.a. Translator Red KG.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

categorize()None[source]

Checks for a node’s category property and assigns a category from BioLink Model. TODO: categorize for edges?

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None[source]

Fetch all triples using the specified predicates and add them to an instance of BaseGraph.

Parameters
  • rdfgraph (rdflib.Graph) – A rdflib Graph (unused)

  • predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form

  • kwargs (dict) – Any additional arguments. Ex: specifying ‘limit’ argument will limit the number of triples fetched.

load_nodes(node_set: Set)None[source]

Load nodes into an instance of BaseGraph.

This method queries the SPARQL endpoint for all triples where nodes in the node_set is a subject.

Parameters

node_set (list) – A list of node CURIEs

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

query(q: str)Dict

Query a SPARQL endpoint.

Parameters

q (str) – The query string

Returns

A dictionary containing results from the query

Return type

Dict

report()None

Print a summary report about self.graph

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.sparql_transformer.SparqlTransformer(source_graph: Optional[kgx.graph.base_graph.BaseGraph] = None, url: Optional[str] = None)[source]

Bases: kgx.transformers.rdf_graph_mixin.RdfGraphMixin, kgx.transformers.transformer.Transformer

Transformer for communicating with a SPARQL endpoint.

Parameters
  • source_graph (Optional[kgx.graph.base_graph.BaseGraph]) – The source graph

  • url (Optional[str]) – The URL to a SPARQL endpoint

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, data: Optional[Dict[Any, Any]] = None)Dict

This method should be used by all derived classes when adding an edge to the kgx.graph.base_graph.BaseGraph. This method ensures that the subject and object identifiers are CURIEs, and that predicate is in the correct form.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

  • data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns

The edge data

Return type

Dict

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str)Dict

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

Returns

The edge data

Return type

Dict

add_node(iri: rdflib.term.URIRef, data: Optional[Dict] = None)Dict

This method should be used by all derived classes when adding a node to the kgx.graph.base_graph.BaseGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the kgx.graph.base_graph.BaseGraph

Parameters
  • iri (rdflib.URIRef) – IRI of a node

  • data (Optional[Dict]) – Additional node properties

Returns

The node data

Return type

Dict

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: Union[str, List])Dict

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (Union[str, List]) – The value of the attribute

Returns

The node data

Return type

Dict

Returns a Biolink Model element for a given predicate.

Parameters

predicate (Any) – The CURIE of a predicate

Returns

The corresponding Biolink Model element

Return type

Optional[Element]

is_empty()bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_graph(rdfgraph: rdflib.graph.Graph, predicates: Optional[Set[rdflib.term.URIRef]] = None, **kwargs: Dict)None[source]

Fetch triples from the SPARQL endpoint and load them as edges.

Parameters
  • rdfgraph (rdflib.Graph) – A rdflib Graph (unused)

  • predicates (Optional[Set[URIRef]]) – A set containing predicates in rdflib.URIRef form

  • kwargs (Dict) – Any additional arguments.

process_predicate(p: Optional[Union[rdflib.term.URIRef, str]])Tuple[str, str, str, str]

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters

p (Optional[Union[URIRef, str]]) – The predicate

Returns

A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p

Return type

Tuple[str, str, str, str]

query(q: str)Dict[source]

Query a SPARQL endpoint.

Parameters

q (str) – The query string

Returns

A dictionary containing results from the query

Return type

Dict

report()None

Print a summary report about self.graph

set_edge_filter(key: str, value: set)None

Set an edge filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching edges from a source.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for edge filter

  • value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_node_filter(key: str, value: Union[str, set])None

Set a node filter, as defined by a key and value pair. These filters are used to create a subgraph or reduce the search space when fetching nodes from a source.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters
  • key (str) – The key for node filter

  • value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]])Dict

Update an edge with properties.

Parameters
  • subject_curie (str) – Subject CURIE

  • object_curie (str) – Object CURIE

  • edge_key (str) – Edge key

  • data (Optional[Dict[Any, Any]]) – Edge properties

Returns

The edge data

Return type

Dict

update_node(n: Union[rdflib.term.URIRef, str], data: Optional[Dict] = None)Dict

Update a node with properties.

Parameters
  • n (Union[URIRef, str]) – Node identifier

  • data (Optional[Dict]) – Node properties

Returns

The node data

Return type

Dict

static validate_edge(edge: dict)dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict)dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict