
A Source can be implemented for any file, local, and/or remote store that can contains a graph. A Source is responsible for reading nodes and edges from the graph.

A source must subclass kgx.source.source.Source class and must implement the following methods:

  • parse

  • read_nodes

  • read_edges

parse method

  • Responsible for parsing a graph from a file/store

  • Must return a generator that iterates over list of node and edge records from the graph

read_nodes method

  • Responsible for reading nodes from the file/store

  • Must return a generator that iterates over list of node records

  • Each node record must be a 2-tuple (node_id, node_data) where,

    • node_id is the node CURIE

    • node_data is a dictionary that represents the node properties

read_edges method

  • Responsible for reading edges from the file/store

  • Must return a generator that iterates over list of edge records

  • Each edge record must be a 4-tuple (subject_id, object_id, edge_key, edge_data) where,

    • subject_id is the subject node CURIE

    • object_id is the object node CURIE

    • edge_key is the unique key for the edge

    • edge_data is a dictionary that represents the edge properties


Base class for all Sources in KGX.


GraphSource is responsible for reading from an instance of kgx.graph.base_graph.BaseGraph and must use only the methods exposed by BaseGraph to access the graph.


TsvSource is responsible for reading from KGX formatted CSV or TSV using Pandas where every flat file is treated as a Pandas DataFrame and from which data are read in chunks.

KGX expects two separate files - one for nodes and another for edges.


JsonSource is responsible for reading data from a KGX formatted JSON using the ijson library, which allows for streaming data from the file.


JsonlSource is responsible for reading data from a KGX formatted JSON Lines using the jsonlines library.

KGX expects two separate JSON Lines files - one for nodes and another for edges.


TrapiSource is responsible for reading data from a Translator Reasoner API formatted JSON.


ObographSource is responsible for reading data from OBOGraphs in JSON.


SssomSource is responsible for reading data from an SSSOM formatted files.


NeoSource is responsible for reading data from a local or remote Neo4j instance.


RdfSource is responsible for reading data from RDF N-Triples.

This source makes use of a custom kgx.parsers.ntriples_parser.CustomNTriplesParser for parsing N-Triples, which extends rdflib.plugins.parsers.ntriples.NTriplesParser.

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

sort -k 1,2 -t ' ' data.nt > data_sorted.nt


OwlSource is responsible for parsing an OWL ontology.

When parsing an OWL, this source also adds OwlStar annotations to certain OWL axioms.


SparqlSource has yet to be implemented.

In principle, SparqlSource should be able to read data from a local or remote SPARQL endpoint.