In-memory immutable graphs on Vineyard¶
Vineyard is a distributed immutable in-memory data manager that is used as the storage backend for immutable graphs in GraphScope. Vineyard provides zero-copy data sharing using memory mapping, and different compute engines in GraphScope can run on the same vineyard cluster to efficiently share the graph data.
Graphs in Vineyard¶
Vineyard supports immutable property graphs and abstracts it as the vineyard::ArrowFragment
class, which consists of a CSR for edges and uses tables to store edge and vertex properties.
Upon the ArrowFragment
, vineyard abstracts distributed graph as vineyard::ArrowFragmentGroup
which consists of a set of fragments that spread across the cluster.
Loading Graphs to Vineyard¶
Vineyard can be deployed as a standalone service or launched along with GraphScope.
A command-line tool vineyard-graph-loader
is provided to load fragments into
vineyard. It first accepts an optional argument --socket <vineyard-ipc-socket>
,
which points the IPC docket that the loader will connect to. If omitted, the value
will be resolved from the environment variable VINEYARD_IPC_SOCKET
. It takes
either a set of command-line arguments or a JSON file as configuration.
$ vineyard-graph-loader --help
Usage: loading vertices and edges as vineyard graph.
- ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] \
<e_label_num> <efiles...> <v_label_num> <vfiles...> \
[directed] [generate_eid] [retain_oid] [string_oid]
- or: ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] --config <config.json>
The config is a json file and should look like
{
"vertices": [
{
"data_path": "....",
"label": "...",
"options": "...."
},
...
],
"edges": [
{
"data_path": "",
"label": "",
"src_label": "",
"dst_label": "",
"options": ""
},
...
],
"directed": 1, # 0 or 1
"generate_eid": 1, # 0 or 1
"retain_oid": 1, # 0 or 1
"string_oid": 0, # 0 or 1
"local_vertex_map": 0 # 0 or 1
}%
Some of the options that specify how the graph will be constructed are:
directed
: whether the graph is a directed graph or undirected graph.generate_eid
: whether to generate a globally unique edge id for each edge.retain_oid
: whether to retain the original vertex id into the final vertex’s property table.string_oid
: whether the vertex id is a string.local_vertex_map
: whether to use local vertex map during the graph construction, which is usually used for optimizing the memory usage.
Using the vineyard-graph-loader
to load the modern graph can be done in the following ways:
using command line arguments
The
vineyard-graph-loader
accepts a sequence of command line arguments to specify the edge files and vertex files, e.g.,$ ./vineyard-graph-loader 2 "modern_graph/knows.csv#header_row=true&src_label=person&dst_label=person&label=knows&delimiter=|" \ "modern_graph/created.csv#header_row=true&src_label=person&dst_label=software&label=created&delimiter=|" \ 2 "modern_graph/person.csv#header_row=true&label=person&delimiter=|" \ "modern_graph/software.csv#header_row=true&label=software&delimiter=|"
using a JSON configuration file
$ ./vineyard-graph-loader --config config.json
The JSON configuration file could be (using the “modern graph” as an example):
{ "vertices": [ { "data_path": "modern_graph/person.csv", "label": "person", "options": "header_row=true&delimiter=|" }, { "data_path": "modern_graph/software.csv", "label": "software", "options": "header_row=true&delimiter=|" } ], "edges": [ { "data_path": "modern_graph/knows.csv", "label": "knows", "src_label": "person", "dst_label": "person", "options": "header_row=true&delimiter=|" }, { "data_path": "modern_graph/created.csv", "label": "created", "src_label": "person", "dst_label": "software", "options": "header_row=true&delimiter=|" } ], "directed": 1, "generate_eid": 1, "string_oid": 0, "local_vertex_map": 0 }
Using Loaded Graphs¶
After being loaded into vineyard, the loaded fragment can be accessed using vineyard’s IPCClient:
void WriteOut(vineyard::Client& client, const grape::CommSpec& comm_spec,
vineyard::ObjectID fragment_group_id) {
LOG(INFO) << "Loaded graph to vineyard: " << fragment_group_id;
std::shared_ptr<vineyard::ArrowFragmentGroup> fg =
std::dynamic_pointer_cast<vineyard::ArrowFragmentGroup>(
client.GetObject(fragment_group_id));
for (const auto& pair : fg->Fragments()) {
LOG(INFO) << "[frag-" << pair.first << "]: " << pair.second;
}
// NB: only retrieve local fragments.
auto locations = fg->FragmentLocations();
for (const auto& pair : fg->Fragments()) {
if (locations.at(pair.first) != client.instance_id()) {
continue;
}
auto frag_id = pair.second;
Traverse(client, frag_id);
}
}
The local fragment can be traversed using the vineyard::ArrowFragment
’s API:
void Traverse(vineyard::Client& client, vineyard::ObjectID frag_id) {
auto frag = std::dynamic_pointer_cast<GraphType>(client.GetObject(frag_id));
LOG(INFO) << "graph total node number: " << frag->GetTotalNodesNum();
LOG(INFO) << "fragment edge number: " << frag->GetEdgeNum();
LOG(INFO) << "fragment in edge number: " << frag->GetInEdgeNum();
LOG(INFO) << "fragment out edge number: " << frag->GetOutEdgeNum();
for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
LOG(INFO) << "vertex table: " << vlabel << " -> "
<< frag->vertex_data_table(vlabel)->schema()->ToString();
}
for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
LOG(INFO) << "edge table: " << elabel << " -> "
<< frag->edge_data_table(elabel)->schema()->ToString();
}
LOG(INFO) << "--------------- consolidate vertex/edge table columns ...";
if (frag->vertex_data_table(0)->columns().size() >= 4) {
for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
LOG(INFO) << "vertex table: " << vlabel << " -> "
<< frag->vertex_data_table(vlabel)->schema()->ToString();
}
}
if (frag->edge_data_table(0)->columns().size() >= 4) {
for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
LOG(INFO) << "edge table: " << elabel << " -> "
<< frag->edge_data_table(elabel)->schema()->ToString();
}
}
}