Tutorial: Run Giraph Applications on GraphScope¶
Apache Giraph is one of the most famous graph computing frameworks, built on top of Apache Hadoop. Through pregel
interface, user can write vertex-centric
graph algorithms.
GraphScope aiming to provide one-stop graph processing framework, including integrating with popular open-source graph computing framework. Actually, Giraph algorithms can be easily run on GraphScope without any adaptation.
Try some example giraph apps¶
We provide some example giraph algorithms, i.e. SSSP, PageRank in grape-demo.jar. You can try to run these Giraph algorithms on GraphScope.
As Giraph
allows user to load graph with customized loader, we support Giraph VertexInputFormat
and Giraph EdgeInputFormat
with session.load_from
method.
vformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PVertexInputFormat"
eformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PEdgeInputFormat"
#clone https://github.com/GraphScope/gstest to GS_TEST_DIR
graph = graphscope_session.load_from(
vertices="/path/to/vertex-input",
vformat=vformat,
edges="/path/to/edge-input",
eformat=eformat,
)
vertices and edges should points to vertex input and edge input. We also provide some example dataset gstest
at GraphScope/gstest.
In this tutorial we will only need p2p
dataset. You can download it by:
wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.e /home/graphscope/p2p-31.e
wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.v /home/graphscope/p2p-31.v
Then you can load graph via graphscope python client, and query the graph with giraph app.
import graphscope
import os
from graphscope.framework.app import load_app
"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')
sess.add_lib("/home/graphscope/grape-demo-0.19.0-shaded.jar")
# Remember to put giraph: before class name.
vformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PVertexInputFormat"
eformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PEdgeInputFormat"
# Replace path p2p.v and p2p.3 with your own path.
graph = sess.load_from(
vertices=os.path.expandvars("/home/graphscope/p2p-31.v"),
vformat=vformat,
edges=os.path.expandvars("/home/graphscope/p2p-31.e"),
eformat=eformat,
)
graph = graph._project_to_simple(v_prop="vdata", e_prop="data")
giraph_sssp = load_app(algo="giraph:com.alibaba.graphscope.example.giraph.SSSP")
ctx = giraph_sssp(graph, sourceId=6)
ctx.to_numpy('r')
Run your own Giraph apps.¶
After a successful running of example giraph SSSP algorithm, you may want to try your own giraph algorithm on GraphScope(which runs much faster then Giraph itself).
Develop Giraph algorithm¶
You can implement your algorithm towards Giraph’ original API. For example, you can use Giraph official example apps.
git clone https://github.com/apache/giraph.git
cd giraph/
mvn package -pl :giraph-examples
Then you could find giraph-examples-1.4.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
in directory giraph-examples/target
.
Although almost all APIs are supported, there are indeed some limitation of Giraph-on-GraphScope.
Currently graph modification API is not supported.
Using of Complex Writable will cause performance degradation.
Submit to GraphScope.¶
The procedure almost the same as above, except that you need to replace the submitted jar, and choose right InputFormat
classes.
import graphscope
"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')
# path to local jar file, will be distributed over cluster
graphscope_session.add_lib("path/to/grape-demo.jar")
vformat = "giraph:${vertex-input-format-class-full-name}"
eformat = "giraph:${edge-input-format-class-full-name}"
#clone https://github.com/GraphScope/gstest to GS_TEST_DIR
graph = graphscope_session.load_from(
vertices=os.path.expandvars("${path-to-vertex-file}"), # path to local vertex file, will be distributed over cluster
vformat=vformat,
edges=os.path.expandvars("${path-to-edge-file}"), # path to local edge file, will be distributed over cluster
eformat=eformat,
)
graph = graph._project_to_simple(v_prop="vdata", e_prop="data")
giraph_sssp = load_app(algo="giraph:${giraph-computation-class-full-name}")
ctx = giraph_sssp(g, "${a=1,b=2...}")