Frequently Asked Questions¶
If you don’t find an answer to your question here, feel free to file an Issues or post it to Discussions.
What are the minimum resources and system requirements required to run GraphScope?
To use GraphScope Python interface, Python >= 3.7 and pip >= 19.0 is required. GraphScope engine can be deployed in standalone mode or distributed mode. For standalone deployment, the minimum requirement is 4 cores CPU and 8G memory.
GraphScope is tested and supported on the following systems:
CentOS 7+
Ubuntu 18.04+
macOS 10.15+
For distributed deployment, a cluster managed by Kubernetes is required. GraphScope has been tested on Kubernetes version >= v1.12.0+.
Is Kubernetes an essential to use GraphScope?
No. GraphScope supports run in standalone mode on a single machine. GraphScope pre-compiled package is distributed as a python package and can be easily installed with pip: pip3 install graphscope.
How to debug or get detailed information when using GraphScope?
By default, GraphScope is usually running in a silent mode following the convention of Python applications. To enable verbose logging, turn on it by this:
graphscope.set_option(show_log=True)
If you are running GraphScope in k8s, you can use kubectl describe/logs to check the log/status of the pods of GraphScope. If the disk storage is accessible(on local or via Pods), you may also find logs in /var/log/graphscope/current or $HOME/.local/log/graphscope.
Why I find more Pods than expected with command kubectl get pod?
For the failed Pods, you may need to delete them manually by kubectl delete pod <pod-names> This case is observed when using GraphScope with helm. If users did not correctly set the role and rolebinding, the command helm uninstall <release-name> may not correctly recycle allocated resources. More details please refer to Helm Support.
Is GraphScope a graph database?
GraphScope is not a graph database, however there is a persistent storage component that can serve as database inside GraphScope called graphscope-store.
What’s the compatibility of Gremlin in GraphScope?
GraphScope supports most querying operators in Gremlin. You may check the compatibility in this link.
The system seems get stuck, what are the possible reasons?
If GraphScope seems to get stuck, the possible cause might be:
In the session launching stage, the most cases are waiting for Pods ready. The time consumption may be caused by a poor network connection during pulling image, or by failing to acquire the requested resources to launch a session.
In the graph loading stage, it is time consuming to load and build a large graph.
When running a user-defined or built-in analytical algorithm, it may take time to compile the algorithm over the loaded graph.
Why No such file or directory error when loading graph?
This mostly occurs when you are deploying GraphScope in a Kubernetes cluster, the file must be visible to the
engine
Pod of GraphScope. You may need to mount a volume to the Pods or use cloud storage providers.Specifically, if your cluster is deployed with kind, you may need to setup extra-mounts to mount your local directory to kind nodes.
What’s the relationship between
k8s_vineyard_mem
,vineyard_shared_mem
andk8s_engine_mem
?k8s_vineyard_mem
: The memory allocated for the vineyard container. It stores the metadata of blobs managed by vineyard, such as the shape, id, name, and so forth. As the metadata would be much smaller than datasets, the default configuration is sufficient in most cases. It’s equivalent tovineyard.resources.memory.requests
andvineyard.resources.memory.limits
in graphscope helm charts.vineyard_shared_mem
: The memory where the data would be loaded in. Its value needs to be adjusted according to the size of the datasets. We found that setting the value to 5 times the size of the datasets on disk is usually a reasonable value. It’s equivalent tovineyard.shared_mem
in graphscope helm charts.k8s_engine_mem
: The memory of the engine pods, can just be set equal to the value ofvineyard_shared_mem
. Equivalent toengines.resources.memory.requests
andengines.resources.memory.limits
in graphscope helm charts.
Failed to install GraphScope on Apple M1 with python3.8?
Compile
grpcio
failed: You can try to useopenssl
from system byexport GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True
. See more details in grpc issue.compile
scipy
failed: You can follow this to build scipy from source or trypip3 install --pre -i https://pypi.anaconda.org/scipy-wheels-nightly/simple scipy
to workaround this problem.If you encounter errors like ERROR: Dependency “OpenBLAS” not found, tried pkgconfig, framework and cmake during installing scipy on macOS, try:
export CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH:$(brew --prefix openblas)
and run pip3 install scipy to install scipy again.
How to resolve the
Permission denied
error when allocating PV on NFS volumes?ENV: Use helm to install graphscope-store, NFS to supply PV.
Appearance: Pod
graphscope-store-kafka-0
,graphscope-store-zookeeper-0
reportsCrashLoopBackOff
status.Check: First use
kubectl logs graphscope-store-zookeeper-0
to check log. If the log showsmkdir: cannot create directory '/bitnami/zookeeper/data': Permission denied
.Reason: Normally, the permission of NFS directories we created is
root 755
(depends on your specific environment), but the default user of graphscope-store isgraphscope(1001)
, so these pods have no permission to write on NFS.Solution: There are two solutions to solve this.
The brutal one is using
chmod 777
on all related PV directories, this is efficient but not recommended in production environment.The elegant one is creating
graphscope
user and user group first, and then grant the access permission ongraphscope
to the related NFS directories.
why
Timeout Exception
raised during launching GraphScope instance on kubernetes cluster?It will take a few minutes for pulling image during the first time for launching GraphScope instance. Thus, the
Timeout Exception
may be caused by a poor network connection. You can increase the value oftimeout_seconds
parameter as your expectation bygraphscope.set_option(timeout_seconds=600))
.Failed to run GraphScope (either in single machine or in docker container) due to failed connection to building blocks like etcd?
It may be caused by that your machine is in an enterprise network, which requires proxy configurations to access network properly. This may lead to wrong address resolution and port occupancy. You can try to add addresses like
hostname -i
and0.0.0.0
to your environment variableno_proxy
orNO_PROXY
(be aware of the prefix/suffix policy of no_proxy)How to print debug info in GAE Cython SDK Algorithms?
python3 print function is a convenient way to show useful debug info, use print with param flush=True then the stream is forcibly flushed.
More details please refer to Python Documentation.
I do have many other questions…
Please feel free to contact us. You may reach us by Issues, ask questions in Discussions, or drop a message in Slack or DingTalk. We are happy to answer your questions responsively.