Deploying a Graph Indexer on Kubernetes

What is the graph?

The Graph is a decentralised Saas. It provides a blockchain search service to DApps and incentivises network participants to index the data. It’s like being a miner/validator but for data rather than block confirmations.
Graph Indexer Overview
Graph Indexer Stack consists of several applications:
- graph-node – the main component which does heavy lifting indexing of the blockchain Data. Often referred to as Indexing Node.
- graph-query – essentially it is the same application as a graph-node but in configuration to query the indexes
- graph-agent – component allowing your indexer to participate in the Graph Network and managing the graph-node
- index-service – externally facing component which handles client queries and passes them to the graph-nodes
- postgres database – this is the storage for the Indexed data
First, we will set up a standalone and isolated graph node. This can be handy for subgraph development or for any internal use of the graph service. We will use Kubernetes as a deployment mechanism.
Setting up Postgres database
Postgres Database is the only pre-requisite for the graph node installation. We won’t spend too much time on this as it’s a standard task. We will install Postgres on Kubernetes using the Bitnami helm chart. For our experiments, we will not have persistence on the Database, but if you are looking to use this instance for a prolonged time we would recommend enabling it.
kubectl create namespace graph
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgres --set persistence.enabled=false bitnami/postgresql -n graph
Make sure you can see “database system is ready to accept connections” in the Postgres Logs.
Before we go ahead with the graph node deployment we need to create Postgres Database and User for the graph-node application.
Log in to your newly installed Postgres and run the following queries to create the DB, user and grant it the necessary privileges. graph-node requires some extra extensions to work, we will create those as “postgres” privileged user so we don’t have to grant graph user any extra privileges.
postgres=# create database graph;
CREATE DATABASE
postgres=# create user graph with password 'graph';
CREATE ROLE
postgres=# grant ALL PRIVILEGES on database graph to graph;
GRANT
\c graph
CREATE EXTENSION postgres_fdw;
CREATE EXTENSION pg_stat_statements;
Deploying the graph-node
The application will need a connection to the database and access to the RPC Endpoint of the Ethereum Client. For this, we will use the Erigon Node we had set up in the previous post.
Graph node software is distributed in pre-build Docker Containers and can be pulled directly from the docker hub. Using the “latest” tag of the software is generally ok.
Some reasonable defaults in the manifest to deploy a single node using StatefulSet:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: graph-node
namespace: graph
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/instance: graph
app.kubernetes.io/name: graph-node
serviceName: graph-node
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8040"
prometheus.io/scrape: "true"
labels:
app.kubernetes.io/instance: graph
app.kubernetes.io/name: graph-node
spec:
containers:
- env:
- name: node_id
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: GRAPH_LOG
value: debug
- name: RUST_LOG
value: INFO
- name: BLOCK_INGESTOR
value: graph-node-0
- name: GRAPH_ETH_CALL_GAS
value: "50000000"
- name: GRAPH_GETH_ETH_CALL_ERRORS
value: out of gas
- name: ETHEREUM_TRACE_STREAM_STEP_SIZE
value: "50"
- name: ETHEREUM_BLOCK_BATCH_SIZE
value: "10"
- name: ETHEREUM_RPC_MAX_PARALLEL_REQUESTS
value: "64"
- name: GRAPH_ETHEREUM_MAX_BLOCK_RANGE_SIZE
value: "500"
- name: GRAPH_ETHEREUM_TARGET_TRIGGERS_PER_BLOCK_RANGE
value: "200"
- name: GRAPH_KILL_IF_UNRESPONSIVE
value: "true"
- name: GRAPH_ALLOW_NON_DETERMINISTIC_FULLTEXT_SEARCH
value: "true"
- name: GRAPH_ALLOW_NON_DETERMINISTIC_IPFS
value: "true"
- name: EXPERIMENTAL_SUBGRAPH_VERSION_SWITCHING_MODE
value: synced
- name: node_role
value: index-node
- name: ipfs
value: https://ipfs.network.thegraph.com
- name: ethereum
value: mainnet:http://10.0.0.2:8545
- name: postgres_host
value: postgres-postgresql
- name: postgres_user
value: graph
- name: postgres_pass
value: graph
- name: postgres_db
value: graph
image: graphprotocol/graph-node:latest
imagePullPolicy: Always
name: graph-node
ports:
- containerPort: 8000
name: http
protocol: TCP
- containerPort: 8020
name: json-rpc
protocol: TCP
- containerPort: 8040
name: metrics
protocol: TCP
- containerPort: 8030
name: status
protocol: TCP
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: graph-node
namespace: graph
labels:
app.kubernetes.io/name: graph-node
app.kubernetes.io/instance: graph
app.kubernetes.io/version: "0.26.0"
app.kubernetes.io/managed-by: Helm
spec:
type: ClusterIP
ports:
- port: 8000
targetPort: 8000
protocol: TCP
name: http
- port: 8020
targetPort: 8020
protocol: TCP
name: rpc
- port: 8030
targetPort: 8030
protocol: TCP
name: status
selector:
app.kubernetes.io/name: graph-node
app.kubernetes.io/instance: graph
Some interesting configuration values to keep in mind:
node_id – This will be used by graph-node to identify each instance of the application and to manage the allocation of the indexing jobs when multiple indexing nodes are deployed. In Kubernetes, we use the name of the POD itself to configure it. Must be the same between restarts, so we are using StatefulSet for this and not a Deployment.
ethereum – it’s the name of the Network and URL to connect to the RPC endpoint. This should be your Ethereum Archive Node or your Infura(or another provider) Node.
BLOCK_INGESTOR – the name of the Graph Node which will be responsible for polling the Blockchain for new blocks. It will be the actual name of one of the PODs when multiple replicas are deployed.
node_role – the role which node will perform. We are deploying index-node.
If your deployment is successful you should be able to see in the POD logs that graph-node started, connected to Postgres DB, IPFS and RPC endpoint. Generally, absence ERRORs mean the node has started OK. Make sure you see the log lines indicating the Block Ingestor is running.
Deploying the subgraph
Now when our indexer is running we can deploy the first subgraph and see it being indexed.
As an example, we will use the subgraph of KnownOrigin(in the graph manifest above we have used GRAPH_ALLOW_NON_DETERMINISTIC_IPFS=true since this specific subgraph has a dependency on IPFS). We took Hash “QmXLwn6GUbvjs6JPGvAGdMpzfB8LdQbGp8jsLt5as2KpPt” for KnownOrigin subgraph from Hosted Service URL, this is an address of the subgraph in IPFS.
There are several ways to deploy the subgraph into an indexer
Deploying the subgraph using graph-cli
We will launch a shell POD to interact with your indexer. Installation of the grap_cli is straightforward. To save time we use graph_cli Image we built to do the testing and development of subgraphs.
apiVersion: v1
kind: Pod
metadata:
name: shell
namespace: graph
spec:
containers:
- image: metaops/graph_cli:latest
name: shell
command:
- /bin/sh
- "-c"
- "sleep infinity"
imagePullPolicy: Always
env:
- name: ethereum
value: mainnet:http://10.0.0.2:8545
- name: postgres_host
value: postgres-postgresql
- name: postgres_user
value: graph
- name: postgres_pass
value: graph
- name: postgres_db
value: graph
Graph CLI is intended for use when developing your own subgraphs, but it can also be used to deploy the existing subgraph into a local indexer node. (this method was picked up from the Graph Discord chat and shouldn’t be used in production)
graph@shell:~$ graph create -g http://graph-node:8020 knownorigin
Created subgraph: knownorigin
graph@shell:~$ http post graph-node:8020 jsonrpc="2.0" id="1" method="subgraph_deploy" params:='{"ipfs_hash": "QmXLwn6GUbvjs6JPGvAGdMpzfB8LdQbGp8jsLt5as2KpPt", "name":"knownorigin"}'
HTTP/1.1 200 OK
content-length: 193
content-type: application/json; charset=utf-8
date: Thu, 15 Sep 2022 15:07:13 GMT
{
"id": "1",
"jsonrpc": "2.0",
"result": {
"playground": ":8000/subgraphs/name/knownorigin/graphql",
"queries": ":8000/subgraphs/name/knownorigin",
"subscriptions": ":8001/subgraphs/name/knownorigin"
}
}
the same can be done by using “graphman” – a utility which comes as part of the graph-node Docker image.
Deploying the subgraph via graph-node startup parameter
Another way of deploying a subgraph to an indexer is passing SUBGRAPH environment variable to the graph-node itself. Adding the following lines to the graph-node StatefulSet manifest will make the same effect as the above API call.
- name: SUBGRAPH
value: knownorigin:QmXLwn6GUbvjs6JPGvAGdMpzfB8LdQbGp8jsLt5as2KpPt
Once the subgraph is deployed the above environment variable can be removed or changed to deploy another subgraph. Not sure if this was the intended use of this feature but it works well for testing.
Query subgraph
Once the subgraph is deployed you should be able to see in the logs that it has started indexing. To start the node responsible for ingesting will scan the entire blockchain in a search of blocks relevant to the new subgraph. Blocks will be stored in Postgres and the subgraph can be indexed. You can immediately check the indexing progress via the Graph Query Web interface.
Setup port forwarding in order to access the graph node web UI and point your browser on http://localhost:8030/graphql/playground
kubectl port-forward pod/graph-node-0 8030 -n graph
Type in the GraphQL query window the following query(this is a standard query which returns the status of any subgraph):
{
indexingStatusForCurrentVersion(subgraphName: "knownorigin") {
synced
health
fatalError {
message
block {
number
hash
}
handler
}
chains {
chainHeadBlock {
number
}
latestBlock {
number
}
}
}
}
Depending on the status of the subgraph and how long time passed since deployment it might take several hours to sync. Eventually, you will see something like:

When the subgraph is fully synced we can test it by querying the real business data. We will do it via subgraph API. Setup port forwarding and navigate to http://localhost:8000/subgraphs/name/knownorigin
kubectl port-forward pod/graph-node-0 8000 -n graph
Submit the actual query:
{
mostEth: artists(
first: 100
orderBy: totalValueInEth
orderDirection: desc
where: {totalValueInEth_gt: 0}
) {
address
totalValueInEth
issuedCount
salesCount
editionsCount
firstEditionTimestamp
lastEditionTimestamp
supply
highestSaleValueInEth
}
}
The result should look like this

In the next post, we will look into scaling our indexer and putting it out there on the Graph Network!
- What is TCP Proxy Protocol and why do you need to know about it? - March 30, 2023
- Highlights of OpenUK Conference in London - February 13, 2023
- Applied Observability - January 25, 2023