Running an indexer at scale on the Graph Network

In the previous blog post, we deployed a single copy of a Graph Node in a container and manually allocated a single subgraph. It was an excellent exercise to understand how indexing works and what is required to start indexing, but this is not what you want to do if you want to start indexing at a scale and achieve true decentralisation.
In this blog post, we will look into adding a few more features to our Graph Indexer:
- scaling indexing
- deploying graph-agent and using it for subgraph deployment
- deploying query mechanism
- rule-based subgraph allocation
Scaling indexing
If you have read the requirements to become an indexer you probably already have in mind what size of hardware you need and the volumes of data to expect.
As a first step toward a production service, it is a good idea to scale your graph-node to run multiple instances. If you have used the StatefulSet from the previous blog post you can simply increase the number of replicas in the YAML configuration. From that moment any of the nodes can index the subgraphs, but it’s still only one node that does the Block Ingestion(BLOCK_INGESTOR environment variable).
kubectl scale statefulset/graph-node --replicas=2 -n graph
statefulset.apps/graph-node scaled
Query
It is recommended to run a separate “query” element for your Indexer.
“graph-query” Kubernetes Deployment uses the same image as a graph-node Statefulset, but in this case, it will only serve Graphql queries, hence it can be autoscaled using Kubernetes HPA.
In query mode it requires less configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: query-node
namespace: graph
spec:
selector:
matchLabels:
app: query-node
replicas: 1
template:
metadata:
labels:
app: query-node
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "8040"
spec:
containers:
- name: graph-node
image: graphprotocol/graph-node:latest
ports:
- name: http
containerPort: 8000
- name: ws
containerPort: 8001
- name: index-node
containerPort: 8030
env:
- name: ipfs
value: https://ipfs.network.thegraph.com
- name: EXPERIMENTAL_SUBGRAPH_VERSION_SWITCHING_MODE
value: synced
- name: GRAPH_KILL_IF_UNRESPONSIVE
value: "true"
- name: postgres_host
value: postgres-postgresql
- name: postgres_user
value: graph
- name: postgres_pass
value: graph
- name: postgres_db
value: graph
- name: node_role
value: query-node
- name: node_id
valueFrom:
fieldRef:
fieldPath: metadata.name
indexer-service
Another service to support the Query mechanism is indexer-service. It is the only externally-facing component that handles client queries and passes them to query nodes. Indexer-service receives queries from The Graph Gateway and passes them into query-node.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: indexer-service
namespace: graph
spec:
serviceName: indexer-service
selector:
matchLabels:
app: indexer-service
replicas: 1
template:
metadata:
labels:
app: indexer-service
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "7300"
spec:
containers:
- name: indexer-service
image: ghcr.io/graphprotocol/indexer-service:v0.20.5-alpha.1
imagePullPolicy: Always
resources:
requests:
cpu: 600m
ports:
- name: http
containerPort: 7600
- name: metrics
containerPort: 7300
env:
- name: INDEXER_SERVICE_PORT
value: "7600"
- name: INDEXER_SERVICE_MNEMONIC
value: <YOUR MNEMONIC>
- name: INDEXER_SERVICE_INDEXER_ADDRESS
value: "<YOUR INDEXER ADDRESS>"
- name: INDEXER_SERVICE_ETHEREUM_NETWORK
value: goerli
- name: INDEXER_SERVICE_ETHEREUM
value: "<<RPC Endpoint of the GOERLI Archive Node. Infura Free Tier is sufficient for this https://goerli.infura.io/v3/...>>"
- name: INDEXER_SERVICE_GRAPH_NODE_QUERY_ENDPOINT
value: http://query-node:8000/
- name: INDEXER_SERVICE_GRAPH_NODE_STATUS_ENDPOINT
value: http://index-node:8030/graphql
- name: INDEXER_SERVICE_NETWORK_SUBGRAPH_ENDPOINT
value: "https://gateway.testnet.thegraph.com/network"
- name: INDEXER_SERVICE_POSTGRES_HOST
value: postgres-postgresql
- name: INDEXER_SERVICE_POSTGRES_USERNAME
value: graph
- name: INDEXER_SERVICE_POSTGRES_PASSWORD
value: graph
- name: INDEXER_SERVICE_POSTGRES_DATABASE
value: graph_agent
- name: INDEXER_SERVICE_FREE_QUERY_AUTH_TOKEN
value: free
- name: INDEXER_SERVICE_CLIENT_SIGNER_ADDRESS
value: 0x5CDBbD99EFEd374B732735C1C32A1735a55daF47
Introducing Graph Agent
To truly see the power of a distributed indexing model we need our indexer to participate in the Graph Network.
indexer-agent is a component responsible for your indexer participating in the Network.
To participate in the Graph Network you need a few things:
- Indexer address and mnemonic
- operator address
- some GRT tokens(participating in the Mainnet cost 100k GRT Tokens ~ 7$ using the current exchange rate)
Best to start learning the indexing ecosystem using TestNet. The Graph uses Goerli as a Test Network. All necessary steps on how to set up Indexer/Operator addresses and get enough GRT tokens to participate in the Testnet are well documented in the Graph Docs so we will not repeat it here.
indexer-agent requires its own Postgres Database to manage the configuration of the Indexer.
A new schema can be created on the same instance of Postgres we used for the index-node
postgres=# create database graph_agent;
CREATE DATABASE
postgres=# grant ALL PRIVILEGES on database graph_agent to graph;
GRANT
Once you completed all prerequisites we are ready to deploy an indexer agent. It should be always deployed as a single instance and must not be exposed to the external world.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: indexer-agent
namespace: graph
spec:
serviceName: indexer-agent
selector:
matchLabels:
app: indexer-agent
replicas: 1 # Must not be changed, there should only be one indexer-agent
template:
metadata:
labels:
app: indexer-agent
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "7300"
spec:
containers:
- name: indexer-agent
image: ghcr.io/graphprotocol/indexer-agent:latest
imagePullPolicy: Always
resources:
requests:
cpu: 600m
ports:
- name: http
containerPort: 7600
- name: metrics
containerPort: 7300
- name: management
containerPort: 8000
- name: syncing
containerPort: 8001
env:
- name: INDEXER_AGENT_MNEMONIC
value: <<YOUR MNEMONIC>>
- name: INDEXER_AGENT_INDEXER_ADDRESS
value: "<<YOUR INDEXER ADDRESS>>"
- name: INDEXER_AGENT_ETHEREUM_NETWORK
value: goerli
- name: INDEXER_AGENT_ETHEREUM
value: "<<RPC Endpoint of the GOERLI Archive Node. Infura Free Tier is sufficient for this https://goerli.infura.io/v3/...>>"
- name: INDEXER_AGENT_PUBLIC_INDEXER_URL
value: <<YOUR External URL, We will create this later>>
- name: INDEXER_AGENT_INDEX_NODE_IDS
value: index_node_0
- name: INDEXER_AGENT_GRAPH_NODE_QUERY_ENDPOINT
value: http://query-node:8000
- name: INDEXER_AGENT_GRAPH_NODE_ADMIN_ENDPOINT
value: http://index-node.graph.svc.cluster.local:8020/graphql
- name: INDEXER_AGENT_GRAPH_NODE_STATUS_ENDPOINT
value: http://index-node.graph.svc.cluster.local:8030/graphql
- name: INDEXER_AGENT_POSTGRES_HOST
value: postgres-postgresql
- name: INDEXER_AGENT_POSTGRES_USERNAME
value: graph
- name: INDEXER_AGENT_POSTGRES_PASSWORD
value: graph
- name: INDEXER_AGENT_POSTGRES_DATABASE
value: graph_agent
- name: INDEXER_AGENT_COLLECT_RECEIPTS_ENDPOINT
value: "https://gateway.testnet.thegraph.com/collect-receipts"
- name: INDEXER_AGENT_NETWORK_SUBGRAPH_ENDPOINT
value: "https://gateway.testnet.thegraph.com/network"
- name: INDEXER_AGENT_DAI_CONTRACT
value: 0x9e7e607afd22906f7da6f1ec8f432d6f244278be
- name: INDEXER_AGENT_EPOCH_SUBGRAPH_ENDPOINT
value: https://api.thegraph.com/subgraphs/name/graphprotocol/goerli-epoch-block-oracle
and basic service to communicate with the agent:
apiVersion: v1
kind: Service
metadata:
name: indexer-agent
namespace: graph
spec:
type: ClusterIP
selector:
app: indexer-agent
ports:
- name: syncing
protocol: TCP
port: 8001
targetPort: 8001
- name: management
protocol: TCP
port: 8000
targetPort: 8000
You should be able to see in the logs if your Indexer got successfully registered on the Network, also you should be able to find it in https://testnet.thegraph.com/explorer/participants/indexers
Selecting a subgraph to index
Let’s consider what subgraph to index to index. Since our Indexer is connected to a Testnet but will index Data from the mainnet we will search for “Mainnet” subgraphs in https://testnet.thegraph.com/explorer .
A quick search rocketpool eth mainnet looks to satisfy both of the criteria.
Now we are ready to start indexing this subgraph….
Indexing subraphs with indexer-agent
For interacting with the Indexer let’s deploy a shell container, we will use our own custom build
apiVersion: v1
kind: Pod
metadata:
name: shell
namespace: graph-test
labels:
app: shell
spec:
containers:
- image: metaops/graph_cli:latest
name: shell
command:
- /bin/sh
- "-c"
- "sleep infinity"
imagePullPolicy: Always
All interactions with an agent we will perform by running “exec” into this container.
Indexing and allocating
To start indexing the subgraph we need to tell it to index-node. In the previous blog post we did it directly via index management API. Now we can do it via indexer-agent
graph@shell-0:~$ graph indexer rules prepare QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9
┌────────────────────────────────────────────────┬────────────────┬──────────────────┬────────────────────┬─────────────┬─────────────────────┬─────────────────────────┬───────────┬───────────┬──────────┬─────────────────────┬────────┬───────────────┬──────────────────┐
│ identifier │ identifierType │ allocationAmount │ allocationLifetime │ autoRenewal │ parallelAllocations │ maxAllocationPercentage │ minSignal │ maxSignal │ minStake │ minAverageQueryFees │ custom │ decisionBasis │ requireSupported │
├────────────────────────────────────────────────┼────────────────┼──────────────────┼────────────────────┼─────────────┼─────────────────────┼─────────────────────────┼───────────┼───────────┼──────────┼─────────────────────┼────────┼───────────────┼──────────────────┤
│ QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9 │ deployment │ null │ null │ true │ null │ null │ null │ null │ null │ null │ null │ offchain │ true │
└────────────────────────────────────────────────┴────────────────┴──────────────────┴────────────────────┴─────────────┴─────────────────────┴─────────────────────────┴───────────┴───────────┴──────────┴─────────────────────┴────────┴───────────────┴──────────────────┘
This will start indexing of the subgraph QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9 off-chain, i.e. the fact that you are indexing this subgraph will not be reflected on the Graph Test Network. This is a recommended way of starting indexing the subgraph.
You can monitor the progress of indexing by running a query:
graph@shell-0:~$ http -b post http://index-node:8030/graphql query='{ indexingStatuses(subgraphs: ["QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9"]) { subgraph synced health fatalError {message deterministic block { number }} chains {latestBlock {number}chainHeadBlock {number}}}}'
{
"data": {
"indexingStatuses": [
{
"chains": [
{
"chainHeadBlock": {
"number": "16019676"
},
"latestBlock": {
"number": "13420822"
}
}
],
"fatalError": null,
"health": "healthy",
"subgraph": "QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9",
"synced": false
}
]
}
}
When the latest block matches the chainHeadBlock, the Subgraph is synced and we can allocate some funds towards the Subgraph so it will appear on the graph Network Gateway will start sending queries to your indexer:
graph@shell-0:~$ graph indexer allocations create QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9 10000
✔ Allocation created
┌────────────────────────────────────────────┬────────────────────────────────────────────────┬─────────────────┐
│ allocation │ deployment │ allocatedTokens │
├────────────────────────────────────────────┼────────────────────────────────────────────────┼─────────────────┤
│ 0xcE6F15Be0538193C9741Dd4102caf2a3F08B48a9 │ QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9 │ 10000.0 │
└────────────────────────────────────────────┴────────────────────────────────────────────────┴─────────────────┘
You can verify by checking the indexers of the subgraph you just deployed via https://testnet.thegraph.com/explorer/subgraphs/DY3QPkCxnaUpDEP3iVkARbBviNnKRcKWKk6vQwVQHGAe?view=Indexers .
If all worked well you will see your indexer and allocations on the network.

Indexing Rules
Allocating graphs manually works well for a small indexer, but if you want to run Indexing at scale, you better learn to use Indexing Rules.
Indexing Rules can be created via the same graph agent CLI.
Example: Checking all existing Rules
graph@shell-0:~$ graph indexer rules get all
┌────────────────────────────────────────────────┬────────────────┬──────────────────┬────────────────────┬─────────────┬─────────────────────┬─────────────────────────┬───────────┬───────────┬──────────┬─────────────────────┬────────┬───────────────┬──────────────────┐
│ identifier │ identifierType │ allocationAmount │ allocationLifetime │ autoRenewal │ parallelAllocations │ maxAllocationPercentage │ minSignal │ maxSignal │ minStake │ minAverageQueryFees │ custom │ decisionBasis │ requireSupported │
├────────────────────────────────────────────────┼────────────────┼──────────────────┼────────────────────┼─────────────┼─────────────────────┼─────────────────────────┼───────────┼───────────┼──────────┼─────────────────────┼────────┼───────────────┼──────────────────┤
│ global │ group │ 0.01 │ null │ true │ 1 │ null │ null │ null │ null │ null │ null │ rules │ true │
├────────────────────────────────────────────────┼────────────────┼──────────────────┼────────────────────┼─────────────┼─────────────────────┼─────────────────────────┼───────────┼───────────┼──────────┼─────────────────────┼────────┼───────────────┼──────────────────┤
│ QmbfLf4JfYwBfJeFKZ83CToRr5RpYnX25y4wQytqy2bthe │ deployment │ 10,000.0 │ null │ true │ null │ null │ null │ null │ null │ null │ null │ always │ true │
├────────────────────────────────────────────────┼────────────────┼──────────────────┼────────────────────┼─────────────┼─────────────────────┼─────────────────────────┼───────────┼───────────┼──────────┼─────────────────────┼────────┼───────────────┼──────────────────┤
│ QmSCkfLwsaQoT6nPKX9nE8TFdzDjaueUtELCK1db1aFWE9 │ deployment │ 10,000.0 │ null │ true │ null │ null │ null │ null │ null │ null │ null │ always │ true │
└────────────────────────────────────────────────┴────────────────┴──────────────────┴────────────────────┴─────────────┴─────────────────────┴─────────────────────────┴───────────┴───────────┴──────────┴─────────────────────┴────────┴───────────────┴──────────────────┘
You can operate the indexer in Auto mode when indexer-agent will find and allocate subgraphs based on your set criteria or manual mode when you need to submit individual Actions and Approve them manually.
- What is TCP Proxy Protocol and why do you need to know about it? - March 30, 2023
- Highlights of OpenUK Conference in London - February 13, 2023
- Applied Observability - January 25, 2023