Skywalking oap-server init mode
Experimenting with Skywalking in our Lab every now and then we are coming back to the broken installation.
All components seem running but no Data is displayed in the UI. Further investigation shows error on oap-server
2022-07-22 15:28:51,809 org.apache.skywalking.oap.server.core.storage.model.ModelInstaller 43 [main] INFO  - table: alarm_record does not exist. OAP is running in 'no-init' mode, waiting... retry 3s later.
This points us to the “no-init” mode of running the oap-server.
Going further into documentation it appears that Skywalking Oap Server should run with -init mode when it starts for the first time against a new Datastore. Oap Server in Init mode creates all necessary indexes and structures for it to store the data.
By the looks of it, the Deployment of elasticsearch we use in our Lab does not persist the Data by default. We used it mostly for POCs which rarely exist for longer than a few hours.
At some point, elasticsearch got restarted by Kubernetes on a different node and at the same time, all its data got lost along with indexes and metadata.
It’s easy enough to recover from this situation. We just need to run the oap-agent with init mode again to create all necessary elements in the elastcsearch.
Kubernetes Job to the rescue:
apiVersion: batch/v1 kind: Job metadata: name: oap-init-job # @feature: cluster; set up an init job to initialize ES templates and indices spec: template: metadata: name: oap-init-job annotations: sidecar.istio.io/inject: "false" spec: serviceAccountName: skywalking-oap-sa-cluster restartPolicy: Never initContainers: - name: wait-for-es image: busybox:1.30 command: - 'sh' - '-c' - 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1' containers: - name: oap-init image: ghcr.io/apache/skywalking/oap:b695983fc58ae17bc6993898afb671ff1e19be12 imagePullPolicy: Always env: # @feature: cluster; make sure all env vars are the same with the cluster nodes as this will affect templates / indices - name: JAVA_OPTS value: "-Dmode=init" # @feature: cluster; set the OAP mode to "init" so the job can complete - name: SW_OTEL_RECEIVER value: default - name: SW_OTEL_RECEIVER_ENABLED_OC_RULES value: vm,oap - name: SW_STORAGE value: elasticsearch - name: SW_STORAGE_ES_CLUSTER_NODES value: elasticsearch:9200 - name: SW_STORAGE_ES_INDEX_REPLICAS_NUMBER value: "0" - name: SW_TELEMETRY value: prometheus volumeMounts: - name: config-volume mountPath: /skywalking/ext-config volumes: - name: config-volume configMap: name: oap-static-config
after completion of the job
oap-init-job-kbn9s 0/1 Completed 0 20m
Skywalking is running again and storing Data in the elastcsearch!