Troubleshoot mariadb pods
Let's say we have a Kubernetes cluster in Google Cloud and we've performed the deployment of a Mariadb cluster with 1 primary instances and 2 read replicas.
After some time of activity, we notice that the Mariadb main pod keeps crashing and restarting and you wonder, what could it be?
The very first step of troubleshooting our pods it's to check the logs.
To do so, identify first the pod name in our namespace:
kubectl get pods -n mariadb-1
kubectl get pods -n mariadb-1 NAME READY STATUS RESTARTS AGE mariadb-1-deployer-1225393596-czs5w 0/1 Completed 0 13d mariadb-1-mariadb-0 2/2 Running 0 96m mariadb-1-mariadb-secondary-0 1/1 Running 1 6d4h mariadb-1-mariadb-secondary-1 1/1 Running 1 13d
Then, get the logs by using:
kubectl logs mariadb-1-mariadb-0 -n mariadb-1
2020-03-04 1:36:17 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2626073630 2020-03-04 1:36:18 0 [Note] InnoDB: Last binlog file './mysql-bin.000057', position 718622 2020-03-04 1:36:19 0 [Note] InnoDB: 128 out of 128 rollback segments are active. 2020-03-04 1:36:19 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2020-03-04 1:36:19 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2020-03-04 1:36:19 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2020-03-04 1:36:19 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2020-03-04 1:36:19 0 [Note] InnoDB: 10.3.22 started; log sequence number 2626073639; transaction id 73045993
We can see clearly on the first line that our database is recovering from a crash. Nevertheless, this log won't give us more details.
The next step would be to connect to our pod and try to get more information. To so:
kubectl exec -it mariadb-1-mariadb-0 -n mariadb-1 /bin/bash
Once inside the pod, we can perform some basic linux troubleshooting. We can try by using dmesg
[1178727.643114] Memory cgroup out of memory: Kill process 1338764 (k8s_metadata) score 119 or sacrifice child [1178727.653058] Killed process 1338764 (k8s_metadata) total-vm:525388kB, anon-rss:23952kB, file-rss:2796kB, shmem-rss:0kB
Interesting fact, we can see a process kill due to lack of memory
Additionally we can check the namespace events and describe the pods to gather extra information
kubectl describe pod/mariadb-1-mariadb-0 -n mariadb-1
kubectl get events -n mariadb-1
If we go to our cloud provider monitoring our defined tool and we look for our pod in the period of time when it crashed
we can observe the Used Memory and the Requested
As per the official documentantion: Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.
Taking this in consideration, we need to guarantee that our Pod will have at least our currant usage plus any possible spike.
If we check the official mariadb documentation or use the mysql calculator with the default parameters: https://www.mysqlcalculator.com/, the minimum to run is about 576mb
Now that we have identified the problem, it's time to fix our Deployment or Statefulset.
To do so, we will identify first the Statefulset or Deployment name:
kubectl get statefulset -n mariadb-1
kubectl get statefulset -n mariadb-1 NAME READY AGE mariadb-1-mariadb 1/1 13d mariadb-1-mariadb-secondary 2/2 13d
Then, we can edit by typing:
kubectl edit statefulset mariadb-1-mariadb -n mariadb-1
Once in the edit mode, we need to locate the requests parameters:
resources: requests: cpu: 100m memory: 100Mi
We just need to adjust the values for the memory parameter:
resources: requests: cpu: 100m memory: 800Mi
Once we save the changes, please be aware that the current pods will be terminated and new pods will be deployed