Troubleshoot mariadb pods

Topic created · 1 Posts · 4 Views
  • Let's say we have a Kubernetes cluster in Google Cloud and we've performed the deployment of a Mariadb cluster with 1 primary instances and 2 read replicas.

    After some time of activity, we notice that the Mariadb main pod keeps crashing and restarting and you wonder, what could it be?

    Review the logs

    The very first step of troubleshooting our pods it's to check the logs.
    To do so, identify first the pod name in our namespace:

    kubectl  get pods -n mariadb-1
     kubectl  get pods -n mariadb-1
    NAME                                  READY   STATUS      RESTARTS   AGE
    mariadb-1-deployer-1225393596-czs5w   0/1     Completed   0          13d
    mariadb-1-mariadb-0                   2/2     Running     0          96m
    mariadb-1-mariadb-secondary-0         1/1     Running     1          6d4h
    mariadb-1-mariadb-secondary-1         1/1     Running     1          13d

    Then, get the logs by using:

    kubectl logs mariadb-1-mariadb-0  -n mariadb-1
    2020-03-04  1:36:17 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2626073630
    2020-03-04  1:36:18 0 [Note] InnoDB: Last binlog file './mysql-bin.000057', position 718622
    2020-03-04  1:36:19 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
    2020-03-04  1:36:19 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
    2020-03-04  1:36:19 0 [Note] InnoDB: Creating shared tablespace for temporary tables
    2020-03-04  1:36:19 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
    2020-03-04  1:36:19 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
    2020-03-04  1:36:19 0 [Note] InnoDB: 10.3.22 started; log sequence number 2626073639; transaction id 73045993

    We can see clearly on the first line that our database is recovering from a crash. Nevertheless, this log won't give us more details.

    The next step would be to connect to our pod and try to get more information. To so:

    kubectl exec -it mariadb-1-mariadb-0  -n mariadb-1 /bin/bash

    Once inside the pod, we can perform some basic linux troubleshooting. We can try by using dmesg

    [1178727.643114] Memory cgroup out of memory: Kill process 1338764 (k8s_metadata) score 119 or sacrifice child
    [1178727.653058] Killed process 1338764 (k8s_metadata) total-vm:525388kB, anon-rss:23952kB, file-rss:2796kB, shmem-rss:0kB

    Interesting fact, we can see a process kill due to lack of memory

    Additionally we can check the namespace events and describe the pods to gather extra information

    kubectl describe pod/mariadb-1-mariadb-0 -n mariadb-1
    kubectl get events -n mariadb-1

    Review the metrics

    If we go to our cloud provider monitoring our defined tool and we look for our pod in the period of time when it crashed
    we can observe the Used Memory and the Requested

    alt text

    As per the official documentantion: Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.

    Taking this in consideration, we need to guarantee that our Pod will have at least our currant usage plus any possible spike.

    If we check the official mariadb documentation or use the mysql calculator with the default parameters:, the minimum to run is about 576mb

    Update the requests

    Now that we have identified the problem, it's time to fix our Deployment or Statefulset.
    To do so, we will identify first the Statefulset or Deployment name:

    kubectl get statefulset -n mariadb-1
    kubectl get statefulset -n mariadb-1
    NAME                          READY   AGE
    mariadb-1-mariadb             1/1     13d
    mariadb-1-mariadb-secondary   2/2     13d

    Then, we can edit by typing:

    kubectl edit statefulset mariadb-1-mariadb -n mariadb-1

    Once in the edit mode, we need to locate the requests parameters:

                cpu: 100m
                memory: 100Mi

    We just need to adjust the values for the memory parameter:

                cpu: 100m
                memory: 800Mi

    Once we save the changes, please be aware that the current pods will be terminated and new pods will be deployed

Log in to reply