Troubleshooting clustering process of thinking and auscultation three axes (thirty three)


Data in this chapter deal with failure and share some trick questions, such as auscultation three axes: 1) View Log

2) View event details and resources

3) Check the allocation of resources (YAML)

If you’re still not very good analysis, then resorted artifact –kubectl-debug.

Finally, the only remedy to the problem based on.


table of Contents

  • Further diagnostic analysis – auscultation three tricks

  • Commissioning container

  • Remedy


Further diagnostic analysis – auscultation three tricks

Both during the initial stage, we often only some of the surface of the obtained information, such as node hanging, Pod collapsed, the network is disconnected, etc. In this case, we need to use some tools according to our newly diagnosed direction and extent of binding and specific diagnostic log .

Here I respected auscultation three axes:

  • View Log

  • View event details and resources

  • View resources

View Log

In most cases, you want to get specific cause, view the log is the most direct way, therefore, we need to learn how to view the log.


1. Use journalctl View Service Logs

Mainstream Linux system basically using Systemd to centrally manage and configure the system, if you are using Systemd mechanism, we can use journalctl command to view the service log:

For example docker:

journalctl -u docker

View and track kubelet log:

journalctl -u kubelet -f




2. Use the “kubectl logs” View container logs

Our applications run in the Pod, as well as some of the components k8s such kube-apiserver, coredns, etcd, kube-controller-manager, kube-proxy, kube-scheduler, etc., are also running in the Pod (static Pod) then how to view these log components, and application of it? Here it is necessary to use “kubectl logs” command mentioned earlier.

The syntax is as follows:

kubectl logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER] [options]

The main parameters described in the following table:



-f, –follow

Whether to keep track log, the default is false, it will continue after the specified output log.

-p, –previous

Output in the Pod has been run over, but the log has been terminated container.

-c, –container

Container name.


Returns only the relative time (e.g. 5s, 2m or 3h) in the log. The default return all logs.


Log returns only after a specified time, the default return all. At the same time, and can only use one since the since-time.


The number of the latest log bar to be displayed, the default is -1, show all.


Include a timestamp in the log output.

-l, –selector

Label selected using filter

Understand the main parameters and description, we see a few examples:

  • View Pod “mssql-58b6bff865-xdxx8” log

kubectl logs mssql-58b6bff865-xdxx8
  • Log in to view 24 hours

kubectl logs mssql-58b6bff865-xdxx8 --since 24h
  • View Log Pod according to label

kubectl logs -lapp=mssql
  • See the log specified namespace Pod (Note that the system components namespace “kube-system”)

kubectl logs kube-apiserver-k8s-master -f -n kube-system

View details of the resource instance

In addition to viewing the log, sometimes we need to look at the resource instance details to help us solve the problem. That’s where we mentioned above had “kubectl describe” command.

“Kubectl describe” command to view one or more resources for details, including the related resources and events. The syntax is as follows:

kubectl describe (-f FILENAME | TYPE [NAME_PREFIX | -l label] | TYPE/NAME)

The main parameters described in the following table:




View all resources in the namespace

-f, –filename

According to the resource description file, directory, Url to see

-R, –recursive

Recursively view all resources assigned -f

-l, –selector

Label selected using filter


Show Events

Understand the main parameters and instructions, explain to us by way of example:

1. View node

Check the specified node:

kubectl describe nodes k8s-node1

See all the nodes:

kubectl describe nodes

Check the specified node and events:

kubectl describe nodes k8s-node1--show-events

Note that if the Node status is NotReady, by looking at the node events can help us to troubleshoot the problem.


2. Check Pod

Check specify Pod:

kubectl describe pods gitlab-84754bd77f-7tqcb

View all resources of the specified file description

kubectl describe -f teamcity.yaml

View resources and configuration

Many application errors are often caused by our configuration, then how to view the deployed configuration of resources it? That’s where the powerful “kubectl get” commands.

“Kubectl get” command we often use, before that we often use it to look up resources, then how to use it to view resource allocation? Let’s look at the syntax:

kubectl get [(-o|--output=)json|yaml|wide|custom-columns=...|custom-columns-file=...|go-template=...|go-template-file=...|jsonpath=...|jsonpath-file=...] (TYPE[.VERSION][.GROUP] [NAME | -l label] | TYPE[.VERSION][.GROUP]/NAME ...) [flags] [options]


As shown in the above syntax, “kubectl get” has a strong formatted output capacity to support “json”, “yaml” and so on, in the above kubectl one we have explained before, here we mainly use the “-o “resource allocation to view, particularly as shown in the following examples:

  • Check the configuration specified Pod

kubectl get pods mssql-58b6bff865-xdxx8 -o yaml

  • yaml Nujia could not understand, want to see the JSON version:

  • All want to see:

kubectl get pods -o json
  • View the service configuration

kubectl get svc mssql -o yaml

  • View deployment (deployment) Configuration

kubectl get deployments mssql -o yaml




Note: “- o” used well, no longer have to worry about yaml not write.


Commissioning container

Sometimes just look at the log has not issued a specific diagnosis, may have to carry out further checks Dong Daozi or commissioning to demonstrate our guess. I recommend using the following scheme:


Use “kubectl exec” run into the container in the commissioning

We can use “kubectl exec” run into the container in the commissioning. This command and “docker exec” is very similar, the specific syntax is as follows:

kubectl exec (POD | TYPE/NAME) [-c CONTAINER] [flags] -- COMMAND [args...] [options]

The main parameters described in the following table:



-c, –container

Specify the container name

-i, –stdin

Enable standard input

–tty , -t

Pseudo dispensing the TTY (terminal equipment)

Next we combine example shows:

  • Check into the container configuration

kubectl exec mssql-58b6bff865-xdxx8 -- cat /etc/resolv.conf

  • Dispensing container into the flow goes to the input terminal and the standard bash

kubectl exec mssql-58b6bff865-xdxx8 -it bash

As shown above, then we enter the container MSSQL database, use sqlcmd tool to perform a query. This operation if in doubt, refer to the database of a container.


Kubectl-debug tool using commissioning containers

kubectl-debug is a simple open source kubectl plug-in, can help us to easily conduct troubleshooting on Pod Kubernetes diagnosis, behind to do is very simple: extra from a new container Pod in operation, and add new containers the target container pid, network, user and ipc namespace in, then we can directly in the new container netstat, tcpdump these familiar tools to diagnose and solve the problem, and the old containers can be kept to a minimum, no pre install any additional troubleshooting tools.

GitHub address: https: //

The installation script as follows (CentOS 7):

export PLUGIN_VERSION=0.1.1
# linux x86_64,下载文件
curl -Lo kubectl-debug.tar.gz${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_linux_amd64.tar.gz
tar -zxvf kubectl-debug.tar.gz kubectl-debug
sudo mv kubectl-debug /usr/local/bin/

For debugging faster and more convenient, we need to install debug-agent DaemonSet, install command as follows:

kubectl apply -f

Very simple to use, the following are common examples of their use:

# 输出帮助命令
kubectl debug -h
# 启动Debug
kubectl debug (POD | NAME)
# 假如 Pod 处于 CrashLookBackoff 状态无法连接, 可以复制一个完全相同的 Pod 来进行诊断
kubectl debug (POD | NAME) --

fork # If the Node no public IP network or can not directly access (reason firewalls, etc.), use the port

-forward 模式 kubectl debug (POD | NAME) --port-forward --daemonset-ns=kube-system --daemonset-name=debug-agent

Next, we use the tool to debug an existing Pod, as follows:

kubectl debug teamcity-5997d4fc7f-ldt8w

After executing the command, it will automatically pull the relevant image and create a container open tty and into the interior of the container, and comes with a number of commonly used tools. Here we use the nslookup command to test the external domain name in the Pod (such as Analysis:

As shown above, so that not every Commissioning for network problems, application problems and the installation of various tools, not that time consuming, sometimes relatively nowhere network hurt.



According to “auscultation” step, we need to get specific intelligence can address the problem. For example, why not Pod scheduling, resources (CPU, memory, etc.) is less than, or all the nodes do not satisfy the scheduling requirements (such as specifying the “nodeName” mandatory requirements Pod scheduling to a node, while the node goes down). Only know the specific reasons, we can adjust and process for the case, until you resolve the problem.

In general, Pod problem we encountered more, here be lessons learned.

  • Pod has been in a Pending state, was diagnosed as a lack of resources

Pending this case generally represented pod is not scheduled to a node. Usually this is caused because of insufficient resources.

Solutions include:

  1. Add Work node

  2. Removing a portion of Pod to free up resources

  3. Reduction of the current resource constraints of Pod


  • Pod been in Waiting state, by pulling the failure diagnosis of the mirror

If a pod stuck in Waiting state, it means that the pod has been tuned to the node, but not up and running.

Solutions include:

  1. Check the network, if it is a network problem, the guarantee smooth network, consider using a proxy or international network (part of the domain name in the domestic network can not access, such as “”)

  2. If the time-out is pulled, consider using a mirror accelerator (such as using a mirror or acceleration Ali cloud address of the cloud provider Tencent), may also be considered appropriate adjustments timeout

  3. Try using docker pull to verify that the mirror can be properly pulled


  • Pod has been in CrashLoopBackOff state, startup timeout for the health check after check to withdraw

CrashLoopBackOff state description of the container has been launched, but the abnormal exit. Typically this Pod reboot count is greater than zero.

Solutions include:

  1. Retry settings appropriate health check threshold

  2. Optimize vessel performance, improve the startup speed

  3. Close Health Check


    Past content links

    Docker + Kubernetes has become the mainstream cloud computing (XXV)

    After the clouds container of how to save costs? (Twenty-six)

    Learn Kubernetes main structure (xxvii)

    Use Minikube deploy local Kubernetes cluster (28)

    Use kubectl management k8s cluster (29)

    Use Kubeadm create a cluster of k8s deployment plan (thirty)

    Use Kubeadm create a cluster of nodes deployed k8s (31)

    Clustering ideas and troubleshooting of health checks (32)



Leave a Reply