篇二:Kubernetes部署资源指标监控组件metrics-server
在Kubernetes v1.12
之前的版本,资源指标监控主要是heapster
来实现,在Kubernetes v1.12
之后的版本,heapster
逐渐被废弃了,取而代之的是metrics-server
。
metrics-server
作为k8s插件部署在k8s集群内,提供资源指标监控数据。其yaml部署文件在这里。
但是这个yaml清单是没办法直接使用的,会报一些错误,下面我记录下来都会遇到哪些问题以及其解决方法。
下载yaml部署清单
mkdir metrics-server
cd metrics-server
for file in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml; do
wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/$file
done
metrics-server-deployment.yaml
文件中使用了两个镜像,镜像地址是k8s.gcr.io
,由于国内网络限制,改为阿里云的镜像地址
- registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
- registry.cn-hangzhou.aliyuncs.com/google_containers/addon-resizer:1.8.11
metrics-server-deployment.yaml
中metrics-server-nanny镜像的启动参数有几个变量,不能直接使用,需要修改如下:
command:
- /pod_nanny
- --config-dir=/etc/config
- --cpu=100m
- --extra-cpu=20m
- --memory=100Mi
- --extra-memory=10Mi
- --threshold=5
- --deployment=metrics-server-v0.3.6
- --container=metrics-server
- --poll-period=300000
- --estimator=exponential
# Specifies the smallest cluster (defined in number of nodes)
# resources will be scaled to.
- --minClusterSize=2
# Use kube-apiserver metrics to avoid periodically listing nodes.
- --use-metrics=true
如果你的Kubernetes集群的版本是1.18及以下版本,还需要修改metrics-server-deployment.yaml
# 删除
securityContext:
seccompProfile:
type: RuntimeDefault
# 往Deployment.spec加入
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
应用yaml清单
[root@k8s-master01 metrics-server]# kubectl apply -f .
遇到的第一个问题
此时,metrics-server起不来,报以下错误
[root@k8s-master01 metrics-server]# kubectl -n kube-system logs metrics-server-v0.3.6-5c99fdbfd9-wmfzv metrics-server
Flag --deprecated-kubelet-completely-insecure has been deprecated, This is rarely the right option, since it leaves kubelet communication completely insecure. If you encounter auth errors, make sure you've enabled token webhook auth on the Kubelet, and if you're in a test cluster with self-signed Kubelet certificates, consider using kubelet-insecure-tls instead.
I1028 01:21:02.451736 1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1028 01:21:03.243108 1 secure_serving.go:116] Serving securely on [::]:443
E1028 01:21:33.287357 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused]
E1028 01:22:03.279839 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused]
E1028 01:22:33.276133 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused]
E1028 01:23:03.277724 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused]
解决思路:
修改metrics-server-deployment.yaml,修改pod启动命令参数:
containers:
- name: metrics-server
image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls=true
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
应用yaml:
kubectl apply -f metrics-server-deployment.yaml
遇到的第二个问题
此时metrics-server还起不来,查看日志如下
[root@k8s-master01 metrics-server]# kubectl -n kube-system logs metrics-server-v0.3.6-5fd56d6f48-h54pb metrics-server
I1028 01:34:09.685892 1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1028 01:34:10.270012 1 secure_serving.go:116] Serving securely on [::]:443
[root@k8s-master01 metrics-server]# kubectl -n kube-system logs -f metrics-server-v0.3.6-5fd56d6f48-h54pb metrics-server
I1028 01:34:09.685892 1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1028 01:34:10.270012 1 secure_serving.go:116] Serving securely on [::]:443
E1028 01:34:40.330891 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]
解决思路:
修改resource-reader.yaml,添加对nodes/stats
资源的访问权限
# 以上省略
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- namespaces
- nodes/stats
# 以下省略
应用yaml:
kubectl apply -f resource-reader.yaml
稍等一会,等pod重启,或主动删除pod让其重建,查看状态为启动状态了:
[root@k8s-master01 metrics-server]# kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
metrics-server-v0.3.6-7d8d945c9c-cwkj7 2/2 Running 0 4m44s
查看监控数据
稍等几分钟,让metrics-server
采集数据,即可使用kubectl top
命令查看监控指标数据了
[root@k8s-master01 metrics-server]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master01 267m 13% 1180Mi 62%
k8s-node01 67m 1% 704Mi 8%
k8s-node02 72m 1% 724Mi 9%
k8s-node03 56m 1% 696Mi 8%
[root@k8s-master01 metrics-server]# kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-f9fd979d6-5tvpb 4m 14Mi
coredns-f9fd979d6-zvb6h 2m 16Mi
etcd-k8s-master01 8m 125Mi
kube-apiserver-k8s-master01 35m 357Mi
kube-controller-manager-k8s-master01 6m 67Mi
kube-flannel-ds-259c7 2m 20Mi
kube-flannel-ds-6bct7 1m 20Mi
kube-flannel-ds-827cg 1m 20Mi
kube-flannel-ds-8s2b5 1m 10Mi
kube-proxy-pwkb2 1m 22Mi
kube-proxy-q6gs2 1m 18Mi
kube-proxy-r5ns5 1m 29Mi
kube-proxy-rxvz4 1m 19Mi
kube-scheduler-k8s-master01 2m 28Mi
metrics-server-v0.3.6-7d8d945c9c-cwkj7 13m 30Mi