Kubernetes v1.12之前的版本,资源指标监控主要是heapster来实现,在Kubernetes v1.12之后的版本,heapster逐渐被废弃了,取而代之的是metrics-server

metrics-server作为k8s插件部署在k8s集群内,提供资源指标监控数据。其yaml部署文件在这里

但是这个yaml清单是没办法直接使用的,会报一些错误,下面我记录下来都会遇到哪些问题以及其解决方法。

下载yaml部署清单

mkdir metrics-server
cd metrics-server

for file in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml; do
  wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/$file
done

metrics-server-deployment.yaml文件中使用了两个镜像,镜像地址是k8s.gcr.io,由于国内网络限制,改为阿里云的镜像地址

  • registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
  • registry.cn-hangzhou.aliyuncs.com/google_containers/addon-resizer:1.8.11

metrics-server-deployment.yaml中metrics-server-nanny镜像的启动参数有几个变量,不能直接使用,需要修改如下:

        command:
          - /pod_nanny
          - --config-dir=/etc/config
          - --cpu=100m
          - --extra-cpu=20m
          - --memory=100Mi
          - --extra-memory=10Mi
          - --threshold=5
          - --deployment=metrics-server-v0.3.6
          - --container=metrics-server
          - --poll-period=300000
          - --estimator=exponential
          # Specifies the smallest cluster (defined in number of nodes)
          # resources will be scaled to.
          - --minClusterSize=2
          # Use kube-apiserver metrics to avoid periodically listing nodes.
          - --use-metrics=true

如果你的Kubernetes集群的版本是1.18及以下版本,还需要修改metrics-server-deployment.yaml

# 删除
      securityContext:
        seccompProfile:
          type: RuntimeDefault
# 往Deployment.spec加入
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: 'docker/default'

应用yaml清单

[root@k8s-master01 metrics-server]# kubectl apply -f .

遇到的第一个问题

此时,metrics-server起不来,报以下错误

[root@k8s-master01 metrics-server]# kubectl -n kube-system logs metrics-server-v0.3.6-5c99fdbfd9-wmfzv metrics-server
Flag --deprecated-kubelet-completely-insecure has been deprecated, This is rarely the right option, since it leaves kubelet communication completely insecure.  If you encounter auth errors, make sure you've enabled token webhook auth on the Kubelet, and if you're in a test cluster with self-signed Kubelet certificates, consider using kubelet-insecure-tls instead.
I1028 01:21:02.451736       1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1028 01:21:03.243108       1 secure_serving.go:116] Serving securely on [::]:443
E1028 01:21:33.287357       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused]
E1028 01:22:03.279839       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused]
E1028 01:22:33.276133       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused]
E1028 01:23:03.277724       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): Get http://172.16.135.12:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.12:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): Get http://172.16.135.13:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.13:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): Get http://172.16.135.11:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.11:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): Get http://172.16.135.10:10255/stats/summary?only_cpu_and_memory=true: dial tcp 172.16.135.10:10255: connect: connection refused]

解决思路:

修改metrics-server-deployment.yaml,修改pod启动命令参数:

      containers:
      - name: metrics-server
        image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
        command:
        - /metrics-server
        - --metric-resolution=30s
        - --kubelet-insecure-tls=true
        - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

应用yaml:

kubectl apply -f metrics-server-deployment.yaml

遇到的第二个问题

此时metrics-server还起不来,查看日志如下

[root@k8s-master01 metrics-server]# kubectl -n kube-system logs metrics-server-v0.3.6-5fd56d6f48-h54pb metrics-server
I1028 01:34:09.685892       1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1028 01:34:10.270012       1 secure_serving.go:116] Serving securely on [::]:443
[root@k8s-master01 metrics-server]# kubectl -n kube-system logs -f metrics-server-v0.3.6-5fd56d6f48-h54pb metrics-server
I1028 01:34:09.685892       1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1028 01:34:10.270012       1 secure_serving.go:116] Serving securely on [::]:443
E1028 01:34:40.330891       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node03: unable to fetch metrics from Kubelet k8s-node03 (172.16.135.13): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:k8s-node01: unable to fetch metrics from Kubelet k8s-node01 (172.16.135.11): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:k8s-node02: unable to fetch metrics from Kubelet k8s-node02 (172.16.135.12): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:k8s-master01: unable to fetch metrics from Kubelet k8s-master01 (172.16.135.10): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]

解决思路:

修改resource-reader.yaml,添加对nodes/stats资源的访问权限

# 以上省略
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - namespaces
  - nodes/stats
# 以下省略

应用yaml:

kubectl apply -f resource-reader.yaml

稍等一会,等pod重启,或主动删除pod让其重建,查看状态为启动状态了:

[root@k8s-master01 metrics-server]# kubectl -n kube-system get pod
NAME                                     READY   STATUS    RESTARTS   AGE
metrics-server-v0.3.6-7d8d945c9c-cwkj7   2/2     Running   0          4m44s

查看监控数据

稍等几分钟,让metrics-server采集数据,即可使用kubectl top命令查看监控指标数据了

[root@k8s-master01 metrics-server]# kubectl top nodes
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-master01   267m         13%    1180Mi          62%       
k8s-node01     67m          1%     704Mi           8%        
k8s-node02     72m          1%     724Mi           9%        
k8s-node03     56m          1%     696Mi           8%

[root@k8s-master01 metrics-server]# kubectl top pods -n kube-system
NAME                                     CPU(cores)   MEMORY(bytes)   
coredns-f9fd979d6-5tvpb                  4m           14Mi            
coredns-f9fd979d6-zvb6h                  2m           16Mi            
etcd-k8s-master01                        8m           125Mi           
kube-apiserver-k8s-master01              35m          357Mi           
kube-controller-manager-k8s-master01     6m           67Mi            
kube-flannel-ds-259c7                    2m           20Mi            
kube-flannel-ds-6bct7                    1m           20Mi            
kube-flannel-ds-827cg                    1m           20Mi            
kube-flannel-ds-8s2b5                    1m           10Mi            
kube-proxy-pwkb2                         1m           22Mi            
kube-proxy-q6gs2                         1m           18Mi            
kube-proxy-r5ns5                         1m           29Mi            
kube-proxy-rxvz4                         1m           19Mi            
kube-scheduler-k8s-master01              2m           28Mi            
metrics-server-v0.3.6-7d8d945c9c-cwkj7   13m          30Mi

标签: none

添加新评论