Kubernetes 切换到 Containerd

一、环境准备

  • Ubuntu 20.04 x5
  • Etcd 3.4.16
  • Kubernetes 1.21.1
  • Containerd 1.3.3

1.1、处理 IPVS

由于 Kubernetes 新版本 Service 实现切换到 IPVS,所以需要确保内核加载了 IPVS modules;以下命令将设置系统启动自动加载 IPVS 相关模块,执行完成后需要重启。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Kernel modules
cat > /etc/modules-load.d/50-kubernetes.conf <<EOF
# Load some kernel modules needed by kubernetes at boot
nf_conntrack
br_netfilter
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
EOF

# sysctl
cat > /etc/sysctl.d/50-kubernetes.conf <<EOF
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
fs.inotify.max_user_watches=525000
EOF

重启完成后务必检查相关 module 加载以及内核参数设置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# check ipvs modules
➜ ~ lsmod | grep ip_vs
ip_vs_sed 16384 0
ip_vs_nq 16384 0
ip_vs_fo 16384 0
ip_vs_sh 16384 0
ip_vs_dh 16384 0
ip_vs_lblcr 16384 0
ip_vs_lblc 16384 0
ip_vs_wrr 16384 0
ip_vs_rr 16384 0
ip_vs_wlc 16384 0
ip_vs_lc 16384 0
ip_vs 155648 22 ip_vs_wlc,ip_vs_rr,ip_vs_dh,ip_vs_lblcr,ip_vs_sh,ip_vs_fo,ip_vs_nq,ip_vs_lblc,ip_vs_wrr,ip_vs_lc,ip_vs_sed
nf_conntrack 139264 1 ip_vs
nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs
libcrc32c 16384 5 nf_conntrack,btrfs,xfs,raid456,ip_vs

# check sysctl
➜ ~ sysctl -a | grep ip_forward
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0

➜ ~ sysctl -a | grep bridge-nf-call
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

1.2、安装 Containerd

Containerd 在 Ubuntu 20 中已经在默认官方仓库中包含,所以只需要 apt 安装即可:

1
2
# 其他软件包后面可能会用到,所以顺手装了
apt install containerd bridge-utils nfs-common tree -y

安装成功后可以通过执行 ctr images ls 命令验证,本章节不会对 Containerd 配置做说明,Containerd 配置文件将在 Kubernetes 安装时进行配置。

二、安装 kubernetes

2.1、安装 Etcd 集群

Etcd 对于 Kubernetes 来说是核心中的核心,所以个人还是比较喜欢在宿主机安装;宿主机安装情况下为了方便我打包了一些 *-pack 的工具包,用于快速处理:

安装 CFSSL 和 ETCD

1
2
3
4
5
6
7
8
# 下载安装包
wget https://github.com/mritd/etcd-pack/releases/download/v3.4.16/etcd_v3.4.16.run
wget https://github.com/mritd/cfssl-pack/releases/download/v1.5.0/cfssl_v1.5.0.run

# 安装 cfssl 和 etcd
chmod +x *.run
./etcd_v3.4.16.run install
./cfssl_v1.5.0.run install

安装完成后,自行调整 /etc/cfssl/etcd/etcd-csr.json 相关 IP,然后执行同目录下 create.sh 生成证书。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
➜ ~ cat /etc/cfssl/etcd/etcd-csr.json
{
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"O": "etcd",
"OU": "etcd Security",
"L": "Beijing",
"ST": "Beijing",
"C": "CN"
}
],
"CN": "etcd",
"hosts": [
"127.0.0.1",
"localhost",
"*.etcd.node",
"*.kubernetes.node",
"10.0.0.11",
"10.0.0.12",
"10.0.0.13"
]
}

# 复制到 3 台 master
➜ ~ for ip in `seq 1 3`; do scp /etc/cfssl/etcd/*.pem root@10.0.0.1$ip:/etc/etcd/ssl; done

证书生成完成后调整每台机器的 Etcd 配置文件,然后修复权限启动。

1
2
3
4
5
6
7
8
# 复制配置
for ip in `seq 1 3`; do scp /etc/etcd/etcd.cluster.yaml root@10.0.0.1$ip:/etc/etcd/etcd.yaml; done

# 修复权限
for ip in `seq 1 3`; do ssh root@10.0.0.1$ip chown -R etcd:etcd /etc/etcd; done

# 每台机器启动
systemctl start etcd

启动完成后通过 etcdctl 验证集群状态:

1
2
3
4
5
# 稳妥点应该执行 etcdctl endpoint health
➜ ~ etcdctl member list
55fcbe0adaa45350, started, etcd3, https://10.0.0.13:2380, https://10.0.0.13:2379, false
cebdf10928a06f3c, started, etcd1, https://10.0.0.11:2380, https://10.0.0.11:2379, false
f7a9c20602b8532e, started, etcd2, https://10.0.0.12:2380, https://10.0.0.12:2379, false

2.2、安装 kubeadm

kubeadm 国内用户建议使用 aliyun 的安装源:

1
2
3
4
5
6
7
8
9
10
# kubeadm
apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt update

# ebtables、ethtool kubelet 可能会用,具体忘了,反正从官方文档上看到的
apt install kubelet kubeadm kubectl ebtables ethtool -y

2.3、安装 kube-apiserver-proxy

kube-apiserver-proxy 是我自己编译的一个仅开启四层代理的 Nginx,其主要负责监听 127.0.0.1:6443 并负载到所有的 Api Server 地址(0.0.0.0:5443):

1
2
3
wget https://github.com/mritd/kube-apiserver-proxy-pack/releases/download/v1.20.0/kube-apiserver-proxy_v1.20.0.run
chmod +x *.run
./kube-apiserver-proxy_v1.20.0.run install

安装完成后根据 IP 地址不同自行调整 Nginx 配置文件,然后启动:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
➜ ~ cat /etc/kubernetes/apiserver-proxy.conf
error_log syslog:server=unix:/dev/log notice;

worker_processes auto;
events {
multi_accept on;
use epoll;
worker_connections 1024;
}

stream {
upstream kube_apiserver {
least_conn;
server 10.0.0.11:5443;
server 10.0.0.12:5443;
server 10.0.0.13:5443;
}

server {
listen 0.0.0.0:6443;
proxy_pass kube_apiserver;
proxy_timeout 10m;
proxy_connect_timeout 1s;
}
}

systemctl start kube-apiserver-proxy

2.4、安装 kubeadm-config

kubeadm-config 是一系列配置文件的组合以及 kubeadm 安装所需的必要镜像文件的打包,安装完成后将会自动配置 Containerd、ctrictl 等:

1
2
3
4
5
wget https://github.com/mritd/kubeadm-config-pack/releases/download/v1.21.1/kubeadm-config_v1.21.1.run
chmod +x *.run

# --load 选项用于将 kubeadm 所需镜像 load 到 containerd 中
./kubeadm-config_v1.21.1.run install --load

2.4.1、containerd 配置

Containerd 配置位于 /etc/containerd/config.toml,其配置如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
version = 2
# 指定存储根目录
root = "/data/containerd"
state = "/run/containerd"
# OOM 评分
oom_score = -999

[grpc]
address = "/run/containerd/containerd.sock"

[metrics]
address = "127.0.0.1:1234"

[plugins]
[plugins."io.containerd.grpc.v1.cri"]
# sandbox 镜像
sandbox_image = "k8s.gcr.io/pause:3.4.1"
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
# 开启 systemd cgroup
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

2.4.2、crictl 配置

在切换到 Containerd 以后意味着以前的 docker 命令将不再可用,containerd 默认自带了一个 ctr 命令,同时 CRI 规范会自带一个 crictl 命令;crictl 命令配置文件存放在 /etc/crictl.yaml 中:

1
2
3
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
pull-image-on-create: true

2.4.3、kubeadm 配置

kubeadm 配置目前分为 2 个,一个是用于首次引导启动的 init 配置,另一个是用于其他节点 join 到 master 的配置;其中比较重要的 init 配置如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# /etc/kubernetes/kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
# kubeadm token create
bootstrapTokens:
- token: "c2t0rj.cofbfnwwrb387890"
nodeRegistration:
# CRI 地址(Containerd)
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
runtime-cgroups: "/system.slice/containerd.service"
rotate-server-certificates: "true"
localAPIEndpoint:
advertiseAddress: "10.0.0.11"
bindPort: 5443
# kubeadm certs certificate-key
certificateKey: 31f1e534733a1607e5ba67b2834edd3a7debba41babb1fac1bee47072a98d88b
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
clusterName: "kuberentes"
kubernetesVersion: "v1.21.1"
certificatesDir: "/etc/kubernetes/pki"
# Other components of the current control plane only connect to the apiserver on the current host.
# This is the expected behavior, see: https://github.com/kubernetes/kubeadm/issues/2271
controlPlaneEndpoint: "127.0.0.1:6443"
etcd:
external:
endpoints:
- "https://10.0.0.11:2379"
- "https://10.0.0.12:2379"
- "https://10.0.0.13:2379"
caFile: "/etc/etcd/ssl/etcd-ca.pem"
certFile: "/etc/etcd/ssl/etcd.pem"
keyFile: "/etc/etcd/ssl/etcd-key.pem"
networking:
serviceSubnet: "10.66.0.0/16"
podSubnet: "10.88.0.1/16"
dnsDomain: "cluster.local"
apiServer:
extraArgs:
v: "4"
alsologtostderr: "true"
# audit-log-maxage: "21"
# audit-log-maxbackup: "10"
# audit-log-maxsize: "100"
# audit-log-path: "/var/log/kube-audit/audit.log"
# audit-policy-file: "/etc/kubernetes/audit-policy.yaml"
authorization-mode: "Node,RBAC"
event-ttl: "720h"
runtime-config: "api/all=true"
service-node-port-range: "30000-50000"
service-cluster-ip-range: "10.66.0.0/16"
# insecure-bind-address: "0.0.0.0"
# insecure-port: "8080"
# The fraction of requests that will be closed gracefully(GOAWAY) to prevent
# HTTP/2 clients from getting stuck on a single apiserver.
goaway-chance: "0.001"
# extraVolumes:
# - name: "audit-config"
# hostPath: "/etc/kubernetes/audit-policy.yaml"
# mountPath: "/etc/kubernetes/audit-policy.yaml"
# readOnly: true
# pathType: "File"
# - name: "audit-log"
# hostPath: "/var/log/kube-audit"
# mountPath: "/var/log/kube-audit"
# pathType: "DirectoryOrCreate"
certSANs:
- "*.kubernetes.node"
- "10.0.0.11"
- "10.0.0.12"
- "10.0.0.13"
timeoutForControlPlane: 1m
controllerManager:
extraArgs:
v: "4"
node-cidr-mask-size: "19"
deployment-controller-sync-period: "10s"
experimental-cluster-signing-duration: "8670h"
node-monitor-grace-period: "20s"
pod-eviction-timeout: "2m"
terminated-pod-gc-threshold: "30"
scheduler:
extraArgs:
v: "4"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false
oomScoreAdj: -900
cgroupDriver: "systemd"
kubeletCgroups: "/system.slice/kubelet.service"
nodeStatusUpdateFrequency: 5s
rotateCertificates: true
evictionSoft:
"imagefs.available": "15%"
"memory.available": "512Mi"
"nodefs.available": "15%"
"nodefs.inodesFree": "10%"
evictionSoftGracePeriod:
"imagefs.available": "3m"
"memory.available": "1m"
"nodefs.available": "3m"
"nodefs.inodesFree": "1m"
evictionHard:
"imagefs.available": "10%"
"memory.available": "256Mi"
"nodefs.available": "10%"
"nodefs.inodesFree": "5%"
evictionMaxPodGracePeriod: 30
imageGCLowThresholdPercent: 70
imageGCHighThresholdPercent: 80
kubeReserved:
"cpu": "500m"
"memory": "512Mi"
"ephemeral-storage": "1Gi"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
# kube-proxy specific options here
clusterCIDR: "10.88.0.1/16"
mode: "ipvs"
oomScoreAdj: -900
ipvs:
minSyncPeriod: 5s
syncPeriod: 5s
scheduler: "wrr"

init 配置具体含义请自行参考官方文档,相对于 init 配置,join 配置比较简单,不过需要注意的是如果需要 join 为 master 则需要 controlPlane 这部分,否则请注释掉 controlPlane

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# /etc/kubernetes/kubeadm-join.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
controlPlane:
localAPIEndpoint:
advertiseAddress: "10.0.0.12"
bindPort: 5443
certificateKey: 31f1e534733a1607e5ba67b2834edd3a7debba41babb1fac1bee47072a98d88b
discovery:
bootstrapToken:
apiServerEndpoint: "127.0.0.1:6443"
token: "c2t0rj.cofbfnwwrb387890"
# Please replace with the "--discovery-token-ca-cert-hash" value printed
# after the kubeadm init command is executed successfully
caCertHashes:
- "sha256:97590810ae34a82501717e33acfca76f16044f1a365c5ad9a1c66433c386c75c"
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
runtime-cgroups: "/system.slice/containerd.service"
rotate-server-certificates: "true"

2.5、拉起 master

在调整好配置后,拉起 master 节点只需要一条命令:

1
kubeadm init --config /etc/kubernetes/kubeadm.yaml --upload-certs --ignore-preflight-errors=Swap

拉起完成后记得保存相关 Token 以便于后续使用。

2.6、拉起其他 master

在第一个 master 启动完成后,使用 join 命令让其他 master 加入即可;需要注意的是 kubeadm-join.yaml 配置中需要替换 caCertHashes 为第一个 master 拉起后的 discovery-token-ca-cert-hash 的值。

1
kubeadm join 127.0.0.1:6443 --config /etc/kubernetes/kubeadm-join.yaml --ignore-preflight-errors=Swap

2.7、拉起其他 node

node 节点拉起与拉起其他 master 节点一样,唯一不同的是需要注释掉配置中的 controlPlane 部分。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# /etc/kubernetes/kubeadm-join.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
#controlPlane:
# localAPIEndpoint:
# advertiseAddress: "10.0.0.12"
# bindPort: 5443
# certificateKey: 31f1e534733a1607e5ba67b2834edd3a7debba41babb1fac1bee47072a98d88b
discovery:
bootstrapToken:
apiServerEndpoint: "127.0.0.1:6443"
token: "c2t0rj.cofbfnwwrb387890"
# Please replace with the "--discovery-token-ca-cert-hash" value printed
# after the kubeadm init command is executed successfully
caCertHashes:
- "sha256:97590810ae34a82501717e33acfca76f16044f1a365c5ad9a1c66433c386c75c"
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
runtime-cgroups: "/system.slice/containerd.service"
rotate-server-certificates: "true"
1
kubeadm join 127.0.0.1:6443 --config /etc/kubernetes/kubeadm-join.yaml --ignore-preflight-errors=Swap

2.8、其他处理

由于 kubelet 开启了证书轮转,所以新集群会有大量 csr 请求,批量允许即可:

1
kubectl get csr | grep Pending | awk '{print $1}' | xargs kubectl certificate approve

同时为了 master 节点也能负载 pod,需要调整污点:

1
kubectl taint nodes --all node-role.kubernetes.io/master-

后续 CNI 等不在本文内容范围内。

三、Containerd 常用操作

1
2
3
4
5
6
7
8
9
10
11
# 列出镜像
ctr images ls

# 列出 k8s 镜像
ctr -n k8s.io images ls

# 导入镜像
ctr -n k8s.io images import xxxx.tar

# 导出镜像
ctr -n k8s.io images export kube-scheduler.tar k8s.gcr.io/kube-scheduler:v1.21.1

四、资源仓库

本文中所有 *-pack 仓库地址如下:


Kubernetes 切换到 Containerd
https://mritd.com/2021/05/29/use-containerd-with-kubernetes/
作者
Kovacs
发布于
2021年5月29日
许可协议