IP地址 | Hostname | 最小配置 | Kernel Version |
192.168.74.101 | k8s-unode1 | 2CPU 4G | Linux k8s-unode1 5.15.0-43-generic |
192.168.74.102 | k8s-unode2 | 2CPU 4G | Linux k8s-unode2 5.15.0-43-generic |
192.168.74.103 | k8s-unode3 | 2CPU 4G | Linux k8s-unode3 5.15.0-43-generic |
192.168.74.104 | k8s-unode4 | 2CPU 4G | Linux k8s-unode4 5.15.0-43-generic |
对于Ubuntu安装之前的准备工作,可参考之前的文章,Ubuntu 22.04最小化安装之后的优化
基础架构图
coreDNS会proxy非集群的search(也就是pod访问外网,这个就是集群外的解析)到宿主机的/etc/resolv.conf里的nameserver,这个文件内容会和宿主机一样,Ubuntu系统会把DNS解析到127.0.0.x本地的一个DNS server,代理本地所有的DNS请求到公网,这样会导致POD无法解析到外网域名。
这里我们需要禁用Ubuntu 20.04LTS 的reslove.conf中的127.0.0.53的代理。
PS: 我们在修改/etc/reslov.conf 中DNS的 nameserver 114.114.114.114,每次重启之后,就会重置为:127.0.0.53
Override Ubuntu 20.04 DNS using systemd-resolved
打开/etc/systemd/resolved.conf,修改为:
[Resolve]DNS=114.114.114.114#FallbackDNS=#Domains=LLMNR=no#MulticastDNS=no#DNSSEC=no#Cache=yesDNSStubListener=no
LLMNR=设置的是禁止运行LLMNR(Link-Local Multicast Name Resolution),否则systemd-resolve会监听5535端口。
rm /etc/resolv.conf ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf systemctl restart systemd-resolved
参考链接:https://unix.stackexchange.com/questions/588658/override-ubuntu-20-04-dns-using-systemd-resolved
下载安装包,这里我们使用的目前最新的版本1.6.6
这里我们下载了包含了cri和containerd的安装包.
不下载包含cni的安装包,这因为containerd包含的cni的版本为0.8.6, 而我们所需要的版本为1.0.0以上.
本篇末尾,会讲解我使用containerd自带的cni版本插件所遇到的问题.
$ cd /usr/local/src/$ wget https://github.com/containerd/containerd/releases/download/v1.6.6/cri-containerd-1.6.6-linux-amd64.tar.gz$ mkdir cri-containerd$ tar zxvf cri-containerd-1.6.6-linux-amd64.tar.gz -C cri-containerd/$ cd cri-containerd/ && ls -ltotal 12drwxr-xr-x 3 root root 4096 Aug 4 17:01 etcdrwxr-xr-x 3 root root 4096 Aug 4 17:01 optdrwxr-xr-x 3 root root 4096 Jun 7 01:34 usr
拷贝对应的文件到指定的目录,这里看了一下opt目录下的内容,好像是跟谷歌的gce有关的,这里我们用不到,就不拷贝了.
$ mv etc/crictl.yaml /etc/$ mv etc/systemd/system/containerd.service /etc/systemd/system/$ mv usr/local/bin/* /usr/local/bin/$ mv usr/local/sbin/runc /usr/local/sbin/
生成containerd配置文件
$ mkdir -p /etc/containerd$ containerd config default > /etc/containerd/config.toml
修改配置文件如下
disabled_plugins = []imports = []oom_score = 0plugin_dir = ""required_plugins = []root = "/var/lib/containerd"state = "/run/containerd"temp = ""version = 2[cgroup] path = ""[debug] address = "" format = "" gid = 0 level = "" uid = 0[grpc] address = "/run/containerd/containerd.sock" gid = 0 max_recv_message_size = 16777216 max_send_message_size = 16777216 tcp_address = "" tcp_tls_ca = "" tcp_tls_cert = "" tcp_tls_key = "" uid = 0[metrics] address = "" grpc_histogram = false[plugins] [plugins."io.containerd.gc.v1.scheduler"] deletion_threshold = 0 mutation_threshold = 100 pause_threshold = 0.02 schedule_delay = "0s" startup_delay = "100ms" [plugins."io.containerd.grpc.v1.cri"] device_ownership_from_security_context = false disable_apparmor = false disable_cgroup = false disable_hugetlb_controller = true disable_proc_mount = false disable_tcp_service = true enable_selinux = false enable_tls_streaming = false enable_unprivileged_icmp = false enable_unprivileged_ports = false ignore_image_defined_volumes = false max_concurrent_downloads = 3 max_container_log_line_size = 16384 netns_mounts_under_state_dir = false restrict_oom_score_adj = false #1. 修改K8S基础镜像地址 sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6" selinux_category_range = 1024 stats_collect_period = 10 stream_idle_timeout = "4h0m0s" stream_server_address = "127.0.0.1" stream_server_port = "0" systemd_cgroup = false tolerate_missing_hugetlb_controller = true unset_seccomp_profile = "" [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d" conf_template = "" ip_pref = "" max_conf_num = 1 [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "runc" disable_snapshot_annotations = true discard_unpacked_layers = false ignore_rdt_not_enabled_errors = false no_pivot = false snapshotter = "overlayfs" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] BinaryName = "" CriuImagePath = "" CriuPath = "" CriuWorkPath = "" IoGid = 0 IoUid = 0 NoNewKeyring = false NoPivotRoot = false Root = "" ShimCgroup = "" #2. 结合 runc 使用 systemd cgroup 驱动 参考k8s设置地址: https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#containerd SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "" [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options] [plugins."io.containerd.grpc.v1.cri".image_decryption] key_model = "node" [plugins."io.containerd.grpc.v1.cri".registry] config_path = "" [plugins."io.containerd.grpc.v1.cri".registry.auths] [plugins."io.containerd.grpc.v1.cri".registry.configs] [plugins."io.containerd.grpc.v1.cri".registry.headers] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] #3. 设置镜像仓库加速 [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://fz5yth0r.mirror.aliyuncs.com","https://dockerhub.mirrors.nwafu.edu.cn","https://docker.mirrors.ustc.edu.cn","https://registry.docker-cn.com"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gcr.io"] endpoint = ["https://gcr.mirrors.ustc.edu.cn"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"] endpoint = ["https://registry.cn-hangzhou.aliyuncs.com/google_containers"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"] endpoint = ["https://quay.mirrors.ustc.edu.cn"] [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming] tls_cert_file = "" tls_key_file = "" [plugins."io.containerd.internal.v1.opt"] path = "/opt/containerd" [plugins."io.containerd.internal.v1.restart"] interval = "10s" [plugins."io.cont:qainerd.internal.v1.tracing"] sampling_ratio = 1.0 service_name = "containerd" [plugins."io.containerd.metadata.v1.bolt"] content_sharing_policy = "shared" [plugins."io.containerd.monitor.v1.cgroups"] no_prometheus = false [plugins."io.containerd.runtime.v1.linux"] no_shim = false runtime = "runc" runtime_root = "" shim = "containerd-shim" shim_debug = false [plugins."io.containerd.runtime.v2.task"] platforms = ["linux/amd64"] sched_core = false [plugins."io.containerd.service.v1.diff-service"] default = ["walking"] [plugins."io.containerd.service.v1.tasks-service"] rdt_config_file = "" [plugins."io.containerd.snapshotter.v1.aufs"] root_path = "" [plugins."io.containerd.snapshotter.v1.btrfs"] root_path = "" [plugins."io.containerd.snapshotter.v1.devmapper"] async_remove = false base_image_size = "" discard_blocks = false fs_options = "" fs_type = "" pool_name = "" root_path = "" [plugins."io.containerd.snapshotter.v1.native"] root_path = "" [plugins."io.containerd.snapshotter.v1.overlayfs"] root_path = "" upperdir_label = false [plugins."io.containerd.snapshotter.v1.zfs"] root_path = "" [plugins."io.containerd.tracing.processor.v1.otlp"] endpoint = "" insecure = false protocol = ""[proxy_plugins][stream_processors] [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"] accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"] args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"] env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"] path = "ctd-decoder" returns = "application/vnd.oci.image.layer.v1.tar" [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"] accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"] args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"] env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"] path = "ctd-decoder" returns = "application/vnd.oci.image.layer.v1.tar+gzip"[timeouts] "io.containerd.timeout.bolt.open" = "0s" "io.containerd.timeout.shim.cleanup" = "5s" "io.containerd.timeout.shim.load" = "5s" "io.containerd.timeout.shim.shutdown" = "3s" "io.containerd.timeout.task.state" = "2s"[ttrpc] address = "" gid = 0 uid = 0
启动containerd
# 启动,并设置为开机启动$ systemctl daemon-reload &&systemctl enable containerd && systemctl start containerd# 查看是否启动成功$ systemctl status containerd
这里是基于nginx local proxy来实现kubernetes集群的HA, 可以不用SLB也可以在公有云的VPC中部署;需要在所有的机器上运行nginx来进行实现.
修改所有节点的hosts文件,追加如下内容:
$ cat>>/etc/hosts <
这里大家也可以不定义hosts,在nginx local proxy时可以使用IP地址.但是每次修改IP需要重新加载,建议使用域名.
$ apt install nginx$ mv /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak$ mkdir -p /etc/kubernetes$ vim /etc/kubernetes/kube-nginx.conf
配置文件内容如下:
error_log /var/log/nginx/error.log notice;user www-data;pid /run/nginx.pid;include /etc/nginx/modules-enabled/*.conf;worker_processes 2;worker_rlimit_nofile 130048;worker_shutdown_timeout 10s;events { multi_accept on; use epoll; worker_connections 16384;}stream { upstream kube_apiserver { least_conn; server apiserver01.k8s.local:6443 max_fails=3 fail_timeout=10s; server apiserver02.k8s.local:6443 max_fails=3 fail_timeout=10s; server apiserver03.k8s.local:6443 max_fails=3 fail_timeout=10s; } server { listen 8443; proxy_pass kube_apiserver; proxy_timeout 10m; proxy_connect_timeout 1s; }}http { aio threads; aio_write on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 75s; keepalive_requests 100; reset_timedout_connection on; server_tokens off; autoindex off; server { listen 8081; location /healthz { access_log off; return 200; } location /stub_status { stub_status on; access_log off; } }}
启动nginx local proxy
$ ln -s /etc/kubernetes/kube-nginx.conf /etc/nginx/nginx.conf$ nginx -t$ systemctl daemon-reload$ systemctl restart nginx$ ss -lntp
node节点可以按需安装kubectl
$ apt install -y kubeadm kubectl kubelet
这里我们需要替换通过仓库安装额cni插件,通过仓库安装的cni插件版本为0.8.6;会导致incompatible CNI versions问题; 我们需要进行更新.详情可以参考本篇最后面的章节.
安装cni插件
下载地址
这里我们下载CNI plugins v1.1.1
$ wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz$ mkdir cni-plugins/$ tar zxvf cni-plugins-linux-amd64-v1.1.1.tgz -C cni-plugins/$ mkdir -p /opt/cni/bin$ mv cni-plugins/* /opt/cni/bin/
打印默认的配置信息
$ kubeadm config print init-defaults > initconfig.yml
默认的配置文件如下:
apiVersion: kubeadm.k8s.io/v1beta3bootstrapTokens:- groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authenticationkind: InitConfigurationlocalAPIEndpoint: advertiseAddress: 1.2.3.4 bindPort: 6443nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent name: node taints: null---apiServer: timeoutForControlPlane: 4m0sapiVersion: kubeadm.k8s.io/v1beta3certificatesDir: /etc/kubernetes/pkiclusterName: kubernetescontrollerManager: {}dns: {}etcd: local: dataDir: /var/lib/etcdimageRepository: registry.cn-hangzhou.aliyuncs.com/google_containerskind: ClusterConfigurationkubernetesVersion: 1.24.3networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12scheduler: {}
我们主要关注和只保留ClusterConfiguration的段,然后修改下,最终版本如下:
---#使用 kubeadm API 定制组件#https://kubernetes.io/zh-cn/docs/reference/config-api/kubeadm-config.v1beta3/#通用配置apiVersion: kubeadm.k8s.io/v1beta3kind: ClusterConfigurationimageRepository: registry.cn-hangzhou.aliyuncs.com/google_containerscertificatesDir: /etc/kubernetes/pkiclusterName: kuberneteskubernetesVersion: 1.24.3controlPlaneEndpoint: apiserver.k8s.local:8443 # 单个master的话写master的ip或者不写networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16#kube-apiserver配置定制apiServer: timeoutForControlPlane: 4m0s extraArgs: runtime-config: api/all=true storage-backend: etcd3 etcd-servers: https://192.168.74.101:2379,https://192.168.74.102:2379,https://192.168.74.103:2379 certSANs: - 10.96.0.1 # service cidr的第一个ip - 127.0.0.1 # 多个master的时候负载均衡出问题了能够快速使用localhost调试 - localhost - apiserver.k8s.local # 负载均衡的域名或者vip - 192.168.74.101 - 192.168.74.102 - 192.168.74.103 - apiserver01.k8s.local - apiserver02.k8s.local - apiserver03.k8s.local - master - kubernetes - kubernetes.default - kubernetes.default.svc - kubernetes.default.svc.cluster.local extraVolumes: #与宿主机时区保持一致 - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true#kube-controller-manager配置定制controllerManager: extraArgs: bind-address: "0.0.0.0" #如果有多IP建议写死 experimental-cluster-signing-duration: 876000h extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: truescheduler: extraArgs: bind-address: "0.0.0.0" extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: trueetcd: local: dataDir: /var/lib/etcd serverCertSANs: # server和peer的localhost,127,::1都默认自带的不需要写 - master - 192.168.74.101 - 192.168.74.102 - 192.168.74.103 - etcd01.k8s.local - etcd02.k8s.local - etcd03.k8s.local peerCertSANs: - master - 192.168.74.101 - 192.168.74.102 - 192.168.74.103 - etcd01.k8s.local - etcd02.k8s.local - etcd03.k8s.local extraArgs: # 官方暂时没有extraVolumes auto-compaction-retention: "1h" max-request-bytes: "33554432" quota-backend-bytes: "8589934592" enable-v2: "false" # disable etcd v2 api---apiVersion: kubeproxy.config.k8s.io/v1alpha1kind: KubeProxyConfiguration # https://godoc.org/k8s.io/kube-proxy/config/v1alpha1#KubeProxyConfigurationmode: ipvs # or iptablesipvs: excludeCIDRs: null minSyncPeriod: 0s scheduler: "rr" # 调度算法 syncPeriod: 15siptables: masqueradeAll: true masqueradeBit: 14 minSyncPeriod: 0s syncPeriod: 30s---apiVersion: kubelet.config.k8s.io/v1beta1kind: KubeletConfiguration # https://godoc.org/k8s.io/kubelet/config/v1beta1#KubeletConfigurationcgroupDriver: systemdfailSwapOn: true # 如果开启swap则设置为false
检查配置文件
$ kubeadm init --config initconfig.yaml --dry-run
warning是由于我们使用了8443端口,nginx local proxy进行本地负载; 他会覆盖默认的6443端口;可以忽略
port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
预先拉取镜像
$ kubeadm config images pull --config initconfig.yml[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.24.3[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.24.3[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.24.3[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.24.3[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.7[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.3-0[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.6
下面的init操作只在第一个master上执行
$ kubeadm init --config initconfig.yml --upload-certs
输出的信息主要内容如下:
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key[addons] Applied essential addon: CoreDNSW0803 16:16:22.731226 40287 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:-----------------------------配置kubectl---------------------------------- mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/You can now join any number of the control-plane node running the following command on each as root:------------------------------加入master节点----------------------------------------- kubeadm join apiserver.k8s.local:8443 --token asexb7.8yfzenkoco264qf5 \ --discovery-token-ca-cert-hash sha256:a207febabb4d9fecc98f76cb4cc4f93563c7369e0d5cf9a970c8ffcf780601dc \ --control-plane --certificate-key 183cef7625aec22a0153aa603bcdb36e5c2dd6267fa0e0ee2875f1f02c69709ePlease note that the certificate-key gives access to cluster sensitive data, keep it secret!As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.Then you can join any number of worker nodes by running the following on each as root:----------------------------加入node节点--------------------------------------------------kubeadm join apiserver.k8s.local:8443 --token asexb7.8yfzenkoco264qf5 \ --discovery-token-ca-cert-hash sha256:a207febabb4d9fecc98f76cb4cc4f93563c7369e0d5cf9a970c8ffcf780601dc
可以依据上面的命令配置kubectl并加入其它节点.
配置可选项
初始化集群的(kubeadm init config file)的yaml信息会存放在集群的configmap中,可以通过以下命令获取
$ kubectl -n kube-system get configmap kubeadm-config -o yaml
可以通过以下命令去掉master节点上的污点,生产环境不要这么做
$ kubectl taint nodes --all node-role.kubernetes.io/master-
token忘记或者ttl过期了,怎么查看和重新创建
$ kubeadm token list$ kubeadm token create --print-join-command
sha256的值可以通过下列命令获取
$ openssl x509 -pubkey -in \ /etc/kubernetes/pki/ca.crt | \ openssl rsa -pubin -outform der 2>/dev/null | \ openssl dgst -sha256 -hex | sed 's/^.* //'
设置kubectl的补全脚本
$ kubectl completion bash > /etc/bash_completion.d/kubectl
按照前面的步骤做
获取加入命令在其中任意一台master上做即可.
加入的命令在具体需要加入的节点上执行.
1: 系统初始化设置2: 设置主机名以及时间同步等3: 安装containerd和启动nginx local proxy4: 配置软件源,可以只安装kubeadm和kubelet5: 按照上面的命令加入节点
#这里可以看到token,但是这个token活过期,可以看到后面的TTL值,还有23个小时过期$ kubeadm token listTOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPSasexb7.8yfzenkoco264qf5 23h 2022-08-04T08:16:22Z authentication,signing system:bootstrappers:kubeadm:default-node-tokeniqgk0q.uh25jj8wytrjiyr9 1h 2022-08-03T10:16:21Z Proxy for managing TTL for the kubeadm-certs secret #过下列命令获取获取sha256的值$ openssl x509 -pubkey -in \ /etc/kubernetes/pki/ca.crt | \ openssl rsa -pubin -outform der 2>/dev/null | \ openssl dgst -sha256 -hex | sed 's/^.* //'a207febabb4d9fecc98f76cb4cc4f93563c7369e0d5cf9a970c8ffcf780601dc#拼接处加入集群的命令如下kubeadm join apiserver.k8s.local:8443 --token asexb7.8yfzenkoco264qf5 \ --discovery-token-ca-cert-hash sha256:a207febabb4d9fecc98f76cb4cc4f93563c7369e0d5cf9a970c8ffcf780601dc
如果此时token已经过期,我们可以通过如下命令直接获取加入集群的信息
$ kubeadm token create --print-join-commandkubeadm join apiserver.k8s.local:8443 --token qek2d8.b4v1w3mxzp00mn84 --discovery-token-ca-cert-hash sha256:a207febabb4d9fecc98f76cb4cc4f93563c7369e0d5cf9a970c8ffcf780601dc#可以看到下面已经生成了一个新的tocken$ kubeadm token listTOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPSasexb7.8yfzenkoco264qf5 23h 2022-08-04T08:16:22Z authentication,signing system:bootstrappers:kubeadm:default-node-tokeniqgk0q.uh25jj8wytrjiyr9 1h 2022-08-03T10:16:21Z Proxy for managing TTL for the kubeadm-certs secret qek2d8.b4v1w3mxzp00mn84 23h 2022-08-04T09:04:57Z authentication,signing system:bootstrappers:kubeadm:default-node-token
添加ROLES
ROLES只是一个label,可以打 label 让其显示指定的字符串,想显示啥就 node-role.kubernetes.io/xxxx
kubectl label node k8s-unode4 node-role.kubernetes.io/worker=""
验证ROLES
$ kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-unode1 NotReady control-plane 60m v1.24.3k8s-unode2 NotReady control-plane 56m v1.24.3k8s-unode3 NotReady control-plane 42m v1.24.3k8s-unode4 NotReady 41m v1.24.3$ kubectl label node k8s-unode4 node-role.kubernetes.io/worker=""node/k8s-unode4 labeled$ kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-unode1 NotReady control-plane 60m v1.24.3k8s-unode2 NotReady control-plane 56m v1.24.3k8s-unode3 NotReady control-plane 43m v1.24.3k8s-unode4 NotReady worker 42m v1.24.3
在任意一台master上执行即可.
$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml$ kubectl apply -f kube-flannel.yml
查看flanel的pod状态,新版的flannel创建了自己独立的namespace,之前是在kube-system名称空间下
$ kubectl -n kube-flannel get podNAME READY STATUS RESTARTS AGEkube-flannel-ds-7r7g6 0/1 Init:0/2 0 90skube-flannel-ds-rsmh5 0/1 Init:0/2 0 90skube-flannel-ds-w98q8 0/1 Init:0/2 0 90skube-flannel-ds-zvfpl 0/1 Init:0/2 0 90s
$ kubectl -n kube-flannel get podNAME READY STATUS RESTARTS AGEkube-flannel-ds-gwxqj 1/1 Running 0 18mkube-flannel-ds-k9jc5 1/1 Running 0 18mkube-flannel-ds-pclv4 1/1 Running 0 18mkube-flannel-ds-tf6kv 1/1 Running 0 18m
$ kubectl -n kube-system get pod -o wide
等待 kube-system 空间下的 pod 都是 Running 后我们来测试下集群可用性
$ cat<
确认pod已正常启动
$ kubectl get pod -o wide -w$ kubectl get po,svc -o wide
验证集群DNS解析
$ kubectl exec -ti busybox -- nslookup kubernetesServer: 10.96.0.10Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.localName: kubernetesAddress 1: 10.96.0.1 kubernetes.default.svc.cluster.local
#删除测试pod$ kubectl delete -f nginx-test.yml
官方文档: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/high-availability/
此时我们的网络插件flannel已经正常运行了. 我们想确认CoreDNS组件是否正确的运行,确发现CoreDNS的pod一直处于ContainerCreating状态.
$ kubectl -n kube-system get podNAME READY STATUS RESTARTS AGEcoredns-7f74c56694-btdnp 0/1 ContainerCreating 0 20mcoredns-7f74c56694-rwcr8 0/1 ContainerCreating 0 20metcd-k8s-unode1 1/1 Running 8 126metcd-k8s-unode2 1/1 Running 5 121metcd-k8s-unode3 1/1 Running 0 108mkube-apiserver-k8s-unode1 1/1 Running 11 126mkube-apiserver-k8s-unode2 1/1 Running 4 121mkube-apiserver-k8s-unode3 1/1 Running 0 108mkube-controller-manager-k8s-unode1 1/1 Running 0 29mkube-controller-manager-k8s-unode2 1/1 Running 1 (25m ago) 25mkube-controller-manager-k8s-unode3 1/1 Running 0 27mkube-proxy-5pkvz 1/1 Running 0 122mkube-proxy-fm6xc 1/1 Running 0 126mkube-proxy-jqqm2 1/1 Running 0 107mkube-proxy-ww9w7 1/1 Running 0 108mkube-scheduler-k8s-unode1 1/1 Running 2 126mkube-scheduler-k8s-unode2 1/1 Running 1 122mkube-scheduler-k8s-unode3 1/1 Running 0 108m
通过命令查看其无法正常启动问题
$ kubectl -n kube-system describe pod coredns-7f74c56694-btdnp.............. Warning FailedCreatePodSandBox 3m9s (x61 over 16m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "60e0230279bd5b31031f82edbdabebd970412acb216998728b1c52152511ea80": plugin type="bridge" failed (add): incompatible CNI versions; config is "1.0.0", plugin supports ["0.1.0" "0.2.0" "0.3.0" "0.3.1" "0.4.0"]
这个问题的主要内容是: incompatible CNI versions.
当你去网上去搜索解决方案大都是 说 使用 containerd 的版本不对,也就是说, 确实是 CNI 的版本是有问题的.
具体的解决方案在官方网站中是这样说的:
官网的参考链接
当前containerd的版本是 v1.6.6
$ containerd -vcontainerd github.com/containerd/containerd v1.6.6
在containerd的issues中有一个人遇到了 v1.6.4 版本中出现了 incompatible CNI versions 这个问题。
https://github.com/containerd/containerd/issues/6876
仔细看了一下,需要安装一个新版本的1.0.0版本的bridge.
替换CNI
# 下载包$ wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz# 备份一下驱动$ cp -ra /opt/cni /opt/cni.bak# 解压到指定目录$ tar -zxvf cni-plugins-linux-amd64-v1.1.1.tgz -C /opt/cni/bin # 重启 kubelet$ systemctl restart kubelet
然后在所有的节点上都执行一遍上面的内容, 然后就可以了。
留言与评论(共有 0 条评论) “” |