K3s高可用etcd¶
备注
详细部署和异常排查实践请参考 部署TLS认证的etcd集群
etcd - 分布式kv存储 是Kuberntes主流的持久化数据存储,提供了分布式存储能力。在 K3s
的高可用部署环境,使用 external etcd
是最稳定可靠的部署模型。
在 树莓派堆叠 环境采用3台 树莓派Raspberry Pi 3 硬件部署3节点 etcd - 分布式kv存储 集群:
主机IP |
主机名 |
---|---|
192.168.7.11 |
x-k3s-m-1 |
192.168.7.12 |
x-k3s-m-2 |
192.168.7.13 |
x-k3s-m-3 |
下载etcd¶
etcd-io / etcd Releases 提供了最新版本,当前
3.5.2
:
ETCD_VER=v3.5.4
KERNEL=`uname -s` # Linux / Darwin
ARCH=`uname -m` # x86_64 / aarch64
if [ ${KERNEL} == "Linux" ];then
KERNEL="linux"
elif [ ${KERNEL} == "Darwin" ];then
KERNEL="darwin"
else
echo "Not Linux or macOS, exit!"
exit 0
fi
if [ ${ARCH} == "x86_64" ];then
ARCH="amd64"
elif [ ${ARCH} == "aarch64" ];then
ARCH="arm64"
else
echo "Not x86_64 or aarch64, exit!"
exit 0
fi
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz -o /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
sudo mv /tmp/etcd-download-test/etcd /usr/local/bin
sudo mv /tmp/etcd-download-test/etcdctl /usr/local/bin
sudo mv /tmp/etcd-download-test/etcdutl /usr/local/bin
生成和分发服务器证书¶
使用 cfssl
签发证书,不过 Alpine Linux 只在 edge
仓库提供了 cfssl
。当前我使用alpine linux的stable仓库,不能同时激活stable和edge。
cfssl
官方提供了linux amd64版本,也可以在 macOS 上通过 brew 安装。不过我为了能够独立在 树莓派堆叠 环境完成所有工作,有两种方法安装 cfssl
:
在 Alpine Linux 环境节点
x-k3s-a-0
建立容器运行一个开发环境
x-dev
再按照 etcd集群TLS设置 方法完成
cfssl
安装
直接采用 Alpine Linux 的
edge/testing
仓库 Alpine Linux包管理apk 安装:apk add cfssl --update-cache --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing/ --allow-untrusted
备注
Red Hat OpenShift Atlas 所使用的 etcd 镜像就是采用上游 etcd镜像 (基于 Alpine Linux OS) install: use origin-v4.0 etcd image #511
完整证书创建和分发参考 etcd集群TLS设置 和 部署TLS认证的etcd集群
生成证书¶
创建
cfssl
选项配置:
mkdir ~/cfssl
cd ~/cfssl
cfssl print-defaults config > ca-config.json
cfssl print-defaults csr > ca-csr.json
修改
ca-config.json
将过期时间延长到10年:
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"server": {
"expiry": "87600h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "87600h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "87600h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
配置CSR(Certificate Signing Request)配置文件
ca-csr.json
:
{
"CN": "edge k3s etcd",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Shanghai",
"O": "huatai.me",
"ST": "cloud-atlas",
"OU": "edge"
}
]
}
使用上述配置定义生成CA:
cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
准备3个服务器 peer certificate 配置:
{
"CN": "x-k3s-m-1",
"hosts": [
"x-k3s-m-1.edge.huatai.me",
"x-k3s-m-1",
"192.168.7.11",
"127.0.0.1"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "CN",
"L": "Shanghai",
"ST": "cloud-atlas"
}
]
}
{
"CN": "x-k3s-m-2",
"hosts": [
"x-k3s-m-2.edge.huatai.me",
"x-k3s-m-2",
"192.168.7.12",
"127.0.0.1"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "CN",
"L": "Shanghai",
"ST": "cloud-atlas"
}
]
}
{
"CN": "x-k3s-m-3",
"hosts": [
"x-k3s-m-3.edge.huatai.me",
"x-k3s-m-3",
"192.168.7.13",
"127.0.0.1"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "CN",
"L": "Shanghai",
"ST": "cloud-atlas"
}
]
}
对应生成3个主机的服务器证书:
for sn in `seq 3`; do
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer x-k3s-m-${sn}.json | cfssljson -bare x-k3s-m-${sn}
done
准备
client.json
:
{
"CN": "edge k3s etcd client",
"hosts": [""],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "CN",
"L": "Shanghai",
"ST": "cloud-atlas"
}
]
}
生成客户端证书:
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client
分发证书¶
脚本进行分发:
cat << EOF > etcd_hosts
x-k3s-m-1
x-k3s-m-2
x-k3s-m-3
EOF
cat << EOF > prepare_etcd.sh
if [ -d /tmp/etcd_tls ];then
rm -rf /tmp/etcd_tls
mkdir /tmp/etcd_tls
else
mkdir /tmp/etcd_tls
fi
if [ ! -d /etc/etcd/ ];then
sudo mkdir /etc/etcd
fi
EOF
for host in `cat etcd_hosts`;do
scp prepare_etcd.sh $host:/tmp/
ssh $host 'sh /tmp/prepare_etcd.sh'
done
for host in `cat etcd_hosts`;do
scp ${host}.pem ${host}:/tmp/etcd_tls/
scp ${host}-key.pem ${host}:/tmp/etcd_tls/
scp ca.pem ${host}:/tmp/etcd_tls/
scp server.pem ${host}:/tmp/etcd_tls/
scp server-key.pem ${host}:/tmp/etcd_tls/
scp client.csr ${host}:/tmp/etcd_tls/
scp client.pem ${host}:/tmp/etcd_tls/
scp client-key.pem ${host}:/tmp/etcd_tls/
ssh $host 'sudo cp /tmp/etcd_tls/* /etc/etcd/;sudo chown etcd:etcd /etc/etcd/*'
done
执行脚本:
sh deploy_etcd_certificates.sh
这样在 etcd
主机上分别有对应主机的配置文件 /etc/etcd
目录下有(以下案例是 x-k3s-m-1
):
ca.pem
server-key.pem
server.pem
x-k3s-m-1-key.pem
x-k3s-m-1.pem
OpenRC 启动etcd脚本¶
在 Alpine Linux 上采用 OpenRC 服务脚本来控制 etcd
,采用配置文件来管理服务:
准备配置文件
conf.yml
(这个配置文件是 edge/testing仓库etcd 的etcd 配置文件/etc/etcd/conf.yml
基础上修订,增加配置占位符方便后续通过脚本修订):
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'NODENAME'
# Path to the data directory.
data-dir: /var/lib/etcd
# Path to the dedicated wal directory.
wal-dir:
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: https://NODEIP:2380
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: https://NODEIP:2379,https://127.0.0.1:2379
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: https://NODENAME.DOMAIN:2380
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: https://NODENAME.DOMAIN:2379
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: NODE1=https://NODE1.DOMAIN:2380,NODE2=https://NODE2.DOMAIN:2380,NODE3=https://NODE3.DOMAIN:2380
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'INIT-TOKEN'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file: /etc/etcd/server.pem
# Path to the client server TLS key file.
key-file: /etc/etcd/server-key.pem
# Enable client cert authentication.
client-cert-auth: true
# Path to the client server TLS trusted CA cert file.
trusted-ca-file: /etc/etcd/ca.pem
# Client TLS using generated certificates
auto-tls: true
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file: /etc/etcd/NODENAME.pem
# Path to the peer server TLS key file.
key-file: /etc/etcd/NODENAME-key.pem
# Enable peer client cert authentication.
client-cert-auth: true
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file: /etc/etcd/ca.pem
# Peer TLS using generated certificates.
auto-tls: true
# Enable debug-level logging for etcd.
debug: false
logger: zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [stderr]
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: periodic
auto-compaction-retention: "1"
修订etcd配置的脚本
config_etcd.sh
:
NODENAME=`hostname -s`
NODEIP=`ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1`
NODE1="x-k3s-m-1"
NODE2="x-k3s-m-2"
NODE3="x-k3s-m-3"
DOMAIN="edge.huatai.me"
INITTOKEN="x-k3s"
cd /tmp/etcd_config
sed -i "s/NODENAME/$NODENAME/g" conf.yml
sed -i "s/NODEIP/$NODEIP/g" conf.yml
sed -i "s/INITTOKEN/$INITTOKEN/g" conf.yml
sed -i "s/NODE1/$NODE1/g" conf.yml
sed -i "s/NODE2/$NODE2/g" conf.yml
sed -i "s/NODE3/$NODE3/g" conf.yml
sed -i "s/DOMAIN/$DOMAIN/g" conf.yml
执行以下部署脚本
deploy_etcd_config.sh
:
cat << EOF > etcd_hosts
x-k3s-m-1
x-k3s-m-2
x-k3s-m-3
EOF
cat << EOF > prepare_etcd_config.sh
if [ -d /tmp/etcd_config ];then
rm -rf /tmp/etcd_config
mkdir /tmp/etcd_config
else
mkdir /tmp/etcd_config
fi
if [ ! -d /etc/etcd/ ];then
sudo mkdir /etc/etcd
fi
EOF
for host in `cat etcd_hosts`;do
scp prepare_etcd_config.sh $host:/tmp/
ssh $host 'sh /tmp/prepare_etcd_config.sh'
done
for host in `cat etcd_hosts`;do
scp config_etcd.sh $host:/tmp/etcd_config/
scp conf.yml $host:/tmp/etcd_config/
ssh $host 'sh /tmp/etcd_config/config_etcd.sh'
ssh $host 'sudo cp /tmp/etcd_config/conf.yml /etc/etcd/'
done
sh deploy_etcd_config.sh
然后验证每台管控服务器上 /etc/etcd/config.yml
配置文件中的占位符是否已经正确替换成主机名。正确情况下, /etc/etcd/conf.yml
中对应 占位符
都会被替换成对应主机的IP地址或者域名
准备配置文件
conf.d-etcd
和init.d-etcd
(从alpine linux软件仓库etcd-openrc
软件包提取)
LOGPATH=/var/log/${RC_SVCNAME}
ETCD_CONFIG_FILE=/etc/etcd/conf.yml
#!/sbin/openrc-run
# Copyright 2016 Alpine Linux
# Distributed under the terms of the GNU General Public License v2
# $Id$
supervisor=supervise-daemon
name="$RC_SVCNAME"
description="Highly-available key-value store"
ETCD_DATA_DIR=$(sed -nr 's/^data-dir:\s*(\/.*)/\1/p' $ETCD_CONFIG_FILE)
command=/usr/bin/etcd
command_args="--config-file=${ETCD_CONFIG_FILE}"
: ${output_log:=$LOGPATH/$RC_SVCNAME.log}
: ${error_log:=$LOGPATH/$RC_SVCNAME.log}
command_user="etcd:etcd"
supervise_daemon_args="--chdir $ETCD_DATA_DIR"
depend() {
need net
}
start_pre() {
checkpath -d -m 0775 -o "$command_user" "$LOGPATH"
checkpath -d -m 0700 -o "$command_user" "$ETCD_DATA_DIR"
}
然后执行以下
deploy_etcd_service.sh
:
cat << EOF > etcd_hosts
x-k3s-m-1
x-k3s-m-2
x-k3s-m-3
EOF
cat << EOF > prepare_etcd_service.sh
if [ -d /tmp/etcd_service ];then
rm -rf /tmp/etcd_service
mkdir /tmp/etcd_service
else
mkdir /tmp/etcd_service
fi
EOF
for host in `cat etcd_hosts`;do
scp prepare_etcd_service.sh $host:/tmp/
ssh $host 'sh /tmp/prepare_etcd_service.sh'
done
for host in `cat etcd_hosts`;do
scp conf.d-etcd $host:/tmp/etcd_service/
scp init.d-etcd $host:/tmp/etcd_service/
ssh $host 'sudo cp /tmp/etcd_service/conf.d-etcd /etc/conf.d/etcd'
ssh $host 'sudo cp /tmp/etcd_service/init.d-etcd /etc/init.d/etcd'
ssh $host 'sudo addgroup -g 1001 etcd && sudo adduser -u 1001 -G etcd -h /dev/null -s /sbin/nologin -D etcd'
done
sh deploy_etcd_service.sh
在3台管控服务器上启动服务:
sudo service etcd start
配置服务器启动时自动启动:
sudo rc-update add etcd
验证etcd集群¶
现在 etcd
集群已经启动,我们使用以下命令检查集群是否正常工作:
curl --cacert ca.pem --cert client.pem --key client-key.pem https://etcd.edge.huatai.me:2379/health
此时返回信息应该是:
{"health":"true","reason":""}
为方便日常维护,为 etcdctl
配置环境变量 /etc/profile
:
export ETCDCTL_API=3
#export ETCDCTL_ENDPOINTS='https://etcd.edge.huatai.me:2379'
export ETCDCTL_ENDPOINTS='https://192.168.7.11:2379,https://192.168.7.12:2379,https://192.168.7.13:2379'
export ETCDCTL_CACERT=/etc/etcd/ca.pem
export ETCDCTL_CERT=/etc/etcd/client.pem
export ETCDCTL_KEY=/etc/etcd/client-key.pem
检查集群节点状态:
etcdctl --write-out=table endpoint status
输出显示:
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.7.11:2379 | 7e8d94ba496c072d | 3.5.2 | 4.5 MB | false | false | 10 | 13295290 | 13295290 | |
| https://192.168.7.12:2379 | a01cb65343e64610 | 3.5.2 | 4.4 MB | true | false | 10 | 13295290 | 13295290 | |
| https://192.168.7.13:2379 | 9bfd4ef1e72d26 | 3.5.2 | 4.5 MB | false | false | 10 | 13295290 | 13295290 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
参考¶
Generate self-signed certificates CoreOS官方(etcd开发公司)提供的指导文档