私有云etcd服务

备注

本文结合了多个实践文档的再次综合实践:

备注

本文步骤比较繁琐,主要是etcd证书生成步骤较多。我后面再部署新集群时候将改写为脚本以便快速部署。

通过 私有云KVM环境 构建3台虚拟机,并且部署 私有云数据层LVM卷管理 后,就可以在独立划分的存储 /var/lib/etcd 目录之上构建etcd,这样可以为 etcd - 分布式kv存储 提供高性能虚拟化存储。

私有云KVM虚拟机

主机IP

主机名

192.168.6.204

z-b-data-1

192.168.6.205

z-b-data-2

192.168.6.206

z-b-data-3

etcd集群证书生成

发行版安装cfssl

  • 安装Cloudflare 的 cfssl 工具:

ubuntu发行版提供Cloudflare的cfssl工具
sudo apt install golang-cfssl -y

初始化证书认证

  • 准备 ca-config.json (有效期限10年):

修订证书有效期10年 ca-config.json
{
    "signing": {
        "default": {
            "expiry": "87600h"
        },
        "profiles": {
            "server": {
                "expiry": "87600h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            },
            "client": {
                "expiry": "87600h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "client auth"
                ]
            },
            "peer": {
                "expiry": "87600h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}

备注

这里CA配置中, "server" 段落必须要添加 "client auth" ,否则高版本etcd启动时会提示连接错误。详见 部署TLS认证的etcd集群

  • 配置CSR(Certificate Signing Request)配置文件 ca-csr.json :

修订CSR ca-csr.json
{
    "CN": "priv k8s etcd",
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "Shanghai",
            "O": "huatai.me",
            "ST": "cloud-atlas",
            "OU": "staging"
        }
    ]
}
  • 使用上述配置定义生成CA:

生成CA
cfssl gencert -initca ca-csr.json | cfssljson -bare ca -

这样将获得3个文件:

ca-key.pem
ca.csr
ca.pem

警告

请确保 ca-key.pem 文件安全,该文件是CA可以创建任何证书

  • 生成服务器证书: 直接编辑 server.json :

修订 server.json
{
    "CN": "priv k8s etcd",
    "hosts": [
        "etcd.staging.huatai.me",
        "192.168.6.204",
        "192.168.6.205",
        "192.168.6.206",
        "127.0.0.1"
    ],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "CN",
            "L": "Shanghai",
            "ST": "cloud-atlas"
        }
    ]
}
  • 生成服务器证书和私钥:

生成服务器证书和私钥
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server

这样获得3个文件:

server-key.pem
server.csr
server.pem
  • peer certificate (每个服务器一个,按对应主机名):

服务器 z-b-data-1.staging.huatai.me 点对点证书
{
    "CN": "z-b-data-1",
    "hosts": [
        "z-b-data-1.staging.huatai.me",
        "z-b-data-1",
        "192.168.6.204",
        "127.0.0.1"
    ],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "CN",
            "L": "Shanghai",
            "ST": "cloud-atlas"
        }
    ]
}
服务器 z-b-data-2.staging.huatai.me 点对点证书
{
    "CN": "z-b-data-2",
    "hosts": [
        "z-b-data-2.staging.huatai.me",
        "z-b-data-2",
        "192.168.6.205",
        "127.0.0.1"
    ],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "CN",
            "L": "Shanghai",
            "ST": "cloud-atlas"
        }
    ]
}
服务器 z-b-data-3.staging.huatai.me 点对点证书
{
    "CN": "z-b-data-3",
    "hosts": [
        "z-b-data-3.staging.huatai.me",
        "z-b-data-3",
        "192.168.6.206",
        "127.0.0.1"
    ],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "CN",
            "L": "Shanghai",
            "ST": "cloud-atlas"
        }
    ]
}

对应生成3个主机的服务器证书:

生成3个主机的点对点证书
for sn in `seq 3`; do
    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer z-b-data-${sn}.json | cfssljson -bare z-b-data-${sn}
done

此时获得对应文件是:

z-b-data-1-key.pem
z-b-data-1.csr
z-b-data-1.pem
...
  • 客户端证书 client.json (主要是主机列表保持空):

修订 client.json
{
    "CN": "private k8s etcd client",
    "hosts": [""],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "CN",
            "L": "Shanghai",
            "ST": "cloud-atlas"
        }
    ]
}
  • 现在可以生成客户端证书:

生成客户端证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

获得了以下文件:

client-key.pem
client.csr
client.pem

安装软etcd件包

下载etcd的linux版本脚本 install_etcd.sh
ETCD_VER=v3.5.4
KERNEL=`uname -s` # Linux / Darwin
ARCH=`uname -m` # x86_64 / aarch64

if [ ${KERNEL} == "Linux" ];then
    KERNEL="linux"
elif [ ${KERNEL} == "Darwin" ];then
    KERNEL="darwin"
else
    echo "Not Linux or macOS, exit!"
    exit 0
fi

if [ ${ARCH} == "x86_64" ];then
    ARCH="amd64"
elif [ ${ARCH} == "aarch64"  ];then
    ARCH="arm64"
else
    echo "Not x86_64 or aarch64, exit!"
    exit 0
fi

# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz -o /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-${KERNEL}-${ARCH}.tar.gz

/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version

sudo mv /tmp/etcd-download-test/etcd /usr/local/bin
sudo mv /tmp/etcd-download-test/etcdctl /usr/local/bin
sudo mv /tmp/etcd-download-test/etcdutl /usr/local/bin
    • 在安装节点创建 etcd 目录以及用户和用户组(如果使用了 私有云数据层LVM卷管理 中构建的 lv-etcd 卷,则忽略目录创建):

useradd添加etcd用户账号
sudo mkdir -p /etc/etcd /var/lib/etcd
groupadd -f -g 1501 etcd
useradd -c "etcd user" -d /var/lib/etcd -s /bin/false -g etcd -u 1501 etcd
chown -R etcd:etcd /var/lib/etcd

证书分发

分发证书脚本 deploy_etcd_certificates.sh
cat << EOF > etcd_hosts
z-b-data-1
z-b-data-2
z-b-data-3
EOF

cat << EOF > prepare_etcd.sh
if [ -d /tmp/etcd_tls ];then
    rm -rf /tmp/etcd_tls
    mkdir /tmp/etcd_tls
else
    mkdir /tmp/etcd_tls
fi

if  [ ! -d /etc/etcd/ ];then
    sudo mkdir /etc/etcd
fi
EOF

for host in `cat etcd_hosts`;do
    scp prepare_etcd.sh $host:/tmp/
    ssh $host 'sh /tmp/prepare_etcd.sh'
done

for host in `cat etcd_hosts`;do
    scp ${host}.pem ${host}:/tmp/etcd_tls/
    scp ${host}-key.pem ${host}:/tmp/etcd_tls/
    scp ca.pem ${host}:/tmp/etcd_tls/
    scp server.pem ${host}:/tmp/etcd_tls/
    scp server-key.pem ${host}:/tmp/etcd_tls/
    scp client.csr ${host}:/tmp/etcd_tls/
    scp client.pem ${host}:/tmp/etcd_tls/
    scp client-key.pem ${host}:/tmp/etcd_tls/
    ssh $host 'sudo cp /tmp/etcd_tls/* /etc/etcd/;sudo chown etcd:etcd /etc/etcd/*'
done

执行脚本:

sh deploy_etcd_certificates.sh

这样在 etcd 主机上分别有对应主机的配置文件 /etc/etcd 目录下

配置etcd

  • 执行脚本 generate_etcd_service 生成 /etc/etcd/conf.yml 配置文件和 Systemd进程管理器 启动 etcd 配置文件 /lib/systemd/system/etcd.service :

创建etcd启动的配置conf.yml 和 systemd脚本
ETCD_HOST_IP=$(ip addr show enp1s0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
ETCD_NAME=$(hostname -s)
ETCD_HOST_1=z-b-data-1
ETCD_HOST_2=z-b-data-2
ETCD_HOST_3=z-b-data-3
ETCD_HOST_1_IP=192.168.6.204
ETCD_HOST_2_IP=192.168.6.205
ETCD_HOST_3_IP=192.168.6.206
INIT_TOKEN=initpasswd

cat << EOF > /etc/etcd/conf.yml
# This is the configuration file for the etcd server.

# Human-readable name for this member.
name: ${ETCD_NAME}

# Path to the data directory.
data-dir: /var/lib/etcd

# Path to the dedicated wal directory.
wal-dir:

# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000

# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100

# Time (in milliseconds) for an election to timeout.
election-timeout: 1000

# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0

# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: https://${ETCD_HOST_IP}:2380

# List of comma separated URLs to listen on for client traffic.
listen-client-urls: https://${ETCD_HOST_IP}:2379,https://127.0.0.1:2379

# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5

# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5

# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:

# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: https://${ETCD_HOST_IP}:2380

# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: https://${ETCD_HOST_IP}:2379

# Discovery URL used to bootstrap the cluster.
discovery:

# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'

# HTTP proxy to use for traffic to discovery service.
discovery-proxy:

# DNS domain used to bootstrap initial cluster.
discovery-srv:

# Initial cluster configuration for bootstrapping.
initial-cluster: ${ETCD_HOST_1}=https://${ETCD_HOST_1_IP}:2380,${ETCD_HOST_2}=https://${ETCD_HOST_2_IP}:2380,${ETCD_HOST_3}=https://${ETCD_HOST_3_IP}:2380

# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: ${INIT_TOKEN}

# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'

# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false

# Accept etcd V2 client requests
enable-v2: true

# Enable runtime profiling data via HTTP server
enable-pprof: true

# Valid values include 'on', 'readonly', 'off'
proxy: 'off'

# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000

# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000

# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000

# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000

# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0

client-transport-security:
  # Path to the client server TLS cert file.
  cert-file: /etc/etcd/server.pem

  # Path to the client server TLS key file.
  key-file: /etc/etcd/server-key.pem

  # Enable client cert authentication.
  client-cert-auth: true

  # Path to the client server TLS trusted CA cert file.
  trusted-ca-file: /etc/etcd/ca.pem

  # Client TLS using generated certificates
  auto-tls: true

peer-transport-security:
  # Path to the peer server TLS cert file.
  cert-file: /etc/etcd/${ETCD_NAME}.pem

  # Path to the peer server TLS key file.
  key-file: /etc/etcd/${ETCD_NAME}-key.pem

  # Enable peer client cert authentication.
  client-cert-auth: true

  # Path to the peer server TLS trusted CA cert file.
  trusted-ca-file: /etc/etcd/ca.pem

  # Peer TLS using generated certificates.
  auto-tls: true

# Enable debug-level logging for etcd.
debug: false

logger: zap

# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [stderr]

# Force to create a new one member cluster.
force-new-cluster: false

auto-compaction-mode: periodic
auto-compaction-retention: "1"
EOF

cat << EOF > /lib/systemd/system/etcd.service
[Unit]
Description=etcd service
Documentation=https://github.com/coreos/etcd

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \\
 --config-file=/etc/etcd/conf.yml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF
  • 激活服务:

    sudo systemctl enable etcd.service
    
  • 启动服务:

    sudo systemctl start etcd.service
    

检查

  • 启动 etcd 之后,检查服务进程:

    ps aux | grep etcd
    
  • 检查日志:

    journalctl -u etcd.service
    

验证etcd集群

  • 为方便维护,配置 etcdctl 环境变量,添加到用户自己的 profile中:

ETCDCTL_ENDPOINTS 环境变量
export ETCDCTL_API=3
#export ETCDCTL_ENDPOINTS='https://etcd.staging.huatai.me:2379'
export ETCDCTL_ENDPOINTS=https://192.168.6.204:2379,https://192.168.6.205:2379,https://192.168.6.206:2379
export ETCDCTL_CACERT=/etc/etcd/ca.pem
export ETCDCTL_CERT=/etc/etcd/client.pem
export ETCDCTL_KEY=/etc/etcd/client-key.pem

然后可以检查

  • 检查节点状态:

etcdctl 检查endpoint状态(表格形式输出)
etcdctl --write-out=table endpoint status
  • 检查节点健康状况:

etcdctl 检查endpoint健康状态(查看节点响应情况)
etcdctl endpoint health
  • (重要步骤)由于 etcd 已经完成部署,之前在 /etc/etcd/conf.yml 配置集群状态,需要从 new 改为 existing ,表明集群已经建设完成:

    # Initial cluster state ('new' or 'existing').
    initial-cluster-state: 'existing'