在软RAID10 + LVM上CentOS 7 部署Gluster 11

备注

根据 CentOS 7 部署Gluster 11 迭代改进部署方案

CentOS 7 部署Gluster 11 我采用了一个比较ugly的部署方案,将物理服务器上的12个 brick (磁盘)分别用于同一个volume,但是由于强制顺序安排,可以让数据分布到不同服务器上。但是,实际上这种方案限制了集群的扩容和缩容,例如在 CentOS 7上部署的Gluster 11 集群添加服务器 就会看到缺陷。对于GlusterFS这种精简架构的分布式存储, Gluster存储最佳实践的思考 ,改进为底层采用 Linux 软RAID 来统一存储磁盘,实现一个超大规模的磁盘,然后借助 Linux LVM逻辑卷管理 来实现灵活的卷划分和管理。

准备工作

磁盘存储池构建

  • mdadm构建RAID10 是因为本省存储磁盘非常充裕,所以我就没有采用更为节约磁盘(且高数据安全)的RAID6

警告

由于项目方案调整,我现在采用了直接在 mdadm构建RAID10 构建 XFS文件系统 ,快速完成部署。

以下部分请暂时忽略,等后续再有实践机会重新开始…

安装和启动服务

在CentOS上安装GlusterFS
yum install glusterfs-server

输出显示将安装如下软件包:

在CentOS上安装GlusterFS输出信息
Dependencies resolved.
=================================================================================
 Package                   Arch    Version          Repository              Size
=================================================================================
Installing:
 glusterfs-server          x86_64  11.0-0.0.alios7  centos-gluster11       1.2 M
Installing dependencies:
 glusterfs                 x86_64  11.0-0.0.alios7  centos-gluster11       600 k
 glusterfs-cli             x86_64  11.0-0.0.alios7  centos-gluster11       187 k
 glusterfs-client-xlators  x86_64  11.0-0.0.alios7  centos-gluster11       731 k
 glusterfs-fuse            x86_64  11.0-0.0.alios7  centos-gluster11       150 k
 libgfapi0                 x86_64  11.0-0.0.alios7  centos-gluster11       114 k
 libgfchangelog0           x86_64  11.0-0.0.alios7  centos-gluster11        53 k
 libgfrpc0                 x86_64  11.0-0.0.alios7  centos-gluster11        74 k
 libgfxdr0                 x86_64  11.0-0.0.alios7  centos-gluster11        52 k
 libglusterfs0             x86_64  11.0-0.0.alios7  centos-gluster11       314 k
 rpcbind                   x86_64  0.2.0-47.alios7  alios.7u2.base.x86_64   59 k
 userspace-rcu             x86_64  0.10.0-3.el7     centos-gluster11        93 k

Transaction Summary
==================================================================================
Install  12 Packages

Total download size: 3.6 M
Installed size: 15 M
Is this ok [y/N]:
  • 启动GlusterFS管理服务:

启动和激活GlusterFS管理服务
systemctl enable --now glusterd
  • 检查 glusterd 服务状态:

检查GlusterFS管理服务
systemctl status glusterd

输出显示服务运行正常:

检查GlusterFS管理服务显示运行正常
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-06-29 11:59:08 CST; 1min 17s ago
     Docs: man:glusterd(8)
  Process: 7319 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 7324 (glusterd)
    Tasks: 8
   Memory: 7.5M
   CGroup: /system.slice/glusterd.service
           └─7324 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Jun 29 11:59:08 l59c11165.alipay.ea133 systemd[1]: Starting GlusterFS, a clustered file-system se......
Jun 29 11:59:08 l59c11165.alipay.ea133 systemd[1]: Started GlusterFS, a clustered file-system server.
Hint: Some lines were ellipsized, use -l to show in full.
  • 在所有需要的CentOS 7.2服务器节点都安装好上述 gluster-server 软件包( 使用 pssh - 并行SSH ):

使用 pssh - 并行SSH 批量安装 glusterfs-server
pssh -h hosts glusterfs-11.repo /tmp/glusterfs-11.repo
pssh -ih hosts 'sudo cp /tmp/glusterfs-11.repo /etc/yum.repos.d/'

pssh -ih hosts 'sudo yum update -y && sudo yum install glusterfs-server -y'
pssh -ih hosts 'sudo systemctl enable --now glusterd'
pssh -ih hosts 'sudo systemctl status glusterd'

如果简单ssh执行通过如下方式完成:

简单使用 ssh服务 顺序循环安装 glusterfs-server
for ip in `cat hosts`;do scp glusterfs-11.repo /tmp/glusterfs-11.repo;done

for ip in `cat hosts`;do ssh $ip 'sudo cp /tmp/glusterfs-11.repo /etc/yum.repos.d/';done

for ip in `cat hosts`;do ssh $ip 'sudo yum update -y && sudo yum install glusterfs-server -y';done

for ip in `cat hosts`;do ssh $ip 'sudo systemctl enable --now glusterd && sudo systemctl status glusterd'

配置服务

  • CentOS 7 默认启用了防火墙(视具体部署),确保服务器打开正确通讯端口:

为GlusterFS打开CentOS防火墙端口
firewall-cmd --zone=public --add-port=24007-24008/tcp --permanent
firewall-cmd --reload

备注

  • 在采用分布式卷的配置时,需要确保 brick 数量是 replica 数量的整数倍。举例,配置 replica 3 ,则对应 bricks 必须是 3 / 6 / 9 依次类推

  • 部署案例中,采用了 6 台服务器,每个服务器 12 块NVME磁盘: 12*6 (3的整数倍)

  • 配置gluster配对 只需要在一台服务器上执行一次 :

一台 服务器上 执行一次 gluster peer probe
server1=192.168.1.1
server2=192.168.1.2
server3=192.168.1.3
server4=192.168.1.4
server5=192.168.1.5
server6=192.168.1.6

gluster peer probe ${server2}
gluster peer probe ${server3}
gluster peer probe ${server4}
gluster peer probe ${server5}
gluster peer probe ${server6}
  • 完成后检查 gluster peer 状态:

一台 服务器上执行 gluster peer status 检查peer是否创建并连接成功
gluster peer status
gluster peer status 输出显示 peerConnected 状态则表明构建成功
Number of Peers: 5

Hostname: 192.168.1.2
Uuid: c664761a-5973-4e2e-8506-9c142c657297
State: Peer in Cluster (Connected)

Hostname: 192.168.1.3
Uuid: 901b8027-5eab-4f6b-8cf4-aafa4463ca13
State: Peer in Cluster (Connected)

Hostname: 192.168.1.4
Uuid: 5ff667dd-5f45-4daf-900e-913e78e52297
State: Peer in Cluster (Connected)

Hostname: 192.168.1.5
Uuid: ebd1d002-0719-4704-a59d-b4e8b3b28c29
State: Peer in Cluster (Connected)

Hostname: 192.168.1.6
Uuid: 1f958e31-2d55-4904-815a-89f6ade360fe
State: Peer in Cluster (Connected)

创建GlusterFS卷

  • 创建一个简单的脚本 create_gluster ,方便构建 replica 3 的分布式卷:

create_gluster 脚本,传递卷名作为参数就可以创建 replica 3 的分布式卷
volume=$1
server1=192.168.1.1
server2=192.168.1.2
server3=192.168.1.3
server4=192.168.1.4
server5=192.168.1.5
server6=192.168.1.6

gluster volume create ${volume} replica 3 \
        ${server1}:/data/brick0/${volume} \
        ${server2}:/data/brick0/${volume} \
        ${server3}:/data/brick0/${volume} \
        ${server4}:/data/brick0/${volume} \
        ${server5}:/data/brick0/${volume} \
        ${server6}:/data/brick0/${volume} \
        \
        ${server1}:/data/brick1/${volume} \
        ${server2}:/data/brick1/${volume} \
        ${server3}:/data/brick1/${volume} \
        ${server4}:/data/brick1/${volume} \
        ${server5}:/data/brick1/${volume} \
        ${server6}:/data/brick1/${volume} \
        \
        ${server1}:/data/brick2/${volume} \
        ${server2}:/data/brick2/${volume} \
        ${server3}:/data/brick2/${volume} \
        ${server4}:/data/brick2/${volume} \
        ${server5}:/data/brick2/${volume} \
        ${server6}:/data/brick2/${volume} \
        \
        ${server1}:/data/brick3/${volume} \
        ${server2}:/data/brick3/${volume} \
        ${server3}:/data/brick3/${volume} \
        ${server4}:/data/brick3/${volume} \
        ${server5}:/data/brick3/${volume} \
        ${server6}:/data/brick3/${volume} \
        \
        ${server1}:/data/brick4/${volume} \
        ${server2}:/data/brick4/${volume} \
        ${server3}:/data/brick4/${volume} \
        ${server4}:/data/brick4/${volume} \
        ${server5}:/data/brick4/${volume} \
        ${server6}:/data/brick4/${volume} \
        \
        ${server1}:/data/brick5/${volume} \
        ${server2}:/data/brick5/${volume} \
        ${server3}:/data/brick5/${volume} \
        ${server4}:/data/brick5/${volume} \
        ${server5}:/data/brick5/${volume} \
        ${server6}:/data/brick5/${volume} \
        \
        ${server1}:/data/brick6/${volume} \
        ${server2}:/data/brick6/${volume} \
        ${server3}:/data/brick6/${volume} \
        ${server4}:/data/brick6/${volume} \
        ${server5}:/data/brick6/${volume} \
        ${server6}:/data/brick6/${volume} \
        \
        ${server1}:/data/brick7/${volume} \
        ${server2}:/data/brick7/${volume} \
        ${server3}:/data/brick7/${volume} \
        ${server4}:/data/brick7/${volume} \
        ${server5}:/data/brick7/${volume} \
        ${server6}:/data/brick7/${volume} \
        \
        ${server1}:/data/brick8/${volume} \
        ${server2}:/data/brick8/${volume} \
        ${server3}:/data/brick8/${volume} \
        ${server4}:/data/brick8/${volume} \
        ${server5}:/data/brick8/${volume} \
        ${server6}:/data/brick8/${volume} \
        \
        ${server1}:/data/brick9/${volume} \
        ${server2}:/data/brick9/${volume} \
        ${server3}:/data/brick9/${volume} \
        ${server4}:/data/brick9/${volume} \
        ${server5}:/data/brick9/${volume} \
        ${server6}:/data/brick9/${volume} \
        \
        ${server1}:/data/brick10/${volume} \
        ${server2}:/data/brick10/${volume} \
        ${server3}:/data/brick10/${volume} \
        ${server4}:/data/brick10/${volume} \
        ${server5}:/data/brick10/${volume} \
        ${server6}:/data/brick10/${volume} \
        \
        ${server1}:/data/brick11/${volume} \
        ${server2}:/data/brick11/${volume} \
        ${server3}:/data/brick11/${volume} \
        ${server4}:/data/brick11/${volume} \
        ${server5}:/data/brick11/${volume} \
        ${server6}:/data/brick11/${volume}

备注

brick 数量是 replica 的整数倍(2倍或更多倍)时, 分布式复制GlusterFS卷(Distributed Replicated GlusterFS Volume) 自动创建,能够同时获得高可用和高性能。但是对 brick 的排列有要求: 先 replicadistribute

所以为了能将数据分布到不同服务器上,我这里采用了特定的排列顺序: A:0,B:0,C:0,A:1,B:1,C:1,A:2,B2,C2... 以便让 replicas 3 能够精确分布到不同服务器上。

这种部署方式有利有弊: Gluster存储最佳实践 我会详细探讨

  • 将脚本加上执行权限:

    chmod 755 create_gluster
    
  • 创建卷,举例是 backup :

create_gluster 脚本创建 backup 三副本分布式卷
volume=backup
./create_gluster ${volume}
gluster volume start ${volume}
  • 如果创建卷错误,通过以下方式删除:

删除 backup
volume=backup
gluster volume stop ${volume} --mode=script
gluster volume delete ${volume} --mode=script

备注

这里在执行 gluster volumestopdelete 命令时,都添加了参数 --mode=script 是为了避免交互,方便脚本命令执行。日常运维则可以不用这个参数,方便交互确认。

  • 完成后检查卷状态:

检查 backup 卷状态
volume=backup
gluster volume status ${volume}

挂载gluster卷

  • 在客户端服务器只需要安装 gluster-fuse 软件包:

安装GlusterFS客户端 gluster-fuse
yum install gluster-fuse
  • 修改 /etc/fstab 添加如下内容:

GlusterFS客户端的 /etc/fstab
192.168.1.1:/backup  /data/backup  glusterfs    defaults,_netdev,direct-io-mode=enable,backupvolfile-server=192.168.1.2    0    0
  • 挂载存储卷:

挂载GlusterFS卷
mkdir -p /data/backup
mount /data/backup