添加Ceph OSDs (LVM卷)

注解

我在 添加Ceph OSDs (RAW磁盘) 遇到了重启后无法正确挂载 /var/lib/ceph/osd/ceph-0 问题,暂时无法解决。所以回归最标准的采用 Linux LVM逻辑卷管理 卷作为底层的方法,重试尽快部署完Ceph,进入下阶段测试。后续再做方案完善…`

在完成了初始 安装 ceph-mon 之后,就可以添加 OSDs。只有完成了足够的OSDs部署(满足对象数量,例如 osd pool default size =2 要求集群至少具备2个OSDs)才能达到 active + clean 状态。在完成了 bootstap Ceph monitor之后,集群就具备了一个默认的 CRUSH map,但是此时 CRUSH map还没有具备任何Ceph OSD Daemons map到一个Ceph节点。

Ceph提供了一个 ceph-vlume 工具,用来准备一个逻辑卷,磁盘或分区给Ceph使用,通过增加索引来创建OSD ID,并且将新的OSD添加到CRUSH map。需要在每个要添加OSD的节点上执行该工具。

注解

我有3个服务器节点提供存储,需要分别在这3个节点上部署OSD服务。

注解

Ceph官方文档的案例都是采用 ceph-volume lvm 来完成的,这个命令可以在Ceph的OSD底层构建一个 Linux LVM逻辑卷管理 ,带来的优势是可以随时扩容底层存储容量,对后续运维带来极大便利。在生产环境中部署,建议使用 lvm 卷。

我这里的测试环境,采用简化的 磁盘分区 来提供Ceph Ceph后端存储引擎BlueStore 存储,原因是我需要简化配置,同时我的测试服务器也没有硬件进行后续扩容。

bluestore

Ceph后端存储引擎BlueStore 是最新的Ceph采用的默认高性能存储引擎,底层不再使用OS的文件系统,可以直接管理磁盘硬件。

需要部署OSD的服务器首先准备存储,通常采用LVM卷作为底层存储块设备,这样可以通过LVM逻辑卷灵活调整块设备大小(有可能随着数据存储增长需要调整设备)。

作为我的实践环境 HPE ProLiant DL360 Gen9服务器 每个 Open Virtual Machine Firmware(OMVF) 虚拟机仅有一个pass-through PCIe NVMe存储,所以我没有划分不同存储设备来分别存放 block / block.dbblock.wal 。采用LVM可以不断扩容底层存储,所以即使开始时候磁盘空间划分较小也没有关系。我这次实践采用划分500GB作为初始分区,后续我将实践在线扩容。

使用LVM作为bluestore底层

  • 执行 ceph-volume --help 可以看到支持3种底层存储:

    lvm                      Use LVM and LVM-based technologies to deploy OSDs
    simple                   Manage already deployed OSDs with ceph-volume
    raw                      Manage single-device OSDs on raw block devices
    

我这里构建实践采用 ceph-volume lvm ,这个命令会自动创建底层 Linux LVM逻辑卷管理

注解

生产环境请使用LVM卷作为底层设备 - 参考 Ceph BlueStore配置

我的部署实践是在3台虚拟机 z-b-data-1 / z-b-data-2 / z-b-data-3 上完成,分区完全一致

  • 准备底层块设备,这里划分 GPT 分区1

    sudo parted /dev/nvme0n1 mklabel gpt
    sudo parted -a optimal /dev/nvme0n1 mkpart primary 0% 500GB
    

完成后检查 fdisk -l 可以看到:

Disk /dev/nvme0n1: 953.89 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVL21T0HCLR-00B00
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: BF78F6A8-7654-4646-83B7-8331F77921E1

Device         Start       End   Sectors   Size Type
/dev/nvme0n1p1  2048 976562175 976560128 465.7G Linux filesystem

注解

以上分区操作在3台存储虚拟机上完成

  • 创建第一个OSD,注意我使用了统一的 data 存储来存放所有数据,包括 block.dbblock.wal

    sudo ceph-volume lvm create --bluestore --data /dev/nvme0n1p1
    

注解

ceph-volume raw -h 包含子命令:

list                     list BlueStore OSDs on raw devices
prepare                  Format a raw device and associate it with a (BlueStore) OSD
activate                 Discover and prepare a data directory for a (BlueStore) OSD on a raw device

ceph-volume lvm -h 包含子命令:

activate                 Discover and mount the LVM device associated with an OSD ID and start the Ceph OSD
deactivate               Deactivate OSDs
batch                    Automatically size devices for multi-OSD provisioning with minimal interaction
prepare                  Format an LVM device and associate it with an OSD
create                   Create a new OSD from an LVM device
trigger                  systemd helper to activate an OSD
list                     list logical volumes and devices associated with Ceph
zap                      Removes all data and filesystems from a logical volume or partition.
migrate                  Migrate BlueFS data from to another LVM device
new-wal                  Allocate new WAL volume for OSD at specified Logical Volume
new-db                   Allocate new DB volume for OSD at specified Logical Volume

对于 raw 命令需要分步骤完成,不像 lvm 命令提供了更为丰富的批量命令

提示信息:

ceph-volume lvm create 输出
 1Running command: /usr/bin/ceph-authtool --gen-print-key
 2Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 33b7d928-8075-4531-9177-9253a71dec84
 3Running command: /usr/sbin/vgcreate --force --yes ceph-b7d91a2a-72ca-488b-948f-c42613698cca /dev/nvme0n1p1
 4 stdout: Wiping ceph_bluestore signature on /dev/nvme0n1p1.
 5 stdout: Physical volume "/dev/nvme0n1p1" successfully created.
 6 stdout: Volume group "ceph-b7d91a2a-72ca-488b-948f-c42613698cca" successfully created
 7Running command: /usr/sbin/lvcreate --yes -l 119208 -n osd-block-33b7d928-8075-4531-9177-9253a71dec84 ceph-b7d91a2a-72ca-488b-948f-c42613698cca
 8 stdout: Logical volume "osd-block-33b7d928-8075-4531-9177-9253a71dec84" created.
 9Running command: /usr/bin/ceph-authtool --gen-print-key
10Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
11--> Executable selinuxenabled not in PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
12Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84
13Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0
14Running command: /usr/bin/ln -s /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 /var/lib/ceph/osd/ceph-0/block
15Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
16 stderr: 2021-12-01T17:42:34.192+0800 7f9e1bb61700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
172021-12-01T17:42:34.192+0800 7f9e1bb61700 -1 AuthRegistry(0x7f9e140592a0) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
18 stderr: got monmap epoch 2
19Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQCJQ6dhbRclCxAARfLlWBvCjGmfbOx7ElaEDA==
20 stdout: creating /var/lib/ceph/osd/ceph-0/keyring
21added entity osd.0 auth(key=AQCJQ6dhbRclCxAARfLlWBvCjGmfbOx7ElaEDA==)
22Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
23Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
24Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 33b7d928-8075-4531-9177-9253a71dec84 --setuser ceph --setgroup ceph
25 stderr: 2021-12-01T17:42:34.656+0800 7fe1b49dfd80 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
26 stderr: 2021-12-01T17:42:34.700+0800 7fe1b49dfd80 -1 freelist read_size_meta_from_db missing size meta in DB
27--> ceph-volume lvm prepare successful for: /dev/nvme0n1p1
28Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
29Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
30Running command: /usr/bin/ln -snf /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 /var/lib/ceph/osd/ceph-0/block
31Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
32Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0
33Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
34Running command: /usr/bin/systemctl enable ceph-volume@lvm-0-33b7d928-8075-4531-9177-9253a71dec84
35 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-33b7d928-8075-4531-9177-9253a71dec84.service → /lib/systemd/system/ceph-volume@.service.
36Running command: /usr/bin/systemctl enable --runtime ceph-osd@0
37 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service → /lib/systemd/system/ceph-osd@.service.
38Running command: /usr/bin/systemctl start ceph-osd@0
39--> ceph-volume lvm activate successful for osd ID: 0
40--> ceph-volume lvm create successful for: /dev/nvme0n1p1
  • 检查osd 卷设备:

    sudo ceph-volume lvm list
    

可以看到设备文件如下:

====== osd.0 =======

  [block]       /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84

      block device              /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84
      block uuid                T3vB57-w3fx-7g7r-Zgk6-ZqJK-Ijrc-zy3LZW
      cephx lockbox secret
      cluster fsid              0e6c8b6f-0d32-4cdb-a45d-85f8c7997c17
      cluster name              ceph
      crush device class        None
      encrypted                 0
      osd fsid                  33b7d928-8075-4531-9177-9253a71dec84
      osd id                    0
      osdspec affinity
      type                      block
      vdo                       0
      devices                   /dev/nvme0n1p1

使用 ceph-volume lvm create 命令有以下优点:

  • 检查集群状态:

    sudo ceph -s
    

可以看到OSD已经运行:

cluster:
  id:     0e6c8b6f-0d32-4cdb-a45d-85f8c7997c17
  health: HEALTH_WARN
          Reduced data availability: 1 pg inactive
          Degraded data redundancy: 1 pg undersized
          OSD count 1 < osd_pool_default_size 3

services:
  mon: 1 daemons, quorum z-b-data-1 (age 47m)
  mgr: z-b-data-1(active, since 36m)
  osd: 1 osds: 1 up (since 6m), 1 in (since 6m)

data:
  pools:   1 pools, 1 pgs
  objects: 0 objects, 0 B
  usage:   1.0 GiB used, 465 GiB / 466 GiB avail
  pgs:     100.000% pgs not active
           1 undersized+peered
  • 检查OSD状态:

    sudo ceph osd tree
    

可以看到:

ID  CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         0.45470  root default
-3         0.45470      host z-b-data-1
 0    ssd  0.45470          osd.0            up   1.00000  1.00000

请注意,现在只有一个OSD运行,不满足配置中要求3个副本的要求,我们需要添加OSD节点

重启操作系统验证

重启操作系统 sudo shutdown -r now

  • 启动后检查:

    sudo ceph -s
    

可以看到 ceph-volume lvm 默认配置非常方便,重启后系统服务正常,OSD也能正常运行:

cluster:
  id:     0e6c8b6f-0d32-4cdb-a45d-85f8c7997c17
  health: HEALTH_WARN
          Reduced data availability: 1 pg inactive
          Degraded data redundancy: 1 pg undersized
          OSD count 1 < osd_pool_default_size 3

services:
  mon: 1 daemons, quorum z-b-data-1 (age 82m)
  mgr: z-b-data-1(active, since 81m)
  osd: 1 osds: 1 up (since 82m), 1 in (since 100m)

data:
  pools:   1 pools, 1 pgs
  objects: 0 objects, 0 B
  usage:   1.0 GiB used, 465 GiB / 466 GiB avail
  pgs:     100.000% pgs not active
           1 undersized+peered

上述 HEALTH_WARN 暂时不用顾虑,原因是OSD数量尚未满足配置3副本要求,后续将会配置补上。根据目前输出信息,3个服务都已经启动:

services:
  mon: 1 daemons, quorum z-b-data-1 (age 82m)
  mgr: z-b-data-1(active, since 81m)
  osd: 1 osds: 1 up (since 82m), 1 in (since 100m)

添加OSD

需要满足3副本要求,我们需要在服务器本地或者其他服务器上添加OSD。为了能够冗余,我采用集群3个服务器上每个服务器都启动 ceph-monceph-osd ,所以下面我们来完成:

然后再执行:

参考