添加Ceph OSDs (LVM卷)¶
备注
我在 添加Ceph OSDs (RAW磁盘) 遇到了重启后无法正确挂载 /var/lib/ceph/osd/ceph-0
问题,暂时无法解决。所以回归最标准的采用 Linux LVM逻辑卷管理 卷作为底层的方法,重试尽快部署完Ceph,进入下阶段测试。后续再做方案完善…`
在完成了初始 安装 ceph-mon 之后,就可以添加 OSDs。只有完成了足够的OSDs部署(满足对象数量,例如 osd pool default size =2
要求集群至少具备2个OSDs)才能达到 active + clean
状态。在完成了 bootstap
Ceph monitor之后,集群就具备了一个默认的 CRUSH
map,但是此时 CRUSH
map还没有具备任何Ceph OSD Daemons map到一个Ceph节点。
Ceph提供了一个 ceph-vlume
工具,用来准备一个逻辑卷,磁盘或分区给Ceph使用,通过增加索引来创建OSD ID,并且将新的OSD添加到CRUSH map。需要在每个要添加OSD的节点上执行该工具。
备注
我有3个服务器节点提供存储,需要分别在这3个节点上部署OSD服务。
备注
Ceph官方文档的案例都是采用 ceph-volume lvm
来完成的,这个命令可以在Ceph的OSD底层构建一个 Linux LVM逻辑卷管理 ,带来的优势是可以随时扩容底层存储容量,对后续运维带来极大便利。在生产环境中部署,建议使用 lvm
卷。
我这里的测试环境,采用简化的 磁盘分区
来提供Ceph Ceph后端存储引擎BlueStore 存储,原因是我需要简化配置,同时我的测试服务器也没有硬件进行后续扩容。
bluestore¶
Ceph后端存储引擎BlueStore 是最新的Ceph采用的默认高性能存储引擎,底层不再使用OS的文件系统,可以直接管理磁盘硬件。
需要部署OSD的服务器首先准备存储,通常采用LVM卷作为底层存储块设备,这样可以通过LVM逻辑卷灵活调整块设备大小(有可能随着数据存储增长需要调整设备)。
作为我的实践环境 HPE ProLiant DL360 Gen9服务器 每个 Open Virtual Machine Firmware(OMVF) 虚拟机仅有一个pass-through PCIe NVMe存储,所以我没有划分不同存储设备来分别存放 block
/ block.db
和 block.wal
。采用LVM可以不断扩容底层存储,所以即使开始时候磁盘空间划分较小也没有关系。我这次实践采用划分500GB作为初始分区,后续我将实践在线扩容。
使用LVM作为bluestore底层¶
执行
ceph-volume --help
可以看到支持3种底层存储:lvm Use LVM and LVM-based technologies to deploy OSDs simple Manage already deployed OSDs with ceph-volume raw Manage single-device OSDs on raw block devices
我这里构建实践采用 ceph-volume lvm
,这个命令会自动创建底层 Linux LVM逻辑卷管理
备注
生产环境请使用LVM卷作为底层设备 - 参考 Ceph BlueStore配置
我的部署实践是在3台虚拟机 z-b-data-1
/ z-b-data-2
/ z-b-data-3
上完成,分区完全一致
准备底层块设备,这里划分 GPT 分区1
sudo parted /dev/nvme0n1 mklabel gpt sudo parted -a optimal /dev/nvme0n1 mkpart primary 0% 500GB
完成后检查 fdisk -l
可以看到:
Disk /dev/nvme0n1: 953.89 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVL21T0HCLR-00B00
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: BF78F6A8-7654-4646-83B7-8331F77921E1
Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 976562175 976560128 465.7G Linux filesystem
备注
以上分区操作在3台存储虚拟机上完成
创建第一个OSD,注意我使用了统一的
data
存储来存放所有数据,包括block.db
和block.wal
sudo ceph-volume lvm create --bluestore --data /dev/nvme0n1p1
备注
ceph-volume raw -h
包含子命令:
list list BlueStore OSDs on raw devices
prepare Format a raw device and associate it with a (BlueStore) OSD
activate Discover and prepare a data directory for a (BlueStore) OSD on a raw device
ceph-volume lvm -h
包含子命令:
activate Discover and mount the LVM device associated with an OSD ID and start the Ceph OSD
deactivate Deactivate OSDs
batch Automatically size devices for multi-OSD provisioning with minimal interaction
prepare Format an LVM device and associate it with an OSD
create Create a new OSD from an LVM device
trigger systemd helper to activate an OSD
list list logical volumes and devices associated with Ceph
zap Removes all data and filesystems from a logical volume or partition.
migrate Migrate BlueFS data from to another LVM device
new-wal Allocate new WAL volume for OSD at specified Logical Volume
new-db Allocate new DB volume for OSD at specified Logical Volume
对于 raw
命令需要分步骤完成,不像 lvm
命令提供了更为丰富的批量命令
提示信息:
1Running command: /usr/bin/ceph-authtool --gen-print-key 2Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 33b7d928-8075-4531-9177-9253a71dec84 3Running command: /usr/sbin/vgcreate --force --yes ceph-b7d91a2a-72ca-488b-948f-c42613698cca /dev/nvme0n1p1 4 stdout: Wiping ceph_bluestore signature on /dev/nvme0n1p1. 5 stdout: Physical volume "/dev/nvme0n1p1" successfully created. 6 stdout: Volume group "ceph-b7d91a2a-72ca-488b-948f-c42613698cca" successfully created 7Running command: /usr/sbin/lvcreate --yes -l 119208 -n osd-block-33b7d928-8075-4531-9177-9253a71dec84 ceph-b7d91a2a-72ca-488b-948f-c42613698cca 8 stdout: Logical volume "osd-block-33b7d928-8075-4531-9177-9253a71dec84" created. 9Running command: /usr/bin/ceph-authtool --gen-print-key 10Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0 11--> Executable selinuxenabled not in PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin 12Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 13Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0 14Running command: /usr/bin/ln -s /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 /var/lib/ceph/osd/ceph-0/block 15Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap 16 stderr: 2021-12-01T17:42:34.192+0800 7f9e1bb61700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 172021-12-01T17:42:34.192+0800 7f9e1bb61700 -1 AuthRegistry(0x7f9e140592a0) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx 18 stderr: got monmap epoch 2 19Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQCJQ6dhbRclCxAARfLlWBvCjGmfbOx7ElaEDA== 20 stdout: creating /var/lib/ceph/osd/ceph-0/keyring 21added entity osd.0 auth(key=AQCJQ6dhbRclCxAARfLlWBvCjGmfbOx7ElaEDA==) 22Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring 23Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/ 24Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 33b7d928-8075-4531-9177-9253a71dec84 --setuser ceph --setgroup ceph 25 stderr: 2021-12-01T17:42:34.656+0800 7fe1b49dfd80 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid 26 stderr: 2021-12-01T17:42:34.700+0800 7fe1b49dfd80 -1 freelist read_size_meta_from_db missing size meta in DB 27--> ceph-volume lvm prepare successful for: /dev/nvme0n1p1 28Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0 29Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 --path /var/lib/ceph/osd/ceph-0 --no-mon-config 30Running command: /usr/bin/ln -snf /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84 /var/lib/ceph/osd/ceph-0/block 31Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block 32Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0 33Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0 34Running command: /usr/bin/systemctl enable ceph-volume@lvm-0-33b7d928-8075-4531-9177-9253a71dec84 35 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-33b7d928-8075-4531-9177-9253a71dec84.service → /lib/systemd/system/ceph-volume@.service. 36Running command: /usr/bin/systemctl enable --runtime ceph-osd@0 37 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service → /lib/systemd/system/ceph-osd@.service. 38Running command: /usr/bin/systemctl start ceph-osd@0 39--> ceph-volume lvm activate successful for osd ID: 0 40--> ceph-volume lvm create successful for: /dev/nvme0n1p1
检查osd 卷设备:
sudo ceph-volume lvm list
可以看到设备文件如下:
====== osd.0 =======
[block] /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84
block device /dev/ceph-b7d91a2a-72ca-488b-948f-c42613698cca/osd-block-33b7d928-8075-4531-9177-9253a71dec84
block uuid T3vB57-w3fx-7g7r-Zgk6-ZqJK-Ijrc-zy3LZW
cephx lockbox secret
cluster fsid 0e6c8b6f-0d32-4cdb-a45d-85f8c7997c17
cluster name ceph
crush device class None
encrypted 0
osd fsid 33b7d928-8075-4531-9177-9253a71dec84
osd id 0
osdspec affinity
type block
vdo 0
devices /dev/nvme0n1p1
使用 ceph-volume lvm create
命令有以下优点:
OSD自动激活并运行
自动添加了 Systemd进程管理器 对应服务配置,所以操作系统重启不会遇到我之前 添加Ceph OSDs (RAW磁盘) 中无法正确挂载卷和运行OSD的问题
检查集群状态:
sudo ceph -s
可以看到OSD已经运行:
cluster:
id: 0e6c8b6f-0d32-4cdb-a45d-85f8c7997c17
health: HEALTH_WARN
Reduced data availability: 1 pg inactive
Degraded data redundancy: 1 pg undersized
OSD count 1 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum z-b-data-1 (age 47m)
mgr: z-b-data-1(active, since 36m)
osd: 1 osds: 1 up (since 6m), 1 in (since 6m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 1.0 GiB used, 465 GiB / 466 GiB avail
pgs: 100.000% pgs not active
1 undersized+peered
检查OSD状态:
sudo ceph osd tree
可以看到:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.45470 root default
-3 0.45470 host z-b-data-1
0 ssd 0.45470 osd.0 up 1.00000 1.00000
请注意,现在只有一个OSD运行,不满足配置中要求3个副本的要求,我们需要添加OSD节点
重启操作系统验证¶
重启操作系统 sudo shutdown -r now
启动后检查:
sudo ceph -s
可以看到 ceph-volume lvm
默认配置非常方便,重启后系统服务正常,OSD也能正常运行:
cluster:
id: 0e6c8b6f-0d32-4cdb-a45d-85f8c7997c17
health: HEALTH_WARN
Reduced data availability: 1 pg inactive
Degraded data redundancy: 1 pg undersized
OSD count 1 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum z-b-data-1 (age 82m)
mgr: z-b-data-1(active, since 81m)
osd: 1 osds: 1 up (since 82m), 1 in (since 100m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 1.0 GiB used, 465 GiB / 466 GiB avail
pgs: 100.000% pgs not active
1 undersized+peered
上述 HEALTH_WARN
暂时不用顾虑,原因是OSD数量尚未满足配置3副本要求,后续将会配置补上。根据目前输出信息,3个服务都已经启动:
services:
mon: 1 daemons, quorum z-b-data-1 (age 82m)
mgr: z-b-data-1(active, since 81m)
osd: 1 osds: 1 up (since 82m), 1 in (since 100m)
添加OSD¶
需要满足3副本要求,我们需要在服务器本地或者其他服务器上添加OSD。为了能够冗余,我采用集群3个服务器上每个服务器都启动 ceph-mon
和 ceph-osd
,所以下面我们来完成:
然后再执行: