移动云计算Ceph添加Ceph OSDs (LVM卷)
在完成了初始 移动云计算Ceph部署ceph-mon 之后,就可以添加 OSDs。只有完成了足够的OSDs部署(满足对象数量,例如 osd pool default size =2
要求集群至少具备2个OSDs)才能达到 active + clean
状态。在完成了 bootstap
Ceph monitor之后,集群就具备了一个默认的 CRUSH
map,但是此时 CRUSH
map还没有具备任何Ceph OSD Daemons map到一个Ceph节点。
Ceph提供了一个 ceph-vlume
工具,用来准备一个逻辑卷,磁盘或分区给Ceph使用,通过增加索引来创建OSD ID,并且将新的OSD添加到CRUSH map。需要在每个要添加OSD的节点上执行该工具。
备注
我有3个服务器节点提供存储,需要分别在这3个节点上部署OSD服务。
备注
Ceph官方文档的案例都是采用 ceph-volume lvm
来完成的,这个命令可以在Ceph的OSD底层构建一个 Linux LVM逻辑卷管理 ,带来的优势是可以随时扩容底层存储容量,对后续运维带来极大便利。在生产环境中部署,建议使用 lvm
卷。
bluestore
Ceph后端存储引擎BlueStore 是最新的Ceph采用的默认高性能存储引擎,底层不再使用OS的文件系统,可以直接管理磁盘硬件。
需要部署OSD的服务器首先准备存储,通常采用LVM卷作为底层存储块设备,这样可以通过LVM逻辑卷灵活调整块设备大小(有可能随着数据存储增长需要调整设备)。
使用LVM作为bluestore底层
执行
ceph-volume --help
可以看到支持3种底层存储:lvm Use LVM and LVM-based technologies to deploy OSDs simple Manage already deployed OSDs with ceph-volume raw Manage single-device OSDs on raw block devices
我这里构建实践采用 ceph-volume lvm
,这个命令会自动创建底层 Linux LVM逻辑卷管理
准备 vdb
虚拟磁盘分区
备注
生产环境请使用LVM卷作为底层设备 - 参考 Ceph BlueStore配置
我的部署实践是在3台虚拟机 z-b-data-1
/ z-b-data-2
/ z-b-data-3
上完成,分区完全一致
准备底层块设备( 虚拟机有2块磁盘,其中
/dev/vdb
用于Ceph数据 磁盘空间有限,分配50G),这里划分 GPT 分区1 :
sudo parted /dev/vdb mklabel gpt
sudo parted -a optimal /dev/vdb mkpart primary 0% 50G
完成后检查 fdisk -l
可以看到:
Disk /dev/vdb: 55 GiB, 59055800320 bytes, 115343360 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A1497C37-CA99-431B-9AF4-DC99FBBDC2B9
Device Start End Sectors Size Type
/dev/vdb1 2048 97656831 97654784 46.6G Linux filesystem
备注
以上分区操作在3台存储虚拟机上完成
创建OSD使用的bluestore存储
创建第一个OSD,注意我使用了统一的
data
存储来存放所有数据,包括block.db
和block.wal
:
sudo ceph-volume lvm create --bluestore --data /dev/vdb1
备注
ceph-volume raw -h
包含子命令:
list list BlueStore OSDs on raw devices
prepare Format a raw device and associate it with a (BlueStore) OSD
activate Discover and prepare a data directory for a (BlueStore) OSD on a raw device
ceph-volume lvm -h
包含子命令:
activate Discover and mount the LVM device associated with an OSD ID and start the Ceph OSD
deactivate Deactivate OSDs
batch Automatically size devices for multi-OSD provisioning with minimal interaction
prepare Format an LVM device and associate it with an OSD
create Create a new OSD from an LVM device
trigger systemd helper to activate an OSD
list list logical volumes and devices associated with Ceph
zap Removes all data and filesystems from a logical volume or partition.
migrate Migrate BlueFS data from to another LVM device
new-wal Allocate new WAL volume for OSD at specified Logical Volume
new-db Allocate new DB volume for OSD at specified Logical Volume
对于 raw
命令需要分步骤完成,不像 lvm
命令提供了更为丰富的批量命令
提示信息:
1Running command: /usr/bin/ceph-authtool --gen-print-key
2Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931
3Running command: vgcreate --force --yes ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532 /dev/vdb1
4 stdout: Physical volume "/dev/vdb1" successfully created.
5 stdout: Volume group "ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532" successfully created
6Running command: lvcreate --yes -l 11920 -n osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931 ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532
7 stdout: Logical volume "osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931" created.
8Running command: /usr/bin/ceph-authtool --gen-print-key
9Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
10Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-0
11Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532/osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931
12Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0
13Running command: /usr/bin/ln -s /dev/ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532/osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931 /var/lib/ceph/osd/ceph-0/block
14Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
15 stderr: 2022-12-08T23:59:38.809+0800 ffff88d4f1a0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
162022-12-08T23:59:38.809+0800 ffff88d4f1a0 -1 AuthRegistry(0xffff840601e0) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
17 stderr: got monmap epoch 2
18--> Creating keyring file for osd.0
19Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
20Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
21Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931 --setuser ceph --setgroup ceph
22 stderr: 2022-12-08T23:59:39.079+0800 ffff8ca8b040 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
23--> ceph-volume lvm prepare successful for: /dev/vdb1
24Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
25Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532/osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
26Running command: /usr/bin/ln -snf /dev/ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532/osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931 /var/lib/ceph/osd/ceph-0/block
27Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
28Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0
29Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
30Running command: /usr/bin/systemctl enable ceph-volume@lvm-0-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931
31 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931.service → /usr/lib/systemd/system/ceph-volume@.service.
32Running command: /usr/bin/systemctl enable --runtime ceph-osd@0
33 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service → /usr/lib/systemd/system/ceph-osd@.service.
34Running command: /usr/bin/systemctl start ceph-osd@0
35--> ceph-volume lvm activate successful for osd ID: 0
36--> ceph-volume lvm create successful for: /dev/vdb1
这次实践似乎有一些报错,不像之前 添加Ceph OSDs (LVM卷) 执行没有任何错误。看起来是缺少针对 osd
的 ceph.keyring
不过,在osd目录下有一个 /var/lib/ceph/osd/ceph-0/keyring
,并且也没有影响运行
检查osd 卷设备:
sudo ceph-volume lvm list
可以看到设备文件如下:
====== osd.0 =======
[block] /dev/ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532/osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931
block device /dev/ceph-97fa0d8e-9538-462c-98a0-7d95fe2d4532/osd-block-2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931
block uuid W4kXl5-5v9W-yhE3-MOXW-YzoP-HT8X-xaE1tV
cephx lockbox secret
cluster fsid 598dc69c-5b43-4a3b-91b8-f36fc403bcc5
cluster name ceph
crush device class
encrypted 0
osd fsid 2bcd1d3d-c9bf-4276-8fe2-b6f1e3efe931
osd id 0
osdspec affinity
type block
vdo 0
devices /dev/vdb1
使用 ceph-volume lvm create
命令有以下优点:
OSD自动激活并运行
自动添加了 Systemd进程管理器 对应服务配置,所以操作系统重启不会遇到我之前 添加Ceph OSDs (RAW磁盘) 中无法正确挂载卷和运行OSD的问题
检查集群状态:
sudo ceph -s
可以看到OSD已经运行:
cluster:
id: 598dc69c-5b43-4a3b-91b8-f36fc403bcc5
health: HEALTH_WARN
2 mgr modules have recently crashed
OSD count 1 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum a-b-data-1 (age 66m)
mgr: a-b-data-1(active, since 53m)
osd: 1 osds: 1 up (since 28m), 1 in (since 28m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 19 MiB used, 47 GiB / 47 GiB avail
pgs:
检查OSD状态:
sudo ceph osd tree
可以看到:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.04549 root default
-3 0.04549 host a-b-data-1
0 hdd 0.04549 osd.0 up 1.00000 1.00000
请注意,现在只有一个OSD运行,不满足配置中要求3个副本的要求,我们需要添加OSD节点
整合脚本快速完成
安装OSD非常简便,所以整合脚本也就更为简单:
#!/usr/bin/env bash
ceph_env() {
CLUSTER=ceph
# FSID=$(cat /proc/sys/kernel/random/uuid)
FSID=598dc69c-5b43-4a3b-91b8-f36fc403bcc5
HOST=$(hostname -s)
HOST_IP=$(hostname -i)
HOST_1=a-b-data-1
HOST_2=a-b-data-2
HOST_3=a-b-data-3
HOST_1_IP=192.168.8.204
HOST_2_IP=192.168.8.205
HOST_3_IP=192.168.8.206
HOST_NET=192.168.8.0/24
}
parted_vbd() {
sudo parted /dev/vdb mklabel gpt
sudo parted -a optimal /dev/vdb mkpart primary 0% 50G
}
create_dir() {
local dir="$1"
if [ ! -d "$dir" ]; then
sudo mkdir -p $dir
sudo chown ceph:ceph $dir
fi
}
create_ceph_osd() {
create_dir /var/lib/ceph/osd/ceph-0
sudo ceph-volume lvm create --bluestore --data /dev/vdb1
sleep 1
echo "ceph-volume:"
sudo ceph-volume lvm list
}
ceph_env
parted_vbd
create_ceph_osd
重启操作系统验证
重启操作系统 sudo shutdown -r now
启动后检查:
sudo ceph -s
可以看到 ceph-volume lvm
默认配置非常方便,重启后系统服务正常,OSD也能正常运行:
cluster:
id: 598dc69c-5b43-4a3b-91b8-f36fc403bcc5
health: HEALTH_WARN
mon is allowing insecure global_id reclaim
1 monitors have not enabled msgr2
OSD count 1 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum a-b-data-1 (age 60s)
mgr: a-b-data-1(active, since 47s)
osd: 1 osds: 1 up (since 55s), 1 in (since 17m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 5.5 MiB used, 47 GiB / 47 GiB avail
pgs:
上述 HEALTH_WARN
暂时不用顾虑,原因是OSD数量尚未满足配置3副本要求,后续将会配置补上。根据目前输出信息,3个服务都已经启动
添加OSD
需要满足3副本要求,我们需要在服务器本地或者其他服务器上添加OSD。为了能够冗余,我采用集群3个服务器上每个服务器都启动 ceph-mon
和 ceph-osd
,所以下面我们来完成:
然后再执行: