比较IOMMU NVMe和原生NVMe存储性能

我通过 Open Virtual Machine Firmware(OMVF)三星PM9A1 NVMe存储 assign到kvm虚拟机,这种 pass-through 技术可以极大提升虚拟机的存储性能。本文将采用 fio 存储性能测试 对比存储性能,观察能否满足性能要求,后续还会进行 IOMMU性能优化

备注

Open Virtual Machine Firmware(OMVF) 虚拟机指定NVMe设备前,需要先在物理主机上内核屏蔽了 NVMe 设备。所以要测试物理服务器,必须在内核没有屏蔽之前进行。

由于我已经完成NVMe assign 给虚拟机 z-iommu ,所以我先在虚拟机内部完成fio测试,然后关闭虚拟机,去除内核屏蔽nvme设备。然后重启服务器,让物理主机能够访问NVMe设备,再进行对比测试。

磁盘性能测试

虚拟机配置

  • 测试虚拟机的 vcpu 需要匹配 fionumjobs 数量,按照我下文测试命令采用 -numjobs=4 所以配置虚拟机 vcpu=4

  • 虚拟机分配内存 16GB (可选)

随机写IOPS

  • 测试命令:

    fio -direct=1 -iodepth=32 -rw=randwrite -ioengine=libaio -bs=4k -numjobs=4 -time_based=1 -runtime=60 -group_reporting -filename=/dev/nvme0n1 -name=test
    
  • 虚拟机测试结果:

IOMMU虚拟机 随机写IOPS
 1fio-3.16
 2Starting 4 processes
 3Jobs: 4 (f=4): [w(4)][100.0%][w=1669MiB/s][w=427k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=4): err= 0: pid=1155: Thu Nov 18 23:24:45 2021
 5  write: IOPS=629k, BW=2457MiB/s (2576MB/s)(144GiB/60025msec); 0 zone resets
 6    slat (nsec): min=1915, max=27716k, avg=3359.01, stdev=5297.16
 7    clat (usec): min=65, max=30781, avg=198.89, stdev=200.16
 8     lat (usec): min=121, max=30783, avg=202.45, stdev=200.17
 9    clat percentiles (usec):
10     |  1.00th=[  157],  5.00th=[  169], 10.00th=[  172], 20.00th=[  174],
11     | 30.00th=[  176], 40.00th=[  176], 50.00th=[  178], 60.00th=[  180],
12     | 70.00th=[  180], 80.00th=[  184], 90.00th=[  210], 95.00th=[  306],
13     | 99.00th=[  594], 99.50th=[  807], 99.90th=[ 1287], 99.95th=[ 2704],
14     | 99.99th=[ 3916]
15   bw (  MiB/s): min= 1577, max= 2736, per=100.00%, avg=2457.95, stdev=108.84, samples=480
16   iops        : min=403930, max=700596, avg=629234.87, stdev=27864.27, samples=480
17  lat (usec)   : 100=0.01%, 250=92.67%, 500=5.69%, 750=1.05%, 1000=0.35%
18  lat (msec)   : 2=0.20%, 4=0.05%, 10=0.01%, 20=0.01%, 50=0.01%
19  cpu          : usr=44.65%, sys=45.51%, ctx=703251, majf=0, minf=46
20  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
21     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
22     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
23     issued rwts: total=0,37756393,0,0 short=0,0,0,0 dropped=0,0,0,0
24     latency   : target=0, window=0, percentile=100.00%, depth=32
25
26Run status group 0 (all jobs):
27  WRITE: bw=2457MiB/s (2576MB/s), 2457MiB/s-2457MiB/s (2576MB/s-2576MB/s), io=144GiB (155GB), run=60025-60025msec
28
29Disk stats (read/write):
30  nvme0n1: ios=49/37694388, merge=0/0, ticks=1/2314471, in_queue=41020, util=100.00%
  • 物理主机测试结果:

物理主机 随机写IOPS
 1fio-3.16
 2Starting 4 processes
 3Jobs: 4 (f=4): [w(4)][100.0%][w=1605MiB/s][w=411k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=4): err= 0: pid=3079: Wed Nov 17 00:38:52 2021
 5  write: IOPS=669k, BW=2614MiB/s (2741MB/s)(153GiB/60002msec); 0 zone resets
 6    slat (nsec): min=1735, max=13104k, avg=2479.76, stdev=3624.75
 7    clat (usec): min=17, max=31428, avg=188.19, stdev=229.37
 8     lat (usec): min=100, max=31430, avg=190.78, stdev=229.47
 9    clat percentiles (usec):
10     |  1.00th=[  135],  5.00th=[  143], 10.00th=[  149], 20.00th=[  151],
11     | 30.00th=[  151], 40.00th=[  151], 50.00th=[  151], 60.00th=[  153],
12     | 70.00th=[  157], 80.00th=[  169], 90.00th=[  253], 95.00th=[  375],
13     | 99.00th=[  725], 99.50th=[  906], 99.90th=[ 1369], 99.95th=[ 3294],
14     | 99.99th=[ 4293]
15   bw (  MiB/s): min= 1427, max= 3243, per=99.98%, avg=2613.49, stdev=188.57, samples=480
16   iops        : min=365372, max=830288, avg=669052.77, stdev=48273.91, samples=480
17  lat (usec)   : 20=0.01%, 100=0.01%, 250=89.78%, 500=7.60%, 750=1.70%
18  lat (usec)   : 1000=0.58%
19  lat (msec)   : 2=0.28%, 4=0.04%, 10=0.03%, 20=0.01%, 50=0.01%
20  cpu          : usr=34.03%, sys=43.44%, ctx=6636228, majf=0, minf=23552
21  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
22     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
23     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
24     issued rwts: total=0,40151535,0,0 short=0,0,0,0 dropped=0,0,0,0
25     latency   : target=0, window=0, percentile=100.00%, depth=32
26
27Run status group 0 (all jobs):
28  WRITE: bw=2614MiB/s (2741MB/s), 2614MiB/s-2614MiB/s (2741MB/s-2741MB/s), io=153GiB (164GB), run=60002-60002msec
29
30Disk stats (read/write):
31  nvme0n1: ios=49/40095228, merge=0/0, ticks=3/7015916, in_queue=86932, util=99.93%

随机读IOPS

  • 测试命令:

    fio -direct=1 -iodepth=32 -rw=randread -ioengine=libaio -bs=4k -numjobs=4 -time_based=1 -runtime=60 -group_reporting -filename=/dev/nvme0n1 -name=test
    
  • 虚拟机测试结果:

IOMMU虚拟机 随机读IOPS
 1fio-3.16
 2Starting 4 processes
 3Jobs: 4 (f=4): [r(4)][100.0%][r=2824MiB/s][r=723k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=4): err= 0: pid=1163: Thu Nov 18 23:27:48 2021
 5  read: IOPS=700k, BW=2733MiB/s (2866MB/s)(160GiB/60001msec)
 6    slat (nsec): min=1845, max=1873.3k, avg=3211.29, stdev=2745.61
 7    clat (usec): min=21, max=3663, avg=178.53, stdev=59.84
 8     lat (usec): min=29, max=3667, avg=181.96, stdev=59.88
 9    clat percentiles (usec):
10     |  1.00th=[   91],  5.00th=[  114], 10.00th=[  127], 20.00th=[  141],
11     | 30.00th=[  151], 40.00th=[  159], 50.00th=[  165], 60.00th=[  174],
12     | 70.00th=[  186], 80.00th=[  206], 90.00th=[  241], 95.00th=[  285],
13     | 99.00th=[  412], 99.50th=[  482], 99.90th=[  611], 99.95th=[  676],
14     | 99.99th=[  832]
15   bw (  MiB/s): min= 2509, max= 2825, per=99.97%, avg=2732.07, stdev=21.69, samples=476
16   iops        : min=642448, max=723388, avg=699410.37, stdev=5552.17, samples=476
17  lat (usec)   : 50=0.01%, 100=2.16%, 250=89.18%, 500=8.24%, 750=0.40%
18  lat (usec)   : 1000=0.02%
19  lat (msec)   : 2=0.01%, 4=0.01%
20  cpu          : usr=50.35%, sys=48.28%, ctx=329452, majf=0, minf=169
21  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
22     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
23     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
24     issued rwts: total=41979562,0,0,0 short=0,0,0,0 dropped=0,0,0,0
25     latency   : target=0, window=0, percentile=100.00%, depth=32
26
27Run status group 0 (all jobs):
28   READ: bw=2733MiB/s (2866MB/s), 2733MiB/s-2733MiB/s (2866MB/s-2866MB/s), io=160GiB (172GB), run=60001-60001msec
29
30Disk stats (read/write):
31  nvme0n1: ios=41875509/0, merge=0/0, ticks=4605799/0, in_queue=0, util=99.94%
  • 物理主机测试结果:

物理主机 随机读IOPS
 1fio-3.16
 2Starting 4 processes
 3Jobs: 4 (f=4): [r(4)][100.0%][r=3358MiB/s][r=860k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=4): err= 0: pid=3165: Wed Nov 17 00:41:21 2021
 5  read: IOPS=765k, BW=2988MiB/s (3134MB/s)(175GiB/60001msec)
 6    slat (nsec): min=1731, max=517537, avg=2676.86, stdev=2086.72
 7    clat (usec): min=11, max=2254, avg=163.93, stdev=91.78
 8     lat (usec): min=13, max=2257, avg=166.72, stdev=91.78
 9    clat percentiles (usec):
10     |  1.00th=[   65],  5.00th=[   79], 10.00th=[   88], 20.00th=[  102],
11     | 30.00th=[  117], 40.00th=[  128], 50.00th=[  141], 60.00th=[  155],
12     | 70.00th=[  174], 80.00th=[  204], 90.00th=[  265], 95.00th=[  334],
13     | 99.00th=[  519], 99.50th=[  611], 99.90th=[  873], 99.95th=[  996],
14     | 99.99th=[ 1270]
15   bw (  MiB/s): min= 2590, max= 3393, per=99.88%, avg=2984.89, stdev=56.96, samples=476
16   iops        : min=663154, max=868642, avg=764132.39, stdev=14582.55, samples=476
17  lat (usec)   : 20=0.01%, 50=0.14%, 100=18.40%, 250=69.60%, 500=10.68%
18  lat (usec)   : 750=0.97%, 1000=0.15%
19  lat (msec)   : 2=0.05%, 4=0.01%
20  cpu          : usr=40.12%, sys=46.57%, ctx=4938168, majf=0, minf=23686
21  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
22     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
23     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
24     issued rwts: total=45902275,0,0,0 short=0,0,0,0 dropped=0,0,0,0
25     latency   : target=0, window=0, percentile=100.00%, depth=32
26
27Run status group 0 (all jobs):
28   READ: bw=2988MiB/s (3134MB/s), 2988MiB/s-2988MiB/s (3134MB/s-3134MB/s), io=175GiB (188GB), run=60001-60001msec
29
30Disk stats (read/write):
31  nvme0n1: ios=45797167/0, merge=0/0, ticks=6875887/0, in_queue=0, util=100.00%

顺序写吞吐量

  • 测试命令:

    fio -direct=1 -iodepth=128 -rw=write -ioengine=libaio -bs=128k -numjobs=1 -time_based=1 -runtime=60 -group_reporting -filename=/dev/nvme0n1 -name=test
    
  • 虚拟机测试结果:

IOMMU虚拟机 顺序写吞吐量
 1fio-3.16
 2Starting 1 process
 3Jobs: 1 (f=1): [W(1)][100.0%][w=1672MiB/s][w=13.4k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=1): err= 0: pid=1170: Thu Nov 18 23:30:39 2021
 5  write: IOPS=20.8k, BW=2605MiB/s (2732MB/s)(153GiB/60010msec); 0 zone resets
 6    slat (usec): min=4, max=630, avg= 7.12, stdev= 4.17
 7    clat (usec): min=1319, max=48194, avg=6132.70, stdev=2535.96
 8     lat (usec): min=1327, max=48201, avg=6140.03, stdev=2536.39
 9    clat percentiles (usec):
10     |  1.00th=[ 4752],  5.00th=[ 4817], 10.00th=[ 4817], 20.00th=[ 4817],
11     | 30.00th=[ 4817], 40.00th=[ 4817], 50.00th=[ 4883], 60.00th=[ 4883],
12     | 70.00th=[ 4883], 80.00th=[ 8979], 90.00th=[ 9634], 95.00th=[10028],
13     | 99.00th=[12518], 99.50th=[14484], 99.90th=[39584], 99.95th=[40633],
14     | 99.99th=[45351]
15   bw (  MiB/s): min= 1415, max= 3282, per=99.99%, avg=2605.19, stdev=796.73, samples=120
16   iops        : min=11322, max=26256, avg=20841.52, stdev=6373.88, samples=120
17  lat (msec)   : 2=0.01%, 4=0.01%, 10=94.45%, 20=5.38%, 50=0.15%
18  cpu          : usr=13.17%, sys=17.81%, ctx=1237797, majf=0, minf=11
19  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
20     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
21     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
22     issued rwts: total=0,1250832,0,0 short=0,0,0,0 dropped=0,0,0,0
23     latency   : target=0, window=0, percentile=100.00%, depth=128
24
25Run status group 0 (all jobs):
26  WRITE: bw=2605MiB/s (2732MB/s), 2605MiB/s-2605MiB/s (2732MB/s-2732MB/s), io=153GiB (164GB), run=60010-60010msec
27
28Disk stats (read/write):
29  nvme0n1: ios=49/1249053, merge=0/0, ticks=3/7649919, in_queue=6350160, util=99.91%
  • 物理主机测试结果:

物理主机 顺序写吞吐量
 1fio-3.16
 2Starting 1 process
 3Jobs: 1 (f=1): [W(1)][100.0%][w=1703MiB/s][w=13.6k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=1): err= 0: pid=3236: Wed Nov 17 00:43:35 2021
 5  write: IOPS=20.0k, BW=2624MiB/s (2752MB/s)(154GiB/60011msec); 0 zone resets
 6    slat (usec): min=5, max=567, avg=11.20, stdev= 5.09
 7    clat (usec): min=1237, max=45813, avg=6084.24, stdev=2514.30
 8     lat (usec): min=1247, max=45827, avg=6095.64, stdev=2515.80
 9    clat percentiles (usec):
10     |  1.00th=[ 4752],  5.00th=[ 4752], 10.00th=[ 4752], 20.00th=[ 4752],
11     | 30.00th=[ 4752], 40.00th=[ 4752], 50.00th=[ 4752], 60.00th=[ 4752],
12     | 70.00th=[ 4817], 80.00th=[ 8979], 90.00th=[ 9634], 95.00th=[10028],
13     | 99.00th=[11600], 99.50th=[13435], 99.90th=[39584], 99.95th=[40633],
14     | 99.99th=[43779]
15   bw (  MiB/s): min= 1350, max= 3318, per=100.00%, avg=2624.36, stdev=810.62, samples=120
16   iops        : min=10806, max=26544, avg=20994.88, stdev=6484.91, samples=120
17  lat (msec)   : 2=0.01%, 4=0.01%, 10=94.74%, 20=5.10%, 50=0.16%
18  cpu          : usr=17.24%, sys=21.52%, ctx=1245901, majf=0, minf=18
19  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
20     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
21     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
22     issued rwts: total=0,1259822,0,0 short=0,0,0,0 dropped=0,0,0,0
23     latency   : target=0, window=0, percentile=100.00%, depth=128
24
25Run status group 0 (all jobs):
26  WRITE: bw=2624MiB/s (2752MB/s), 2624MiB/s-2624MiB/s (2752MB/s-2752MB/s), io=154GiB (165GB), run=60011-60011msec
27
28Disk stats (read/write):
29  nvme0n1: ios=49/1258174, merge=0/0, ticks=3/7648325, in_queue=6389084, util=99.89%

顺序读吞吐量

  • 测试命令:

    fio -direct=1 -iodepth=128 -rw=read -ioengine=libaio -bs=128k -numjobs=1 -time_based=1 -runtime=60 -group_reporting -filename=/dev/nvme0n1 -name=test
    
  • 虚拟机测试结果:

IOMMU虚拟机 顺序读吞吐量
 1test: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=128
 2fio-3.16
 3Starting 1 process
 4Jobs: 1 (f=1): [R(1)][100.0%][r=3399MiB/s][r=27.2k IOPS][eta 00m:00s]
 5test: (groupid=0, jobs=1): err= 0: pid=1175: Thu Nov 18 23:32:49 2021
 6  read: IOPS=27.1k, BW=3394MiB/s (3559MB/s)(199GiB/60004msec)
 7    slat (usec): min=4, max=537, avg= 5.67, stdev= 3.28
 8    clat (usec): min=1089, max=12459, avg=4707.56, stdev=1147.66
 9     lat (usec): min=1098, max=12630, avg=4713.40, stdev=1147.66
10    clat percentiles (usec):
11     |  1.00th=[ 3818],  5.00th=[ 4015], 10.00th=[ 4047], 20.00th=[ 4113],
12     | 30.00th=[ 4146], 40.00th=[ 4178], 50.00th=[ 4228], 60.00th=[ 4293],
13     | 70.00th=[ 4621], 80.00th=[ 4948], 90.00th=[ 6128], 95.00th=[ 7832],
14     | 99.00th=[ 8979], 99.50th=[ 9241], 99.90th=[ 9503], 99.95th=[ 9634],
15     | 99.99th=[ 9634]
16   bw (  MiB/s): min= 3330, max= 3406, per=99.99%, avg=3393.32, stdev= 7.40, samples=120
17   iops        : min=26642, max=27248, avg=27146.56, stdev=59.22, samples=120
18  lat (msec)   : 2=0.01%, 4=4.82%, 10=95.17%, 20=0.01%
19  cpu          : usr=8.33%, sys=23.95%, ctx=1272560, majf=0, minf=4106
20  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
21     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
22     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
23     issued rwts: total=1629113,0,0,0 short=0,0,0,0 dropped=0,0,0,0
24     latency   : target=0, window=0, percentile=100.00%, depth=128
25
26Run status group 0 (all jobs):
27   READ: bw=3394MiB/s (3559MB/s), 3394MiB/s-3394MiB/s (3559MB/s-3559MB/s), io=199GiB (214GB), run=60004-60004msec
28
29Disk stats (read/write):
30  nvme0n1: ios=1626219/0, merge=0/0, ticks=7648322/0, in_queue=6432852, util=99.93%
  • 物理主机测试结果:

物理主机 顺序读吞吐量
 1fio-3.16
 2Starting 1 process
 3Jobs: 1 (f=1): [R(1)][100.0%][r=3444MiB/s][r=27.5k IOPS][eta 00m:00s]
 4test: (groupid=0, jobs=1): err= 0: pid=3314: Wed Nov 17 00:45:32 2021
 5  read: IOPS=27.5k, BW=3437MiB/s (3604MB/s)(201GiB/60004msec)
 6    slat (usec): min=4, max=556, avg= 8.88, stdev= 4.45
 7    clat (usec): min=1348, max=12643, avg=4644.51, stdev=1082.97
 8     lat (usec): min=1371, max=12656, avg=4653.56, stdev=1082.95
 9    clat percentiles (usec):
10     |  1.00th=[ 3785],  5.00th=[ 3949], 10.00th=[ 3982], 20.00th=[ 4047],
11     | 30.00th=[ 4113], 40.00th=[ 4146], 50.00th=[ 4228], 60.00th=[ 4293],
12     | 70.00th=[ 4555], 80.00th=[ 4883], 90.00th=[ 5997], 95.00th=[ 7570],
13     | 99.00th=[ 8717], 99.50th=[ 8979], 99.90th=[ 9241], 99.95th=[ 9372],
14     | 99.99th=[ 9503]
15   bw (  MiB/s): min= 3313, max= 3451, per=100.00%, avg=3437.09, stdev=13.50, samples=120
16   iops        : min=26504, max=27614, avg=27496.70, stdev=107.98, samples=120
17  lat (msec)   : 2=0.01%, 4=11.45%, 10=88.54%, 20=0.01%
18  cpu          : usr=12.83%, sys=28.07%, ctx=1178237, majf=0, minf=4112
19  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
20     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
21     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
22     issued rwts: total=1649929,0,0,0 short=0,0,0,0 dropped=0,0,0,0
23     latency   : target=0, window=0, percentile=100.00%, depth=128
24
25Run status group 0 (all jobs):
26   READ: bw=3437MiB/s (3604MB/s), 3437MiB/s-3437MiB/s (3604MB/s-3604MB/s), io=201GiB (216GB), run=60004-60004msec
27
28Disk stats (read/write):
29  nvme0n1: ios=1647401/0, merge=0/0, ticks=7644604/0, in_queue=6008424, util=99.87%

测试结果

备注

我第一次测试 Open Virtual Machine Firmware(OMVF) 虚拟机,设置了 1c2g 规格。实际上上述随机读写测试采用了4个job,我观察了实际上会把4个CPU核心打满。对于 1c2g 虚拟机由于只有1个cpu,会导致性能无法满足并发4个读写进程对要求: 测试结果读写性能只有物理主机的 1/4 不到

第二次测试我分配了4cpu的虚拟机,并发果然跑满4个vcpu之后,虚拟机存储性能基本上接近物理主机存储性能

IOMMU虚拟机和物理机 NVMe性能对比

IOMMU虚拟

物理主机

虚拟/物理

PM9A1标称

物理/标称

虚拟/标称

4K随机写IOPS

629k

669k

94%

850k

78.7%

74%

4K随机读IOPS

700k

765k

91.5%

1000k

76.5%

70%

顺序写吞吐量

2732MB/s

2752MB/s

99.3%

5100MB/s

54%

53.6%

顺序读吞吐量

3559MB/s

3604MB/s

98.8%

7000MB/s

51.5%

50.8%

  • 采用 iommu 方式pass-through NVMe存储给虚拟机,结合了 Open Virtual Machine Firmware(OMVF) (uefi)虚拟机 + IOMMU调优: CPU pinning ,可以接近直接物理主机读写NVMe性能: 随机4k读写性能 92% ~ 94% ,顺序读写性能 99%

  • 由于我使用的二手 HPE ProLiant DL360 Gen9服务器 硬件是PCIe 3.0,所以对于 PCIe 3.0 x4 接口,最高只支持大约 3500 MB/s 接口速率,从我的 fio 存储性能测试 测试来看

    • 物理读写NVMe存储受限于PCIe3.0接口,只能获得顺序读写能力50%(已接近理想值)和随机读写能力77%

    • 虚拟化消耗的存储性能不多,所以也能获得硬件顺序读写能力50%和随机读写能力72%

    • 对于我的模拟测试环境,采用iommu虚拟化存储应该能过满足部署大规模云计算需求

参考