西部数据Passport SSD移动硬盘

为了解决 Raspberry Pi 存储 性能低下的短板,我尝试采用高速TF卡,但是发现性能依然无法和主流的Intel主机所配置的SSD磁盘性能相提并论。目前,可行的方式是通过树莓派USB 3.0接口连接SSD移动硬盘来提高存储性能。

我选择的是 西部数据my passport ssd固态移动硬盘,原因:

  • 颜值高,非常小巧适合配合树莓派设备

  • 虽然不是NvME固态硬盘,但是树莓派只有USB 3.0接口,最高只支持500MB/s,购买NvME的固态移动硬盘无法发挥性能。所以,这款SSD移动硬盘配合树莓派较好

  • 价格相对较低

备注

实际全盘连续写入测试,最高写入速度超过410MB/s,但是持续写入性能有波动,平均得到的性能大约260MB/s。我在考虑后续是不是可以采用NvME的固态硬盘,看看能否达到稳定的写入性能。

备注

第三次购买的Passport SSD,我遇到一个非常奇怪的 Linux SSD分区对齐 问题

文件系统格式化

为初步测试性能,并对比TF卡性能,采用先划分 128G 磁盘空间:

# parted -a optimal /dev/sda
GNU Parted 3.3
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: WD My Passport 25F3 (scsi)
Disk /dev/sda: 1024GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name         Flags
 1      1049kB  1024GB  1024GB               My Passport  msftdata

 (parted) rm 1
 (parted) print
 Model: WD My Passport 25F3 (scsi)
 Disk /dev/sda: 1024GB
 Sector size (logical/physical): 512B/4096B
 Partition Table: gpt
 Disk Flags:

 Number  Start  End  Size  File system  Name  Flags

 (parted) mkpart primary ext4 0 128G
 Warning: The resulting partition is not properly aligned for best performance:
 34s % 2048s != 0s
 Ignore/Cancel? c
 (parted) mkpart primary ext4 2048s 128G
 (parted) align-check optimal 1
 1 aligned
 (parted) quit
 Information: You may need to update /etc/fstab.
  • 文件系统格式化:

    mkfs.ext4 /dev/sda1
    
  • 挂载磁盘:

    mount /dev/sda1 /mnt
    

性能测试

随机写IOPS

  • 随机写IOPS:

    fio -direct=1 -iodepth=32 -rw=randwrite -ioengine=libaio -bs=4k \
    -numjobs=4 -time_based=1 -runtime=1000 -group_reporting \
    -filename=fio.img -size=1g -name=test_fio
    

从测试时性能数据来看,SSD移动硬盘的写入IOPS确实非常高,能够达到 3w IOPS,并且带宽达到 120+MB/s

测试是性能显示:

test_fio: Laying out IO file (1 file / 1024MiB)
Jobs: 4 (f=4): [w(4)][100.0%][w=120MiB/s][w=30.7k IOPS][eta 00m:00s]
test_fio: (groupid=0, jobs=4): err= 0: pid=2438: Wed Sep 23 23:05:51 2020
  write: IOPS=30.2k, BW=118MiB/s (124MB/s)(115GiB/1000001msec); 0 zone resets
    slat (usec): min=19, max=24485, avg=120.78, stdev=358.41
    clat (usec): min=26, max=95528, avg=4116.96, stdev=2997.23
     lat (usec): min=128, max=95733, avg=4238.47, stdev=3067.58
    clat percentiles (usec):
     |  1.00th=[ 1729],  5.00th=[ 1778], 10.00th=[ 2540], 20.00th=[ 2606],
     | 30.00th=[ 2638], 40.00th=[ 2999], 50.00th=[ 3392], 60.00th=[ 3785],
     | 70.00th=[ 4080], 80.00th=[ 4424], 90.00th=[ 5932], 95.00th=[10683],
     | 99.00th=[18482], 99.50th=[20841], 99.90th=[27395], 99.95th=[29492],
     | 99.99th=[36439]
   bw (  KiB/s): min=29136, max=165440, per=99.97%, avg=120660.79, stdev=9165.37, samples=7996
   iops        : min= 7284, max=41360, avg=30165.04, stdev=2291.35, samples=7996
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=7.22%, 4=60.33%, 10=26.85%, 20=4.93%, 50=0.66%
  lat (msec)   : 100=0.01%
  cpu          : usr=5.79%, sys=79.52%, ctx=8968401, majf=0, minf=84
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,30172855,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
  WRITE: bw=118MiB/s (124MB/s), 118MiB/s-118MiB/s (124MB/s-124MB/s), io=115GiB (124GB), run=1000001-1000001msec
Disk stats (read/write):
  sda: ios=0/30170065, merge=0/8989, ticks=0/6428786, in_queue=35324, util=100.00%

测试时top显示:

top - 22:50:28 up 7 min,  2 users,  load average: 3.89, 2.38, 1.00
Tasks: 636 total,   6 running, 630 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.0 us, 29.5 sy,  0.0 ni,  5.6 id,  0.0 wa,  0.0 hi, 62.0 si,  0.0 st
%Cpu1  :  6.2 us, 89.3 sy,  0.0 ni,  4.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  5.9 us, 86.5 sy,  0.0 ni,  7.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  6.5 us, 87.3 sy,  0.0 ni,  6.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7811.3 total,   6976.8 free,    200.1 used,    634.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   7094.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2438 root      20   0  790132   4912    864 R  95.1   0.1   1:03.47 fio
   2441 root      20   0  790144   4932    880 R  92.5   0.1   1:02.84 fio
   2439 root      20   0  790136   4892    844 R  84.6   0.1   1:01.11 fio
   2440 root      20   0  790140   4928    876 R  75.5   0.1   0:58.63 fio
      9 root      20   0       0      0      0 R   5.6   0.0   0:04.06 ksoftirqd/0
      6 root       0 -20       0      0      0 I   2.0   0.0   0:03.66 kworker/0:0H-kblockd
   1894 root      20   0   11228   3676   2588 R   2.0   0.0   0:05.09 top
   2436 root      20   0  790140 428312 424288 S   1.6   5.4   0:02.75 fio
     10 root      20   0       0      0      0 I   0.3   0.0   0:00.21 rcu_preempt
   2156 root      20   0       0      0      0 I   0.3   0.0   0:00.73 kworker/0:12-events

备注

测试时注意到 cpu0 的软中断极高,达到 62% ,说明存在瓶颈。而测试时,几乎没有 iowait ,显示SSD存储性能有余量未达到最高性能,树莓派的CPU瓶颈导致未能充分发挥SSD存储性能。

备注

在测试随机写IOPS时,我发现树莓派(2G版)突然重启,所以参考 排查系统crash 排查Raspberry Pi 4存储测试fio出现crash 。详见排查文档。

不过,最近购买的8G版本,并且升级内核之后,该项测试顺利通过。

  • 对比测试SanDisk的128 TF卡(高速卡,官方参数达到90MB/s写入),相同检测命令,获得4k写入性能: 2.7MB/s,659IOPS:

    test_fio: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
    ...
    fio-3.16
    Starting 4 processes
    Jobs: 4 (f=4): [w(4)][6.4%][w=124KiB/s][w=31 IOPS][eta 24m:54s]
    test_fio: (groupid=0, jobs=4): err= 0: pid=2561: Wed Sep 23 23:18:33 2020
      write: IOPS=659, BW=2638KiB/s (2702kB/s)(261MiB/101377msec); 0 zone resets
        slat (usec): min=28, max=1584.7k, avg=748.82, stdev=16731.37
        clat (msec): min=2, max=5550, avg=193.21, stdev=276.96
         lat (msec): min=2, max=5550, avg=193.96, stdev=278.90
        clat percentiles (msec):
         |  1.00th=[    6],  5.00th=[   18], 10.00th=[   32], 20.00th=[   60],
         | 30.00th=[   88], 40.00th=[  118], 50.00th=[  148], 60.00th=[  178],
         | 70.00th=[  207], 80.00th=[  241], 90.00th=[  279], 95.00th=[  468],
         | 99.00th=[ 1720], 99.50th=[ 2165], 99.90th=[ 2970], 99.95th=[ 3272],
         | 99.99th=[ 4463]
       bw (  KiB/s): min=   32, max= 4162, per=100.00%, avg=2696.47, stdev=315.15, samples=792
       iops        : min=    8, max= 1040, avg=674.02, stdev=78.80, samples=792
      lat (msec)   : 4=0.36%, 10=1.97%, 20=3.57%, 50=10.65%, 100=17.58%
      lat (msec)   : 250=48.96%, 500=12.25%, 750=2.03%, 1000=0.72%, 2000=1.18%
      lat (msec)   : >=2000=0.74%
      cpu          : usr=0.34%, sys=1.64%, ctx=65062, majf=0, minf=83
      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.8%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
         issued rwts: total=0,66869,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=32
    
    Run status group 0 (all jobs):
      WRITE: bw=2638KiB/s (2702kB/s), 2638KiB/s-2638KiB/s (2702kB/s-2702kB/s), io=261MiB (274MB), run=101377-101377msec
    

备注

4k写入性能: SSD存储随机写4k性能是TF卡的 45.8 倍(IOPS),接近46倍的差距。

随机读IOPS

顺序写吞吐量(写带宽)

  • 测试命令:

    fio -direct=1 -iodepth=128 -rw=write -ioengine=libaio \
    -bs=128k -numjobs=4 -time_based=1 -runtime=1000 \
    -group_reporting -filename=/mnt/fio.img -name=test
    

顺序写入性能达到 319MB/s ,2550 IOPS ,比较稳定。另外,测试发现,并发 --jobs 是1还是4,实际获得的总带宽基本相同。不过,并发4个jobs,则系统load较高(load>=4),所以负载还是比单个jobs要大很多,总体来看这块SSD的顺序读写能力稳定。

top显示:

top - 08:39:01 up  8:33,  4 users,  load average: 4.13, 3.13, 1.51
Tasks: 151 total,   1 running, 150 sleeping,   0 stopped,   0 zombie
%Cpu0  :  4.6 us, 18.1 sy,  0.0 ni, 66.0 id, 10.6 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu1  :  4.9 us,  9.8 sy,  0.0 ni, 70.9 id, 14.4 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  3.3 us, 10.5 sy,  0.0 ni, 74.0 id, 12.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  2.6 us, 10.5 sy,  0.0 ni, 75.0 id, 11.8 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1848.2 total,    759.8 free,    263.3 used,    825.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1148.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3196 root      20   0  806544  21300    888 D  12.6   1.1   0:49.07 fio
   3197 root      20   0  806548  21312    900 D  12.3   1.1   0:49.08 fio
   3194 root      20   0  806536  21308    888 D  11.9   1.1   0:49.01 fio
   3195 root      20   0  806540  21308    888 D  11.9   1.1   0:49.00 fio
   3208 root       0 -20       0      0      0 I   8.9   0.0   0:06.89 kworker/0:0H-kblockd
   3192 root      20   0  790140 428580 424560 S   1.3  22.6   0:06.18 fio
   3162 root      20   0   10684   3008   2592 R   0.7   0.2   0:04.11 top

顺序写入没有出现异常重启现象。

测试结果显示写入带宽达到 320MB/s , 2560 IOPS:

test_serial_write: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=128
...
fio-3.16
Starting 4 processes
Jobs: 4 (f=4): [W(4)][100.0%][w=320MiB/s][w=2560 IOPS][eta 00m:00s]
test_serial_write: (groupid=0, jobs=4): err= 0: pid=3194: Mon Sep 21 08:48:56 2020
  write: IOPS=2531, BW=316MiB/s (332MB/s)(309GiB/1000091msec); 0 zone resets
    slat (usec): min=43, max=97299, avg=1562.52, stdev=3709.91
    clat (msec): min=53, max=619, avg=200.67, stdev=20.54
     lat (msec): min=54, max=628, avg=202.24, stdev=20.57
    clat percentiles (msec):
     |  1.00th=[  146],  5.00th=[  163], 10.00th=[  184], 20.00th=[  192],
     | 30.00th=[  194], 40.00th=[  197], 50.00th=[  201], 60.00th=[  203],
     | 70.00th=[  205], 80.00th=[  209], 90.00th=[  218], 95.00th=[  241],
     | 99.00th=[  271], 99.50th=[  279], 99.90th=[  296], 99.95th=[  309],
     | 99.99th=[  334]
   bw (  KiB/s): min=244992, max=384574, per=99.97%, avg=323936.04, stdev=2701.86, samples=8000
   iops        : min= 1914, max= 3004, avg=2530.50, stdev=21.11, samples=8000
  lat (msec)   : 100=0.04%, 250=96.58%, 500=3.37%, 750=0.01%
  cpu          : usr=3.10%, sys=9.01%, ctx=572236, majf=0, minf=87
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,2531704,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=316MiB/s (332MB/s), 316MiB/s-316MiB/s (332MB/s-332MB/s), io=309GiB (332GB), run=1000091-1000091msec

Disk stats (read/write):
  sda: ios=0/649953, merge=0/1877425, ticks=0/54335318, in_queue=53030424, util=100.00%

顺序读吞吐量(读带宽)

  • 顺序读吞吐量(读带宽):

    fio -direct=1 -iodepth=128 -rw=read -ioengine=libaio \
    -bs=128k -numjobs=1 -time_based=1 -runtime=1000 \
    -group_reporting -filename=/mnt/fio.img -name=test_serial_read
    

测试结果显示顺序读带宽 379MB/s, 3032 IOPS,相对顺序写快20%:

test_serial_read: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=128
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=379MiB/s][r=3032 IOPS][eta 00m:00s]
test_serial_read: (groupid=0, jobs=1): err= 0: pid=3749: Mon Sep 21 13:23:25 2020
  read: IOPS=3026, BW=378MiB/s (397MB/s)(369GiB/1000042msec)
    slat (usec): min=24, max=860, avg=53.87, stdev= 8.65
    clat (msec): min=7, max=519, avg=42.23, stdev=29.75
     lat (msec): min=7, max=519, avg=42.29, stdev=29.75
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   19], 10.00th=[   26], 20.00th=[   35],
     | 30.00th=[   42], 40.00th=[   43], 50.00th=[   43], 60.00th=[   43],
     | 70.00th=[   43], 80.00th=[   43], 90.00th=[   43], 95.00th=[   43],
     | 99.00th=[  218], 99.50th=[  296], 99.90th=[  342], 99.95th=[  359],
     | 99.99th=[  363]
   bw (  KiB/s): min=286720, max=388864, per=99.98%, avg=387312.22, stdev=2570.62, samples=2000
   iops        : min= 2240, max= 3038, avg=3025.77, stdev=20.10, samples=2000
  lat (msec)   : 10=0.37%, 20=5.87%, 50=90.67%, 100=0.97%, 250=1.37%
  lat (msec)   : 500=0.76%, 750=0.01%
  cpu          : usr=3.19%, sys=19.09%, ctx=759378, majf=0, minf=4122
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=3026664,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=378MiB/s (397MB/s), 378MiB/s-378MiB/s (397MB/s-397MB/s), io=369GiB (397GB), run=1000042-1000042msec

Disk stats (read/write):
  sda: ios=756532/3, merge=2269603/1, ticks=31953025/280, in_queue=30273944, util=100.00%

参考