存储设备S.M.A.R.T监控¶
我的二手 HPE ProLiant DL360 Gen9服务器 服务器使用了一块我很久以前购买的Intel SATA SSD磁盘,不过这块SSD时不时在系统日志中留下触目惊心的Err记录:
[Sun Aug 6 11:05:54 2023] ata5.00: exception Emask 0x0 SAct 0x80080000 SErr 0x0 action 0x6 frozen
[Sun Aug 6 11:05:54 2023] ata5.00: failed command: READ FPDMA QUEUED
[Sun Aug 6 11:05:54 2023] ata5.00: cmd 60/08:98:98:20:9c/00:00:02:00:00/40 tag 19 ncq dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[Sun Aug 6 11:05:54 2023] ata5.00: status: { DRDY }
[Sun Aug 6 11:05:54 2023] ata5.00: failed command: READ FPDMA QUEUED
[Sun Aug 6 11:05:54 2023] ata5.00: cmd 60/08:f8:e8:e4:8c/00:00:00:00:00/40 tag 31 ncq dma 4096 in
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[Sun Aug 6 11:05:54 2023] ata5.00: status: { DRDY }
[Sun Aug 6 11:05:54 2023] ata5: hard resetting link
[Sun Aug 6 11:05:54 2023] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Sun Aug 6 11:05:54 2023] ata5.00: configured for UDMA/133
[Sun Aug 6 11:05:54 2023] sd 4:0:0:0: [sdb] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=30s
[Sun Aug 6 11:05:54 2023] sd 4:0:0:0: [sdb] tag#31 Sense Key : Illegal Request [current]
[Sun Aug 6 11:05:54 2023] sd 4:0:0:0: [sdb] tag#31 Add. Sense: Unaligned write command
[Sun Aug 6 11:05:54 2023] sd 4:0:0:0: [sdb] tag#31 CDB: Read(10) 28 00 00 8c e4 e8 00 00 08 00
[Sun Aug 6 11:05:54 2023] blk_update_request: I/O error, dev sdb, sector 9233640 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Sun Aug 6 11:05:54 2023] ata5: EH complete
[Sun Aug 6 11:05:54 2023] ata5.00: Enabling discard_zeroes_data
[Sun Aug 6 11:06:24 2023] ata5.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
[Sun Aug 6 11:06:24 2023] ata5.00: failed command: READ FPDMA QUEUED
[Sun Aug 6 11:06:24 2023] ata5.00: cmd 60/08:c0:70:1f:ce/00:00:00:00:00/40 tag 24 ncq dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[Sun Aug 6 11:06:24 2023] ata5.00: status: { DRDY }
[Sun Aug 6 11:06:24 2023] ata5: hard resetting link
[Sun Aug 6 11:06:24 2023] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Sun Aug 6 11:06:24 2023] ata5.00: configured for UDMA/133
[Sun Aug 6 11:06:24 2023] ata5.00: device reported invalid CHS sector 0
[Sun Aug 6 11:06:24 2023] sd 4:0:0:0: [sdb] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=30s
[Sun Aug 6 11:06:24 2023] sd 4:0:0:0: [sdb] tag#24 Sense Key : Illegal Request [current]
[Sun Aug 6 11:06:24 2023] sd 4:0:0:0: [sdb] tag#24 Add. Sense: Unaligned write command
[Sun Aug 6 11:06:24 2023] sd 4:0:0:0: [sdb] tag#24 CDB: Read(10) 28 00 00 ce 1f 70 00 00 08 00
[Sun Aug 6 11:06:24 2023] blk_update_request: I/O error, dev sdb, sector 13508464 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Sun Aug 6 11:06:24 2023] ata5: EH complete
[Sun Aug 6 11:06:24 2023] ata5.00: Enabling discard_zeroes_data
备注
我感觉这个 Intel 545s Series SSDs
的firmware可能存在问题,参考 Latest Firmware For Solidigm™ (Formerly Intel®) Solid State Drives 可以看到这款 Intel 545s Series SSDs
最新的firmware 是 004C
(针对512GB) 和 0B3C
(针对1TB) 。我准备做一次firmware升级来尝试修复这个reset问题。
我想通过存储的 S.M.A.R.T. 技术来检测和监视磁盘的异常:
本文的
smartctl
命令行检查(基础能力)Node Exporter smartctl 文本插件 通过自己部署的 Prometheus + Grafana 监控来直观观察
安装 smartmontools
¶
在 Ubuntu Linux 环境使用 APT包管理 安装:
sudo apt install smartmontools
SMART info¶
检查磁盘设备是否支持和激活SMART:
sudo smartctl -i /dev/sda
我的 SanDisk CloudSpeed Eco Gen. II SATA SSD企业级固态硬盘 SMART 信息如下:
=== START OF INFORMATION SECTION ===
Model Family: Sandisk SATA Cloudspeed Max and GEN2 ESS SSDs
Device Model: SDLF1CRR-019T-1HA1
Serial Number: A007C9D9
LU WWN Device Id: 5 001173 100a88424
Firmware Version: ZR11RPA1
User Capacity: 1,920,383,410,176 bytes [1.92 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Aug 23 11:43:03 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family: Intel 545s Series SSDs
Device Model: INTEL SSDSC2KW512G8
Serial Number: BTLA7513037S512DGN
LU WWN Device Id: 5 5cd2e4 14eea7536
Firmware Version: LHF002C
User Capacity: 512,110,190,592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Aug 23 11:42:31 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART test¶
SMART提供 两种 不同的测试:
Background Mode(后台模式): 后台测试的优先级低,也就是说硬盘仍然会处理常规指令。如果硬盘繁忙,则测试会暂停并且以低负载速度进行,这样不会中断硬盘工作
Foreground Mode(前台模式): 测试采用了
CHECK CONDITION
状态必须响应,这种模式只能在不使用的硬盘上进行。
根据经验, 建议采用后台模式
ATA/SCSI(共有的)测试¶
Short Test¶
短测试
的目的是快速识别有缺陷的硬盘驱动器。因此,短测试的最大持续实践大约2分钟。该测试将磁盘氛围3个不同阶段来检查:
Electrical Properties (电气特性): 控制器测试自己的的电子电路,由于这个测试是每个制造商特有的,因此无法确切解释正在测试的内容。例如测试内部RAM,读写电路或磁头电子器件
Mechanical Properties (机械特性): 测试伺服系统和定位机构的确切顺序也因每个制造商而异
Read/Verify (读取/验证): 读取磁盘的某个区域并验证某些数据,读取的区域的大小和位置也是每个制造商特定的
Long Test¶
长测试
被设计成生产中的最终测试,与短测试相同,但有 2点区别 :
长测试没有时间限制
长测试会 Read/Verify (读取/验证) 整个磁盘而不仅仅是一小部分
ATA特有的测试¶
运输测试(Conveyance Tests)¶
运输测试(Conveyance Test)可以在短短几分钟内确定硬盘在运输过程中的损坏情况
选择测试(Select Tests)¶
选择测试可以指定LBA范围,即只扫描指定的LBA区域:
sudo smartctl -t select,10-20 /dev/sdc #LBA 10 to LBA 20 (incl.)
sudo smartctl -t select,10+11 /dev/sdc #LBA 10 to LBA 20 (incl.)
而且可以指定多个范围(最多5个)进行扫描:
sudo smartctl -t select,0-10 -t select,5-15 -t select,10-20 /dev/sdc
使用 smartctl
测试¶
检查存储设备SMART能力¶
在测试前,可以预估一下不同测试所需时间:
sudo smartctl -c /dev/sda
可以看到 /dev/sda
( SanDisk CloudSpeed Eco Gen. II SATA SSD企业级固态硬盘 )预估测试时间:
=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20160) seconds.
Offline data collection
capabilities: (0x5d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 1) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
我的另一个磁盘 /dev/sdb
( Intel 545s系列 ):
=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
测试¶
/dev/sda
¶
执行测试(long test):
sudo smartctl -t long -C /dev/sda
长测试输出信息
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-78-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in captive mode".
Drive command "Execute SMART Extended self-test routine immediately in captive mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Wed Aug 23 15:05:27 2023 CST
可以看到这个 SanDisk CloudSpeed Eco Gen. II SATA SSD企业级固态硬盘 仅需要1分钟就能完成长测试 ( 搞笑? 这个长测试和短测试的时间是一样的,不会是虚假吧 )
查看测试结果(
-a
参数 ):
sudo smartctl -a /dev/sda
=== START OF INFORMATION SECTION ===
Model Family: Sandisk SATA Cloudspeed Max and GEN2 ESS SSDs
Device Model: SDLF1CRR-019T-1HA1
Serial Number: A007C9D9
LU WWN Device Id: 5 001173 100a88424
Firmware Version: ZR11RPA1
User Capacity: 1,920,383,410,176 bytes [1.92 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Aug 23 15:13:40 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20160) seconds.
Offline data collection
capabilities: (0x5d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 1) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 046 046 000 Old_age Always - 47763 (2 27 0)
13 Lifetime_UECC_Ct 0x0012 100 100 001 Old_age Always - 0
32 Lifetime_Write_AmpFctr 0x0002 100 100 000 Old_age Always - 0
33 Write_AmpFctr 0x0002 100 100 000 Old_age Always - 100
170 Reserve_Erase_BlkCt 0x0032 100 100 000 Old_age Always - 18218
171 Program_Fail_Ct 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Ct 0x0032 100 100 000 Old_age Always - 0
175 Lifetime_Die_Failure_Ct 0x0032 100 100 000 Old_age Always - 0
178 SSD_LifeLeft(0.01%) 0x0012 100 100 000 Old_age Always - 9126
183 LT_Link_Rate_DwnGrd_Ct 0x0032 100 100 000 Old_age Always - 0
191 Clean_Shutdown_Ct 0x0032 100 100 000 Old_age Always - 46
192 Unclean_Shutdown_Ct 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 068 059 030 Old_age Always - 32 (Min/Max 19/41)
196 Lifetime_Retried_Blk_Ct 0x001b 100 100 010 Pre-fail Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
211 Read_Disturb_ReallocEvt 0x0032 100 100 000 Old_age Always - 0
233 Lifetime_Nand_Writes 0x0032 100 100 000 Old_age Always - 1347968
235 Capacitor_Health 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 806144
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 923840
244 Therm_Throt_Activation 0x0032 100 100 000 Old_age Always - 0
245 Drive_Life_Remaining% 0x0012 092 092 002 Old_age Always - 92
253 SPI_Test_Remaining 0x0012 100 100 001 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended captive Completed without error 00% 47763 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
这里可以看到 SSD_LifeLeft(0.01%)
表示以 万分比 0.01%
为单位得到的数值是 9126
,折算为百分比就是 91.26%
,所以在 Drive_Life_Remaining%
的数值就是 92
/dev/sdb
¶
执行测试(long test):
sudo smartctl -t long -C /dev/sdb
长测试输出信息
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-78-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in captive mode".
Drive command "Execute SMART Extended self-test routine immediately in captive mode" successful.
Testing has begun.
Please wait 30 minutes for test to complete.
Test will complete after Wed Aug 23 16:10:44 2023 CST
Intel SSD的长测试 似乎是真测试 需要花费30分钟完成
查看测试结果(
-a
参数 ):
sudo smartctl -a /dev/sdb
=== START OF INFORMATION SECTION ===
Model Family: Intel 545s Series SSDs
Device Model: INTEL SSDSC2KW512G8
Serial Number: BTLA7513037S512DGN
LU WWN Device Id: 5 5cd2e4 14eea7536
Firmware Version: LHF002C
User Capacity: 512,110,190,592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Aug 23 22:41:29 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 41) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 24193
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 160
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0033 079 079 005 Pre-fail Always - 1413069406491
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 36
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0032 027 044 000 Old_age Always - 27 (Min/Max 13/44)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 36
199 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1787678
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 0
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 0
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 0
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 079 079 000 Old_age Always - 0
236 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1787678
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 47693
249 NAND_Writes_1GiB 0x0032 100 100 000 Old_age Always - 168517
252 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 329
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended captive Interrupted (host reset) 90% 24191 -
# 2 Extended captive Interrupted (host reset) 90% 24191 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
比较奇怪,这个 Intel SSD 的SMART测试看不到健康度(剩余寿命 ID #245
),而且测试状态没有完成 Interrupted (host reset)
。我连做两次测试都是这样(见高亮部分)
我想了一下,是不是因为这个 /dev/sdb
正在使用(挂载为系统盘),所以 Foreground Test
会被磁盘读写操作中断?
改为
Background Mode
long tests
测试( 去掉-C
参数 ):
sudo smartctl -t long /dev/sdb
此时会看到立即返回终端提示(不像 -C
参数需要等待卡住一会):
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 30 minutes for test to complete.
Test will complete after Wed Aug 23 23:15:21 2023 CST
Use smartctl -X to abort test.
可以看到测试时间依然是30分钟,不过提示是 off-line mode
(之前 -C
参数显示 captive mode
)
果然,采用
offline mode
方式扫描,就能够正常完成测试,输出结果如下:
=== START OF INFORMATION SECTION ===
Model Family: Intel 545s Series SSDs
Device Model: INTEL SSDSC2KW512G8
Serial Number: BTLA7513037S512DGN
LU WWN Device Id: 5 5cd2e4 14eea7536
Firmware Version: LHF002C
User Capacity: 512,110,190,592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Aug 24 00:33:08 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 24193
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 160
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0033 079 079 005 Pre-fail Always - 1413069406491
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 36
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0032 028 045 000 Old_age Always - 28 (Min/Max 13/45)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 36
199 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1787849
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 0
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 0
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 0
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 079 079 000 Old_age Always - 0
236 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1787849
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 47693
249 NAND_Writes_1GiB 0x0032 100 100 000 Old_age Always - 168542
252 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 329
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 24193 -
# 2 Extended captive Interrupted (host reset) 90% 24191 -
# 3 Extended captive Interrupted (host reset) 90% 24191 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
这里看到 LifeTime(hours)
值是 24193
这个值就是 Power_On_Hours
值,也就是磁盘加电时长
很奇怪,为何Intel SSD无法查看 Drive_Life_Remaining%
?
搜索了一下,看来Intel有自己的诊断工具 How to Perform Quick/Full Diagnostic of Intel® SSDs Using Intel® Memory and Storage Tool (Intel® MAS) GUI (这个是Intel Optane SSDs / Memory 设备检测工具)
详细请参考 ` Support for Intel® Memory and Storage Tool <https://www.intel.com/content/www/us/en/support/products/202249/memory-and-storage/ssd-management-tools/intel-memory-and-storage-tool.html>`_