固态硬盘主要分为SATA和NVME两种协议。针对SATA协议的固态硬盘,我们可以使用smartctl -a /dev/sdx 进行查看;针对NVME协议的固态硬盘,我们除了可以使用smartctl之外,还可以使用nvme smart-log /dev/nvme0n1,当然需要在linux下面安装nvme-cli软件。
SATA协议示例
root@jacky-office:/home/jacky# smartctl -a /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.45-amd64-desktop] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Silicon Motion based SSDs
Device Model: TS480GSSD220S
Serial Number: C990271114
Firmware Version: P0330AA
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed May 15 10:35:23 2024 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 6753
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 1494
160 Uncorrectable_Error_Cnt 0x0000 100 100 000 Old_age Offline - 0
161 Valid_Spare_Block_Cnt 0x0000 100 100 000 Old_age Offline - 50
163 Initial_Bad_Block_Count 0x0000 100 100 000 Old_age Offline - 500
164 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 84538
165 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 106
166 Min_Erase_Count 0x0000 100 100 000 Old_age Offline - 21
167 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 65
168 Max_Erase_Count_of_Spec 0x0000 100 100 000 Old_age Offline - 1000
169 Remaining_Lifetime_Perc 0x0000 100 100 001 Old_age Offline - 100
175 Program_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
176 Erase_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
177 Wear_Leveling_Count 0x0000 100 100 050 Old_age Offline - 76
178 Runtime_Invalid_Blk_Cnt 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age Offline - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age Offline - 193
194 Temperature_Celsius 0x0000 100 100 070 Old_age Offline - 34 (42 43 41 41 0)
195 Hardware_ECC_Recovered 0x0000 100 100 000 Old_age Offline - 0
196 Reallocated_Event_Count 0x0000 100 100 016 Old_age Offline - 0
198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0000 100 100 000 Old_age Offline - 0
232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 100
241 Host_Writes_32MiB 0x0000 100 100 000 Old_age Offline - 1037797
242 Host_Reads_32MiB 0x0000 100 100 000 Old_age Offline - 1053254
245 TLC_Writes_32MiB 0x0000 100 100 000 Old_age Offline - 998605
SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 2
ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error -1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 ec 00 00 00 00 00 Device Fault
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 00 00 00 00 00 00 00:00:00.000 READ DMA
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Offline Completed without error 00% 97 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
6 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
分析
The table provided shows various vendor-specific S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) attributes for a storage device, likely an SSD, used to monitor and report on the health and performance of the drive.
Here's a brief interpretation of some key attributes and their current status:
| Attribute Name | Current Value | Raw Value | Interpretation |
|-------------------------------|---------------|-----------|-------------------------------------------------------------------|
| **Raw_Read_Error_Rate** | 100 | 0 | No read errors have been detected. |
| **Reallocated_Sector_Ct** | 100 | 0 | No sectors have been reallocated, indicating healthy NAND cells. |
| **Power_On_Hours** | 100 | 6753 | The drive has been powered on for 6753 hours. |
| **Power_Cycle_Count** | 100 | 1494 | The drive has been power cycled 1494 times. |
| **Uncorrectable_Error_Cnt** | 100 | 0 | No uncorrectable errors have occurred. |
| **Valid_Spare_Block_Cnt** | 100 | 50 | 50 spare blocks are available. |
| **Initial_Bad_Block_Count** | 100 | 500 | 500 bad blocks were present initially. |
| **Total_Erase_Count** | 100 | 84538 | Total erase cycles performed. |
| **Max_Erase_Count** | 100 | 106 | Maximum erase count of any block is 106. |
| **Average_Erase_Count** | 100 | 65 | Average erase count is 65. |
| **Remaining_Lifetime_Perc** | 100 | 100 | 100% of the drive’s lifetime remains. |
| **Temperature_Celsius** | 100 | 34 | Current operating temperature is 34°C. |
| **Power-Off_Retract_Count** | 100 | 193 | The drive has been retracted 193 times during power off. |
| **Available_Reservd_Space** | 100 | 100 | Reserved space available remains at 100%. |
| **Host_Writes_32MiB** | 100 | 1037797 | Host writes amounting to approximately 31.9 TB (1037797 × 32 MiB).|
| **Host_Reads_32MiB** | 100 | 1053254 | Host reads amounting to approximately 32.3 TB (1053254 × 32 MiB). |
| **TLC_Writes_32MiB** | 100 | 998605 | TLC writes amounting to approximately 30.3 TB (998605 × 32 MiB). |
**Additional Notes:**
1. The drive is in good health, with no significant issues flagged by the attributes.
2. The `Remaining_Lifetime_Perc` being at 100% suggests that the drive still has its full rated lifetime remaining.
3. All critical error counts such as `Reallocated_Sector_Ct`, `Uncorrectable_Error_Cnt`, and `Erase_Fail_Count_Total` are at zero.
Given the values, this SSD appears to be in excellent condition with no current indicators of imminent failure. Regular monitoring should continue to ensure ongoing reliability.
NVME协议示例
root@jacky-office:/home/jacky# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.45-amd64-desktop] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Fanxiang S690 2TB
Serial Number: FX2310096093
Firmware Version: SN07443
PCI Vendor/Subsystem ID: 0x1e4b
IEEE OUI Identifier: 0x000000
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 0
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000000 2310096093
Local Time is: Wed May 15 10:42:31 2024 CST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 120 Celsius
Critical Comp. Temp. Threshold: 130 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.50W - - 0 0 0 0 0 0
1 + 5.80W - - 1 1 1 1 0 0
2 + 3.60W - - 2 2 2 2 0 0
3 - 0.7460W - - 3 3 3 3 5000 10000
4 - 0.7260W - - 4 4 4 4 8000 45000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 44 Celsius
Available Spare: 100%
Available Spare Threshold: 1%
Percentage Used: 1%
Data Units Read: 47,421,007 [24.2 TB]
Data Units Written: 57,307,699 [29.3 TB]
Host Read Commands: 933,166,322
Host Write Commands: 1,280,285,379
Controller Busy Time: 1,551
Power Cycles: 152
Power On Hours: 6,840
Unsafe Shutdowns: 109
Media and Data Integrity Errors: 0
Error Information Log Entries: 1,147
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 44 Celsius
Temperature Sensor 2: 52 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1147 0 0x2014 0x4004 - 0 0 -
1 1146 0 0x1016 0x4004 0x028 0 0 -
主要内容分析
The SMART data you've provided is for an NVMe SSD, and it overall indicates that the drive is in good health. Let's break down the key aspects:
### Overall Health:
- **SMART overall-health self-assessment test result:** PASSED
- The drive is passing its internal health checks, suggesting it is functioning well.
### Critical Factors:
- **Critical Warning:** 0x00
- No critical warnings are present.
### Temperature:
- **Temperature:** 44 Celsius
- This is within a normal operating range.
- **Temperature Sensor 1:** 44 Celsius
- **Temperature Sensor 2:** 52 Celsius
- Both temperatures are within acceptable ranges for an NVMe SSD.
### Usage and Life Expectancy:
- **Available Spare:** 100%
- **Available Spare Threshold:** 1%
- The spare space available for reallocating bad blocks is at 100%, so no spare blocks have been used.
- **Percentage Used:** 1%
- Only 1% of the drive's estimated lifespan has been used, indicating significant remaining life.
### Data Transfer:
- **Data Units Read:** 47,421,007 (24.2 TB)
- **Data Units Written:** 57,307,699 (29.3 TB)
- The drive has handled substantial amounts of data, but not excessively high amounts for many modern SSDs.
### Operational Metrics:
- **Host Read Commands:** 933,166,322
- **Host Write Commands:** 1,280,285,379
- **Controller Busy Time:** 1,551
- The drive's controller has been busy for a cumulative period of 1,551 units (seconds or minutes, typically seconds for NVMe SSDs).
### Power and Shutdowns:
- **Power Cycles:** 152
- **Power On Hours:** 6,840
- This equates to approximately 285 days of continuous operation.
- **Unsafe Shutdowns:** 109
- There have been 109 instances where the drive has been powered off in an unsafe manner, which can be a point of concern if it continues.
### Errors:
- **Media and Data Integrity Errors:** 0
- **Error Information Log Entries:** 1,147
- While the drive has logged 1,147 errors, without additional context, it's hard to gauge severity. These errors could range from minor recoverable errors to something more significant, but the absence of media and data integrity errors indicates none of these errors have resulted in data loss or corruption.
### Specific Error Details:
- **Error Information (NVMe Log 0x01, max 64 entries):**
- Example entries show status `0x4004`, which typically indicates a general NVMe error condition.
- The details provided do not indicate LBAs (Logical Block Addresses) or specific namespaces of failures, suggesting errors are more operational or command-specific rather than indicating physical address failures.
### Conclusion:
Your NVMe SSD is in good health overall, with no critical warnings. The drive has a long remaining lifespan, given the 1% usage. The temperature is within normal operating ranges, but it's important to monitor if the unsafe shutdown counts continue to increase, as repeated unsafe shutdowns can cause data corruption or other issues over time. The logged errors should be monitored, though without media and integrity errors, they are possibly non-critical. Regular backups are always a good practice to safeguard against any future failures.