固态硬盘主要分为SATA和NVME两种协议。针对SATA协议的固态硬盘,我们可以使用smartctl -a /dev/sdx 进行查看;针对NVME协议的固态硬盘,我们除了可以使用smartctl之外,还可以使用nvme smart-log /dev/nvme0n1,当然需要在linux下面安装nvme-cli软件。

SATA协议示例

root@jacky-office:/home/jacky# smartctl -a /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.45-amd64-desktop] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Silicon Motion based SSDs
Device Model:     TS480GSSD220S
Serial Number:    C990271114
Firmware Version: P0330AA
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed May 15 10:35:23 2024 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x71) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (   2) minutes.
Conveyance self-test routine
recommended polling time:      (   1) minutes.
SCT capabilities:            (0x0035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       6753
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       1494
160 Uncorrectable_Error_Cnt 0x0000   100   100   000    Old_age   Offline      -       0
161 Valid_Spare_Block_Cnt   0x0000   100   100   000    Old_age   Offline      -       50
163 Initial_Bad_Block_Count 0x0000   100   100   000    Old_age   Offline      -       500
164 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       84538
165 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       106
166 Min_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       21
167 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       65
168 Max_Erase_Count_of_Spec 0x0000   100   100   000    Old_age   Offline      -       1000
169 Remaining_Lifetime_Perc 0x0000   100   100   001    Old_age   Offline      -       100
175 Program_Fail_Count_Chip 0x0000   100   100   000    Old_age   Offline      -       0
176 Erase_Fail_Count_Chip   0x0000   100   100   000    Old_age   Offline      -       0
177 Wear_Leveling_Count     0x0000   100   100   050    Old_age   Offline      -       76
178 Runtime_Invalid_Blk_Cnt 0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0
192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       193
194 Temperature_Celsius     0x0000   100   100   070    Old_age   Offline      -       34 (42 43 41 41 0)
195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -       0
196 Reallocated_Event_Count 0x0000   100   100   016    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0000   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0000   100   100   000    Old_age   Offline      -       0
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
241 Host_Writes_32MiB       0x0000   100   100   000    Old_age   Offline      -       1037797
242 Host_Reads_32MiB        0x0000   100   100   000    Old_age   Offline      -       1053254
245 TLC_Writes_32MiB        0x0000   100   100   000    Old_age   Offline      -       998605

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 2

ATA Error Count: 0
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error -1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 ec 00 00 00 00 00  Device Fault

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 00 00 00 00 00      00:00:00.000  READ DMA

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Offline             Completed without error       00%        97         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
    6        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

分析

The table provided shows various vendor-specific S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) attributes for a storage device, likely an SSD, used to monitor and report on the health and performance of the drive.

Here's a brief interpretation of some key attributes and their current status:

| Attribute Name                | Current Value | Raw Value | Interpretation                                                   |
|-------------------------------|---------------|-----------|-------------------------------------------------------------------|
| **Raw_Read_Error_Rate**       | 100           | 0         | No read errors have been detected.                                |
| **Reallocated_Sector_Ct**     | 100           | 0         | No sectors have been reallocated, indicating healthy NAND cells.  |
| **Power_On_Hours**            | 100           | 6753      | The drive has been powered on for 6753 hours.                     |
| **Power_Cycle_Count**         | 100           | 1494      | The drive has been power cycled 1494 times.                       |
| **Uncorrectable_Error_Cnt**   | 100           | 0         | No uncorrectable errors have occurred.                            |
| **Valid_Spare_Block_Cnt**     | 100           | 50        | 50 spare blocks are available.                                    |
| **Initial_Bad_Block_Count**   | 100           | 500       | 500 bad blocks were present initially.                            |
| **Total_Erase_Count**         | 100           | 84538     | Total erase cycles performed.                                     |
| **Max_Erase_Count**           | 100           | 106       | Maximum erase count of any block is 106.                          |
| **Average_Erase_Count**       | 100           | 65        | Average erase count is 65.                                        |
| **Remaining_Lifetime_Perc**   | 100           | 100       | 100% of the drive’s lifetime remains.                             |
| **Temperature_Celsius**       | 100           | 34        | Current operating temperature is 34°C.                             |
| **Power-Off_Retract_Count**   | 100           | 193       | The drive has been retracted 193 times during power off.          |
| **Available_Reservd_Space**   | 100           | 100       | Reserved space available remains at 100%.                         |
| **Host_Writes_32MiB**         | 100           | 1037797   | Host writes amounting to approximately 31.9 TB (1037797 × 32 MiB).|
| **Host_Reads_32MiB**          | 100           | 1053254   | Host reads amounting to approximately 32.3 TB (1053254 × 32 MiB). |
| **TLC_Writes_32MiB**          | 100           | 998605    | TLC writes amounting to approximately 30.3 TB (998605 × 32 MiB).  |

**Additional Notes:**
1. The drive is in good health, with no significant issues flagged by the attributes.
2. The `Remaining_Lifetime_Perc` being at 100% suggests that the drive still has its full rated lifetime remaining.
3. All critical error counts such as `Reallocated_Sector_Ct`, `Uncorrectable_Error_Cnt`, and `Erase_Fail_Count_Total` are at zero.

Given the values, this SSD appears to be in excellent condition with no current indicators of imminent failure. Regular monitoring should continue to ensure ongoing reliability.

NVME协议示例

root@jacky-office:/home/jacky# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.45-amd64-desktop] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Fanxiang S690 2TB
Serial Number:                      FX2310096093
Firmware Version:                   SN07443
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 2310096093
Local Time is:                      Wed May 15 10:42:31 2024 CST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     120 Celsius
Critical Comp. Temp. Threshold:     130 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.7460W       -        -    3  3  3  3     5000   10000
 4 -   0.7260W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    1%
Data Units Read:                    47,421,007 [24.2 TB]
Data Units Written:                 57,307,699 [29.3 TB]
Host Read Commands:                 933,166,322
Host Write Commands:                1,280,285,379
Controller Busy Time:               1,551
Power Cycles:                       152
Power On Hours:                     6,840
Unsafe Shutdowns:                   109
Media and Data Integrity Errors:    0
Error Information Log Entries:      1,147
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               44 Celsius
Temperature Sensor 2:               52 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       1147     0  0x2014  0x4004      -            0     0     -
  1       1146     0  0x1016  0x4004  0x028            0     0     -

主要内容分析

The SMART data you've provided is for an NVMe SSD, and it overall indicates that the drive is in good health. Let's break down the key aspects:

### Overall Health:
- **SMART overall-health self-assessment test result:** PASSED
  - The drive is passing its internal health checks, suggesting it is functioning well.

### Critical Factors:
- **Critical Warning:** 0x00
  - No critical warnings are present.

### Temperature:
- **Temperature:** 44 Celsius
  - This is within a normal operating range.
- **Temperature Sensor 1:** 44 Celsius
- **Temperature Sensor 2:** 52 Celsius
  - Both temperatures are within acceptable ranges for an NVMe SSD.

### Usage and Life Expectancy:
- **Available Spare:** 100%
- **Available Spare Threshold:** 1%
  - The spare space available for reallocating bad blocks is at 100%, so no spare blocks have been used.
- **Percentage Used:** 1%
  - Only 1% of the drive's estimated lifespan has been used, indicating significant remaining life.

### Data Transfer:
- **Data Units Read:** 47,421,007 (24.2 TB)
- **Data Units Written:** 57,307,699 (29.3 TB)
  - The drive has handled substantial amounts of data, but not excessively high amounts for many modern SSDs.

### Operational Metrics:
- **Host Read Commands:** 933,166,322
- **Host Write Commands:** 1,280,285,379
- **Controller Busy Time:** 1,551
  - The drive's controller has been busy for a cumulative period of 1,551 units (seconds or minutes, typically seconds for NVMe SSDs).

### Power and Shutdowns:
- **Power Cycles:** 152
- **Power On Hours:** 6,840
  - This equates to approximately 285 days of continuous operation.
- **Unsafe Shutdowns:** 109
  - There have been 109 instances where the drive has been powered off in an unsafe manner, which can be a point of concern if it continues.

### Errors:
- **Media and Data Integrity Errors:** 0
- **Error Information Log Entries:** 1,147
  - While the drive has logged 1,147 errors, without additional context, it's hard to gauge severity. These errors could range from minor recoverable errors to something more significant, but the absence of media and data integrity errors indicates none of these errors have resulted in data loss or corruption.

### Specific Error Details:
- **Error Information (NVMe Log 0x01, max 64 entries):**
  - Example entries show status `0x4004`, which typically indicates a general NVMe error condition.
  - The details provided do not indicate LBAs (Logical Block Addresses) or specific namespaces of failures, suggesting errors are more operational or command-specific rather than indicating physical address failures.

### Conclusion:
Your NVMe SSD is in good health overall, with no critical warnings. The drive has a long remaining lifespan, given the 1% usage. The temperature is within normal operating ranges, but it's important to monitor if the unsafe shutdown counts continue to increase, as repeated unsafe shutdowns can cause data corruption or other issues over time. The logged errors should be monitored, though without media and integrity errors, they are possibly non-critical. Regular backups are always a good practice to safeguard against any future failures.
最后修改:2024 年 05 月 15 日
如果觉得我的文章对你有用,请随意赞赏