Login Logout

Howto SMART

Documentation : https://www.smartmontools.org/wiki/TocDoc

SMART (Self-Monitoring, Analysis and Reporting Technology) est intégrée à la plupart des disques durs pour avoir des indicateurs de diagnostic. Sous Linux/Unix, Smartmontools est l’outil pour exploiter la technologie SMART, notamment avec la commande smartctl et le démon smartd.

Installation

# apt install smartmontools

$ /usr/sbin/smartctl -V
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build)
[...]
smartmontools release 6.6 dated 2016-05-07 at 11:17:46 UTC
smartmontools SVN rev 4324 dated 2016-05-31 at 20:45:50
smartmontools build host: x86_64-pc-linux-gnu
smartmontools build with: C++98, GCC 5.4.0 20160609
[...]

# systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
   Loaded: loaded (/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
     Docs: man:smartd(8)
           man:smartd.conf(5)

Utilisation basique

Quelques exemples de commande de base :

# smartctl --scan
# smartctl -a /dev/sda
# smartctl -a /dev/sda | grep Power_On_Hours
# smartctl -a /dev/sda | grep Power_Cycle_Count
# smartctl -a /dev/sda -d megaraid,0
# smartctl -i /dev/sg0

smartctl

On peut s’assurer que toutes les fonctionnalités SMART sont activées sur un disque via :

# smartctl -s on -o on -S on /dev/sda

Lister les disques

Sur une machine avec un seul disque :

# smartctl --scan

/dev/sda -d scsi # /dev/sda, SCSI device

Sur une machine avec du RAID hardware :

# smartctl --scan

/dev/hdd -d ata # /dev/hdd, ATA device
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device

Voir les informations d’un disque

L’option -i permet d’afficher les informations sur un disque :

# smartctl -i /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Laptop Thin HDD
Device Model:     ST500LM021-1KJ152
Serial Number:    XXXXXXXX
LU WWN Device Id: 5 000c50 09cbac333
Firmware Version: 0005SDM1
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 28 16:19:49 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

L’option -l error permet d’afficher les éventuelles erreurs d’un disque :

# smartctl -l error /dev/sda

=== START OF SMART DATA SECTION ===
Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        120     0  0x0008  0x4004      -            0     0     -
  1        119     0  0x0018  0x4004  0x02c            0     0     -
  2        118     0  0x0017  0x4004  0x02c            0     0     -
  3        117     0  0x0008  0x4004      -            0     0     -
  4        116     0  0x0018  0x4004  0x02c            0     0     -
  5        115     0  0x0017  0x4004  0x02c            0     0     -
  6        114     0  0x0008  0x4004      -            0     0     -
  7        113     0  0x0018  0x4004  0x02c            0     0     -
  8        112     0  0x0017  0x4004  0x02c            0     0     -
  9        111     0  0x0008  0x4004      -            0     0     -
 10        110     0  0x0008  0x4004      -            0     0     -
 11        109     0  0x0008  0x4004  0x02c            0     0     -
 12        108     0  0x0008  0x4004  0x02c            0     0     -
 13        107     0  0x0018  0x4004  0x02c            0     0     -
 14        106     0  0x0017  0x4004  0x02c            0     0     -
 15        105     0  0x0008  0x4004  0x02c            0     0     -
... (48 entries not shown)

L’option -a permet d’afficher toutes les informations SMART :

# smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZVLW256HEHP-000L7
Serial Number:                      XXXXXXXX
Firmware Version:                   4L7QCXB7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 256 060 514 304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256 060 514 304 [256 GB]
Namespace 1 Utilization:            208 604 237 824 [208 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Dec  4 00:16:33 2017 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL *Other*
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Warning  Comp. Temp. Threshold:     69 Celsius
Critical Comp. Temp. Threshold:     72 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.60W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     5.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1500
 4 -   0.0050W       -        -    4  4  4  4     2200    6000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  23) The self-test routine was aborted by
                                        the host.
Total time to complete Offline 
data collection:                (    1) seconds.
Offline data collection
capabilities:                    (0x75) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       49872
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       15
170 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0030   100   100   000    Old_age   Offline      -       0
184 End-to-End_Error        0x0032   100   100   090    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       13
199 UDMA_CRC_Error_Count    0x0030   100   100   000    Old_age   Offline      -       5
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       575610
226 Load-in_Time            0x0032   100   100   000    Old_age   Always       -       18829
227 Torq-amp_Count          0x0032   100   100   000    Old_age   Always       -       0
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2992332
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   082   082   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       575610
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       581199

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed without error       10%     49872         -
# 2  Extended offline    Completed without error       00%     49872         -
# 3  Reserved (0x20)     Completed without error       00%     49872         -
# 4  Reserved (0x20)     Completed without error       10%        14         -
# 5  Reserved (0x20)     Completed without error       10%         4         -
# 6  Reserved (0x20)     Completed without error       10%         4         -
# 7  Vendor (0x58)       Completed without error       10%         4         -

Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Si votre disque n’est pas un disque physique mais un volume d’un RAID matériel, il faut préciser le type et le numéro du disque physique voulu :

# smartctl -i /dev/sda -d megaraid,0

=== START OF INFORMATION SECTION ===
Device Model:     SSDSC2BB480G7R
Serial Number:    XXXXXXXXXXXXXXXXXX
LU WWN Device Id: 5 5cd2e4 14d52d0aa
Add. Product Id:  DELL(tm)
Firmware Version: N201DL41
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 28 16:27:57 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Dans certains cas, le contrôleur RAID dispose d’une possibilité de voir le disque au travers d’un module SCSI générique.

# modprobe sg

# smartctl -i /dev/sg0

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 3.5" MG03ACAxxx(Y) Enterprise HDD
Device Model:     TOSHIBA MG03ACA100
Serial Number:    XXXXX
LU WWN Device Id: 5 000039 4eb981078
Add. Product Id:  DELL(tm)
Firmware Version: FL1D
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Dec  1 11:57:19 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Tester un disque

On peut lancer un test rapide d’un disque :

# smartctl -t short /dev/sda

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Thu Dec  7 02:51:10 2017

On peut visualiser les résultats du test avec :

# smartctl -l selftest /dev/sda

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     49872         -
# 2  Reserved (0x20)     Completed without error       00%     49872         -
# 3  Reserved (0x20)     Completed without error       10%        14         -
# 4  Reserved (0x20)     Completed without error       10%         4         -
# 5  Reserved (0x20)     Completed without error       10%         4         -
# 6  Vendor (0x58)       Completed without error       10%         4         -

On peut aussi lancer un test long :

# smartctl -t long /dev/sda

Si l’on veut interrompre le test en cours :

# smartctl -X /dev/sda

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!

smartd

On active smartd en listant les périphériques concernés via /etc/default/smartmontools :

enable_smart="/dev/sda /dev/sdb"
start_smartd=yes
smartd_opts="--interval=1800"

Puis on peut personnaliser l’adresse email de réception des alertes via /etc/smartd.conf :

DEVICESCAN -d removable -n standby -m monitoring@example.com -M exec /usr/share/smartmontools/smartd-runner

FAQ

Voir https://www.smartmontools.org/wiki/FAQ

Device does not support SMART

Certains disques ne supportent pas SMART. Exemple :

# smartctl -a /dev/sda

Device: ATA      Maxtor 7Y250M0   Version: YAR5
Serial number: XXXXXX
Device type: disk
Local Time is: Thu Dec  7 01:59:43 2017 CET
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging