Tuesday, January 15, 2019

Replacing Corrupt Hard Drive for FreeNAS

An ATA hard drive can fail without being detected by the SMART system in time. At a FreeNAS 11 system I observe when this happens, the drive disappears from the zpool disk list while FreeNAS reports an alert typically like the following,


The volume MyZpool state is DEGRADED: One or more devices has been taken offline 
by the administrator. Sufficient replicas exist for the pool to continue 
functioning in a degraded state.

Since the disk drive disappears from the zpool disk list, the method to replace the disk drive in the FreeNAS guide won't work -- after you replaced the corrupted hard disk drive, if you bring the disk drive online, the disk will be "UNAVAIL"; and if you attempt to replace it via the FreeNAS's Web interface, there will be no disk for you to choose from. If you run "zpool status", you will observe a long sequence of digits instead of the partition as the following example:

$ zpool status
  pool: MyZpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q

config:

        NAME                                            STATE     READ WRITE CKSUM
        MyZpool                                         DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/11111111-2222-3333-4444-555555555555  ONLINE       0     0     0
            gptid/22222222-2222-3333-4444-555555555555  ONLINE       0     0     0
            99999999999999999999                        UNAVAIL      0     0     0  was /dev/ada2
            gptid/33333333-2222-3333-4444-555555555555  ONLINE       0     0     0
            gptid/44444444-2222-3333-4444-555555555555  ONLINE       0     0     0
            gptid/55555555-2222-3333-4444-555555555555  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:10:03 with 0 errors on Tue Jan 15 03:55:03 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors


The method to resolve this is via command line. Below is the steps starting from replacing the physical hard drive.

  1. Replace the physical hard drive. For this, shutdown the machine if necessary. 
  2. Partition the hard drive. Following is an example where we assume the disk is /dev/ada2.
    
    # create gpt called ada2
    sudo gpart create -s gpt ada2
    # create a 2G swap partition (it will be ada2p1)
    sudo gpart add -i 1 -b 128 -t freebsd-swap -s 2G ada2
    # create a second partition using the rest of the space (it will be ada2p2)
    sudo gpart add -i 2 -t freebsd-zfs ada2
    # replace disk labeled "99999999999999999999" by ada2p2. See the error message example above. 
    sudo zpool replace MyZpool 99999999999999999999 ada2p2
    

Once the above is completed, FreeNAS will immediately start resilvering the zpool.

2 comments:

  1. Thanks a lot! This has saved me quite a few times already!

    ReplyDelete
  2. Thanks for sharing, this looks exactly like the problem I'm facing now. I have just one question, while doing all these steps, should I/other users stop using it(or say will the users be affected? I have some people constantly reading and writing to the server)

    ReplyDelete