Replacing broken disk in SVM RAID5
It’s ineluctable that one day metacheck script, I strongly encourage you to use it if you don’t, will report a metadevice problem. In nine cases out of ten the root cause will be a failed disk.
# metastat d90 d90: RAID State: Needs Maintenance Invoke: metareplace d90 c0t13d0s2Interlace: 512 blocks Size: 335421567 blocks (159 GB) Original device: Size: 335424000 blocks (159 GB) Device Start Block Dbase State Reloc Hot Spare c0t11d0s2 8019 No Okay Yes c0t12d0s2 7874 No Okay Yes c0t13d0s2 10218 No Maintenance Yes c0t14d0s2 10218 No Okay Yes c0t15d0s2 8019 No Okay Yes c0t10d0s2 8019 No Okay Yes
Not a big deal to fix it correctly without unmounting a file system. Just plug a new disk into empty slot and do metareplace. But what if all slots are occupied? Well, theoretically it shouldn’t be a problem either:
- Do cfgadm -c unconfigure c#t#d# to unconfigure the disk.
- Replace it with a new one and use cfgadm -c configure c#t#d# to make visible to system.
- Don’t forget about metadevadm -u c#t#d#s#
- Finally, type metareplace to initiate re-syncing.
But in my case these steps just didn’t work and I understand why – my broken disk was still a part of SVM. Actually, it was clearly explained to me in the following error message:
# cfgadm -c unconfigure c0::dsk/c0t13d0 cfgadm: Component system is busy, try again: failed to offline: Resource Information ------------------ --------------------------------------- /dev/dsk/c0t13d0s2 component of RAID "/dev/md/dsk/d90" /dev/md/dsk/d90 mounted filesystem "/usr/fsrv/archive"
Since I couldn’t offline the filesystem and cause a downtime my last resort was to skip cfgadm steps and brusquely remove a disk from its slot. Said and done. Thankfully, it worked smoothly and once a new disk found its shelter inside a server the rest was trivial…
# metareplace -e d90 c0t13d0s2 # metastat d90 d90: RAID State: Resyncing Resync in progress: 0.0% done Interlace: 512 blocks Size: 335421567 blocks (159 GB) Original device: Size: 335424000 blocks (159 GB) Device Start Block Dbase State Reloc Hot Spare c0t11d0s2 8019 No Okay Yes c0t12d0s2 7874 No Okay Yes c0t13d0s2 10218 No Resyncing Yes c0t14d0s2 10218 No Okay Yes c0t15d0s2 8019 No Okay Yes c0t10d0s2 8019 No Okay Yes