How to live with disabled disks

Recently, I was working with a customer’s Sun StorEdgeTek 2540 array which mysteriously had all twelve disks in disabled state.

Details
Name: t85d01
ID: Tray.85.Drive.1
Array:
Array Type: 2540
Tray: 85
Slot Number: 1
Role: Unassigned
Virtual Disk: –
State: Disabled
Status: Optimal
Capacity: 279.396 GB
Type: SAS
Speed (RPM): 15000 RPM
Firmware: SA04
Serial number: 000834C6K8PC JHV6K8PC
Disk World Wide Name: 50:00:CC:A0:0D:0B:EC:60

The funniest thing was that in SupportData the information was opposed. So if you find yourself in the same situation don’t waste your time on googling, resetting controllers and array’s configuration – it won’t help. All you’ll see is the following error:

“The operation cannot be completed because either there is a problem communicating with one or more of the disks in the storage array, or no disks are currently connected. Correct the problem, then retry the operation.”

The only possible salvation from this was opening a support case in Sun. After that, the engineer just wiped out all data by doing sysReset and sysReboot from VxWorks, it’s OS that drives this disk array. To get into it, you will have to know a super-duper password, which Sun keeps in secret.

What was the root cause? My take is that during an initial configuration someone just had done a real bad thing to this array, e.g. abruptly terminated firmware upgrade or tried to downgrade it from CAM, which corrupted drive configuration databases (DACstore). So just be careful and prudent or double check that your backups are up to date.

P.S. This also applies to 6140/6540 arrays.

Posted on November 25, 2009 at 11:39 am by sergeyt · Permalink · Leave a comment
In: Sun

Brendan Gregg talks about DTrace

Another fantastic presentation from Brendan and just like all the previous ones it’s a must seen video

Posted on November 20, 2009 at 9:59 pm by sergeyt · Permalink · Leave a comment
In: Solaris

The end of “Fair Play”

After Theirry’s handball no trust left neither in pompous gentry in FIFA nor in their already fscked slogan. It’s such a shame…

Posted on November 20, 2009 at 1:48 pm by sergeyt · Permalink · Leave a comment
In: Life

Plugging HP-UX into SAN

Our task for today is to connect HP-UX (11.31 release) to MSA2312fc through SAN with two Brocade switches in between. First, we need to find out all FC cards installed in our server and once we have that piece of information we could dig for WWN numbers to map them latter to the disk array.

Lets start from the beginning, ioscan is our best friend in revealing inner life of our server:

bash-4.0# ioscan -funC fc
Class     I  H/W Path     Driver S/W State   H/W Type     Description
===================================================================
fc        0  0/3/0/0/0/0  fcd   CLAIMED     INTERFACE    HP 4Gb Dual Port PCIe Fibre Channel Mezzanine (FC Port 1)
                         /dev/fcd0
fc        1  0/3/0/0/0/1  fcd   CLAIMED     INTERFACE    HP 4Gb Dual Port PCIe Fibre Channel Mezzanine (FC Port 2)
                         /dev/fcd1

The output above says everything we need to know. So now, I’m going to use fcmsutil to display WWN information from our cards:

bash-4.0# fcmsutil /dev/fcd0

                           Vendor ID is = 0x1077
                           Device ID is = 0x2432
            PCI Sub-system Vendor ID is = 0x103C
                   PCI Sub-system ID is = 0x1705
                               PCI Mode = PCI Express x4
                       ISP Code version = 4.2.2
                       ISP Chip version = 3
                               Topology = PTTOPT_FABRIC
                             Link Speed = 4Gb
                     Local N_Port_id is = 0x020b01
                  Previous N_Port_id is = None
            N_Port Node World Wide Name = 0x5001438004c2e159
            N_Port Port World Wide Name = 0x5001438004c2e158
            Switch Port World Wide Name = 0x200b00051e868762
            Switch Node World Wide Name = 0x100000051e868762
            N_Port Symbolic Port Name = oamdwh1_fcd0
            N_Port Symbolic Node Name = oamdwh1_HP-UX_B.11.31
                           Driver state = ONLINE
                       Hardware Path is = 0/3/0/0/0/0
                     Maximum Frame Size = 2048
         Driver-Firmware Dump Available = NO
         Driver-Firmware Dump Timestamp = N/A
                         Driver Version = @(#) fcd B.11.31.0803 Jan 20 2008

bash-4.0#
bash-4.0# fcmsutil /dev/fcd1

                           Vendor ID is = 0x1077
                           Device ID is = 0x2432
            PCI Sub-system Vendor ID is = 0x103C
                   PCI Sub-system ID is = 0x1705
                               PCI Mode = PCI Express x4
                       ISP Code version = 4.2.2
                       ISP Chip version = 3
                               Topology = PTTOPT_FABRIC
                             Link Speed = 4Gb
                     Local N_Port_id is = 0x010b01
                  Previous N_Port_id is = None
            N_Port Node World Wide Name = 0x5001438004c2e15b
          N_Port Port World Wide Name = 0x5001438004c2e15a
            Switch Port World Wide Name = 0x200b00051e855e8b
            Switch Node World Wide Name = 0x100000051e855e8b
            N_Port Symbolic Port Name = oamdwh1_fcd1
            N_Port Symbolic Node Name = oamdwh1_HP-UX_B.11.31
                           Driver state = ONLINE
                       Hardware Path is = 0/3/0/0/0/1
                     Maximum Frame Size = 2048
         Driver-Firmware Dump Available = NO
         Driver-Firmware Dump Timestamp = N/A
                         Driver Version = @(#) fcd B.11.31.0803 Jan 20 2008

Using this information we could proceed with zone configuration on FC switches.

alicreate "hpux_fcd1", "50:01:43:80:04:c2:e1:5a"
alicreate "MSA2312_A1", "20:70:00:c0:ff:d8:bb:a4"
zonecreate "hpux_msa2312", "hpux_fcd1;MSA2312_A1"
cfgadd "HPUX_cfg" "hpux_msa2312"

For brevity I omitted the steps required to configure the second switch since they are almost the same. The only difference is the aliases’, zones’ names and WWN numbers.

Don’t forget to save and enable newly created configuration on the switch:

cfgsave HPUX_cfg
cfgenable HPUX_cfg

All that we have to do next, apart from creating and mapping LUNs on our storage, is to tell the system to scan for new disks and to create special files for them:

# insf -C disk

To double check that new disks have been successfully added do the following:

# ioscan -m dsf
Persistent DSF           Legacy DSF(s)
========================================
/dev/rdisk/disk1         /dev/rdsk/c0t0d0
/dev/rdisk/disk1_p1      /dev/rdsk/c0t0d0s1
/dev/rdisk/disk1_p2      /dev/rdsk/c0t0d0s2
/dev/rdisk/disk1_p3      /dev/rdsk/c0t0d0s3
/dev/pt/pt4              /dev/rscsi/c3t0d0
/dev/pt/pt5              /dev/rscsi/c2t0d0
/dev/rdisk/disk6         /dev/rdsk/c4t0d1
                         /dev/rdsk/c5t0d1

/dev/rdisk/disk6 is our new lovely friend. To confirm that, use scsimgr command:

# scsimgr inquiry -D /dev/rdisk/disk6

           INQUIRY INFORMATION FOR LUN: /dev/rdisk/disk6

                Peripheral Device Type: 0 (Direct Access)
                  Peripheral Qualifier: 0 (Peripheral Device Connected)
                       Removable Media: No
                          ANSI Version: 5 (Complies to SPC-3)
                    Normal ACA Support: No
                  Hierarchical Support: 0x1 (Hierarchical addressing model used)
                  Response Data Format: 2 (SPC-3)
                     Additional Length: 155
                           SCC Support: 0 (No Embedded Storage Array Controller)
           Access Controls Coordinator: No
             Target Port Group Support: 0x1 (Implicit Asymmetric Access Support)
                      Third-Party Copy: No
                               Protect: 0 (Protection Information NOT Supported)
                         Basic Queuing: 0
                       Command Queuing: 0x1 (Full Task Management Model (SAM-3))
                    Enclosure Services: No
             Multi-port Device Support: Yes
                Medium Changer Support: No
   Supports 16-bit wide SCSI addresses: No
          Support for 16 Bit Transfers: No
            Synchronous Data Transfers: No
                Linked Command Support: No
                 Vendor Identification: "HP      "
                Product Identification: "MSA2312fc       "
                Product Revision Level: "M110"
                  Vendor Specific Data: 20 20 20 20 20 20 20 20 "        "
                                        43 41 50 49 20 20 41 41 "CAPI  AA"
                                        66 20 20 20             "f   "
                              Clocking: 0 (Supports only ST)
 Quick Arbitration & Selection support: No
    Information Unit transfers Support: No

Finally, when you decide to create LVM configuration on top of newly added disk don’t use legacy DSF, all those c#t#d#, but use persistent DSF instead, i.e. /dev/rdisk/disk6 and HP-UX will deal with multipathing on its own:

# scsimgr lun_map -D /dev/rdisk/disk6

        LUN PATH INFORMATION FOR LUN : /dev/rdisk/disk6

Total number of LUN paths     = 2
World Wide Identifier(WWID)    = 0x600c0ff000d8d17c1b1bf84a01000000

LUN path : lunpath4
Class                         = lunpath
Instance                      = 4
Hardware path                 = 0/3/0/0/0/0.0x247000c0ffd8bba4.0x4001000000000000
SCSI transport protocol       = fibre_channel
State                         = STANDBY
Last Open or Close state      = STANDBY

LUN path : lunpath5
Class                         = lunpath
Instance                      = 5
Hardware path                 = 0/3/0/0/0/1.0x207000c0ffd8bba4.0x4001000000000000
SCSI transport protocol       = fibre_channel
State                         = ACTIVE
Last Open or Close state      = ACTIVE

Enjoy.

Posted on November 17, 2009 at 5:30 pm by sergeyt · Permalink · 4 Comments
In: HP-UX

Stranger in HP-UX

For the last couple of days I’ve been heavily playing with HP-UX, one of the unknown and never seen operating system. That’s true – I’ve never touch it before. But these days are over and now I’m overfilled with new experience and… mixed fillings. You see, on the one hand it’s UNIX and most of the tools and the concepts have many parts in common with other UNIX based OS. But… On the flip side, there are a lot of stuff that work differently. I’m not talking about things that are HP-UX specific i.e. pseudo-swap, LVM or persistent DSF, etc. but widely used and adopted by the others. For example, ifconfig. You can’t use it to view your networking devices. Or another one – -h option for ls or df commands. No luck. Not mentioning the fact that features that come for free in, lets say Solaris, Linux or *BSD, would require a special license, e.g. mirror/UX if you’re looking for mirroring the disks. Even kmeminfo, ::memstat analog from Solaris mdb, is not publicly available. On no account I’m trying to pin a label but to show that HP-UX is a bit different from other UNIXs, at least myself, have gotten used to.

Anyway, with this post I’m going to create a new category devoted to HP-UX cherishing hopes that one day someone could learn through my pain. I must admit that at times its quite amusing to plunge into something new, because usually it also comes together with mix of pure anger and course wording, especially when something well-known, as I mentioned before, doesn’t work as expected and a childish joy when you’ve finally understood the logic and the idea behind it. And that brings enormous fun into a life. So lets it keep prevailing!

Posted on November 11, 2009 at 11:35 pm by sergeyt · Permalink · Leave a comment
In: HP-UX, Life

Little Shop of Performance Horrors

Watch a fantastic presentation by Brendan Gregg who dwells upon different performance aspects and issues with real life examples. This 2.5 hours discussion has been split into three parts: Part 1, Part 2 and Part 3.

Don’t be frighten off by its length, with Brendan’s energy, charisma and ingenious way of giving speeches you definitely won’t notice that.

Posted on November 7, 2009 at 3:01 pm by sergeyt · Permalink · Leave a comment
In: Solaris

Replacing broken disk in SVM RAID5

It’s ineluctable that one day metacheck script, I strongly encourage you to use it if you don’t, will report a metadevice problem. In nine cases out of ten the root cause will be a failed disk.

# metastat d90
d90: RAID
    State: Needs Maintenance 
    Invoke: metareplace d90 c0t13d0s2 
    Interlace: 512 blocks
    Size: 335421567 blocks (159 GB)
Original device:
    Size: 335424000 blocks (159 GB)
        Device      Start Block  Dbase        State Reloc  Hot Spare
        c0t11d0s2       8019        No         Okay   Yes 
        c0t12d0s2       7874        No         Okay   Yes 
        c0t13d0s2      10218        No  Maintenance   Yes 
        c0t14d0s2      10218        No         Okay   Yes 
        c0t15d0s2       8019        No         Okay   Yes 
        c0t10d0s2       8019        No         Okay   Yes 

Not a big deal to fix it correctly without unmounting a file system. Just plug a new disk into empty slot and do metareplace. But what if all slots are occupied? Well, theoretically it shouldn’t be a problem either:

But in my case these steps just didn’t work and I understand why – my broken disk was still a part of SVM. Actually, it was clearly explained to me in the following error message:

# cfgadm -c unconfigure c0::dsk/c0t13d0
cfgadm: Component system is busy, try again: failed to offline: 
     Resource                     Information              
------------------  ---------------------------------------
/dev/dsk/c0t13d0s2  component of RAID "/dev/md/dsk/d90"    
/dev/md/dsk/d90     mounted filesystem "/usr/fsrv/archive" 

Since I couldn’t offline the filesystem and cause a downtime my last resort was to skip cfgadm steps and brusquely remove a disk from its slot. Said and done. Thankfully, it worked smoothly and once a new disk found its shelter inside a server the rest was trivial…

# metareplace -e d90 c0t13d0s2

# metastat d90
d90: RAID
    State: Resyncing    
    Resync in progress:  0.0% done
    Interlace: 512 blocks
    Size: 335421567 blocks (159 GB)
Original device:
    Size: 335424000 blocks (159 GB)
        Device      Start Block  Dbase        State Reloc  Hot Spare
        c0t11d0s2       8019        No         Okay   Yes 
        c0t12d0s2       7874        No         Okay   Yes 
        c0t13d0s2      10218        No    Resyncing   Yes 
        c0t14d0s2      10218        No         Okay   Yes 
        c0t15d0s2       8019        No         Okay   Yes 
        c0t10d0s2       8019        No         Okay   Yes 
Posted on October 27, 2009 at 2:35 pm by sergeyt · Permalink · Leave a comment
In: Solaris

Studying HDS 9990 and 9985 (THI1570)

Tomorrow I’m leaving to Saint-Pitersburg for 5-days training titled “Installing, Configuring and Maintaining Hitachi Universal Storage Platform™ V and Universal Storage Platform ™ VM”. Once I’m done with it, my CTS profile will be 100% completed. Not bad at all ;-)

Posted on October 18, 2009 at 2:07 pm by sergeyt · Permalink · Leave a comment
In: Life

Back to childhood

Having just returned from a child theater (sorry only in Russian) I can’t wait to express the inner fillings that brimmed me. A play, called “A hedgehog in the fog”, was awesome and brilliantly performed. And to tell the truth, sitting there in a dark with my kid on the laps I felt like I was a little boy myself, like there had never been those mature years at all. And my eyes were almost wet… Maybe I’m too sentimental or maybe it’s because I have a birthday today or maybe both. God knows!

Posted on October 18, 2009 at 1:58 pm by sergeyt · Permalink · Leave a comment
In: Life

Veritas romp

Just came across a really funny piece of code in OpenSolaris.

/*
  * XXX - Don't port this to new architectures
  * A 3rd party volume manager driver (vxdm) depends on the symbol romp.
  * 'romp' has no use with a prom with an IEEE 1275 client interface.
  * The driver doesn't use the value, but it depends on the symbol.
  */
 void *romp;		/* veritas driver won't load without romp 4154976 */
Posted on October 16, 2009 at 9:29 am by sergeyt · Permalink · Leave a comment
In: Solaris, Veritas