Dirty Cache – Dell, Equallogic Storage Array

I hope you never encounter such an issue, but if you found yourself searching for a method to get online you’re in luck.

Symptoms:

  • Equallogic Storage Array  no longer responds to pings
  • iSCSI attached volumes have all gone offline
  • Unable to access the Equallogic Storage Array using SAN HQ
  • Unable to access the Equallogic Storage Array via its web interface

By this time you may have been alerted to the fact and are aware that your Equallogic Storage Array is offline.

If by now you have consoled in with the serial cable you will see the following message type: Logger daemon is losing messages because offline disks are generating more events than the daemon can handle.

Actions to Take:

  • Connect Serial Interface Cable
  • Have your grpadmin password ready
  • Have putty or terminal emulator of choice ready for use

Now that you are ready  connect to the system via the serial interface on one of the controllers.

Log into the san using the  grpadmin account –

You will see the following message:

Login to account grpadmin succeeded, using local authentication. User privilege is group-admin.

It appears that the storage array has not been configured.
Would you like to configure the array now ? (y/n) [n] | choose n

The following message will be displayed:

Please run setup before executing management commands
It appears that the storage array has not been configured. Please run setup before executing management commands

We are not doing this as this will destroy your data

 

Now that we have logged into the Equallogic Storage Array we need to drop into the BASH command shell.

To do this we type:  su ex sh

You will see the following message:

You are running a support command, which is normally restricted to PS Series Technical Support personnel. Do not use a support command without instruction from Technical Support.

Run the following command: raidtool
In my cases the following message displayed:

Driver Status: *Admin Intervention Requested*

Next we drop into the ecli by typing: ecli
Now in the ecli we want to type:  hs – the following message may be displayed to you:

Health Status (0x0000000800000000): RED Conditions:
RAID_LOST_CACHE_CONDITION

* what we have just confirmed is the raid cache is corrupted

We want to  quit to exit to the CLI>  prompt

And issue the following command:  clearlostdata

This will display the following:

The clearlostdata command will gather information about the
state of this array for support and troubleshooting purposes.
No user information will be included in this data.

E-mail notification is not available, so you must retrieve the results
by using the “text capture” feature of your terminal emulator
or Telnet program.

You will be given information to help you do this at the end of this procedure.

Finally, please remember to include your Dell Technical Support case or incident number in the subject line of any e-mail that you send to Dell Support. This will help ensure that the message is routed correctly.

Do you wish to proceed with data collection? (y/n) [y]: select y

Next you will see:

Starting data collection on …

Section 1 of 1: ..
Finished in 2 seconds

You also have the option to capture the output by using the “text capture” feature of your Telnet or terminal emulator program.
Do you wish to do this (y/n) [n]: y

The configuration data will now be sent to the console. Please enable text capture in your terminal emulator or Telnet program, and submit the resulting file with your problem report.

Please press the Enter key when you are ready to proceed.

When completed your system will come online once again.

I can’t stress this more.  Get your data off that system now.

In my case we replaced both controllers and the issue still happens. Be on the safe side and evacuate your data “NOW”

Other Tech Info:

Model:  70-0011
Family:  PS100
Chassis:  1403
Disks:  SATA HDD
Firmware:  V5.2.4 (R255063)