For my day job I build storage systems. A lot of what I do at present involves caring a lot about how different OSes deal with things like new LUNs being presented from a SCSI target, or errors along a subset of the available paths to a device.

It will come as no surprise to you to discover that they all suck (for values of all equal to Linux, Solaris, Windows and VMWare). New LUNs are particularly annoying. I’m in the situation that creation and removal of a LUN is exceptionally easy.

Hmmm. Maybe I need to back up here a bit first. SCSI has the concept of a target (think, device, eg hard drive). Each target can present multiple logical units. Each of these is assigned a number - a Logical Unit Number. Most devices (a hard drive, or a CDROM drive) will present a single LUN. A storage array will tend to present multiple LUNs; one for each volume that is exported to the host. At the host level each LUN really just looks like a separate device (for Linux /dev/sda and /dev/sdb may well be separate LUNs on the same array, rather than 2 separate arrays/hard drives, for example. At the block device level you don’t care about the difference usually).

Anyway. For various reasons I end up adding and removing LUNs quite often. And there are ways for the array to indicate that this has happened to the host (the UNIT ATTENTION/REPORT LUNS DATA CHANGED check condition seems to be favoured these days, as a complete Fibre Channel LIP can be disruptive). What I’d like to happen in that case is the host to pick up the check condition and drop and/or add the devices that have changed. Instead everything wants a manual rescan. rescan-scsi-bus tends to be simplest for Linux. Windows wants a manual refresh in Disk Administrator. VMWare a “Rescan HBAs” from vSphere. Solaris a “devfsadm -C” and possibly a “cfgadm -al” first. And all of these can be temperamental about picking up the changes.

We’ve done a lot about hotplug for the desktop user experience, without doing the same level for the server experience. I appreciate that there are situations that you don’t want your server to reconfigure things without being told to, but the current situation can be detrimental (for example Linux multipathing will hold a device open even after it’s disappeared and is returning an “INVALID LUN” response; it would be much better if it could cleanly close that device and wait for it to return). Storage is capable of being much more than just a single block device these days, and it’s a shame that nothing seems to deal fully with that fact.

(Yes, yes, I should write and submit patches, but I appreciate that there’s not always a simple answer, nor necessarily an answer that works for all situations automatically. Plus, y’know, not enough hours in the day and I hope you all appreciate I’ve taken a break from watching BSG to write this.)