> > Now if these vendors could somehow eliminate downtime due to human error
> > we'd be talking *serious* reliablity.
>
> You mean making the OS smart enough to know when clearing the arp
> cache is a bonehead operation, or just making the hardware smart
> enough to realise that the keyswitch really shouldn't be turned
> while 40 people are logged in? (Either way, I agree this'd be an
> improvement. It'd sure make colocation a lot less painful.)
Well I was joking really, but those are two very good examples! Yes, machines should require extra confirmation for
operationslike
those. Hell, even a simple 'init 0' would be well served by a prompt that says "There are currently 400 network
socketsopen, 50
remote users logged in, and 25 disk IOs per second. What's more, there's nobody logged in at the console to boot me up
again
afterwards - are you _sure_ you want to shut the machine down?". It's also crazy that there's no prompt after an 'rm
-rf'(we could
have 'rm -rf --iacceptfullresponsibility' for an unprompted version).
Stuff like that would have saved me from a few embarrassments in the past for sure ;-)
It drives me absolutely nuts every time I see a $staggeringly_expensive clustered server whose sysadmins are scared to
doa failover
test in case something goes wrong! Or which has worse uptime than my desktop PC because the cluster software's poorly
setup or
administered. Or which has both machines on the same circuit breaker. I could go on but it's depressing me.
Favourite anecdote: A project manager friend of mine had a new 'lights out' datacenter to set up. The engineers,
adminsand
operators swore blind that everything had been tested in every possible way, and that incredible uptime was guaranteed.
'So if I
just pull this disk out everything will keep working?' he asked, and then pulled the disk out without waiting for an
answer...
Ever since he told me that story I've done exactly that with every piece of so-called 'redundant' hardware a vendor
triesto flog
me. Ask them to set it up, then just do nasty things to it without asking for permission. Less than half the gear
makesit through
that filter, and actually you can almost tell from the look on the technical sales rep's face as you reach for the
drive/cable/card/whatever whether it will or won't.
M