On Wed, Feb 28, 2018 at 8:03 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > A huge +1 from me for the idea. I can't even count the number of black box > "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has > turned out to be down to the user doing something very silly and not saying > anything about it.
+1 from me, too.
My favourite remains an organisation that kept "fixing" an issue by kill -9'ing the postmaster and removing postmaster.pid to make it start up again. Without killing all the leftover backends. Of course, the system kept getting more unstable and broken, so they did it more and more often. They were working on scripting it when they gave up and asked for help.
The data recovery effort on that one was truly exciting. I remember looking at bash history and having to take a short break to figure out how on earth to communicate what was going on.