* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> But today I thought of another way: suppose that we teach the postmaster
> to commit hara-kiri if the $PGDATA directory goes away. Since the
> buildfarm script definitely does remove all the temporary data directories
> it creates, this ought to get the job done.
Yes, please.
> An easy way to do that would be to have it check every so often if
> pg_control can still be read. We should not have it fail on ENFILE or
> EMFILE, since that would create a new failure hazard under heavy load,
> but ENOENT or similar would be reasonable grounds for deciding that
> something is horribly broken. (At least on Windows, failing on EPERM
> doesn't seem wise either, since we've seen antivirus products randomly
> causing such errors.)
Sounds pretty reasonable to me.
> I wouldn't want to do this every time through the postmaster's main loop,
> but we could do this once an hour for no added cost by adding the check
> where it does TouchSocketLockFiles; or once every few minutes if we
> carried a separate variable like last_touch_time. Once an hour would be
> plenty to fix the buildfarm's problem, I should think.
I have a bad (?) habit of doing exactly this during development and
would really like it to be a bit more often than once/hour, unless
there's a particular problem with that.
> Another question is what exactly "commit hara-kiri" should consist of.
> We could just abort() or _exit(1) and leave it to child processes to
> notice that the postmaster is gone, or we could make an effort to clean
> up. I'd be a bit inclined to treat it like a SIGQUIT situation, ie
> kill all the children and exit. The children are probably having
> problems of their own if the data directory's gone, so forcing
> termination might be best to keep them from getting stuck.
I like the idea of killing all the children and then exiting.
> Also, perhaps we'd only enable this behavior in --enable-cassert builds,
> to avoid any risk of a postmaster incorrectly choosing to suicide in a
> production scenario. Or maybe that's overly conservative.
That would work for my use-case. Perhaps only on --enable-cassert
builds for back-branches but enable it in master and see how things go
for 9.6? I agree that it feels overly conservative, but given our
recent history, we should be overly cautious with the back branches.
> Thoughts?
Thanks!
Stephen