Re: [GENERAL] postmaster deadlock while logging after sysloggerexited - Mailing list pgsql-general
From | Andres Freund |
---|---|
Subject | Re: [GENERAL] postmaster deadlock while logging after sysloggerexited |
Date | |
Msg-id | 20171117025438.vjvgmrrnkpnojleq@alap3.anarazel.de Whole thread Raw |
In response to | Re: [GENERAL] postmaster deadlock while logging after syslogger exited (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [GENERAL] postmaster deadlock while logging after syslogger exited
|
List | pgsql-general |
On 2017-11-16 21:39:49 -0500, Tom Lane wrote: > > We could work around a situation like that if we made postmaster use a > > *different* pipe as stderr than the one we're handing to normal > > backends. If postmaster created a new pipe and closed the read end > > whenever forking a syslogger, we should get EPIPEs when writing after > > syslogger died and could fall back to proper stderr or such. > > I think that's nonsense, unfortunately. Nice phrasing. > If the postmaster had its own pipe, that would reduce the risk of this > deadlock because only the postmaster would be filling that pipe, not > the postmaster and all its other children --- but it wouldn't > eliminate the risk. The deadlock happens because postmaster is waiting for syslogger accept a message, and syslogger waits for postmaster to restart it. To resolve the deadlock postmasterneeds to not wait for a dead sylogger, even if it hasn't yet received & processed the SIGCLD - what other postmaster children do or don't do doesn't matter for resolving that cycle. The reason postmaster currently block on writing to the pipe, instead of getting EPIPE, is because both ends of the pipe are still existing. Which in turn is the case because we need to be able to restart syslogger without passing a new file descriptor to all subprocesses. If postmaster instead uses a different pipe to write to it'll not block anymore, instead getting EPIPE, and can continue towards starting a new syslogger. So I don't think the described deadlock exists if we were to apply my proposed fix. What this obviously would not *not* guarantee is being able start a new syslogger, but it seems fairly impossible to guarantee that. So sure, other processes would still block until syslogger has successfully restarted - but it's a resolvable situation rather than a hard deadlock, which the described situation appears to be. Note that there's plenty of cases where you could run into this even without being unable to fork new processes. You'd e.g. could also run into this while logging the exit of some other subprocess or such, there's enough ereports in postmaster. > I doubt the increase in reliability would be enough to justify the > extra complexity and cost. I'm doubtful about that too. > What might be worth thinking about is allowing the syslogger process to > inherit the postmaster's OOM-kill-proofness setting, instead of dropping > down to the same vulnerability as the postmaster's other child processes. > That presumes that this was an otherwise-unjustified OOM kill, which > I'm not quite sure of ... but it does seem like a situation that could > arise from time to time. Hm. I'm a bit scared about that - it doesn't seem that inconceivable that various backends log humongous multi-line messages, leading to syslogger *actually* taking up a fair amount of memory. Note that we're using plain stringinfos that ereport(ERROR) out of memory situations, rather than failing more gracefully. - Andres -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
pgsql-general by date: