Thread: [GENERAL] postmaster deadlock while logging after syslogger exited
Hello,
I ran into what appears to be a deadlock in the logging subsystem. It looks like what happened was that the syslogger process exited because it ran out of memory. But before the postmaster got a chance to handle the SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker. That also failed, and the postmaster went to log a message about it, but it's blocked on the pipe that's normally connected to the syslogger, presumably because the pipe is full because the syslogger is gone and hasn't read from it. The net result is that new connections to PostgreSQL hang, and any query that causes PostgreSQL to log appears to hang, though queries appear to work on existing connections as long as they do not cause the backend to log anything. I don't believe the system would ever get out of this state.
I ran into this with version 9.2.4 on SmartOS (illumos), but it doesn't look to me like any of the relevant code has changed in 9.6 or 10, nor is it platform-specific. Version 10.0 still:
- restarts the syslogger in response to a SIGCHLD signal (in "reaper")
- starts autovacuum workers in response to SIGUSR1 (in "sigusr1_handler")
- mask SIGCHLD at the beginning of the SIGUSR1 handler
- potentially logs from StartAutoVacWorker (and, of course, many other places in the postmaster)
which I think means the problem is still possible. Am I missing something? I couldn't find anything about this issue. Is this just a rarely seen bug?
--------------------
Here's the raw data: my initial observation was that when I ran "psql" to connect to the database, it would hang (for hours, it turned out). My postmaster process was blocked here, writing to its own stderr pipe:
$ mdb core.41349
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> $C
fffffd7fffdf9df0 libc.so.1`__write+0xa()
fffffd7fffdfb220 write_pipe_chunks+0x142()
fffffd7fffdfb280 send_message_to_server_log+0x55b()
fffffd7fffdfb2a0 EmitErrorReport+0xe6()
fffffd7fffdfb390 errfinish+0x1ac()
fffffd7fffdfb3b0 StartAutoVacWorker+0x5c()
fffffd7fffdfb3e0 StartAutovacuumWorker+0x6f()
fffffd7fffdfb410 sigusr1_handler+0x185()
fffffd7fffdfb420 libc.so.1`__sighndlr+6()
fffffd7fffdfb4b0 libc.so.1`call_user_handler+0x1db(10, 0, fffffd7fffdfb520)
fffffd7fffdfb500 libc.so.1`sigacthandler+0x116(10, 0, fffffd7fffdfb520)
fffffd7fffdfb970 libc.so.1`__pollsys+0xa()
fffffd7fffdfba90 libc.so.1`pselect+0x1cb(5, fffffd7fffdfbaf0, 0, 0, fffffd7fffdfbaa0, 0)
fffffd7fffdfbae0 libc.so.1`select+0x5a(5, fffffd7fffdfbaf0, 0, 0, fffffd7fffdfdaf0)
fffffd7fffdffb40 ServerLoop+0xb5()
fffffd7fffdffbe0 PostmasterMain+0xec2()
fffffd7fffdffc00 main+0x23a()
fffffd7fffdffc10 _start+0x6c()
From inspecting kernel state, I found that the only other file descriptor on the system that corresponds to the other side of this pipe is in the postmaster itself, which it keeps open in case it needs to restart the syslogger. Based on the code, I expected the syslogger subprocess to have the pipe open, but that process appears to have exited due to a fatal error (out of memory). (I know it exited because the process still exists in the kernel -- it hasn't been reaped yet -- and I think it ran out of memory based on a log message I found from around the time when the process exited.) Then I expected the postmaster would have restarted it, but I found that it currently has SIGCHLD both pending and masked:
$ pflags 41349
41349: /opt/postgresql/9.2.4/bin/postgres -D /manatee/pg/data
data model = _LP64 flags = ORPHAN|MSACCT|MSFORK
sigpend = 0x00028000,0x00000000,0x00000000
/1: flags = ASLEEP write(0x2,0xfffffd7fffdf9e10,0x65)
sigmask = 0xfebff047,0xfffffff7,0x000003ff
The signal mask may be system-specific, but I've decoded it and confirmed that SIGCHLD is in "sigpend" as well as "sigmask". And that makes sense because the stack above indicates we're in "sigusr1_handler", which masks SIGCHLD when it starts.
If it wasn't clear from the above, the logging configuration looks like this:
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/pg/'
log_filename = 'postgresql.log'
Thanks,
Dave
David Pacheco <dap@joyent.com> writes: > I ran into what appears to be a deadlock in the logging subsystem. It > looks like what happened was that the syslogger process exited because it > ran out of memory. But before the postmaster got a chance to handle the > SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker. > That also failed, and the postmaster went to log a message about it, but > it's blocked on the pipe that's normally connected to the syslogger, > presumably because the pipe is full because the syslogger is gone and > hasn't read from it. Ugh. > ... that process appears to have exited due to a fatal error > (out of memory). (I know it exited because the process still exists in the > kernel -- it hasn't been reaped yet -- and I think it ran out of memory > based on a log message I found from around the time when the process > exited.) Could we see the exact log message(s) involved? It's pretty hard to believe that the logger would have consumed much memory. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Nov 6, 2017 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Pacheco <dap@joyent.com> writes:
> ... that process appears to have exited due to a fatal error
> (out of memory). (I know it exited because the process still exists in the
> kernel -- it hasn't been reaped yet -- and I think it ran out of memory
> based on a log message I found from around the time when the process
> exited.)
Could we see the exact log message(s) involved? It's pretty hard to
believe that the logger would have consumed much memory.
Thanks for the quick reply!
Based on kernel state about the dead but unreaped syslogger process, I believe the process exited at 2017-10-27T23:46:21.258Z. Here are all of the entries in the PostgreSQL log from 23:19:12 until the top of the next hour:
There's no log entry at exactly 23:46:21 or even immediately before that, but there are a lot of "out of memory" errors and a FATAL one at 23:47:28. Unfortunately, we haven't configured logging to include the pid, so I can't be sure which messages came from the syslogger.
There are also many log entries for some very long SQL queries. I'm sure that contributed to this problem by filling up the pipe. I was able to extract the contents of the pipe while the system was hung, and it was more of these giant query strings.
I think it's likely that this database instance was running in a container with way too small a memory cap for the number of processes configured. (This was a zone (a lightweight container) allocated with 2GB of memory and configured with 512MB of shared_buffers and up to 200 connections.) I expect that the system got to a persistent state of having basically no memory available, at which point nearly any attempt to allocate memory could fail. The syslogger itself may not have been using much memory.
So I'm not so much worried about the memory usage itself, but it would be nice if this condition were handled better. Handling out-of-memory is obviously hard, especially when it means being unable to fork, but even crashing would have been better for our use-case. And of course, there are other reasons that the syslogger could exit prematurely besides being low on memory, and those might be more recoverable.
Thanks,
Dave
On Mon, Nov 6, 2017 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Pacheco <dap@joyent.com> writes:
> I ran into what appears to be a deadlock in the logging subsystem. It
> looks like what happened was that the syslogger process exited because it
> ran out of memory. But before the postmaster got a chance to handle the
> SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker.
> That also failed, and the postmaster went to log a message about it, but
> it's blocked on the pipe that's normally connected to the syslogger,
> presumably because the pipe is full because the syslogger is gone and
> hasn't read from it.
Ugh.
Should I file a bug on this issue?
Thanks,
Dave
On 2017-11-06 15:35:03 -0500, Tom Lane wrote: > David Pacheco <dap@joyent.com> writes: > > I ran into what appears to be a deadlock in the logging subsystem. It > > looks like what happened was that the syslogger process exited because it > > ran out of memory. But before the postmaster got a chance to handle the > > SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker. > > That also failed, and the postmaster went to log a message about it, but > > it's blocked on the pipe that's normally connected to the syslogger, > > presumably because the pipe is full because the syslogger is gone and > > hasn't read from it. > > Ugh. I'm somewhat inclined to say that one has to live with this if the system is so resource constrainted that processes barely using memory get killed. We could work around a situation like that if we made postmaster use a *different* pipe as stderr than the one we're handing to normal backends. If postmaster created a new pipe and closed the read end whenever forking a syslogger, we should get EPIPEs when writing after syslogger died and could fall back to proper stderr or such. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Fri, Nov 17, 2017 at 10:50 AM, Andres Freund <andres@anarazel.de> wrote: > On 2017-11-06 15:35:03 -0500, Tom Lane wrote: >> David Pacheco <dap@joyent.com> writes: >> > I ran into what appears to be a deadlock in the logging subsystem. It >> > looks like what happened was that the syslogger process exited because it >> > ran out of memory. But before the postmaster got a chance to handle the >> > SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker. >> > That also failed, and the postmaster went to log a message about it, but >> > it's blocked on the pipe that's normally connected to the syslogger, >> > presumably because the pipe is full because the syslogger is gone and >> > hasn't read from it. >> >> Ugh. > > I'm somewhat inclined to say that one has to live with this if the > system is so resource constrainted that processes barely using memory > get killed. > > We could work around a situation like that if we made postmaster use a > *different* pipe as stderr than the one we're handing to normal > backends. If postmaster created a new pipe and closed the read end > whenever forking a syslogger, we should get EPIPEs when writing after > syslogger died and could fall back to proper stderr or such. I don't have the code on top of my mind, but isn't a custom fd causing a small penalty when redirection_done is switched to true because the first process generating a message to the syslogger pipe needs to open it first if not done yet? So you'd need proper locking to save from race conditions. Or is the first message redirected message always generated by the postmaster or the syslogger? I don't recall that this is actually true.. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 2017-11-17 11:09:56 +0900, Michael Paquier wrote: > On Fri, Nov 17, 2017 at 10:50 AM, Andres Freund <andres@anarazel.de> wrote: > > On 2017-11-06 15:35:03 -0500, Tom Lane wrote: > >> David Pacheco <dap@joyent.com> writes: > >> > I ran into what appears to be a deadlock in the logging subsystem. It > >> > looks like what happened was that the syslogger process exited because it > >> > ran out of memory. But before the postmaster got a chance to handle the > >> > SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker. > >> > That also failed, and the postmaster went to log a message about it, but > >> > it's blocked on the pipe that's normally connected to the syslogger, > >> > presumably because the pipe is full because the syslogger is gone and > >> > hasn't read from it. > >> > >> Ugh. > > > > I'm somewhat inclined to say that one has to live with this if the > > system is so resource constrainted that processes barely using memory > > get killed. > > > > We could work around a situation like that if we made postmaster use a > > *different* pipe as stderr than the one we're handing to normal > > backends. If postmaster created a new pipe and closed the read end > > whenever forking a syslogger, we should get EPIPEs when writing after > > syslogger died and could fall back to proper stderr or such. > > I don't have the code on top of my mind, but isn't a custom fd causing > a small penalty Sure, there's some minor overhead because every process would need to close another fd after forking. > when redirection_done is switched to true because the first process > generating a message to the syslogger pipe needs to open it first if > not done yet? I can't follow. The syslogger pipe is created when the first syslogger is started (before it's forked!). Which happens before other processes are created, because they all need to inherit that file descriptor. > So you'd need proper locking to save from race conditions. I completely fail to see why this'd be the case. All I'm talking about is using another pipe between syslogger and postmaster than between other-processes and syslogger. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Fri, Nov 17, 2017 at 11:14 AM, Andres Freund <andres@anarazel.de> wrote: > On 2017-11-17 11:09:56 +0900, Michael Paquier wrote: >> when redirection_done is switched to true because the first process >> generating a message to the syslogger pipe needs to open it first if >> not done yet? > > I can't follow. The syslogger pipe is created when the first syslogger > is started (before it's forked!). Which happens before other processes > are created, because they all need to inherit that file descriptor. Ah, OK. I didn't recall this dependency. Sorry for the confusion. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Andres Freund <andres@anarazel.de> writes: > On 2017-11-06 15:35:03 -0500, Tom Lane wrote: >> David Pacheco <dap@joyent.com> writes: >>> I ran into what appears to be a deadlock in the logging subsystem. It >>> looks like what happened was that the syslogger process exited because it >>> ran out of memory. But before the postmaster got a chance to handle the >>> SIGCLD to restart it, it handled a SIGUSR1 to start an autovacuum worker. >>> That also failed, and the postmaster went to log a message about it, but >>> it's blocked on the pipe that's normally connected to the syslogger, >>> presumably because the pipe is full because the syslogger is gone and >>> hasn't read from it. >> Ugh. > I'm somewhat inclined to say that one has to live with this if the > system is so resource constrainted that processes barely using memory > get killed. David's report isn't too clear: did the syslogger process actually run out of memory and exit of its own volition after an ENOMEM, or did it get killed by the dreaded OOM killer? In either case, it's unclear whether it was really using an excessive amount of memory. We have not heard reports suggesting a memory leak in the syslogger, but maybe there is one under unusual circumstances? I think you're probably right that the real cause here is the OOM killer just randomly seizing on the syslogger as a victim process; although since the syslogger disconnects from shared memory, it's not very clear why it would score high on the OOM killer's metrics. The whole thing is definitely odd. > We could work around a situation like that if we made postmaster use a > *different* pipe as stderr than the one we're handing to normal > backends. If postmaster created a new pipe and closed the read end > whenever forking a syslogger, we should get EPIPEs when writing after > syslogger died and could fall back to proper stderr or such. I think that's nonsense, unfortunately. If the postmaster had its own pipe, that would reduce the risk of this deadlock because only the postmaster would be filling that pipe, not the postmaster and all its other children --- but it wouldn't eliminate the risk. I doubt the increase in reliability would be enough to justify the extra complexity and cost. What might be worth thinking about is allowing the syslogger process to inherit the postmaster's OOM-kill-proofness setting, instead of dropping down to the same vulnerability as the postmaster's other child processes. That presumes that this was an otherwise-unjustified OOM kill, which I'm not quite sure of ... but it does seem like a situation that could arise from time to time. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 2017-11-16 21:39:49 -0500, Tom Lane wrote: > > We could work around a situation like that if we made postmaster use a > > *different* pipe as stderr than the one we're handing to normal > > backends. If postmaster created a new pipe and closed the read end > > whenever forking a syslogger, we should get EPIPEs when writing after > > syslogger died and could fall back to proper stderr or such. > > I think that's nonsense, unfortunately. Nice phrasing. > If the postmaster had its own pipe, that would reduce the risk of this > deadlock because only the postmaster would be filling that pipe, not > the postmaster and all its other children --- but it wouldn't > eliminate the risk. The deadlock happens because postmaster is waiting for syslogger accept a message, and syslogger waits for postmaster to restart it. To resolve the deadlock postmasterneeds to not wait for a dead sylogger, even if it hasn't yet received & processed the SIGCLD - what other postmaster children do or don't do doesn't matter for resolving that cycle. The reason postmaster currently block on writing to the pipe, instead of getting EPIPE, is because both ends of the pipe are still existing. Which in turn is the case because we need to be able to restart syslogger without passing a new file descriptor to all subprocesses. If postmaster instead uses a different pipe to write to it'll not block anymore, instead getting EPIPE, and can continue towards starting a new syslogger. So I don't think the described deadlock exists if we were to apply my proposed fix. What this obviously would not *not* guarantee is being able start a new syslogger, but it seems fairly impossible to guarantee that. So sure, other processes would still block until syslogger has successfully restarted - but it's a resolvable situation rather than a hard deadlock, which the described situation appears to be. Note that there's plenty of cases where you could run into this even without being unable to fork new processes. You'd e.g. could also run into this while logging the exit of some other subprocess or such, there's enough ereports in postmaster. > I doubt the increase in reliability would be enough to justify the > extra complexity and cost. I'm doubtful about that too. > What might be worth thinking about is allowing the syslogger process to > inherit the postmaster's OOM-kill-proofness setting, instead of dropping > down to the same vulnerability as the postmaster's other child processes. > That presumes that this was an otherwise-unjustified OOM kill, which > I'm not quite sure of ... but it does seem like a situation that could > arise from time to time. Hm. I'm a bit scared about that - it doesn't seem that inconceivable that various backends log humongous multi-line messages, leading to syslogger *actually* taking up a fair amount of memory. Note that we're using plain stringinfos that ereport(ERROR) out of memory situations, rather than failing more gracefully. - Andres -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Andres Freund <andres@anarazel.de> writes: > On 2017-11-16 21:39:49 -0500, Tom Lane wrote: >> What might be worth thinking about is allowing the syslogger process to >> inherit the postmaster's OOM-kill-proofness setting, instead of dropping >> down to the same vulnerability as the postmaster's other child processes. > Hm. I'm a bit scared about that - it doesn't seem that inconceivable > that various backends log humongous multi-line messages, leading to > syslogger *actually* taking up a fair amount of memory. Note that we're > using plain stringinfos that ereport(ERROR) out of memory situations, > rather than failing more gracefully. True, but there's no hard limits on the postmaster's memory consumption either ... and if the syslogger does get killed on such a basis, we have at the least lost a bunch of log output. On the whole I think we'd be better off trying to prevent OOM kills on the syslogger. (That doesn't preclude other mitigation measures.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On November 16, 2017 7:06:23 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@anarazel.de> writes: >> On 2017-11-16 21:39:49 -0500, Tom Lane wrote: >>> What might be worth thinking about is allowing the syslogger process >to >>> inherit the postmaster's OOM-kill-proofness setting, instead of >dropping >>> down to the same vulnerability as the postmaster's other child >processes. > >> Hm. I'm a bit scared about that - it doesn't seem that inconceivable >> that various backends log humongous multi-line messages, leading to >> syslogger *actually* taking up a fair amount of memory. Note that >we're >> using plain stringinfos that ereport(ERROR) out of memory situations, >> rather than failing more gracefully. > >True, but there's no hard limits on the postmaster's memory consumption >either ... Is there a credible scenario where it'd allocate many gigabytes of memory? > and if the syslogger does get killed on such a basis, we >have at the least lost a bunch of log output. On the whole I think we'd be >better off trying to prevent OOM kills on the syslogger. (That doesn't >preclude other mitigation measures.) It doesn't seem impossible to get into a situation where syslogger is the source of the OOM. Just enabling a lot of loggingin a workload with many large query strings might do it. So making it less likely to be killed might make the problemworse... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Andres Freund <andres@anarazel.de> writes: > It doesn't seem impossible to get into a situation where syslogger is > the source of the OOM. Just enabling a lot of logging in a workload with > many large query strings might do it. So making it less likely to be > killed might make the problem worse... Hm, so that's another angle David didn't report on: is it possible that his workload could have resulted in a very large volume of incomplete in-progress log messages? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Responding to several points below.
Tom Lane wrote:
> David's report isn't too clear: did the syslogger process actually run
> out of memory and exit of its own volition after an ENOMEM, or did it get
> killed by the dreaded OOM killer? In either case, it's unclear whether
> it was really using an excessive amount of memory. We have not heard
> reports suggesting a memory leak in the syslogger, but maybe there is
> one under unusual circumstances?
I'm running on illumos, where there is no OOM killer.
I expect what happened is that the syslogger process attempted to allocate
memory, failed because the system was low, and explicitly exited. That's
consistent with an exited process, no core file generated, and the "FATAL" "out
of memory" entries in the log I posted.
Of course, that doesn't mean the syslogger was using much memory. It means only
that it attempted to allocate any memory in a context where it could not handle
failing.
Andres Freund wrote:
> I'm somewhat inclined to say that one has to live with this if the
> system is so resource constrainted that processes barely using memory
> get killed.
Again, there's no OOM killer here, so the process has the opportunity to respond
differently. I can understand the database not functioning well when the system
is out of memory, but this half-functioning, half-deadlocked state is quite
disruptive. By comparison, while not ideal, if the syslogger had raised SIGABRT
instead, that would have been significantly better for my use-case. Our process
manager would have terminated all of the processes associated with the database
and restarted it.
Tom Lane wrote:
> Hm, so that's another angle David didn't report on: is it possible that
> his workload could have resulted in a very large volume of incomplete
> in-progress log messages?
Yes. I mentioned in my Nov 6 mail that large log messages over a short period
appear to have been a major contributing factor:
> Here are all of the entries in the PostgreSQL log from 23:19:12 until the top of
> the next hour:
...
> There are also many log entries for some very long SQL queries. I'm sure that
> contributed to this problem by filling up the pipe. I was able to extract the
> contents of the pipe while the system was hung, and it was more of these giant
> query strings.
Andres Freund wrote:
> Note that there's plenty of cases where you could run into this even
> without being unable to fork new processes. You'd e.g. could also run
> into this while logging the exit of some other subprocess or such,
> there's enough ereports in postmaster.
This is a critical point. As far as I can tell, all that's necessary for this
deadlock to occur is:
- the syslogger is unable to make forward progress (e.g., because it segfaulted)
- enough other processes write to the pipe that it fills up before the
postmaster can restart the syslogger
- the postmaster attempts to write to the log as part of any of its signal
handling operations
It seems to me that the same thing can happen if there were a bug in the
syslogger that resulted in a segfault around the same time that the postmaster
attempted to start an autovacuum worker, for example (if there was also
reasonably high log traffic).
Thanks,
Dave
Hi, On 2017-11-20 10:13:57 -0800, David Pacheco wrote: > I expect what happened is that the syslogger process attempted to allocate > memory, failed because the system was low, and explicitly exited. That's > consistent with an exited process, no core file generated, and the "FATAL" > "out > of memory" entries in the log I posted. Well unfortunately we're presumably not going to see output from syslogger's death itself, chicken and egg etc. > Tom Lane wrote: > > Hm, so that's another angle David didn't report on: is it possible that > > his workload could have resulted in a very large volume of incomplete > > in-progress log messages? > > Yes. I mentioned in my Nov 6 mail that large log messages over a > short period appear to have been a major contributing factor: So it's actually quite possible that syslogger did use a fair amount of memory. Could you show its memory usage, just via ps? > This is a critical point. As far as I can tell, all that's necessary > for this deadlock to occur is: > > - the syslogger is unable to make forward progress (e.g., because it > segfaulted) This specific case I don't care that much about, but ... > - enough other processes write to the pipe that it fills up before the > postmaster can restart the syslogger > - the postmaster attempts to write to the log as part of any of its signal > handling operations > > It seems to me that the same thing can happen if there were a bug in the > syslogger that resulted in a segfault around the same time that the > postmaster > attempted to start an autovacuum worker, for example (if there was also > reasonably high log traffic). Greetings, Andres Freund
Thanks again for looking at this issue.
On Mon, Nov 20, 2017 at 10:34 AM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-11-20 10:13:57 -0800, David Pacheco wrote:
> I expect what happened is that the syslogger process attempted to allocate
> memory, failed because the system was low, and explicitly exited. That's
> consistent with an exited process, no core file generated, and the "FATAL"
> "out
> of memory" entries in the log I posted.
Well unfortunately we're presumably not going to see output from
syslogger's death itself, chicken and egg etc.
If that's true, I think that's a separate bug. If a process receives ENOMEM from an attempt to allocate memory, it can still write out a log message to the already-open file descriptor indicating that it's exiting because of that failure. For all I know, that's what happened here, as there are FATAL "out of memory" entries in the log. (I'd be able to tell for sure if we had configured logging to include pids in the log messages.)
> Tom Lane wrote:
> > Hm, so that's another angle David didn't report on: is it possible that
> > his workload could have resulted in a very large volume of incomplete
> > in-progress log messages?
>
> Yes. I mentioned in my Nov 6 mail that large log messages over a
> short period appear to have been a major contributing factor:
So it's actually quite possible that syslogger did use a fair amount of
memory. Could you show its memory usage, just via ps?
I don't have information from the problematic case because the process had already exited. Right now, for the same database (actually a different instance, because this incident triggered a failover):
$ ps -opid,rss,vsz,args -p 37627
PID RSS VSZ COMMAND
37627 2980 14968 /opt/postgresql/9.2.4/bin/postgres -D /manatee/pg/data
I'm not sure what we can infer from that, as this is a different system, and the workload that generates the very large query strings only runs occasionally. Those strings are also not logged unless something's gone wrong.
> This is a critical point. As far as I can tell, all that's necessary
> for this deadlock to occur is:
>
> - the syslogger is unable to make forward progress (e.g., because it
> segfaulted)
This specific case I don't care that much about, but ...
I understand if the community isn't interested in fixing this case if other users aren't seeing it much, but surely it's still a bug that this unusual case can result in a deadlock?
-- Dave
Hi, On 2017-11-20 11:12:08 -0800, David Pacheco wrote: > > > This is a critical point. As far as I can tell, all that's necessary > > > for this deadlock to occur is: > > > > > > - the syslogger is unable to make forward progress (e.g., because it > > > segfaulted) > > > > This specific case I don't care that much about, but ... > I understand if the community isn't interested in fixing this case if other > users aren't seeing it much, but surely it's still a bug that this unusual > case can result in a deadlock? I care about the deadlock issue if triggered by out of memory cases. That's something that can be expected during operation. Being fully resistant against segfaults in arbitrary places however is not unlikely to introduce more bugs than cases where it saves the day. Greetings, Andres Freund
On Mon, Nov 20, 2017 at 11:12 AM, David Pacheco wrote:
> I understand if the community isn't interested in fixing this case if
> other users aren't seeing it much, but surely it's still a bug that this
> unusual case can result in a deadlock?
>
I've filed bug 14945 to cover this issue:
https://www.postgresql.org/message-id/20171204201055.27108.18283%40wrigleys.postgresql.org
While the underlying cause in this case was likely operator error (i.e.,
too little memory provisioned), the deadlock seems clearly like an issue
that could be hit by others and requires operator intervention to restore
service. I understand it might still be sufficiently rare, and the fix
sufficiently complex, that the community doesn't end up prioritizing it.
Thanks,
Dave
Hi, On 2017-11-20 11:12:08 -0800, David Pacheco wrote: > $ ps -opid,rss,vsz,args -p 37627 > PID RSS VSZ COMMAND > 37627 2980 14968 /opt/postgresql/9.2.4/bin/postgres -D /manatee/pg/data > > I'm not sure what we can infer from that, as this is a different system, > and the workload that generates the very large query strings only runs > occasionally. Those strings are also not logged unless something's gone > wrong. FWIW, I'd like to see a report of this around the time the issue occurred before doing anything further here. Greetings, Andres Freund
On Mon, Dec 4, 2017 at 12:23 PM, Andres Freund wrote:
> Hi,
>
> On 2017-11-20 11:12:08 -0800, David Pacheco wrote:
> > $ ps -opid,rss,vsz,args -p 37627
> > PID RSS VSZ COMMAND
> > 37627 2980 14968 /opt/postgresql/9.2.4/bin/postgres -D /manatee/pg/data
> >
> > I'm not sure what we can infer from that, as this is a different system,
> > and the workload that generates the very large query strings only runs
> > occasionally. Those strings are also not logged unless something's gone
> > wrong.
>
> FWIW, I'd like to see a report of this around the time the issue
> occurred before doing anything further here.
>
This failure begins when this process exits, so the best you could get is
memory in use immediately before it exited. I obviously can't get that now
for the one instance I saw weeks ago, but maybe PostgreSQL could log
information about current memory usage when it's about to exit because of
ENOMEM? That way if anybody hits a similar condition in the future, the
data will be available to answer your question.
That said, I think the deadlock itself is pretty well explained by the data
we have already.
-- Dave
On 2017-12-04 13:57:52 -0800, David Pacheco wrote: > On Mon, Dec 4, 2017 at 12:23 PM, Andres Freund <andres@anarazel.de> wrote: > > FWIW, I'd like to see a report of this around the time the issue > > occurred before doing anything further here. > > > > > This failure begins when this process exits, so the best you could get is > memory in use immediately before it exited. I obviously can't get that now > for the one instance I saw weeks ago, but maybe PostgreSQL could log > information about current memory usage when it's about to exit because of > ENOMEM? It already does so. What I was wondering about was the memory usage some time before it dies. In particular while the workload with the long query strings is running. ps output would be good, gdb'ing into the process and issuing MemoryContextStats(TopMemoryContext) would be better. > That way if anybody hits a similar condition in the future, the > data will be available to answer your question. > > That said, I think the deadlock itself is pretty well explained by the data > we have already. Well, it doesn't really explain the root cause, and thus the extent of the fixes required. If the root cause is the amount of memory used by syslogger, we can remove the deadlock, but the experience is still going to be bad. Obviously better, but still bad. Greetings, Andres Freund
Thanks again for helping out.
On Mon, Dec 4, 2017 at 2:12 PM, Andres Freund wrote:
> On 2017-12-04 13:57:52 -0800, David Pacheco wrote:
> > On Mon, Dec 4, 2017 at 12:23 PM, Andres Freund
> wrote:
> > > FWIW, I'd like to see a report of this around the time the issue
> > > occurred before doing anything further here.
> > >
> >
> >
> > This failure begins when this process exits, so the best you could get is
> > memory in use immediately before it exited. I obviously can't get that
> now
> > for the one instance I saw weeks ago, but maybe PostgreSQL could log
> > information about current memory usage when it's about to exit because of
> > ENOMEM?
>
> It already does so.
>
In that case, do you have the information you need in the log that I posted
earlier in the thread?
(
https://gist.githubusercontent.com/davepacheco/c5541bb464532075f2da761dd990a457/raw/2ba242055aca2fb374e9118045a830d08c590e0a/gistfile1.txt
)
What I was wondering about was the memory usage some time before it
> dies. In particular while the workload with the long query strings is
> running. ps output would be good, gdb'ing into the process and issuing
> MemoryContextStats(TopMemoryContext) would be better.
>
Would it make sense for PostgreSQL to periodically sample the memory used
by the current process, keep a small ringbuffer of recent samples, and then
log all of that when it exits due to ENOMEM?
One does not know that one is going to run into this problem before it
happens, and it may not happen very often. (I've only seen it once.) The
more PostgreSQL can keep the information needed to understand something
like this after the fact, the better -- particularly since the overhead
required to maintain this information should not be that substantial.
> That way if anybody hits a similar condition in the future, the
> > data will be available to answer your question.
> >
> > That said, I think the deadlock itself is pretty well explained by the
> data
> > we have already.
>
> Well, it doesn't really explain the root cause, and thus the extent of
> the fixes required. If the root cause is the amount of memory used by
> syslogger, we can remove the deadlock, but the experience is still going
> to be bad. Obviously better, but still bad.
>
Fair enough. But we only know about one problem for sure, which is the
deadlock. There may be a second problem of the syslogger using too much
memory, but I don't think there's any evidence to point in that direction.
Once the whole system is out of memory (and it clearly was, based on the
log entries), anything that tried to allocate would fail, and the log
reflects that a number of different processes did fail to allocate memory.
I'd help investigate this question, but I have no more data about it, and
I'm not sure when I will run into this again.
Thanks,
Dave