Thread: BUG: Infinite syslog() loop

BUG: Infinite syslog() loop

From
Erik Walthinsen
Date:
I woke up this morning to a filled root partition on one of my servers.
I found that both debug and syslog were filled with over 50MB each of
postgres errors:

[what appears to be a standard cleanup cron job]
. . . .
Apr 21 04:00:03 alpha postgres[3371]: [85] DEBUG:  --Relation
pg_toast_16408--
Apr 21 04:00:03 alpha postgres[3371]: [86-1] DEBUG:  Pages 0: Changed 0,
Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Apr 21 04:00:03 alpha postgres[3371]: [86-2] ^ITotal CPU 0.00s/0.00u sec
elapsed 0.00 sec.
Apr 22 04:00:01 alpha postgres[17926]: [1] DEBUG:  connection:
host=[local] user=postgres database=template1
Apr 22 04:00:02 alpha postgres[17941]: [1] DEBUG:  connection:
host=[local] user=postgres database=template1
Apr 22 04:00:02 alpha postgres[362]: [1] DEBUG:  statistics collector
process (pid 364) exited with exit code 1
Apr 22 04:00:02 alpha postgres[362]: [2] DEBUG:  statistics collector
process (pid 17945) exited with exit code 1
. . . .
Apr 22 04:16:32 alpha postgres[362]: [445836] DEBUG:  statistics collector
process (pid 1629) exited with exit code 1

The 'statistics collector exited' error message goes on to repeat for
almost half a million times (as shown), before the files filled the root
fs and truncated.  It took about 16.5 minutes to spew this much, to both
logs, before it wedged the machine and finally fully gave up.

As far as I know the machine did not have zero reported space available
(after minfree) at the time this started, but even if it did, an infinite
loop of syslogs is only going to exacerbate the problem, hence this bug
report.

I've shut down postgres on the machine (since we're not using it at this
instant afaik) and have kept the logs in question, so if someone can tell
me what to look at, I'll do what I can to gather it.  Email me directly,
as I'm not on any pgsql lists.

The machine is a stock Debian Woody box, up to date, with no local changes
to the postgres configuration at all.  It's running 7.2.1-2woody2.

      Erik Walthinsen <omega@temple-baptist.com> - System Administrator
        __
       /  \                GStreamer - The only way to stream!
      |    | M E G A        ***** http://gstreamer.net/ *****
      _\  /_

Re: BUG: Infinite syslog() loop

From
Oliver Elphick
Date:
On Tue, 2003-04-22 at 20:41, Erik Walthinsen wrote:
> I woke up this morning to a filled root partition on one of my servers.
> I found that both debug and syslog were filled with over 50MB each of
> postgres errors:
>
> [what appears to be a standard cleanup cron job]
> . . . .
> Apr 21 04:00:03 alpha postgres[3371]: [85] DEBUG:  --Relation
...
> The machine is a stock Debian Woody box, up to date, with no local changes
> to the postgres configuration at all.  It's running 7.2.1-2woody2.

Change your syslog configuration to throw away debug messages from
postgres (user0)

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                             http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "Nay, in all these things we are more than conquerors
      through him that loved us."    Romans 8:37

Re: BUG: Infinite syslog() loop

From
Tom Lane
Date:
Erik Walthinsen <omega@temple-baptist.com> writes:
> Apr 22 04:00:02 alpha postgres[362]: [1] DEBUG:  statistics collector
> process (pid 364) exited with exit code 1
> Apr 22 04:00:02 alpha postgres[362]: [2] DEBUG:  statistics collector
> process (pid 17945) exited with exit code 1
> . . . .
> Apr 22 04:16:32 alpha postgres[362]: [445836] DEBUG:  statistics collector
> process (pid 1629) exited with exit code 1

> The machine is a stock Debian Woody box, up to date, with no local changes
> to the postgres configuration at all.  It's running 7.2.1-2woody2.

Hmm.  Evidently the pgstat subprocess is dying immediately on launch.
Checking the archives, I see that back then pgstat was in the habit of
reporting its woes on stderr no matter what the elog destination was
supposed to be :-(.  So the actually useful message went to postmaster
stderr, which more than likely is routed to /dev/null by your postmaster
startup script.

If you want to try to figure out what went wrong, you could alter the
start script to enable collection of stderr output and then try to
reproduce the problem.

            regards, tom lane

Re: BUG: Infinite syslog() loop

From
Tom Lane
Date:
Erik Walthinsen <omega@temple-baptist.com> writes:
> The problem though is still that whatever code is spawning pgstat is doing
> so forever.  Are you suggesting that because the error output isn't going
> to the right place that postmaster is not detecting that it died?  I'd
> think that postmaster would instead check the return value (else how does
> it determine atm that it failed?) and not loop once it's failed.

Actually, the postmaster is deliberately designed to restart pgstat.
On the whole I think that's a net plus for reliability, not a minus.

There might be some value in limiting the rate at which we try to spawn
it, though.

            regards, tom lane

Re: BUG: Infinite syslog() loop

From
Tom Lane
Date:
I said:
> Actually, the postmaster is deliberately designed to restart pgstat.
> On the whole I think that's a net plus for reliability, not a minus.

> There might be some value in limiting the rate at which we try to spawn
> it, though.

As of CVS tip, the postmaster will not try to launch pgstat oftener than
once a minute.

            regards, tom lane

Re: BUG: Infinite syslog() loop

From
Oliver Elphick
Date:
On Fri, 2003-04-25 at 23:27, Erik Walthinsen wrote:
> On Fri, 25 Apr 2003, Oliver Elphick wrote:
>
> > > I woke up this morning to a filled root partition on one of my servers.
> > > I found that both debug and syslog were filled with over 50MB each of
> > > postgres errors:
> > Change your syslog configuration to throw away debug messages from
> > postgres (user0)
>
> Um, NO.
>
> I will not configure my system to throw away logging data that's required
> to keep track of normal system operations, because there's a bug causing
> infinite looping.

Oops.  I hadn't read your message properly.  I've frequently seen
complaints of debug messages in syslog, because people don't realise
they need to configure syslog rahter than postgresql to control them.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                             http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "Submit yourselves therefore to God. Resist the devil,
      and he will flee from you."          James 4:7

Re: BUG: Infinite syslog() loop

From
Erik Walthinsen
Date:
On Fri, 25 Apr 2003, Tom Lane wrote:

> As of CVS tip, the postmaster will not try to launch pgstat oftener than
> once a minute.

Cool!  Now I just have to convince Debian to bring that change into woody
<g>

Thanx,
     Omega

      Erik Walthinsen <omega@temple-baptist.com> - System Administrator
        __
       /  \                GStreamer - The only way to stream!
      |    | M E G A        ***** http://gstreamer.net/ *****
      _\  /_

Re: BUG: Infinite syslog() loop

From
Erik Walthinsen
Date:
On Fri, 25 Apr 2003, Oliver Elphick wrote:

> > I woke up this morning to a filled root partition on one of my servers.
> > I found that both debug and syslog were filled with over 50MB each of
> > postgres errors:
> Change your syslog configuration to throw away debug messages from
> postgres (user0)

Um, NO.

I will not configure my system to throw away logging data that's required
to keep track of normal system operations, because there's a bug causing
infinite looping.

This 'solution' in no way solves the problem, which in actuality has very
little to do with syslog and everything to do with the fact that
postgresql managed to get into an infinite loop.  Infinite loops typically
cause the process to eat up all available processing time, hence the
ability to write nearly half a million log messages in just over 16
minutes.  My server has better things to do.

Throwing away the log messages is equivalent to sticking your fingers in
your ears to hide from a nuclear air-burst.

I still welcome any advice that will help me track down the cause of this
bug and get it fixed.

      Erik Walthinsen <omega@temple-baptist.com> - System Administrator
        __
       /  \                GStreamer - The only way to stream!
      |    | M E G A        ***** http://gstreamer.net/ *****
      _\  /_

Re: BUG: Infinite syslog() loop

From
Erik Walthinsen
Date:
On Fri, 25 Apr 2003, Tom Lane wrote:

> Hmm.  Evidently the pgstat subprocess is dying immediately on launch.
> Checking the archives, I see that back then pgstat was in the habit of
> reporting its woes on stderr no matter what the elog destination was
> supposed to be :-(.  So the actually useful message went to postmaster
> stderr, which more than likely is routed to /dev/null by your postmaster
> startup script.
>
> If you want to try to figure out what went wrong, you could alter the
> start script to enable collection of stderr output and then try to
> reproduce the problem.

It is possible that the disk was beyond minfree at that point, which would
explain why pgstat died.  I'll have to test that scenario on a UML box to
avoid squishing my production server.

The problem though is still that whatever code is spawning pgstat is doing
so forever.  Are you suggesting that because the error output isn't going
to the right place that postmaster is not detecting that it died?  I'd
think that postmaster would instead check the return value (else how does
it determine atm that it failed?) and not loop once it's failed.

      Erik Walthinsen <omega@temple-baptist.com> - System Administrator
        __
       /  \                GStreamer - The only way to stream!
      |    | M E G A        ***** http://gstreamer.net/ *****
      _\  /_