Re: Hung postmaster (8.3.9) - Mailing list pgsql-general

From Ed L.
Subject Re: Hung postmaster (8.3.9)
Date
Msg-id 201003011640.55944.pgsql@bluepolka.net
Whole thread Raw
In response to Re: Hung postmaster (8.3.9)  ("Ed L." <pgsql@bluepolka.net>)
Responses Re: Hung postmaster (8.3.9)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Hung postmaster (8.3.9)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Monday 01 March 2010 @ 16:03, Ed L. wrote:
> On Monday 01 March 2010 @ 15:59, Ed L. wrote:
> > > This just happened again ~24 hours after full reload from
> > >  backup. Arrrgh.
> > >
> > > Backtrace looks the same again, same file, same
> > > __read_nocancel().  $PGDATA/global/pg_auth looks fine to
> > > me, permissions are 600, entries are 3 or more
> > > double-quoted items per line each separated by a space,
> > > items 3 and beyond being groups.
> > >
> > > Any clues?
>
> Also seeing lots of postmaster zombies (190 and growing)...

While new connections are hanging, top shows postmaster using
100% of cpu.  SIGTERM/SIGQUIT do nothing.  Here's a backtrace
of this busy postmaster:

(gdb) bt
#0  0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
#1  0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
#2  0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3  0x000000346f8689cb in getc () from /lib64/libc.so.6
#4  0x0000000000531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 "", bufsz=4096) at hba.c:128
#5  0x0000000000532233 in tokenize_file (filename=0x10359b70 "global", file=0x10377ae0, lines=0x7fff322310f8,
line_nums=0x7fff322310f0)at hba.c:232 
#6  0x00000000005322e9 in tokenize_file (filename=0x2b1c8cbf5800 "global/pg_auth", file=0x103767a0, lines=0x98b168,
line_nums=0x98b170)at hba.c:358 
#7  0x00000000005327ff in load_role () at hba.c:959
#8  0x000000000057f878 in sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:3830
#9  <signal handler called>
#10 0x000000346f8cb323 in __select_nocancel () from /lib64/libc.so.6
#11 0x000000000057cc33 in ServerLoop () at postmaster.c:1236
#12 0x000000000057dfdf in PostmasterMain (argc=6, argv=0x1033f000) at postmaster.c:1031
#13 0x00000000005373de in main (argc=6, argv=<value optimized out>) at main.c:188

...and more from the server logs, fwiw:

2010-03-01 17:30:24.213 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:30:31.250 CST [32236]    DEBUG:  transaction log switch forced (archive_timeout=300)
2010-03-01 17:31:24.216 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:32:24.219 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:33:24.222 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:34:24.225 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:35:19.061 CST [32236]    LOG:  checkpoint starting: time
2010-03-01 17:35:19.185 CST [32236]    DEBUG:  recycled transaction log file "000000010000001C00000071"
2010-03-01 17:35:19.185 CST [32236]    LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
added,0 removed, 1 recycled;  
write=0.028 s, sync=0.000 s, total=0.124 s
2010-03-01 17:35:24.328 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:35:31.224 CST [32236]    DEBUG:  transaction log switch forced (archive_timeout=300)
2010-03-01 17:36:44.332 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:37:44.434 CST [32238]    WARNING:  worker took too long to start; cancelled
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG:  could not receive data from client: Connection timed out
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG:  unexpected EOF on client connection
2010-03-01 17:37:47.380 CST [3692] dba 10....(42816) dba LOG:  disconnection: session time: 2:11:15.303 user=dba
database=dbahost=... port=428 

pgsql-general by date:

Previous
From: "Ed L."
Date:
Subject: Re: Hung postmaster (8.3.9)
Next
From: Tom Lane
Date:
Subject: Re: Hung postmaster (8.3.9)