Thread: pgstat wait timeout
I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is currently in test/dev mode. I'm currently seeing the following messages occurring every few seconds: ... Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait timeout Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait timeout Dec 27 17:43:59 foo postgres[23697]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:44:04 foo postgres[27324]: [71402-1] : WARNING: pgstat wait timeout Dec 27 17:44:09 foo postgres[23715]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:44:17 foo postgres[27324]: [71403-1] : WARNING: pgstat wait timeout Dec 27 17:44:22 foo postgres[23716]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:44:27 foo postgres[27324]: [71404-1] : WARNING: pgstat wait timeout Dec 27 17:44:33 foo postgres[23718]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:44:54 foo postgres[27324]: [71405-1] : WARNING: pgstat wait timeout Dec 27 17:44:59 foo postgres[23824]: [6-1] : WARNING: pgstat wait timeout Dec 27 17:45:04 foo postgres[27324]: [71406-1] : WARNING: pgstat wait timeout I can't correlate events exactly, but the messages seem to have started shortly after I dropped a pgbench user and database. My Googling turned up various requests for debugging info on "hackers". Since the system isn't live, I haven't touched it in case anyone wants me to collect debugging info. Otherwise, I plan on just blowing the install away and replacing it with 9.1 Cheers, Steve
Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: > I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is > currently in test/dev mode. I'm currently seeing the following messages > occurring every few seconds: > > ... > Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout > Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait > timeout > Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout > Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait > timeout Hm, so can you strace the stats collector to see what it's doing? Maybe grab a backtrace with GDB from it before anything else. My guess is 27324 is the autovac launcher and the others are autovac workers just as they die. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 12/28/2011 05:05 AM, Alvaro Herrera wrote: > Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: >> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is >> currently in test/dev mode. I'm currently seeing the following messages >> occurring every few seconds: >> >> ... >> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout >> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait >> timeout >> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout >> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait >> timeout > Hm, so can you strace the stats collector to see what it's doing? Maybe > grab a backtrace with GDB from it before anything else. > > My guess is 27324 is the autovac launcher and the others are autovac > workers just as they die. > You are correct. 27324 is the launcher and the others are autovac workers. Here's the strace of the stats collector process: getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) getppid() = 27320 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) ....rinse...lather...repeat...ad nauseum... And the backtrace: #0 0x00007ff4d2e80f58 in poll () from /lib/libc.so.6 #1 0x00007ff4d4e6f465 in ?? () #2 0x00007ff4d4e6fd83 in pgstat_start () #3 0x00007ff4d4e73475 in ?? () #4 <signal handler called> #5 0x00007ff4d2e85fd3 in select () from /lib/libc.so.6 #6 0x00007ff4d4e71b93 in ?? () #7 0x00007ff4d4e74b01 in PostmasterMain () #8 0x00007ff4d4e193b3 in main () Cheers, Steve
Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011: > On 12/28/2011 05:05 AM, Alvaro Herrera wrote: > > Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: > >> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is > >> currently in test/dev mode. I'm currently seeing the following messages > >> occurring every few seconds: > >> > >> ... > >> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout > >> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait > >> timeout > >> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout > >> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait > >> timeout > > Hm, so can you strace the stats collector to see what it's doing? Maybe > > grab a backtrace with GDB from it before anything else. > > > > My guess is 27324 is the autovac launcher and the others are autovac > > workers just as they die. > > > You are correct. 27324 is the launcher and the others are autovac > workers. Here's the strace of the stats collector process: > > getppid() = 27320 > poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) > getppid() = 27320 > poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) > getppid() = 27320 > poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) > ....rinse...lather...repeat...ad nauseum... Weird ... even across more "pgstat wait timeout" messages? It's like it's not getting the "inquiry" messages that would tell it to write the file ... something wrong with the UDP socket perhaps? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 12/28/2011 09:34 AM, Alvaro Herrera wrote: > Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011: >> On 12/28/2011 05:05 AM, Alvaro Herrera wrote: >>> Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011: >>>> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is >>>> currently in test/dev mode. I'm currently seeing the following messages >>>> occurring every few seconds: >>>> >>>> ... >>>> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout >>>> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait >>>> timeout >>>> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout >>>> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait >>>> timeout >>> Hm, so can you strace the stats collector to see what it's doing? Maybe >>> grab a backtrace with GDB from it before anything else. >>> >>> My guess is 27324 is the autovac launcher and the others are autovac >>> workers just as they die. >>> >> You are correct. 27324 is the launcher and the others are autovac >> workers. Here's the strace of the stats collector process: >> >> getppid() = 27320 >> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) >> getppid() = 27320 >> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) >> getppid() = 27320 >> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout) >> ....rinse...lather...repeat...ad nauseum... > Weird ... even across more "pgstat wait timeout" messages? It's like > it's not getting the "inquiry" messages that would tell it to write the > file ... something wrong with the UDP socket perhaps? > Bingo! postgres 27325 postgres 8u *IPv6* 5379428 0t0 UDP localhost:47204->localhost:47204 In working on diagnosing a network timeout issue over an IPv4 to IPv4 VPN I disabled IPv6 via sysctl on this machine and pretty much forgot about it since we are still IPv4 internally. But PostgreSQL had already established a (now non-functional) IPv6 local connection. Re-enabling IPv6, as it was not related to the VPN timeouts, corrected the "pgstat wait timeout" issue. Cheers, Steve