Re: pgstat wait timeout - Mailing list pgsql-hackers

From Steve Crawford
Subject Re: pgstat wait timeout
Date
Msg-id 4EFB5A7D.70904@pinpointresearch.com
Whole thread Raw
In response to Re: pgstat wait timeout  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
On 12/28/2011 09:34 AM, Alvaro Herrera wrote:
> Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
>> On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
>>> Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
>>>> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
>>>> currently in test/dev mode. I'm currently seeing the following messages
>>>> occurring every few seconds:
>>>>
>>>> ...
>>>> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
>>>> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
>>>> timeout
>>>> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
>>>> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
>>>> timeout
>>> Hm, so can you strace the stats collector to see what it's doing?  Maybe
>>> grab a backtrace with GDB from it before anything else.
>>>
>>> My guess is 27324 is the autovac launcher and the others are autovac
>>> workers just as they die.
>>>
>> You are correct. 27324 is the launcher and the others are autovac
>> workers. Here's the strace of the stats collector process:
>>
>> getppid()                               = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid()                               = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid()                               = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> ....rinse...lather...repeat...ad nauseum...
> Weird ... even across more "pgstat wait timeout" messages?  It's like
> it's not getting the "inquiry" messages that would tell it to write the
> file ... something wrong with the UDP socket perhaps?
>
Bingo!

postgres  27325 postgres    8u     *IPv6*            5379428
0t0        UDP localhost:47204->localhost:47204

In working on diagnosing a network timeout issue over an IPv4 to IPv4
VPN I disabled IPv6 via sysctl on this machine and pretty much forgot
about it since we are still IPv4 internally. But PostgreSQL had already
established a (now non-functional) IPv6 local connection. Re-enabling
IPv6, as it was not related to the VPN timeouts, corrected the "pgstat
wait timeout" issue.

Cheers,
Steve



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: age(xid) on hot standby
Next
From: Dimitri Fontaine
Date:
Subject: Re: contrib/README