Thread: Re: pgstat wait timeout (by Robert Schnabel)

Re: pgstat wait timeout (by Robert Schnabel)

From
Alexey Luchko
Date:
Hi!

We've got the same issue with PostgreSQL 8.4.1 on 32 bit Windows 2003
Server RC2. Every minute the message "pgstat wait timeout" appears in Event
Viewer / Application Log.

We've tried to restart the service. It was looking like the service stopped
successfully, but two postgres.exe processes left running. One of them was
utilizing one core. It had no tcp connections in any state. It was
"C:/Program Files/PostgreSQL/8.4/bin/postgres.exe" "--forkcol" "1228"

sysinternals' procexp.exe showes the following stack for this process:
"""
ntkrnlpa.exe+0x8dace
ntkrnlpa.exe+0x29a62
ntkrnlpa.exe+0x33178
hal.dll+0x6199
hal.dll+0x63d9
hal.dll+0x6577
hal.dll+0x3902
ntkrnlpa.exe+0x89721
ntdll.dll!KiFastSystemCallRet
mswsock.dll!ServiceMain+0x2949
mswsock.dll!NSPStartup+0x1893
WS2_32.dll!WSAUnhookBlockingHook+0x486
WS2_32.dll!WSAUnhookBlockingHook+0x660
WS2_32.dll!WSAEventSelect+0x2f
postgres.exe!mdpostckpt+0x181b38
postgres.exe!mdpostckpt+0x18bbdb
"""

After it was killed the following appeared in the application log:
LOG:  statistics collector process (PID 2680) exited with exit code 1


The service was started again and was working for aprox. 18 hours with no
errors. Then the message started appearing again.

The plan is to try PG 8.4.2.

Advices and questions are welcome ;)


--
Alexey

sms stands for save my soul

Re: pgstat wait timeout (by Robert Schnabel)

From
Alvaro Herrera
Date:
Alexey Luchko wrote:

> We've tried to restart the service. It was looking like the service
> stopped successfully, but two postgres.exe processes left running.
> One of them was utilizing one core. It had no tcp connections in any
> state. It was
> "C:/Program Files/PostgreSQL/8.4/bin/postgres.exe" "--forkcol" "1228"
>
> sysinternals' procexp.exe showes the following stack for this process:
> """
> ntkrnlpa.exe+0x8dace
> ntkrnlpa.exe+0x29a62
> ntkrnlpa.exe+0x33178
> hal.dll+0x6199
> hal.dll+0x63d9
> hal.dll+0x6577
> hal.dll+0x3902
> ntkrnlpa.exe+0x89721
> ntdll.dll!KiFastSystemCallRet
> mswsock.dll!ServiceMain+0x2949
> mswsock.dll!NSPStartup+0x1893
> WS2_32.dll!WSAUnhookBlockingHook+0x486
> WS2_32.dll!WSAUnhookBlockingHook+0x660
> WS2_32.dll!WSAEventSelect+0x2f
> postgres.exe!mdpostckpt+0x181b38
> postgres.exe!mdpostckpt+0x18bbdb
> """

This stack trace seems bogus.  The stats collector should not be calling
mdpostckpt at all.


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pgstat wait timeout (by Robert Schnabel)

From
Alexey Luchko
Date:
On 16-12-2009 16:28, Alvaro Herrera wrote:
> Alexey Luchko wrote:
>
>> We've tried to restart the service. It was looking like the service
>> stopped successfully, but two postgres.exe processes left running.
>> One of them was utilizing one core. It had no tcp connections in any
>> state. It was
>> "C:/Program Files/PostgreSQL/8.4/bin/postgres.exe" "--forkcol" "1228"
>>
>> sysinternals' procexp.exe showes the following stack for this process:
>> """
>> ntkrnlpa.exe+0x8dace
>> ntkrnlpa.exe+0x29a62
>> ntkrnlpa.exe+0x33178
>> hal.dll+0x6199
>> hal.dll+0x63d9
>> hal.dll+0x6577
>> hal.dll+0x3902
>> ntkrnlpa.exe+0x89721
>> ntdll.dll!KiFastSystemCallRet
>> mswsock.dll!ServiceMain+0x2949
>> mswsock.dll!NSPStartup+0x1893
>> WS2_32.dll!WSAUnhookBlockingHook+0x486
>> WS2_32.dll!WSAUnhookBlockingHook+0x660
>> WS2_32.dll!WSAEventSelect+0x2f
>> postgres.exe!mdpostckpt+0x181b38
>> postgres.exe!mdpostckpt+0x18bbdb
>> """
>
> This stack trace seems bogus.  The stats collector should not be calling
> mdpostckpt at all.

It seems that the process left and the stack trace is from the process
waiting for the stats collector.

--
Alexey

Re: pgstat wait timeout (by Robert Schnabel)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Alexey Luchko wrote:
>> postgres.exe!mdpostckpt+0x181b38
>> postgres.exe!mdpostckpt+0x18bbdb

> This stack trace seems bogus.  The stats collector should not be calling
> mdpostckpt at all.

Given the size of the offsets, it's pretty clear that it's not actually
in mdpostckpt at all.  Probably this is a stripped executable and
mdpostckpt is just the nearest global symbol.

IOW, not necessarily bogus, but still useless :-(

            regards, tom lane

Re: pgstat wait timeout (by Robert Schnabel)

From
Magnus Hagander
Date:
On Wed, Dec 16, 2009 at 16:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Alexey Luchko wrote:
>>> postgres.exe!mdpostckpt+0x181b38
>>> postgres.exe!mdpostckpt+0x18bbdb
>
>> This stack trace seems bogus. =A0The stats collector should not be calli=
ng
>> mdpostckpt at all.
>
> Given the size of the offsets, it's pretty clear that it's not actually
> in mdpostckpt at all. =A0Probably this is a stripped executable and
> mdpostckpt is just the nearest global symbol.

If that is so, the OP should follow:
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreS=
QL_backend_on_Windows

specifically the part about symbols.

Which it certainly looks like :)


--=20
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pgstat wait timeout (by Robert Schnabel)

From
Robert Haas
Date:
On Wed, Dec 16, 2009 at 10:45 AM, Magnus Hagander <magnus@hagander.net> wro=
te:
> On Wed, Dec 16, 2009 at 16:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> Alexey Luchko wrote:
>>>> postgres.exe!mdpostckpt+0x181b38
>>>> postgres.exe!mdpostckpt+0x18bbdb
>>
>>> This stack trace seems bogus. =A0The stats collector should not be call=
ing
>>> mdpostckpt at all.
>>
>> Given the size of the offsets, it's pretty clear that it's not actually
>> in mdpostckpt at all. =A0Probably this is a stripped executable and
>> mdpostckpt is just the nearest global symbol.
>
> If that is so, the OP should follow:
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_Postgr=
eSQL_backend_on_Windows
>
> specifically the part about symbols.
>
> Which it certainly looks like :)

We should really take some of the vast quantity of really useful
information that is in the wiki and try to index it somehow or
incorporate it into the docs.  I'm always learning about new pages
that have good stuff on them, but I never seem to stumble across them
organically.

...Robert

Re: pgstat wait timeout (by Robert Schnabel)

From
Alvaro Herrera
Date:
Robert Haas escribió:

> We should really take some of the vast quantity of really useful
> information that is in the wiki and try to index it somehow or
> incorporate it into the docs.  I'm always learning about new pages
> that have good stuff on them, but I never seem to stumble across them
> organically.

Maybe we should have a new wiki category, say "pages", that groups all
the existant pages in the wiki.  That way you could browse it and find
everything you'd ever want.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pgstat wait timeout (by Robert Schnabel)

From
Robert Haas
Date:
On Wed, Dec 16, 2009 at 11:05 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Robert Haas escribi=F3:
>
>> We should really take some of the vast quantity of really useful
>> information that is in the wiki and try to index it somehow or
>> incorporate it into the docs. =A0I'm always learning about new pages
>> that have good stuff on them, but I never seem to stumble across them
>> organically.
>
> Maybe we should have a new wiki category, say "pages", that groups all
> the existant pages in the wiki. =A0That way you could browse it and find
> everything you'd ever want.

Unfortunately, I'd likely find everything I didn't want, too.  :-(

...Robert

Re: pgstat wait timeout (by Robert Schnabel)

From
Alexey Luchko
Date:
On 16-12-2009 17:45, Magnus Hagander wrote:
> If that is so, the OP should follow:
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
>
> specifically the part about symbols.
>
> Which it certainly looks like :)

How can I find the stats collector's process?
What are others processes of interest?

Note. The problem appeared in a production system.
It's preferably not to install lot's of additional software.


Thanks in advance!

--
Alexey

sms stands for save my soul

Re: pgstat wait timeout (by Robert Schnabel)

From
Magnus Hagander
Date:
On Wed, Dec 16, 2009 at 17:43, Alexey Luchko <luch@ank-sia.com> wrote:
> On 16-12-2009 17:45, Magnus Hagander wrote:
>>
>> If that is so, the OP should follow:
>>
>> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
>>
>> specifically the part about symbols.
>>
>> Which it certainly looks like :)
>
> How can I find the stats collector's process?

Process explorer will let you see a Mutex named with a description of
the process. Try that.


> Note. The problem appeared in a production system.
> It's preferably not to install lot's of additional software.

The symbols aren't additional software, they are a part of postgresql.

Something like process explorer or windbg is necessary though (but
it's from microsoft, if that helps), because Windows doesn't ship with
any usable debugging tools.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pgstat wait timeout (by Robert Schnabel)

From
Craig Ringer
Date:
On 16/12/2009 11:55 PM, Robert Haas wrote:
> On Wed, Dec 16, 2009 at 10:45 AM, Magnus Hagander<magnus@hagander.net>  wrote:
>> On Wed, Dec 16, 2009 at 16:12, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> Alvaro Herrera<alvherre@commandprompt.com>  writes:
>>>> Alexey Luchko wrote:
>>>>> postgres.exe!mdpostckpt+0x181b38
>>>>> postgres.exe!mdpostckpt+0x18bbdb
>>>
>>>> This stack trace seems bogus.  The stats collector should not be calling
>>>> mdpostckpt at all.
>>>
>>> Given the size of the offsets, it's pretty clear that it's not actually
>>> in mdpostckpt at all.  Probably this is a stripped executable and
>>> mdpostckpt is just the nearest global symbol.
>>
>> If that is so, the OP should follow:
>> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
>>
>> specifically the part about symbols.
>>
>> Which it certainly looks like :)
>
> We should really take some of the vast quantity of really useful
> information that is in the wiki and try to index it somehow or
> incorporate it into the docs.  I'm always learning about new pages
> that have good stuff on them, but I never seem to stumble across them
> organically.

MediaWiki offers features to make this a lot easier. The categories
feature in particular is great: go to the page:

    Category:Blah

and you'll see a listing of all pages containing a link to
[[Category:Blah]]. The existing list of all categories can be found via
the "special pages" link in the sidebar:

   http://wiki.postgresql.org/wiki/Special:Categories

A "favourite" or "highlight" category would be one way to do what you're
talking about.

--
Craig Ringer

Re: pgstat wait timeout (by Robert Schnabel)

From
Robert Haas
Date:
On Wed, Dec 16, 2009 at 9:03 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 16/12/2009 11:55 PM, Robert Haas wrote:
>>
>> On Wed, Dec 16, 2009 at 10:45 AM, Magnus Hagander<magnus@hagander.net>
>> =A0wrote:
>>>
>>> On Wed, Dec 16, 2009 at 16:12, Tom Lane<tgl@sss.pgh.pa.us> =A0wrote:
>>>>
>>>> Alvaro Herrera<alvherre@commandprompt.com> =A0writes:
>>>>>
>>>>> Alexey Luchko wrote:
>>>>>>
>>>>>> postgres.exe!mdpostckpt+0x181b38
>>>>>> postgres.exe!mdpostckpt+0x18bbdb
>>>>
>>>>> This stack trace seems bogus. =A0The stats collector should not be
>>>>> calling
>>>>> mdpostckpt at all.
>>>>
>>>> Given the size of the offsets, it's pretty clear that it's not actually
>>>> in mdpostckpt at all. =A0Probably this is a stripped executable and
>>>> mdpostckpt is just the nearest global symbol.
>>>
>>> If that is so, the OP should follow:
>>>
>>> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_Post=
greSQL_backend_on_Windows
>>>
>>> specifically the part about symbols.
>>>
>>> Which it certainly looks like :)
>>
>> We should really take some of the vast quantity of really useful
>> information that is in the wiki and try to index it somehow or
>> incorporate it into the docs. =A0I'm always learning about new pages
>> that have good stuff on them, but I never seem to stumble across them
>> organically.
>
> MediaWiki offers features to make this a lot easier. The categories featu=
re
> in particular is great: go to the page:
>
> =A0 Category:Blah
>
> and you'll see a listing of all pages containing a link to
> [[Category:Blah]]. The existing list of all categories can be found via t=
he
> "special pages" link in the sidebar:
>
> =A0http://wiki.postgresql.org/wiki/Special:Categories
>
> A "favourite" or "highlight" category would be one way to do what you're
> talking about.

I don't think that's it exactly.  What I think would be really useful
is a page that is linked off the main page or some other reasonably
prominent place that lists useful resources by topic.  I guess the
category interface could do that if it were maintained, but just
"which pages are good" is not quite what I had in mind.

...Robert

Re: pgstat wait timeout

From
Alexey Luchko
Date:
Hi!

On 16-12-2009 17:45, Magnus Hagander wrote:
 >
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
Thanks, it helps!

Yesterday evening Windows on the server were restarted.
Today the messages still appear, but not so regularly as before.
There was one message per minute. Now the rate is lower.

While I was looking at it the WARNING:  pgstat wait timeout
appeared at ~16:48, 16:54, 17:10, 17:15, 17:16. Then I disconnected.

I've found the stats collector process by the stack trace:
"C:/Program Files/PostgreSQL/8.4/bin/postgres.exe" "--forkcol" "1228"

17:00
"""
ntkrnlpa.exe!ExAllocatePoolWithTag+0xfa0
ntkrnlpa.exe!KeInitializeMutant+0x1c
ntkrnlpa.exe!MiFlushSectionInternal+0x89c
ntkrnlpa.exe!MiFlushSectionInternal+0xba4
ntkrnlpa.exe!KeI386Call16BitCStyleFunction+0x55
ntkrnlpa.exe!PopFlushVolumeWorker+0x12e
ntkrnlpa.exe!PopFlushVolumes+0x72
ntkrnlpa.exe!MiReserveSystemPtes+0x1ca
ntdll.dll!KiFastSystemCall+0x3
ntdll.dll!KiFastSystemCallRet
ntdll.dll!NtWaitForMultipleObjects+0xc
kernel32.dll!WaitForMultipleObjectsEx+0x11a
postgres.exe!pgwin32_waitforsinglesocket+0x1f0
postgres.exe!PgstatCollectorMain+0x175
postgres.exe!SubPostmasterMain+0x352
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23
"""

17:12
"""
ntkrnlpa.exe!KiSwapContext+0x26
ntkrnlpa.exe!KiSwapThread+0x2e5
ntkrnlpa.exe!KeWaitForSingleObject+0x346
ntkrnlpa.exe!KiSuspendThread+0x18
ntkrnlpa.exe!KiDeliverApc+0x117
ntkrnlpa.exe!KiSwapThread+0x300
ntkrnlpa.exe!KeWaitForMultipleObjects+0x3d7
ntkrnlpa.exe!ObpWaitForMultipleObjects+0x202
ntkrnlpa.exe!NtWaitForMultipleObjects+0xc8
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntdll.dll!KiFastSystemCallRet
ntdll.dll!NtWaitForMultipleObjects+0xc
kernel32.dll!WaitForMultipleObjectsEx+0x11a
postgres.exe!pgwin32_waitforsinglesocket+0x1f0
postgres.exe!PgstatCollectorMain+0x175
postgres.exe!SubPostmasterMain+0x352
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23
"""

17:18
"""
ntkrnlpa.exe!KiSwapContext+0x26
ntkrnlpa.exe!KiSwapThread+0x2e5
ntkrnlpa.exe!KeWaitForSingleObject+0x346
ntkrnlpa.exe!KiSuspendThread+0x18
ntkrnlpa.exe!KiDeliverApc+0x117
ntkrnlpa.exe!KiSwapThread+0x300
ntkrnlpa.exe!KeWaitForMultipleObjects+0x3d7
ntkrnlpa.exe!ObpWaitForMultipleObjects+0x202
ntkrnlpa.exe!NtWaitForMultipleObjects+0xc8
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntdll.dll!KiFastSystemCallRet
ntdll.dll!NtWaitForMultipleObjects+0xc
kernel32.dll!WaitForMultipleObjectsEx+0x11a
postgres.exe!pgwin32_waitforsinglesocket+0x1f0
postgres.exe!PgstatCollectorMain+0x175
postgres.exe!SubPostmasterMain+0x352
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23
"""

And autovacuum launcher:
"C:/Program Files/PostgreSQL/8.4/bin/postgres.exe" "--forkavlauncher" "1232"
"""
ntkrnlpa.exe!ExAllocatePoolWithTag+0xfa0
ntkrnlpa.exe!KeInitializeMutant+0x1c
ntkrnlpa.exe!MiFlushSectionInternal+0x89c
ntkrnlpa.exe!MiFlushSectionInternal+0xba4
ntkrnlpa.exe!KeInitializeMutant+0x1c
ntkrnlpa.exe!PoGetLightestSystemStateForEject+0x132
ntkrnlpa.exe!MiReserveSystemPtes+0x1ca
ntdll.dll!KiFastSystemCall+0x3
ntdll.dll!KiFastSystemCallRet
ntdll.dll!NtWaitForSingleObject+0xc
kernel32.dll!WaitForSingleObjectEx+0xac
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x36
postgres.exe!AutoVacLauncherMain+0x24c
postgres.exe!SubPostmasterMain+0x2b5
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23
"""

17:12
"""
ntkrnlpa.exe!KiSwapContext+0x26
ntkrnlpa.exe!KiSwapThread+0x2e5
ntkrnlpa.exe!KeWaitForSingleObject+0x346
ntkrnlpa.exe!KiSuspendThread+0x18
ntkrnlpa.exe!KiDeliverApc+0x117
ntkrnlpa.exe!KiSwapThread+0x300
ntkrnlpa.exe!KeWaitForSingleObject+0x346
ntkrnlpa.exe!NtWaitForSingleObject+0x9a
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntdll.dll!KiFastSystemCallRet
ntdll.dll!NtWaitForSingleObject+0xc
kernel32.dll!WaitForSingleObjectEx+0xac
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x36
postgres.exe!AutoVacLauncherMain+0x24c
postgres.exe!SubPostmasterMain+0x2b5
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23
"""

17:17
"""
ntkrnlpa.exe!KiSwapContext+0x26
ntkrnlpa.exe!KiSwapThread+0x2e5
ntkrnlpa.exe!KeWaitForSingleObject+0x346
ntkrnlpa.exe!KiSuspendThread+0x18
ntkrnlpa.exe!KiDeliverApc+0x117
ntkrnlpa.exe!KiSwapThread+0x300
ntkrnlpa.exe!KeWaitForSingleObject+0x346
ntkrnlpa.exe!NtWaitForSingleObject+0x9a
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntdll.dll!KiFastSystemCallRet
ntdll.dll!NtWaitForSingleObject+0xc
kernel32.dll!WaitForSingleObjectEx+0xac
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x36
postgres.exe!AutoVacLauncherMain+0x24c
postgres.exe!SubPostmasterMain+0x2b5
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23
"""


Process Explorer showed the following for both processes
in process properties / threads:
"""
postgres.exe!mainCRTStartup
biolsp.dll!GetLspGuid+0xc00
biolsp.dll!GetLspGuid+0xc00  (second time)
WS2HELP.dll!ApcThread
postgres.exe!pg    signal_thread
"""

Please, ask if one needs more details.


--
Alexey

sms stands for save my soul