Thread: Win XP SP2 SMP locking (8.1.4)

Win XP SP2 SMP locking (8.1.4)

From
Oleg Bartunov
Date:
Hi there,

I'm looking into strange locking, which happens on WinXP SP2 SMP
machine running 8.1.4 with stats_row_level=on. This is the only 
combination (# of cpu and stats_row_level) which has problem - 
SMP + stats_row_level.

The same test runs fine with one cpu (restarted machine with /numproc=1) 
disregarding to stats_row_level option.

Customer's application loads data into database and sometimes process
stopped, no cpu, no io activity. PgAdmin shows current query is 'COMMIT'.
I tried to attach gdb to postgres and client processes, but backtrace looks
useless (see below). Running vacuum analyze of this database in separate 
process cause loading process to continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in all
combinations !  Any idea what changes from 8.1.4 to 8.2beta1 could 
affect the problem ?


postgres.exe:

(gdb) bt
#0  0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1  0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2  0x00000005 in ?? ()
#3  0x00000004 in ?? ()
#4  0x00000001 in ?? ()
#5  0x019effd0 in ?? ()
#6  0xf784e548 in ?? ()
#7  0xffffffff in ?? ()
#8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9  0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
address 0x19f0000


application:
(gdb) bt
#0  0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1  0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2  0x00000005 in ?? ()
#3  0x00000004 in ?? ()
#4  0x00000001 in ?? ()
#5  0x0196ffd0 in ?? ()
#6  0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
#7  0xffffffff in ?? ()
#8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9  0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from
#11 0x00000000 in ?? () from
#12 0x00000000 in ?? () from
#13 0x00000000 in ?? () from
(gdb) Cannot access memory at address 0x1970000

    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: Win XP SP2 SMP locking (8.1.4)

From
"Joshua D. Drake"
Date:
> 
> It's interesting, that there is no problem with 8.2beta1 in all
> combinations !  Any idea what changes from 8.1.4 to 8.2beta1 could
> affect the problem ?

What do you mean locking? Do you mean the postgresql process locks up?
E.g; can you still connect to PostgreSQL from another connection? If not
is there an error?

Joshua D. Drake


> 
> 
> postgres.exe:
> 
> (gdb) bt
> #0  0x7c901231 in ntdll!DbgUiConnectToDbg () from
> C:\WINDOWS\system32\ntdll.dll
> #1  0x7c9507a8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #2  0x00000005 in ?? ()
> #3  0x00000004 in ?? ()
> #4  0x00000001 in ?? ()
> #5  0x019effd0 in ?? ()
> #6  0xf784e548 in ?? ()
> #7  0xffffffff in ?? ()
> #8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
> #9  0x7c9507c8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
> ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
> address 0x19f0000
> 
> 
> application:
> (gdb) bt
> #0  0x7c901231 in ntdll!DbgUiConnectToDbg () from
> C:\WINDOWS\system32\ntdll.dll
> #1  0x7c9507a8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #2  0x00000005 in ?? ()
> #3  0x00000004 in ?? ()
> #4  0x00000001 in ?? ()
> #5  0x0196ffd0 in ?? ()
> #6  0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
> #7  0xffffffff in ?? ()
> #8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
> #9  0x7c9507c8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #10 0x00000000 in ?? () from
> #11 0x00000000 in ?? () from
> #12 0x00000000 in ?? () from
> #13 0x00000000 in ?? () from
> (gdb) Cannot access memory at address 0x1970000
> 
> 
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 


-- 
  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240  Providing the most comprehensive  PostgreSQL
solutionssince 1997            http://www.commandprompt.com/
 




Re: Win XP SP2 SMP locking (8.1.4)

From
Oleg Bartunov
Date:
On Thu, 5 Oct 2006, Joshua D. Drake wrote:

>>
>> It's interesting, that there is no problem with 8.2beta1 in all
>> combinations !  Any idea what changes from 8.1.4 to 8.2beta1 could
>> affect the problem ?
>
> What do you mean locking? Do you mean the postgresql process locks up?
> E.g; can you still connect to PostgreSQL from another connection? If not
> is there an error?

It looks like application is waiting something from postgresql, but 
postgresql thinks it did the job. vacuum analyze gets things moving.
I could connect to PostgreSQL from another connection, for example
pgAdmin still works with this database.


>
> Joshua D. Drake
>
>
>>
>>
>> postgres.exe:
>>
>> (gdb) bt
>> #0  0x7c901231 in ntdll!DbgUiConnectToDbg () from
>> C:\WINDOWS\system32\ntdll.dll
>> #1  0x7c9507a8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #2  0x00000005 in ?? ()
>> #3  0x00000004 in ?? ()
>> #4  0x00000001 in ?? ()
>> #5  0x019effd0 in ?? ()
>> #6  0xf784e548 in ?? ()
>> #7  0xffffffff in ?? ()
>> #8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
>> #9  0x7c9507c8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
>> ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
>> address 0x19f0000
>>
>>
>> application:
>> (gdb) bt
>> #0  0x7c901231 in ntdll!DbgUiConnectToDbg () from
>> C:\WINDOWS\system32\ntdll.dll
>> #1  0x7c9507a8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #2  0x00000005 in ?? ()
>> #3  0x00000004 in ?? ()
>> #4  0x00000001 in ?? ()
>> #5  0x0196ffd0 in ?? ()
>> #6  0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
>> #7  0xffffffff in ?? ()
>> #8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
>> #9  0x7c9507c8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #10 0x00000000 in ?? () from
>> #11 0x00000000 in ?? () from
>> #12 0x00000000 in ?? () from
>> #13 0x00000000 in ?? () from
>> (gdb) Cannot access memory at address 0x1970000
>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>       subscribe-nomail command to majordomo@postgresql.org so that your
>>       message can get through to the mailing list cleanly
>>
>
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: Win XP SP2 SMP locking (8.1.4)

From
"Magnus Hagander"
Date:
> Hi there,
>
> I'm looking into strange locking, which happens on WinXP SP2
> SMP machine running 8.1.4 with stats_row_level=on. This is
> the only combination (# of cpu and stats_row_level) which has
> problem - SMP + stats_row_level.
>
> The same test runs fine with one cpu (restarted machine with
> /numproc=1) disregarding to stats_row_level option.
>
> Customer's application loads data into database and sometimes
> process stopped, no cpu, no io activity. PgAdmin shows
> current query is 'COMMIT'.
> I tried to attach gdb to postgres and client processes, but
> backtrace looks useless (see below). Running vacuum analyze
> of this database in separate process cause loading process to
> continue ! Weird.
>
> It's interesting, that there is no problem with 8.2beta1 in
> all combinations !  Any idea what changes from 8.1.4 to
> 8.2beta1 could affect the problem ?

There is a new implementations of semaphores in 8.2. That could possibly
be it.


//Magnus


Re: Win XP SP2 SMP locking (8.1.4)

From
Oleg Bartunov
Date:
On Thu, 5 Oct 2006, Magnus Hagander wrote:

>> Hi there,
>>
>> I'm looking into strange locking, which happens on WinXP SP2
>> SMP machine running 8.1.4 with stats_row_level=on. This is
>> the only combination (# of cpu and stats_row_level) which has
>> problem - SMP + stats_row_level.
>>
>> The same test runs fine with one cpu (restarted machine with
>> /numproc=1) disregarding to stats_row_level option.
>>
>> Customer's application loads data into database and sometimes
>> process stopped, no cpu, no io activity. PgAdmin shows
>> current query is 'COMMIT'.
>> I tried to attach gdb to postgres and client processes, but
>> backtrace looks useless (see below). Running vacuum analyze
>> of this database in separate process cause loading process to
>> continue ! Weird.
>>
>> It's interesting, that there is no problem with 8.2beta1 in
>> all combinations !  Any idea what changes from 8.1.4 to
>> 8.2beta1 could affect the problem ?
>
> There is a new implementations of semaphores in 8.2. That could possibly
> be it.

I backported them to REL8_1_STABLE but it doesn't helped. Any other idea
what to do, or how to debug the situation ?


    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: Win XP SP2 SMP locking (8.1.4)

From
"Magnus Hagander"
Date:
> >> I'm looking into strange locking, which happens on WinXP SP2 SMP
> >> machine running 8.1.4 with stats_row_level=on. This is the only
> >> combination (# of cpu and stats_row_level) which has problem -
> SMP +
> >> stats_row_level.
> >>
> >> The same test runs fine with one cpu (restarted machine with
> >> /numproc=1) disregarding to stats_row_level option.
> >>
> >> Customer's application loads data into database and sometimes
> process
> >> stopped, no cpu, no io activity. PgAdmin shows current query is
> >> 'COMMIT'.
> >> I tried to attach gdb to postgres and client processes, but
> backtrace
> >> looks useless (see below). Running vacuum analyze of this
> database in
> >> separate process cause loading process to continue ! Weird.
> >>
> >> It's interesting, that there is no problem with 8.2beta1 in all
> >> combinations !  Any idea what changes from 8.1.4 to
> >> 8.2beta1 could affect the problem ?
> >
> > There is a new implementations of semaphores in 8.2. That could
> > possibly be it.
>
> I backported them to REL8_1_STABLE but it doesn't helped. Any other
> idea what to do, or how to debug the situation ?

Unfortunatly, the debugger support for mingw is absolutely horrible. But
you can try process explorer from www.sysinternals.com and see if it'll
give you a decent backtrace. Sometimes it works when others don't.
Either that, or try the Visual Studio or Windows debuggers, they can
usually at least show you if it's stuck waiting on something in the
kernel.

//Magnus


Re: Win XP SP2 SMP locking (8.1.4)

From
"Rocco Altier"
Date:
Didn't the stats communication process get redone for 8.2?

Or atleast some time-out related stuff.

Since the problem seems to be related to stats_row_level being on, I
wonder if the problem might be in that sub-system.  I am guessing that
vacuum is pushing some more stats through, which might explain how that
allows it to unfreeze.
-rocco

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
> Magnus Hagander
> Sent: Friday, October 06, 2006 6:00 AM
> To: Oleg Bartunov
> Cc: Pgsql Hackers
> Subject: Re: [HACKERS] Win XP SP2 SMP locking (8.1.4)
>
>
> > >> I'm looking into strange locking, which happens on WinXP SP2 SMP
> > >> machine running 8.1.4 with stats_row_level=on. This is the only
> > >> combination (# of cpu and stats_row_level) which has problem -
> > SMP +
> > >> stats_row_level.
> > >>
> > >> The same test runs fine with one cpu (restarted machine with
> > >> /numproc=1) disregarding to stats_row_level option.
> > >>
> > >> Customer's application loads data into database and sometimes
> > process
> > >> stopped, no cpu, no io activity. PgAdmin shows current query is
> > >> 'COMMIT'.
> > >> I tried to attach gdb to postgres and client processes, but
> > backtrace
> > >> looks useless (see below). Running vacuum analyze of this
> > database in
> > >> separate process cause loading process to continue ! Weird.
> > >>
> > >> It's interesting, that there is no problem with 8.2beta1 in all
> > >> combinations !  Any idea what changes from 8.1.4 to
> > >> 8.2beta1 could affect the problem ?
> > >
> > > There is a new implementations of semaphores in 8.2. That could
> > > possibly be it.
> >
> > I backported them to REL8_1_STABLE but it doesn't helped. Any other
> > idea what to do, or how to debug the situation ?
>
> Unfortunatly, the debugger support for mingw is absolutely
> horrible. But
> you can try process explorer from www.sysinternals.com and
> see if it'll
> give you a decent backtrace. Sometimes it works when others don't.
> Either that, or try the Visual Studio or Windows debuggers, they can
> usually at least show you if it's stuck waiting on something in the
> kernel.
>
> //Magnus
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>               http://archives.postgresql.org