Thread: Win XP SP2 SMP locking (8.1.4)
Hi there, I'm looking into strange locking, which happens on WinXP SP2 SMP machine running 8.1.4 with stats_row_level=on. This is the only combination (# of cpu and stats_row_level) which has problem - SMP + stats_row_level. The same test runs fine with one cpu (restarted machine with /numproc=1) disregarding to stats_row_level option. Customer's application loads data into database and sometimes process stopped, no cpu, no io activity. PgAdmin shows current query is 'COMMIT'. I tried to attach gdb to postgres and client processes, but backtrace looks useless (see below). Running vacuum analyze of this database in separate process cause loading process to continue ! Weird. It's interesting, that there is no problem with 8.2beta1 in all combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could affect the problem ? postgres.exe: (gdb) bt #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from C:\WINDOWS\system32\ntdll.dll #1 0x7c9507a8 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll #2 0x00000005 in ?? () #3 0x00000004 in ?? () #4 0x00000001 in ?? () #5 0x019effd0 in ?? () #6 0xf784e548 in ?? () #7 0xffffffff in ?? () #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll #9 0x7c9507c8 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at address 0x19f0000 application: (gdb) bt #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from C:\WINDOWS\system32\ntdll.dll #1 0x7c9507a8 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll #2 0x00000005 in ?? () #3 0x00000004 in ?? () #4 0x00000001 in ?? () #5 0x0196ffd0 in ?? () #6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm () #7 0xffffffff in ?? () #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll #9 0x7c9507c8 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at address 0x1970000 Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
> > It's interesting, that there is no problem with 8.2beta1 in all > combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could > affect the problem ? What do you mean locking? Do you mean the postgresql process locks up? E.g; can you still connect to PostgreSQL from another connection? If not is there an error? Joshua D. Drake > > > postgres.exe: > > (gdb) bt > #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from > C:\WINDOWS\system32\ntdll.dll > #1 0x7c9507a8 in ntdll!KiIntSystemCall () from > C:\WINDOWS\system32\ntdll.dll > #2 0x00000005 in ?? () > #3 0x00000004 in ?? () > #4 0x00000001 in ?? () > #5 0x019effd0 in ?? () > #6 0xf784e548 in ?? () > #7 0xffffffff in ?? () > #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll > #9 0x7c9507c8 in ntdll!KiIntSystemCall () from > C:\WINDOWS\system32\ntdll.dll > #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in > ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at > address 0x19f0000 > > > application: > (gdb) bt > #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from > C:\WINDOWS\system32\ntdll.dll > #1 0x7c9507a8 in ntdll!KiIntSystemCall () from > C:\WINDOWS\system32\ntdll.dll > #2 0x00000005 in ?? () > #3 0x00000004 in ?? () > #4 0x00000001 in ?? () > #5 0x0196ffd0 in ?? () > #6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm () > #7 0xffffffff in ?? () > #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll > #9 0x7c9507c8 in ntdll!KiIntSystemCall () from > C:\WINDOWS\system32\ntdll.dll > #10 0x00000000 in ?? () from > #11 0x00000000 in ?? () from > #12 0x00000000 in ?? () from > #13 0x00000000 in ?? () from > (gdb) Cannot access memory at address 0x1970000 > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
On Thu, 5 Oct 2006, Joshua D. Drake wrote: >> >> It's interesting, that there is no problem with 8.2beta1 in all >> combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could >> affect the problem ? > > What do you mean locking? Do you mean the postgresql process locks up? > E.g; can you still connect to PostgreSQL from another connection? If not > is there an error? It looks like application is waiting something from postgresql, but postgresql thinks it did the job. vacuum analyze gets things moving. I could connect to PostgreSQL from another connection, for example pgAdmin still works with this database. > > Joshua D. Drake > > >> >> >> postgres.exe: >> >> (gdb) bt >> #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from >> C:\WINDOWS\system32\ntdll.dll >> #1 0x7c9507a8 in ntdll!KiIntSystemCall () from >> C:\WINDOWS\system32\ntdll.dll >> #2 0x00000005 in ?? () >> #3 0x00000004 in ?? () >> #4 0x00000001 in ?? () >> #5 0x019effd0 in ?? () >> #6 0xf784e548 in ?? () >> #7 0xffffffff in ?? () >> #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll >> #9 0x7c9507c8 in ntdll!KiIntSystemCall () from >> C:\WINDOWS\system32\ntdll.dll >> #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in >> ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at >> address 0x19f0000 >> >> >> application: >> (gdb) bt >> #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from >> C:\WINDOWS\system32\ntdll.dll >> #1 0x7c9507a8 in ntdll!KiIntSystemCall () from >> C:\WINDOWS\system32\ntdll.dll >> #2 0x00000005 in ?? () >> #3 0x00000004 in ?? () >> #4 0x00000001 in ?? () >> #5 0x0196ffd0 in ?? () >> #6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm () >> #7 0xffffffff in ?? () >> #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll >> #9 0x7c9507c8 in ntdll!KiIntSystemCall () from >> C:\WINDOWS\system32\ntdll.dll >> #10 0x00000000 in ?? () from >> #11 0x00000000 in ?? () from >> #12 0x00000000 in ?? () from >> #13 0x00000000 in ?? () from >> (gdb) Cannot access memory at address 0x1970000 >> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
> Hi there, > > I'm looking into strange locking, which happens on WinXP SP2 > SMP machine running 8.1.4 with stats_row_level=on. This is > the only combination (# of cpu and stats_row_level) which has > problem - SMP + stats_row_level. > > The same test runs fine with one cpu (restarted machine with > /numproc=1) disregarding to stats_row_level option. > > Customer's application loads data into database and sometimes > process stopped, no cpu, no io activity. PgAdmin shows > current query is 'COMMIT'. > I tried to attach gdb to postgres and client processes, but > backtrace looks useless (see below). Running vacuum analyze > of this database in separate process cause loading process to > continue ! Weird. > > It's interesting, that there is no problem with 8.2beta1 in > all combinations ! Any idea what changes from 8.1.4 to > 8.2beta1 could affect the problem ? There is a new implementations of semaphores in 8.2. That could possibly be it. //Magnus
On Thu, 5 Oct 2006, Magnus Hagander wrote: >> Hi there, >> >> I'm looking into strange locking, which happens on WinXP SP2 >> SMP machine running 8.1.4 with stats_row_level=on. This is >> the only combination (# of cpu and stats_row_level) which has >> problem - SMP + stats_row_level. >> >> The same test runs fine with one cpu (restarted machine with >> /numproc=1) disregarding to stats_row_level option. >> >> Customer's application loads data into database and sometimes >> process stopped, no cpu, no io activity. PgAdmin shows >> current query is 'COMMIT'. >> I tried to attach gdb to postgres and client processes, but >> backtrace looks useless (see below). Running vacuum analyze >> of this database in separate process cause loading process to >> continue ! Weird. >> >> It's interesting, that there is no problem with 8.2beta1 in >> all combinations ! Any idea what changes from 8.1.4 to >> 8.2beta1 could affect the problem ? > > There is a new implementations of semaphores in 8.2. That could possibly > be it. I backported them to REL8_1_STABLE but it doesn't helped. Any other idea what to do, or how to debug the situation ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
> >> I'm looking into strange locking, which happens on WinXP SP2 SMP > >> machine running 8.1.4 with stats_row_level=on. This is the only > >> combination (# of cpu and stats_row_level) which has problem - > SMP + > >> stats_row_level. > >> > >> The same test runs fine with one cpu (restarted machine with > >> /numproc=1) disregarding to stats_row_level option. > >> > >> Customer's application loads data into database and sometimes > process > >> stopped, no cpu, no io activity. PgAdmin shows current query is > >> 'COMMIT'. > >> I tried to attach gdb to postgres and client processes, but > backtrace > >> looks useless (see below). Running vacuum analyze of this > database in > >> separate process cause loading process to continue ! Weird. > >> > >> It's interesting, that there is no problem with 8.2beta1 in all > >> combinations ! Any idea what changes from 8.1.4 to > >> 8.2beta1 could affect the problem ? > > > > There is a new implementations of semaphores in 8.2. That could > > possibly be it. > > I backported them to REL8_1_STABLE but it doesn't helped. Any other > idea what to do, or how to debug the situation ? Unfortunatly, the debugger support for mingw is absolutely horrible. But you can try process explorer from www.sysinternals.com and see if it'll give you a decent backtrace. Sometimes it works when others don't. Either that, or try the Visual Studio or Windows debuggers, they can usually at least show you if it's stuck waiting on something in the kernel. //Magnus
Didn't the stats communication process get redone for 8.2? Or atleast some time-out related stuff. Since the problem seems to be related to stats_row_level being on, I wonder if the problem might be in that sub-system. I am guessing that vacuum is pushing some more stats through, which might explain how that allows it to unfreeze. -rocco > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of > Magnus Hagander > Sent: Friday, October 06, 2006 6:00 AM > To: Oleg Bartunov > Cc: Pgsql Hackers > Subject: Re: [HACKERS] Win XP SP2 SMP locking (8.1.4) > > > > >> I'm looking into strange locking, which happens on WinXP SP2 SMP > > >> machine running 8.1.4 with stats_row_level=on. This is the only > > >> combination (# of cpu and stats_row_level) which has problem - > > SMP + > > >> stats_row_level. > > >> > > >> The same test runs fine with one cpu (restarted machine with > > >> /numproc=1) disregarding to stats_row_level option. > > >> > > >> Customer's application loads data into database and sometimes > > process > > >> stopped, no cpu, no io activity. PgAdmin shows current query is > > >> 'COMMIT'. > > >> I tried to attach gdb to postgres and client processes, but > > backtrace > > >> looks useless (see below). Running vacuum analyze of this > > database in > > >> separate process cause loading process to continue ! Weird. > > >> > > >> It's interesting, that there is no problem with 8.2beta1 in all > > >> combinations ! Any idea what changes from 8.1.4 to > > >> 8.2beta1 could affect the problem ? > > > > > > There is a new implementations of semaphores in 8.2. That could > > > possibly be it. > > > > I backported them to REL8_1_STABLE but it doesn't helped. Any other > > idea what to do, or how to debug the situation ? > > Unfortunatly, the debugger support for mingw is absolutely > horrible. But > you can try process explorer from www.sysinternals.com and > see if it'll > give you a decent backtrace. Sometimes it works when others don't. > Either that, or try the Visual Studio or Windows debuggers, they can > usually at least show you if it's stuck waiting on something in the > kernel. > > //Magnus > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > http://archives.postgresql.org