Thread: Unresolved Win32 bug reports

Unresolved Win32 bug reports

From
Bruce Momjian
Date:
Folks, my mailbox is filling with unresolved Win32 bug reports,
specifically:
integer divisionshared memorystatistics collectorrenamefsync

I have put the emails at the bottom of the patches_hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Unresolved Win32 bug reports

From
"Jim C. Nasby"
Date:
Here's one to add to the list: running pgbench with a moderately heavy
load on an SMP box likes to trigger a state where the database (or
pgbench) just stops doing work (CPU usage drops to nothing, as does disk
activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
one a 4 way), and a dual Opteron, all running the latest windows binary.
A 50 connection test running 1000 transactions is pretty much ensured to
fail.

I've been unable to produce the same behavior on a single-proc machine.

Please let me know if there's any more info that would be helpful.

On Thu, Apr 20, 2006 at 07:02:01AM -0400, Bruce Momjian wrote:
> Folks, my mailbox is filling with unresolved Win32 bug reports,
> specifically:
> 
>     integer division
>     shared memory
>     statistics collector
>     rename
>     fsync
> 
> I have put the emails at the bottom of the patches_hold queue:
> 
>     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
> 
> -- 
>   Bruce Momjian   http://candle.pha.pa.us
>   EnterpriseDB    http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Unresolved Win32 bug reports

From
Martijn van Oosterhout
Date:
On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
> Here's one to add to the list: running pgbench with a moderately heavy
> load on an SMP box likes to trigger a state where the database (or
> pgbench) just stops doing work (CPU usage drops to nothing, as does disk
> activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
> one a 4 way), and a dual Opteron, all running the latest windows binary.
> A 50 connection test running 1000 transactions is pretty much ensured to
> fail.

Well, this sounds like a dead-lock, the obvious step would be to
attached gdb to both and get a stack-trace...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Unresolved Win32 bug reports

From
"Jim C. Nasby"
Date:
On Thu, Apr 20, 2006 at 07:25:15PM +0200, Martijn van Oosterhout wrote:
> On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
> > Here's one to add to the list: running pgbench with a moderately heavy
> > load on an SMP box likes to trigger a state where the database (or
> > pgbench) just stops doing work (CPU usage drops to nothing, as does disk
> > activity). I've been able to repro this on 2 Intel boxes (one a 2 way,
> > one a 4 way), and a dual Opteron, all running the latest windows binary.
> > A 50 connection test running 1000 transactions is pretty much ensured to
> > fail.
> 
> Well, this sounds like a dead-lock, the obvious step would be to
> attached gdb to both and get a stack-trace...

Any pointers on how to get that setup? IS gdb part of the mingw runtime?

BTW, this appears to be readily reproducable, so it might be a lot more
productive for one of the windows hackers to test this themselves...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Unresolved Win32 bug reports

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
>> Here's one to add to the list: running pgbench with a moderately heavy
>> load on an SMP box likes to trigger a state where the database (or
>> pgbench) just stops doing work (CPU usage drops to nothing, as does disk
>> activity).

> Well, this sounds like a dead-lock, the obvious step would be to
> attached gdb to both and get a stack-trace...

Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
windows semaphore code?  It's clearly windows-specific since no one's
ever reported any such thing on Unixen.
        regards, tom lane


Re: Unresolved Win32 bug reports

From
"Magnus Hagander"
Date:
> > > pgbench) just stops doing work (CPU usage drops to
> nothing, as does
> > > disk activity). I've been able to repro this on 2 Intel
> boxes (one a
> > > 2 way, one a 4 way), and a dual Opteron, all running the
> latest windows binary.
> > > A 50 connection test running 1000 transactions is pretty much
> > > ensured to fail.
> >
> > Well, this sounds like a dead-lock, the obvious step would be to
> > attached gdb to both and get a stack-trace...
>
> Any pointers on how to get that setup? IS gdb part of the
> mingw runtime?

Yes. It's quite crappy compared to on unix though - I've never been able
to make it do the right thing all the way :-(


> BTW, this appears to be readily reproducable, so it might be
> a lot more productive for one of the windows hackers to test
> this themselves...

It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
ATM. Do you know if it's enough with hyperthreading?

//Magnus


Re: Unresolved Win32 bug reports

From
"Jim C. Nasby"
Date:
On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:
> It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
> ATM. Do you know if it's enough with hyperthreading?

Hrm... not sure. Let me see if I can find a box with HT here and test
it. Running the following batch file with arguments of 40 40 1000 is
almost guaranteed to trigger the problem, though...

@echo off
dropdb bench
createdb bench
pgbench -i -s %1 bench
pgbench -t %3 -c %2 -n bench
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Unresolved Win32 bug reports

From
"Larry Rosenman"
Date:
Jim C. Nasby wrote:
> On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:
>> It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
>> ATM. Do you know if it's enough with hyperthreading?
>
> Hrm... not sure. Let me see if I can find a box with HT here and test
> it. Running the following batch file with arguments of 40 40 1000 is
> almost guaranteed to trigger the problem, though...
>
> @echo off
> dropdb bench
> createdb bench
> pgbench -i -s %1 bench
> pgbench -t %3 -c %2 -n bench

It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.

:(

LER


--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com


Re: Unresolved Win32 bug reports

From
"Larry Rosenman"
Date:
Larry Rosenman wrote:
> Jim C. Nasby wrote:
>> On Thu, Apr 20, 2006 at 08:06:30PM +0200, Magnus Hagander wrote:
>>> It reuqires a multi-CPU box, right? I don't hav eone with pgwin32 on
>>> ATM. Do you know if it's enough with hyperthreading?
>>
>> Hrm... not sure. Let me see if I can find a box with HT here and test
>> it. Running the following batch file with arguments of 40 40 1000 is
>> almost guaranteed to trigger the problem, though...
>>
>> @echo off
>> dropdb bench
>> createdb bench
>> pgbench -i -s %1 bench
>> pgbench -t %3 -c %2 -n bench
>
> It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
>
> :(
>
> LER

I may have spoken too soon :(

More in a bit.

LER


--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com


Re: Unresolved Win32 bug reports

From
"Jim C. Nasby"
Date:
On Thu, Apr 20, 2006 at 02:17:35PM -0500, Larry Rosenman wrote:
> > It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
> > 
> > :(
> > 
> > LER
> 
> I may have spoken too soon :(

I took a look and in fact the machine was just disk bound, so it appears
that either HT doesn't exhibit this behavior, or XP doesn't exhibit it
(all the machines I produced the error on are running w2k3 server).

I'll try and pin down better exactly what hardware/software will
reproduce this. In the meantime, if anyone has any good info for getting
a dump of one of these processes...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Unresolved Win32 bug reports

From
"Bort, Paul"
Date:
Some of the SysInternals tools might be a start.

ProcessExplorer provides information about processes:
http://www.sysinternals.com/Utilities/ProcessExplorer.html

DebugView shows Debugging output (not sure if PG uses this):
http://www.sysinternals.com/Utilities/DebugView.html

Also, I haven't used it, but this looks like the Windows equivalent of
gdb:
http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx



> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jim C. Nasby
> Sent: Thursday, April 20, 2006 4:14 PM
> To: Larry Rosenman
> Cc: Magnus Hagander; Martijn van Oosterhout; Bruce Momjian;
> PostgreSQL-development
> Subject: Re: [HACKERS] Unresolved Win32 bug reports
>
> On Thu, Apr 20, 2006 at 02:17:35PM -0500, Larry Rosenman wrote:
> > > It seems to hang up just fine on my XPSP2, PG 8.1.2 HTT box.
> > >
> > > :(
> > >
> > > LER
> >
> > I may have spoken too soon :(
>
> I took a look and in fact the machine was just disk bound, so
> it appears
> that either HT doesn't exhibit this behavior, or XP doesn't exhibit it
> (all the machines I produced the error on are running w2k3 server).
>
> I'll try and pin down better exactly what hardware/software will
> reproduce this. In the meantime, if anyone has any good info
> for getting
> a dump of one of these processes...
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


Re: Unresolved Win32 bug reports

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
> Folks, my mailbox is filling with unresolved Win32 bug reports,
> specifically:
>
>     integer division
>     shared memory
>     statistics collector
>     rename
>     fsync
>
> I have put the emails at the bottom of the patches_hold queue:
>
>     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>

There's also a pg_config buglet that David Fetter found that still needs to
be fixed.

I am currently travelling on family business, but when I return home in a
couple of weeks will be working on getting my new machine built, and
installing a permanent Windows VM (among others), which will make it easier
for me to look at Windows issues within my realm of competence.

cheers

andrew




Re: Unresolved Win32 bug reports

From
"Qingqing Zhou"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
> >> Here's one to add to the list: running pgbench with a moderately heavy
> >> load on an SMP box likes to trigger a state where the database (or
> >> pgbench) just stops doing work (CPU usage drops to nothing, as does
disk
> >> activity).
>
> > Well, this sounds like a dead-lock, the obvious step would be to
> > attached gdb to both and get a stack-trace...
>
> Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
> windows semaphore code?  It's clearly windows-specific since no one's
> ever reported any such thing on Unixen.
>

I also suspect the EAGAIN error reports are related to the semaphore code.
So if possible, I suggest we patch the code and test it.

Regards,
Qingqing




Re: Unresolved Win32 bug reports

From
"Jim C. Nasby"
Date:
On Mon, Apr 24, 2006 at 10:23:07AM +0800, Qingqing Zhou wrote:
> 
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
> > Martijn van Oosterhout <kleptog@svana.org> writes:
> > > On Thu, Apr 20, 2006 at 12:17:07PM -0500, Jim C. Nasby wrote:
> > >> Here's one to add to the list: running pgbench with a moderately heavy
> > >> load on an SMP box likes to trigger a state where the database (or
> > >> pgbench) just stops doing work (CPU usage drops to nothing, as does
> disk
> > >> activity).
> >
> > > Well, this sounds like a dead-lock, the obvious step would be to
> > > attached gdb to both and get a stack-trace...
> >
> > Yeah, I wonder if it's related to that apparent bug Qingqing saw in the
> > windows semaphore code?  It's clearly windows-specific since no one's
> > ever reported any such thing on Unixen.
> >
> 
> I also suspect the EAGAIN error reports are related to the semaphore code.
> So if possible, I suggest we patch the code and test it.

There a patched build available for testing? (I'd rather not have to
figure out how to get windows builds working, unless there's some kind
of instructions somewhere...)
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Unresolved Win32 bug reports

From
"Qingqing Zhou"
Date:
""Jim C. Nasby"" <jnasby@pervasive.com> wrote
>
> There a patched build available for testing? (I'd rather not have to
> figure out how to get windows builds working, unless there's some kind
> of instructions somewhere...)
> -- 

Not yet - the patch is still pending.

Regards,
Qingqing