Thread: Re: Hanging queries on dual CPU windows

Re: Hanging queries on dual CPU windows

From
"Magnus Hagander"
Date:
> > > Could it be they broke it when they did that????
> >
> > In theory, yes, but it still seems a bit far fetched :-(
>
> Well, I rolled back SP1 and am running my test again. Looking
> much better, hasn't locked up in 45mins now, whereas before
> it would lock up within 5mins.
>
> So I think they broke something.

Wow. I guess I was lucky that I didn't say it was impossible :-)


But what really is happening. What other thread is actually holding the
critical section at this point, causing us to block? The only places it
gets held is while looping the signal queue, but it is released while
calling the signal function itself...

But they obviously *have* been messing with critical sections, so maybe
they accidentally changed something else as well...

What bothers me is that nobody else has reported this. It could be that
this was exposed by the changes to the signal handling done for 8.1, and
the ppl with this level of concurrency are either still on 8.0 or just
not on SP1 for their windows boxes yet... Do you have any other software
installed on the machine? That might possibly interfere in some way?

But let's have it run for a bit longer to confirm this does help. If so,
we could perhaps recode that part using a Mutex instead of a critical
section - since it's not a performance critical path, the difference
shouldn't be large. If I code up a patch for that, can you re-apply SP1
and test it? Or is this a production system you can't really touch?

//Magnus

Re: Hanging queries on dual CPU windows

From
Jan de Visser
Date:
On Friday 10 March 2006 13:25, Magnus Hagander wrote:
> > > > Could it be they broke it when they did that????
> > >
> > > In theory, yes, but it still seems a bit far fetched :-(
> >
> > Well, I rolled back SP1 and am running my test again. Looking
> > much better, hasn't locked up in 45mins now, whereas before
> > it would lock up within 5mins.
> >
> > So I think they broke something.
>
> Wow. I guess I was lucky that I didn't say it was impossible :-)
>
>
> But what really is happening. What other thread is actually holding the
> critical section at this point, causing us to block? The only places it
> gets held is while looping the signal queue, but it is released while
> calling the signal function itself...
>
> But they obviously *have* been messing with critical sections, so maybe
> they accidentally changed something else as well...
>
> What bothers me is that nobody else has reported this. It could be that
> this was exposed by the changes to the signal handling done for 8.1, and
> the ppl with this level of concurrency are either still on 8.0 or just
> not on SP1 for their windows boxes yet... Do you have any other software
> installed on the machine? That might possibly interfere in some way?

Just a JDK, JBoss, cygwin (running sshd), and a VNC Server. I don't think that
interferes.

>
> But let's have it run for a bit longer to confirm this does help.

I turned it off after 2.5hr. The longest I had to wait before, with less load,
was 1.45hr.

> If so,
> we could perhaps recode that part using a Mutex instead of a critical
> section - since it's not a performance critical path, the difference
> shouldn't be large. If I code up a patch for that, can you re-apply SP1
> and test it? Or is this a production system you can't really touch?

I can do whatever the hell I want with it, so if you could cook up a patch
that would be great.

As a BTW: I reinstalled SP1 and turned stats collection off. That also seems
to work, but is not really a solution since we want to use autovacuuming.

>
> //Magnus

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

Re: Hanging queries on dual CPU windows

From
Jan de Visser
Date:
On Friday 10 March 2006 14:27, Jan de Visser wrote:
> As a BTW: I reinstalled SP1 and turned stats collection off. That also
> seems to work, but is not really a solution since we want to use
> autovacuuming.

I lied. I hangs now. Just takes a lot longer...

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------