Re: Hanging queries on dual CPU windows

From: Magnus Hagander
Subject: Re: Hanging queries on dual CPU windows
Date: ,
Msg-id: 6BCB9D8A16AC4241919521715F4D8BCEA35109@algol.sollentuna.se
(view: Whole thread, Raw)
Responses: Re: Hanging queries on dual CPU windows  (Jan de Visser)
List: pgsql-performance

Tree view

Re: Hanging queries on dual CPU windows  ("Magnus Hagander", )
 Re: Hanging queries on dual CPU windows  (Jan de Visser, )
  one-field index vs. multi-field index planner estimates  (Evgeny Gridasov, )
   Re: one-field index vs. multi-field index planner estimates  (Tom Lane, )
    Re: one-field index vs. multi-field index planner  (Evgeny Gridasov, )
     Re: one-field index vs. multi-field index planner  (Tom Lane, )

> > > >  I dunno
> > > >
> > > > > if you've got anything gdb-equivalent under Windows,
> but that's
> > > > > the first thing I'd be interested in ...
> > > >
> > > > Here ya go:
> > > >
> > > > http://www.devisser-siderius.com/stack1.jpg
> > > > http://www.devisser-siderius.com/stack2.jpg
> > > > http://www.devisser-siderius.com/stack3.jpg
> > > >
> > > > There are three threads in the process. I guess thread 1
> > > > (stack1.jpg) is the most interesting.
> > > >
> > > > I also noted that cranking up concurrency in my app
> reproduces the
> > > > problem in about 4 minutes ;-)
> >
> > Just reproduced again.
> >
> > > Actually, stack2 looks very interesting. Does it "stay stuck" in
> > > pg_queue_signal? That's really not supposed to happen.
> >
> > Yes it does.
>
> An update on that: There is actually *two* processes in this
> state, both hanging in pg_queue_signal. I've looked at the
> source of that, and the obvious candidate for hanging is
> EnterCriticalSection. I also found this:
>
> http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx
>
> where they say:
>
> "
> In addition, for Windows 2003, SP1, the EnterCriticalSection
> API has a subtle change that's intended tor resolve many of
> the lock convoy issues.  Before
> Win2003 SP1, if 10 threads were blocked on
> EnterCriticalSection and all 10 threads had the same
> priority, then EnterCriticalSection would service those
> threads in a FIFO (first -in, first-out) basis.  Starting in
> Windows 2003 SP1, the EnterCriticalSection will wake up a
> random thread from the waiting threads.  If all the threads
> are doing the same thing (like a thread pool) this won't make
> much of a difference, but if the different threads are doing
> different work (like the critical section protecting a widely
> accessed object), this will go a long way towards removing
> lock convoy semantics.
> "
>
> Could it be they broke it when they did that????

In theory, yes, but it still seems a bit far fetched :-(

If you have the env to rebuild, can you try changing the order of the lines:
    ResetEvent(pgwin32_signal_event);
    LeaveCriticalSection(&pg_signal_crit_sec);

in backend/port/win32/signal.c


And if not, can you also try disabling the stats collector and see if that makes a difference. (Could be a
workaround..)


//Magnus


pgsql-performance by date:

From: Jan de Visser
Date:
Subject: Re: Hanging queries on dual CPU windows
From: "Marc Morin"
Date:
Subject: Re: Trouble managing planner for timestamptz columns