Thread: Anyone seen this kind of lock pileup?

Anyone seen this kind of lock pileup?

From
Josh Berkus
Date:
All,

Having an interesting issue on one 8.4 database.  Due to poor
application design, the application is requesting 8-15 exclusive
(update) locks on the same row on parallel connections pretty much
simultaneously (i.e. < 50ms apart).

What's odd about this is that the resulting "lock pileup" takes a
mysterious 2-3.5 seconds to clear, despite the fact that none of the
connections are *doing* anything during that time, nor are there
deadlock errors.  In theory at least, the locks should clear out in
reverse order in less than a second; none of the individual statements
takes more than 10ms to execute.

Has anyone else seen something like this?  Any idea what causes it?

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

Re: Anyone seen this kind of lock pileup?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Having an interesting issue on one 8.4 database.  Due to poor
> application design, the application is requesting 8-15 exclusive
> (update) locks on the same row on parallel connections pretty much
> simultaneously (i.e. < 50ms apart).

> What's odd about this is that the resulting "lock pileup" takes a
> mysterious 2-3.5 seconds to clear, despite the fact that none of the
> connections are *doing* anything during that time, nor are there
> deadlock errors.  In theory at least, the locks should clear out in
> reverse order in less than a second; none of the individual statements
> takes more than 10ms to execute.

Hmm ... can you extract a test case?  Or at least strace the backends
involved?

            regards, tom lane

Re: Anyone seen this kind of lock pileup?

From
Josh Berkus
Date:
> Hmm ... can you extract a test case?  Or at least strace the backends
> involved?

No, and no.  Strace was the first thing I thought of, but I'd have to
somehow catch one of these backends in the 3 seconds it's locked.  Not
really feasible.

It might be possible to construct a test case, depending on how much the
user wants to spend on the problem.  I'd estimate that a test case would
take 8-12 hours of my time to get working, given the level of activity
and concurrency required.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Anyone seen this kind of lock pileup?

From
Ivan Voras
Date:
On 11/17/10 18:37, Josh Berkus wrote:
> All,
>
> Having an interesting issue on one 8.4 database. Due to poor application
> design, the application is requesting 8-15 exclusive (update) locks on
> the same row on parallel connections pretty much simultaneously (i.e. <
> 50ms apart).
>
> What's odd about this is that the resulting "lock pileup" takes a
> mysterious 2-3.5 seconds to clear, despite the fact that none of the
> connections are *doing* anything during that time, nor are there
> deadlock errors. In theory at least, the locks should clear out in
> reverse order in less than a second; none of the individual statements
> takes more than 10ms to execute.

Just a random guess: a timeout-supported livelock? (of course if there
is any timeout-and-retry protocol going on and the timeout intervals are
non-randomized).


Re: Anyone seen this kind of lock pileup?

From
Josh Berkus
Date:
>> What's odd about this is that the resulting "lock pileup" takes a
>> mysterious 2-3.5 seconds to clear, despite the fact that none of the
>> connections are *doing* anything during that time, nor are there
>> deadlock errors.  In theory at least, the locks should clear out in
>> reverse order in less than a second; none of the individual statements
>> takes more than 10ms to execute.

Ok, I've collected more data.   Looks like the case I was examining was
idiosyncratic; most of these lock pile-ups involve 400 or more locks
waiting held by around 20 different backends.  Given this, taking 3
seconds to sort that all out doesn't seem that unreasonable.
Presumably there's a poll cycle of some sort for waiting statements?

Anyway, the obvious answer is for the user to fix their application.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Anyone seen this kind of lock pileup?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Ok, I've collected more data.   Looks like the case I was examining was
> idiosyncratic; most of these lock pile-ups involve 400 or more locks
> waiting held by around 20 different backends.  Given this, taking 3
> seconds to sort that all out doesn't seem that unreasonable.
> Presumably there's a poll cycle of some sort for waiting statements?

No ... but if the lock requests were mutually exclusive, I could believe
it taking 3 seconds for all of the waiting backends to get their turn
with the lock, do whatever they were gonna do, commit, and release the
lock to the next guy.

> Anyway, the obvious answer is for the user to fix their application.

Probably.

            regards, tom lane