Thread: Lock pileup causes server to stall

Lock pileup causes server to stall

From

Josh Berkus

Date:

10 November 2014, 19:50:32

All,

pg version: 9.3.5
RHEL 6.5
128GB/32 cores
Configured with shared_buffers=16GB
Java/Tomcat/JDBC application

Server has an issue that whenever we get lock waits (transaction lock
waits, usually on an FK dependancy) lasting over a minute or more than
10 at once, *all* queries on the server slow to a crawl, taking 100X to
400X normal execution times.

Other info:
* This applies even to queries which are against other databases, so
it's not purely a lock blocking issue.
* this database routinely has a LOT of lock conlicts, churning through 1
million multixacts per day
* pgBouncer is also involved in this stack, and may be contributing to
the problem in some way
* at no time is the DB server out of CPU (max usage = 38%), RAM, or
doing major IO (max %util = 22%).
* BIND statements can be slow as well as EXECUTEs.

I don't have full query logs from a stall period yet, so I'll have more
information when I do: for example, is it ALL queries which are slow or
just some of them?

However, I thought this list would have some other ideas where to look.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Lock pileup causes server to stall

From

Alvaro Herrera

Date:

10 November 2014, 21:41:40

Josh Berkus wrote:
> All,
>
> pg version: 9.3.5
> RHEL 6.5
> 128GB/32 cores
> Configured with shared_buffers=16GB
> Java/Tomcat/JDBC application
>
> Server has an issue that whenever we get lock waits (transaction lock
> waits, usually on an FK dependancy) lasting over a minute or more than
> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
> 400X normal execution times.

Current FK checking makes you wait if the referenced tuple is modified
on any indexed column, not just those that are actually used in
foreign keys.  Maybe this case would be sped up if we optimized that.

> * This applies even to queries which are against other databases, so
> it's not purely a lock blocking issue.

Oh.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Lock pileup causes server to stall

From

Josh Berkus

Date:

11 November 2014, 17:11:09

On 11/10/2014 01:40 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> All,
>>
>> pg version: 9.3.5
>> RHEL 6.5
>> 128GB/32 cores
>> Configured with shared_buffers=16GB
>> Java/Tomcat/JDBC application
>>
>> Server has an issue that whenever we get lock waits (transaction lock
>> waits, usually on an FK dependancy) lasting over a minute or more than
>> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
>> 400X normal execution times.
>
> Current FK checking makes you wait if the referenced tuple is modified
> on any indexed column, not just those that are actually used in
> foreign keys.  Maybe this case would be sped up if we optimized that.
>
>> * This applies even to queries which are against other databases, so
>> it's not purely a lock blocking issue.
>
> Oh.

Yeah, I think this is more likely a problem with the general lock table
and shared_buffers than anything to do with actual lock-blocks.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Lock pileup causes server to stall

From

Jeff Janes

Date:

11 November 2014, 17:33:17

On Tue, Nov 11, 2014 at 9:11 AM, Josh Berkus <josh@agliodbs.com> wrote:

On 11/10/2014 01:40 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> All,
>>
>> pg version: 9.3.5
>> RHEL 6.5
>> 128GB/32 cores
>> Configured with shared_buffers=16GB
>> Java/Tomcat/JDBC application
>>
>> Server has an issue that whenever we get lock waits (transaction lock
>> waits, usually on an FK dependancy) lasting over a minute or more than
>> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
>> 400X normal execution times.
>
> Current FK checking makes you wait if the referenced tuple is modified
> on any indexed column, not just those that are actually used in
> foreign keys. Maybe this case would be sped up if we optimized that.
>
>> * This applies even to queries which are against other databases, so
>> it's not purely a lock blocking issue.
>
> Oh.

Yeah, I think this is more likely a problem with the general lock table
and shared_buffers than anything to do with actual lock-blocks.

Any chance you can run 'perf record -a' on it?

Cheers,

Jeff

Re: Lock pileup causes server to stall

From

Jesper Krogh

Date:

12 November 2014, 06:49:51

> On 10/11/2014, at 22.40, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Josh Berkus wrote:
>> All,
>>
>> pg version: 9.3.5
>> RHEL 6.5
>> 128GB/32 cores
>> Configured with shared_buffers=16GB
>> Java/Tomcat/JDBC application
>>
>> Server has an issue that whenever we get lock waits (transaction lock
>> waits, usually on an FK dependancy) lasting over a minute or more than
>> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
>> 400X normal execution times.
>
> Current FK checking makes you wait if the referenced tuple is modified
> on any indexed column, not just those that are actually used in
> foreign keys.  Maybe this case would be sped up if we optimized that.

Even if it is an gin index that is being modified?   seems like a harsh limitation to me.

Jesper

Re: Lock pileup causes server to stall

From

Alvaro Herrera

Date:

12 November 2014, 13:51:17

Jesper Krogh wrote:
>
> > On 10/11/2014, at 22.40, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > Josh Berkus wrote:
> >> All,
> >>
> >> pg version: 9.3.5
> >> RHEL 6.5
> >> 128GB/32 cores
> >> Configured with shared_buffers=16GB
> >> Java/Tomcat/JDBC application
> >>
> >> Server has an issue that whenever we get lock waits (transaction lock
> >> waits, usually on an FK dependancy) lasting over a minute or more than
> >> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
> >> 400X normal execution times.
> >
> > Current FK checking makes you wait if the referenced tuple is modified
> > on any indexed column, not just those that are actually used in
> > foreign keys.  Maybe this case would be sped up if we optimized that.
>
> Even if it is an gin index that is being modified?   seems like a harsh limitation to me.

Well, as I recall it's only unique indexes, so it's not *that* harsh.

Anyway, the fklocks patch was stupidly complex (and still got much stuff
wrong).  I didn't want to add more ground to objections by additionally
breaking the abstraction between heapam and the concept of "columns
referenced by a foreign key constraint".  So it was discussed and
decided we'd leave that for future improvement.  Patches are welcome,
particularly if they come from the future.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Lock pileup causes server to stall

From

Josh Berkus

Date:

12 November 2014, 17:04:13

On 11/12/2014 05:51 AM, Alvaro Herrera wrote:
> Anyway, the fklocks patch was stupidly complex (and still got much stuff
> wrong).  I didn't want to add more ground to objections by additionally
> breaking the abstraction between heapam and the concept of "columns
> referenced by a foreign key constraint".

Oh, come on.  We had hardly any problems with that patch!

;-)

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Lock pileup causes server to stall

From

Jesper Krogh

Date:

13 November 2014, 06:54:03

>>>>
>>>
>>> Current FK checking makes you wait if the referenced tuple is modified
>>> on any indexed column, not just those that are actually used in
>>> foreign keys.  Maybe this case would be sped up if we optimized that.
>>
>> Even if it is an gin index that is being modified?   seems like a harsh limitation to me.
>
> Well, as I recall it's only unique indexes, so it's not *that* harsh.
>
Sounds good.   Indices are there for all kinds of reasons, unique ones are more related to referential integrity, so
evennot 100% accurate, at least 90% of the way in my world. 

We do have an "star"-schema in the db with some amount of information needed in the center that needs updates, apart
fromthat a massive update activity on the sorrounding columns, locks on the center entity has quite high impact on the
sorroundingupdates. (9.2 moving to 9.3 reallly soon and looking forward for this enhancement. 

Jesper