Thread: Contention on LWLock buffer_content, due to SHARED lock(?)

Contention on LWLock buffer_content, due to SHARED lock(?)

From
Jens-Wolfhard Schicke-Uffmann
Date:
Hi,

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

Three tables were involved, simplified case:

CREATE TABLE global_config (id BIGINT PRIMARY KEY);

CREATE TABLE b (
  id BIGINT PRIMARY KEY,
  config_id BIGINT REFERENCES global_config (id)
);

CREATE TABLE c (
  id BIGINT PRIMARY KEY,
  config_id BIGINT REFERENCES global_config (id)
);

(I suppose having both b + c doesn't make a difference, but
maybe it was relevant, so I'm including it.)

Heavy INSERT + UPDATE traffic on b + c (all trivial via id),
SELECTs on global_config (again by id).
As the name suggests, there were only very few rows in
global_config, specifically only one was referenced by all
INSERT + UPDATEs on b + c.

On lighter load, all three types of queries were taking <1ms (as
expected), as load grew, all three went to ~50ms avg. execution time
together. AWS RDS console showed wait on LWLock:buffer_content as the
main contribution to that time.

Checking the code, I concluded that I observed lock contention
on the lock taken at the beginning of heap_lock_tuple, where
an exclusive buffer content lock is held while recording the
SHARE lock into the tuple and the WAL and the multiXact. I don't know
the actual number, but potentially up to 7000 active
transactions were holding a SHARE lock on that row, which could have
performance implications while scanning for multiXact memberships.


Semantically, all that lock traffic was superfluous, as the
global_config row's key was in no danger of being changed.

As this situation (some global, essentially static, entity is referenced
by a much written table) seems not uncommon, I wonder:

1. Does the above analysis sound about right?

2. If so, would it be worthwhile to develop a solution?
   I was thinking along the lines of introducing an multiXact
   representation of "everyone": Instead of meticulously recording every
   locking + completing transaciton in a multiXact, after a certain
   number of transactions has accumulated in a single multiXact, it is
   approximated as "everyone". If later a transaction finds that a SHARE
   lock is held by "everyone", the tuple would need no further modification
   (not sure if this could even be checked without taking an exclusive
   buffer lock). The hard part would probably be to ensure that an
   attempt to obtain an EXCLUSIVE lock would finally succeed against a
   SHARE lock held by "everyone".


Regards,
  Drahflow

Attachment

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Jeff Janes
Date:
On Mon, Dec 9, 2019 at 5:10 PM Jens-Wolfhard Schicke-Uffmann <drahflow@gmx.de> wrote:
Hi,

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

What version of PostgreSQL are you using?

Cheers,

Jeff

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Andres Freund
Date:
Hi,

On 2019-12-09 23:10:36 +0100, Jens-Wolfhard Schicke-Uffmann wrote:
> today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> cores) lock contention on a buffer content lock due to taking of a
> SHARED lock (I think):

> Three tables were involved, simplified case:
>
> CREATE TABLE global_config (id BIGINT PRIMARY KEY);
>
> CREATE TABLE b (
>   id BIGINT PRIMARY KEY,
>   config_id BIGINT REFERENCES global_config (id)
> );
>
> CREATE TABLE c (
>   id BIGINT PRIMARY KEY,
>   config_id BIGINT REFERENCES global_config (id)
> );
>
> (I suppose having both b + c doesn't make a difference, but
> maybe it was relevant, so I'm including it.)
>
> Heavy INSERT + UPDATE traffic on b + c (all trivial via id),
> SELECTs on global_config (again by id).
> As the name suggests, there were only very few rows in
> global_config, specifically only one was referenced by all
> INSERT + UPDATEs on b + c.
>
> On lighter load, all three types of queries were taking <1ms (as
> expected), as load grew, all three went to ~50ms avg. execution time
> together. AWS RDS console showed wait on LWLock:buffer_content as the
> main contribution to that time.
>
> Checking the code, I concluded that I observed lock contention
> on the lock taken at the beginning of heap_lock_tuple, where
> an exclusive buffer content lock is held while recording the
> SHARE lock into the tuple and the WAL and the multiXact. I don't know
> the actual number, but potentially up to 7000 active
> transactions were holding a SHARE lock on that row, which could have
> performance implications while scanning for multiXact memberships.

When you say "7000 active transactions" - do you mean to say that you
have set max_connections to something higher than that, and you actually
have that many concurrent transactions?


> Semantically, all that lock traffic was superfluous, as the
> global_config row's key was in no danger of being changed.

Well, postgres can't know that.


> As this situation (some global, essentially static, entity is referenced
> by a much written table) seems not uncommon, I wonder:
>
> 1. Does the above analysis sound about right?

Hard to know without additional data.


> 2. If so, would it be worthwhile to develop a solution?

Possible, but I'm not sure it's worth the complexity.

I'd definitely like to see a proper reproducer and profile for this,
before investigating further.


>    I was thinking along the lines of introducing an multiXact
>    representation of "everyone": Instead of meticulously recording every
>    locking + completing transaciton in a multiXact, after a certain
>    number of transactions has accumulated in a single multiXact, it is
>    approximated as "everyone". If later a transaction finds that a SHARE
>    lock is held by "everyone", the tuple would need no further modification

I think the big problem with a strategy like this is that it's prone to
generate deadlocks that aren't present in the "original" scheduling.


>    (not sure if this could even be checked without taking an exclusive
>    buffer lock).

It should only require a share lock.


>    The hard part would probably be to ensure that an
>    attempt to obtain an EXCLUSIVE lock would finally succeed against a
>    SHARE lock held by "everyone".

Note that this is a seriously complicated area of the code. It's very
easy to create new bugs that aren't easily testable. I think we'd need a
very convincing use-case for improvements around the problem you outline
and relatively simple solution, to counter stability concerns.


Greetings,

Andres Freund



Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Alvaro Herrera
Date:
On 2019-Dec-10, Andres Freund wrote:

> >    The hard part would probably be to ensure that an
> >    attempt to obtain an EXCLUSIVE lock would finally succeed against a
> >    SHARE lock held by "everyone".
> 
> Note that this is a seriously complicated area of the code. It's very
> easy to create new bugs that aren't easily testable. I think we'd need a
> very convincing use-case for improvements around the problem you outline
> and relatively simple solution, to counter stability concerns.

I'd rather have the ability to mark a table READ ONLY (or similar).
Then any FK references can skip the row locks altogether.  For the rare
cases where you need to modify the referenced table, have it marked READ
WRITE, and any row locks are registered normally from that point on,
until you set it back to READ ONLY again.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Jens-Wolfhard Schicke-Uffmann
Date:
Hi,

On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote:
> > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> > cores) lock contention on a buffer content lock due to taking of a
> > SHARED lock (I think):
> When you say "7000 active transactions" - do you mean to say that you
> have set max_connections to something higher than that, and you actually
> have that many concurrent transactions?
Yes, max connections was 20000, active connections around 7000 at that
time. Unfortunately, I don't have actual numbers of connections in
transactions for that point in time. (We were trying to establish
maximum performance of a larger system.)

> > Semantically, all that lock traffic was superfluous, as the
> > global_config row's key was in no danger of being changed.
> Well, postgres can't know that.
I am aware; it's just an argument for why it might be possible to
shove some optimization there.

> > 1. Does the above analysis sound about right?
> Hard to know without additional data.
What data would be worth recording next time? (Except number of
active transactions, obviously.)


> > 2. If so, would it be worthwhile to develop a solution?
> Possible, but I'm not sure it's worth the complexity.
>
> I'd definitely like to see a proper reproducer and profile for this,
> before investigating further.
I'll see if and when I can include this into my client's project
schedule. Might be a while, but I'll get back to you when I have
a reproducer + profile data (of an up-to-date vanilla Postgres,
not 10.7+AWS aurora patches).


> I think we'd need a very convincing use-case for improvements around the problem
> you outline.
Understood. I'll try to get an iron-clad profile of the problematic case
first.


Regards,
  Drahflow

Attachment

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Jens-Wolfhard Schicke-Uffmann
Date:
Hi,

On Tue, Dec 10, 2019 at 03:07:05PM -0300, Alvaro Herrera wrote:
> I'd rather have the ability to mark a table READ ONLY (or similar).
> Then any FK references can skip the row locks altogether.  For the rare
> cases where you need to modify the referenced table, have it marked READ
> WRITE, and any row locks are registered normally from that point on,
> until you set it back to READ ONLY again.
However, that would require changes to applications writing to the table
and a good understanding of performance characteristics by everyone
trying to get to that scale. (OTOH, there is certainly an argument to be
made that whoever hits this kind of problem better also has an idea of
postgres performance tuning anyway.)

More troubling (to me) is that I already know of another table in the
system which should be next-in-line for the same problem, but only on
some rows: It represents accounting entities, of which a very (nearly
static) few are payment processors and all others are customers. From
the application's perspective there's not too much difference between
those, but any customer row will typically only be share locked once,
whereas share locks on payment processor rows will be held by most of
the transactions currently active.

That use-case is not very uncommon I think, so it migth be worthwhile
to implement a solution which does not require all rows of a table to
share similar lock contention characteristics, or writability.


Regards,
  Drahflow

Attachment

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Alvaro Herrera
Date:
On 2019-Dec-10, Jens-Wolfhard Schicke-Uffmann wrote:

> More troubling (to me) is that I already know of another table in the
> system which should be next-in-line for the same problem, but only on
> some rows: It represents accounting entities, of which a very (nearly
> static) few are payment processors and all others are customers. From
> the application's perspective there's not too much difference between
> those, but any customer row will typically only be share locked once,
> whereas share locks on payment processor rows will be held by most of
> the transactions currently active.

Well, you could partition that table.  This probably means you'll need
to improve Postgres implementation of PKs on partitioned tables, though.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From
Andres Freund
Date:
Hi,

On 2019-12-10 22:44:17 +0100, Jens-Wolfhard Schicke-Uffmann wrote:
> On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote:
> > > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> > > cores) lock contention on a buffer content lock due to taking of a
> > > SHARED lock (I think):
> > When you say "7000 active transactions" - do you mean to say that you
> > have set max_connections to something higher than that, and you actually
> > have that many concurrent transactions?
> Yes, max connections was 20000, active connections around 7000 at that
> time. Unfortunately, I don't have actual numbers of connections in
> transactions for that point in time. (We were trying to establish
> maximum performance of a larger system.)

I'd strongly recommend changing your architecture. There's *severe*
overhead in that many concurrent active connections (and some in such a
high max_connections setting). It's likely that you'd be much much
better off by putting in a query pooler in front that limits active
transaction to a significantly smaller number. There's only so many CPU
cores, so at some point adding more concurrency just increases the
overall amount of work that needs to be done (due to the overhead of
managing concurrency and context switches).


> > > 1. Does the above analysis sound about right?
> > Hard to know without additional data.
> What data would be worth recording next time? (Except number of
> active transactions, obviously.)

I think we'd need a CPU profile for starters. But that unfortunately
won't be possible on RDS...

Greetings,

Andres Freund