Thread: Contention on LWLock buffer_content, due to SHARED lock(?)
Hi, today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical cores) lock contention on a buffer content lock due to taking of a SHARED lock (I think): Three tables were involved, simplified case: CREATE TABLE global_config (id BIGINT PRIMARY KEY); CREATE TABLE b ( id BIGINT PRIMARY KEY, config_id BIGINT REFERENCES global_config (id) ); CREATE TABLE c ( id BIGINT PRIMARY KEY, config_id BIGINT REFERENCES global_config (id) ); (I suppose having both b + c doesn't make a difference, but maybe it was relevant, so I'm including it.) Heavy INSERT + UPDATE traffic on b + c (all trivial via id), SELECTs on global_config (again by id). As the name suggests, there were only very few rows in global_config, specifically only one was referenced by all INSERT + UPDATEs on b + c. On lighter load, all three types of queries were taking <1ms (as expected), as load grew, all three went to ~50ms avg. execution time together. AWS RDS console showed wait on LWLock:buffer_content as the main contribution to that time. Checking the code, I concluded that I observed lock contention on the lock taken at the beginning of heap_lock_tuple, where an exclusive buffer content lock is held while recording the SHARE lock into the tuple and the WAL and the multiXact. I don't know the actual number, but potentially up to 7000 active transactions were holding a SHARE lock on that row, which could have performance implications while scanning for multiXact memberships. Semantically, all that lock traffic was superfluous, as the global_config row's key was in no danger of being changed. As this situation (some global, essentially static, entity is referenced by a much written table) seems not uncommon, I wonder: 1. Does the above analysis sound about right? 2. If so, would it be worthwhile to develop a solution? I was thinking along the lines of introducing an multiXact representation of "everyone": Instead of meticulously recording every locking + completing transaciton in a multiXact, after a certain number of transactions has accumulated in a single multiXact, it is approximated as "everyone". If later a transaction finds that a SHARE lock is held by "everyone", the tuple would need no further modification (not sure if this could even be checked without taking an exclusive buffer lock). The hard part would probably be to ensure that an attempt to obtain an EXCLUSIVE lock would finally succeed against a SHARE lock held by "everyone". Regards, Drahflow
Attachment
On Mon, Dec 9, 2019 at 5:10 PM Jens-Wolfhard Schicke-Uffmann <drahflow@gmx.de> wrote:
Hi,
today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):
What version of PostgreSQL are you using?
Cheers,
Jeff
Hi, On 2019-12-09 23:10:36 +0100, Jens-Wolfhard Schicke-Uffmann wrote: > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical > cores) lock contention on a buffer content lock due to taking of a > SHARED lock (I think): > Three tables were involved, simplified case: > > CREATE TABLE global_config (id BIGINT PRIMARY KEY); > > CREATE TABLE b ( > id BIGINT PRIMARY KEY, > config_id BIGINT REFERENCES global_config (id) > ); > > CREATE TABLE c ( > id BIGINT PRIMARY KEY, > config_id BIGINT REFERENCES global_config (id) > ); > > (I suppose having both b + c doesn't make a difference, but > maybe it was relevant, so I'm including it.) > > Heavy INSERT + UPDATE traffic on b + c (all trivial via id), > SELECTs on global_config (again by id). > As the name suggests, there were only very few rows in > global_config, specifically only one was referenced by all > INSERT + UPDATEs on b + c. > > On lighter load, all three types of queries were taking <1ms (as > expected), as load grew, all three went to ~50ms avg. execution time > together. AWS RDS console showed wait on LWLock:buffer_content as the > main contribution to that time. > > Checking the code, I concluded that I observed lock contention > on the lock taken at the beginning of heap_lock_tuple, where > an exclusive buffer content lock is held while recording the > SHARE lock into the tuple and the WAL and the multiXact. I don't know > the actual number, but potentially up to 7000 active > transactions were holding a SHARE lock on that row, which could have > performance implications while scanning for multiXact memberships. When you say "7000 active transactions" - do you mean to say that you have set max_connections to something higher than that, and you actually have that many concurrent transactions? > Semantically, all that lock traffic was superfluous, as the > global_config row's key was in no danger of being changed. Well, postgres can't know that. > As this situation (some global, essentially static, entity is referenced > by a much written table) seems not uncommon, I wonder: > > 1. Does the above analysis sound about right? Hard to know without additional data. > 2. If so, would it be worthwhile to develop a solution? Possible, but I'm not sure it's worth the complexity. I'd definitely like to see a proper reproducer and profile for this, before investigating further. > I was thinking along the lines of introducing an multiXact > representation of "everyone": Instead of meticulously recording every > locking + completing transaciton in a multiXact, after a certain > number of transactions has accumulated in a single multiXact, it is > approximated as "everyone". If later a transaction finds that a SHARE > lock is held by "everyone", the tuple would need no further modification I think the big problem with a strategy like this is that it's prone to generate deadlocks that aren't present in the "original" scheduling. > (not sure if this could even be checked without taking an exclusive > buffer lock). It should only require a share lock. > The hard part would probably be to ensure that an > attempt to obtain an EXCLUSIVE lock would finally succeed against a > SHARE lock held by "everyone". Note that this is a seriously complicated area of the code. It's very easy to create new bugs that aren't easily testable. I think we'd need a very convincing use-case for improvements around the problem you outline and relatively simple solution, to counter stability concerns. Greetings, Andres Freund
On 2019-Dec-10, Andres Freund wrote: > > The hard part would probably be to ensure that an > > attempt to obtain an EXCLUSIVE lock would finally succeed against a > > SHARE lock held by "everyone". > > Note that this is a seriously complicated area of the code. It's very > easy to create new bugs that aren't easily testable. I think we'd need a > very convincing use-case for improvements around the problem you outline > and relatively simple solution, to counter stability concerns. I'd rather have the ability to mark a table READ ONLY (or similar). Then any FK references can skip the row locks altogether. For the rare cases where you need to modify the referenced table, have it marked READ WRITE, and any row locks are registered normally from that point on, until you set it back to READ ONLY again. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Contention on LWLock buffer_content, due to SHARED lock(?)
From
Jens-Wolfhard Schicke-Uffmann
Date:
Hi, On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote: > > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical > > cores) lock contention on a buffer content lock due to taking of a > > SHARED lock (I think): > When you say "7000 active transactions" - do you mean to say that you > have set max_connections to something higher than that, and you actually > have that many concurrent transactions? Yes, max connections was 20000, active connections around 7000 at that time. Unfortunately, I don't have actual numbers of connections in transactions for that point in time. (We were trying to establish maximum performance of a larger system.) > > Semantically, all that lock traffic was superfluous, as the > > global_config row's key was in no danger of being changed. > Well, postgres can't know that. I am aware; it's just an argument for why it might be possible to shove some optimization there. > > 1. Does the above analysis sound about right? > Hard to know without additional data. What data would be worth recording next time? (Except number of active transactions, obviously.) > > 2. If so, would it be worthwhile to develop a solution? > Possible, but I'm not sure it's worth the complexity. > > I'd definitely like to see a proper reproducer and profile for this, > before investigating further. I'll see if and when I can include this into my client's project schedule. Might be a while, but I'll get back to you when I have a reproducer + profile data (of an up-to-date vanilla Postgres, not 10.7+AWS aurora patches). > I think we'd need a very convincing use-case for improvements around the problem > you outline. Understood. I'll try to get an iron-clad profile of the problematic case first. Regards, Drahflow
Attachment
Re: Contention on LWLock buffer_content, due to SHARED lock(?)
From
Jens-Wolfhard Schicke-Uffmann
Date:
Hi, On Tue, Dec 10, 2019 at 03:07:05PM -0300, Alvaro Herrera wrote: > I'd rather have the ability to mark a table READ ONLY (or similar). > Then any FK references can skip the row locks altogether. For the rare > cases where you need to modify the referenced table, have it marked READ > WRITE, and any row locks are registered normally from that point on, > until you set it back to READ ONLY again. However, that would require changes to applications writing to the table and a good understanding of performance characteristics by everyone trying to get to that scale. (OTOH, there is certainly an argument to be made that whoever hits this kind of problem better also has an idea of postgres performance tuning anyway.) More troubling (to me) is that I already know of another table in the system which should be next-in-line for the same problem, but only on some rows: It represents accounting entities, of which a very (nearly static) few are payment processors and all others are customers. From the application's perspective there's not too much difference between those, but any customer row will typically only be share locked once, whereas share locks on payment processor rows will be held by most of the transactions currently active. That use-case is not very uncommon I think, so it migth be worthwhile to implement a solution which does not require all rows of a table to share similar lock contention characteristics, or writability. Regards, Drahflow
Attachment
On 2019-Dec-10, Jens-Wolfhard Schicke-Uffmann wrote: > More troubling (to me) is that I already know of another table in the > system which should be next-in-line for the same problem, but only on > some rows: It represents accounting entities, of which a very (nearly > static) few are payment processors and all others are customers. From > the application's perspective there's not too much difference between > those, but any customer row will typically only be share locked once, > whereas share locks on payment processor rows will be held by most of > the transactions currently active. Well, you could partition that table. This probably means you'll need to improve Postgres implementation of PKs on partitioned tables, though. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-12-10 22:44:17 +0100, Jens-Wolfhard Schicke-Uffmann wrote: > On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote: > > > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical > > > cores) lock contention on a buffer content lock due to taking of a > > > SHARED lock (I think): > > When you say "7000 active transactions" - do you mean to say that you > > have set max_connections to something higher than that, and you actually > > have that many concurrent transactions? > Yes, max connections was 20000, active connections around 7000 at that > time. Unfortunately, I don't have actual numbers of connections in > transactions for that point in time. (We were trying to establish > maximum performance of a larger system.) I'd strongly recommend changing your architecture. There's *severe* overhead in that many concurrent active connections (and some in such a high max_connections setting). It's likely that you'd be much much better off by putting in a query pooler in front that limits active transaction to a significantly smaller number. There's only so many CPU cores, so at some point adding more concurrency just increases the overall amount of work that needs to be done (due to the overhead of managing concurrency and context switches). > > > 1. Does the above analysis sound about right? > > Hard to know without additional data. > What data would be worth recording next time? (Except number of > active transactions, obviously.) I think we'd need a CPU profile for starters. But that unfortunately won't be possible on RDS... Greetings, Andres Freund