Thread: Understanding max_locks_per_transaction
We're using PostgreSQL 13.10, installed on CentOS 7 from PGDG RPMs.
Most discussions regarding the lock table say that the size of the lock table determines how many locks can be held. The documentation for max_locks_per_transaction [3] reads slightly different though, and in particular this phrases stands out to me:
> no more than this many distinct objects can be locked at any one time
To me, that seems to be saying that multiple locks for the same object (e.g. for a single table) would only consume a single lock table entry. Finally on to my first question: Am I interpreting the documentation correctly, that multiple locks for the same object only consume a single lock table entry, or am I reading too much into this and the size of the lock table really does dictate the total number of locks regardless of whether those locks point to the same object? If my interpretation is correct, then I can size the lock table to be slightly above the total number of database objects and I should be safe.
Recently we've run into "out of shared memory" issues with a hint at increasing max_locks_per_transaction. The problem is well described in the PostgreSQL documentation and various blog posts found around the internet, and the solution is straightforward - touch fewer objects per transaction or increase the size of the lock table. The error occurs when joining partitioned tables, and changing the query structure is something we'd like to avoid, so we are going the route of increasing the size of the lock table by increasing max_locks_per_transaction.
Many blog posts suggest against using an arbitrarily large value for max_locks_per_transaction to avoid excess memory usage by the lock table. Contrary to that is an email from Tom Lane [1] indicating that a lock table with several million slots wouldn't be so bad. A SO answer from Laurenz Albe [2] tells us that a lock entry consumes 168 bytes so a table with 1m slots would consume somewhere around 160MB (assuming we are on the same architecture, but if not then at least still probably within an order of magnitude). I suppose I could set the max_locks_per_transaction value really high, but, on the other hand, if I can make a pretty good estimate about how many locks are actually needed then I can set a lower value and let some of that RAM be used for other things like disk caching instead. So here I am, trying to estimate the amount of locks that would actually be needed but have some questions and am running into some behavior that I do not expect.
> no more than this many distinct objects can be locked at any one time
To me, that seems to be saying that multiple locks for the same object (e.g. for a single table) would only consume a single lock table entry. Finally on to my first question: Am I interpreting the documentation correctly, that multiple locks for the same object only consume a single lock table entry, or am I reading too much into this and the size of the lock table really does dictate the total number of locks regardless of whether those locks point to the same object? If my interpretation is correct, then I can size the lock table to be slightly above the total number of database objects and I should be safe.
I tried to test my understanding by creating two database connections and in each connection issuing a query that touched a large number of tables (enough to almost reach the size of the lock table). My theory was that if multiple locks for the same object only consume a single lock table entry, then multiple connections could each issue the same big query that used a large number of locks and no issue would arise. For example, with the default settings (max_connections = 100, max_prepared_statements = 0, max_locks_per_transaction = 64) I should have a lock table with 6400 slots so two connections that both touch 6000 database objects should be able to run concurrently. That test succeeded, but led me to the next observation...
To my surprise, during my testing I was able to issue queries that used well above the maximum number of locks that I was expecting and those queries completed without issue. Below are the relevant database settings and the results of a statement that locks many objects.
archive=# show max_connections ;
max_connections
-----------------
100
(1 row)
archive=# show max_prepared_transactions ;
max_prepared_transactions
---------------------------
0
(1 row)
archive=# show max_locks_per_transaction ;
max_locks_per_transaction
---------------------------
64
(1 row)
archive=# begin;
BEGIN
archive=*# explain <large select statement that joins many partitions>
max_connections
-----------------
100
(1 row)
archive=# show max_prepared_transactions ;
max_prepared_transactions
---------------------------
0
(1 row)
archive=# show max_locks_per_transaction ;
max_locks_per_transaction
---------------------------
64
(1 row)
archive=# begin;
BEGIN
archive=*# explain <large select statement that joins many partitions>
<explain output omitted for brevity>
archive=*# select count(*) as total, count(*) filter (where granted) as granted from pg_locks;
total | granted
-------+---------
7568 | 7568
(1 row)
archive=*# select count(distinct relation) from pg_locks ;
count
-------
7576
(1 row)
total | granted
-------+---------
7568 | 7568
(1 row)
archive=*# select count(distinct relation) from pg_locks ;
count
-------
7576
(1 row)
According to the documentation, I should have a lock table with 6400 entries but somehow have been able to obtain 7576 locks. So my second and last question: how is this possible - shouldn't I have received an "out of shared memory" error since I exceeded 6400 distinct locked objects?
Craig
Craig McIlwee <craigm@vt.edu> writes: > Most discussions regarding the lock table say that the size of the lock > table determines how many locks can be held. The documentation for > max_locks_per_transaction [3] reads slightly different though, and in > particular this phrases stands out to me: >> no more than this many distinct objects can be locked at any one time > To me, that seems to be saying that multiple locks for the same object > (e.g. for a single table) would only consume a single lock table entry. > Finally on to my first question: Am I interpreting the documentation > correctly, that multiple locks for the same object only consume a single > lock table entry, Yes ... however it's a good deal more complicated than that. What actually happens under the hood is that we allocate enough shared memory space for (MaxBackends + max_prepared_transactions) * max_locks_per_transaction LOCK structs (which are the per-locked-object entries) and twice that many PROCLOCK structs, which are per-lock-per-holder information. The 2X multiplier assumes that on average about two sessions will be holding/requesting locks on any specific locked object. Now, MaxBackends is more than max_connections, because it also accounts for autovacuum workers, parallel workers, etc. So that's one of the sources of the fuzzy limit you noticed. The other source is that we allocate about 100K more shared memory space than we think we need, and it's possible for the lock tables to expand into that "slop" space. I've not checked the sizes of these structs lately, but the slop space could surely accommodate several hundred more locks than the initial estimate allows. Certainly it's safe to raise max_prepared_transactions a good deal on modern machines, but I'm not sure that you can reasonably get to a place where there is a mathematical guarantee that you won't run out of shared memory. Even if you know how many lockable objects your installation has (which I bet you don't, or at least the number isn't likely to hold still for long) it's pretty hard to say exactly how many PROCLOCK entries you might need. And bloating the lock table size by max_connections/2 or so to try to brute-force that doesn't seem like a good plan. I'd just raise max_prepared_transactions until you stop seeing problems, and then maybe add a factor of two safety margin. regards, tom lane
On Mon, Oct 16, 2023 at 2:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig McIlwee <craigm@vt.edu> writes:
> Most discussions regarding the lock table say that the size of the lock
> table determines how many locks can be held. The documentation for
> max_locks_per_transaction [3] reads slightly different though, and in
> particular this phrases stands out to me:
>> no more than this many distinct objects can be locked at any one time
> To me, that seems to be saying that multiple locks for the same object
> (e.g. for a single table) would only consume a single lock table entry.
> Finally on to my first question: Am I interpreting the documentation
> correctly, that multiple locks for the same object only consume a single
> lock table entry,
Yes ... however it's a good deal more complicated than that.
What actually happens under the hood is that we allocate enough shared
memory space for (MaxBackends + max_prepared_transactions) *
max_locks_per_transaction LOCK structs (which are the per-locked-object
entries) and twice that many PROCLOCK structs, which are
per-lock-per-holder information. The 2X multiplier assumes that on
average about two sessions will be holding/requesting locks on any
specific locked object.
I think that explains why I could successfully issue the same large statement from 2 concurrent transactions but then encounter an error when trying a third.
Now, MaxBackends is more than max_connections, because it also
accounts for autovacuum workers, parallel workers, etc. So that's
one of the sources of the fuzzy limit you noticed. The other source
is that we allocate about 100K more shared memory space than we think
we need, and it's possible for the lock tables to expand into that
"slop" space. I've not checked the sizes of these structs lately,
but the slop space could surely accommodate several hundred more
locks than the initial estimate allows.
OK, those 2 bits of fuzz look like they could increase the lock table size by at least 1000 entries, maybe even 2000, which explains how I could get ~7500 locks without an error. 100K / 168 = 609 extra slots. Assuming MaxBackends is 10 higher than max_connections (wild guess there, but maybe it's even higher?), then that's another 640 slots.
Even if you know how many lockable objects your installation has (which I bet you don't, or at least the number isn't likely to hold still for long)
Not exactly, but we have time based (monthly) partitioning and can make a reasonable estimate on how many years an installation will run so our guess wouldn't be too far off. My plan was to make an estimate then increase by 50% or maybe even 100% to account for new objects that could be added later.
it's pretty hard to say exactly how many PROCLOCK entries you might need.
Yeah, good point there, but I suppose our connection pool's relatively low max connection limit could be used as an upper bound.
And bloating the lock table size by max_connections/2 or so to try
to brute-force that doesn't seem like a good plan.
I'm not following - where does max_connections/2 come from?
I'd just raise max_prepared_transactions until you stop seeing
problems, and then maybe add a factor of two safety margin.
That's what we've already done for the short term solution. It is somewhat in conflict with your statement regarding the number of lockable objects not holding still for long, though. As time goes on and our scheduled jobs automatically create new monthly partitions, or as our schema evolves, we may eventually hit the limits again. That's why we'd like to create some formula that can estimate the max_locks_per_transaction value we should configure (with the previously mentioned multiplier for safety / future proofing). An alternative would be to precreate all partitions we anticipate needing so we don't get surprises down the line, but then we incur extra planning cost for tables that will stay empty for months or even years.
Craig
On 10/16/23 14:31, Craig McIlwee wrote: > That's what we've already done for the short term solution. It is > somewhat in conflict with your statement regarding the number of lockable > objects not holding still for long, though. As time goes on and our > scheduled jobs automatically create new monthly partitions, or as our > schema evolves, we may eventually hit the limits again. That's why we'd > like to create some formula that can estimate the > max_locks_per_transaction value we should configure (with the previously > mentioned multiplier for safety / future proofing). An alternative would > be to precreate all partitions we anticipate needing so we don't get > surprises down the line, but then we incur extra planning cost for tables > that will stay empty for months or even years. Just set max_locks_per_transaction to Something Big, and go on to other business. 16384 worked for us, after slowly inching up towards 1000. Tom's response let me not worry about setting "so big", and we haven't had any problems since. -- Born in Arizona, moved to Babylonia.