Thread: Assistance with an out of shared memory error

Assistance with an out of shared memory error

From
Gautam Bellary
Date:
Hi Postgres group,

I'm reaching out for some help with an "ERROR: out of shared memory. HINT: You might need to increase max_locks_per_transaction" issue:

I've got a PL/pgSQL function (regenerate_gateway_last_seen, attached) that loops through all partitions of 2 tables ("Measure" and "ValuelessMeasure", schema attached) selecting data from each into another table ("GatewayLastSeenNew"). Occasionally the function runs to completion, but usually it errors with the message copied below. I'd appreciate any advice to help understand why I'm seeing this and if increasing max_locks_per_transaction, changing another configuration value, or changing how the function works would improve reliability.

At the moment I have three theories as to why this might be happening:
1. Machine is out of memory - this seems unlikely because shared_buffers is '5400MB' and I don't see the machine take more of the available swap when I run the function.

2. max_locks_per_transaction is being hit in the transaction - this also seems unlikely because max_locks_per_transaction is set to the default value of 64, but there are ~8000 partitions to consider and I expect the resulting GatewayLastSeenNew table to have thousands of rows. If I was taking locks that would contribute towards that ceiling of 64 I'd expect this to fail every time, instead of failing often but not always as I observe.

3. The max size of the lock table is being exceeded - my understanding is that the lock table has room for max_locks_per_transaction * (max_connections + max_prepared_transactions) locks, which would be 64 * (200 + 0) = 12800 for my current configuration. I used 'SELECT COUNT(*) FROM PG_LOCKS' while the function was running and observe values as high as 21758, so if this is the issue it seems like I might not be estimating the max size of the lock table correctly or only specific locks contribute to that.

Thanks,
Gautam


Error:
SELECT regenerate_gateway_lastseen();
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.
CONTEXT:  SQL statement "INSERT INTO GatewayLastSeenNew(GatewayId, Time)
        SELECT GatewayId, MAX(Time) AS max_time
        FROM valuelessmeasure_39137
        GROUP BY GatewayId
      ON CONFLICT (GatewayId) DO UPDATE SET Time = EXCLUDED.Time WHERE GatewayLastSeenNew.GatewayId = EXCLUDED.GatewayId AND GatewayLastSeenNew.Time < EXCLUDED.Time;"
PL/pgSQL function regenerate_gateway_lastseen() line 37 at EXECUTE


SHOW max_locks_per_transaction;
 max_locks_per_transaction
---------------------------
 64
(1 row)

SHOW max_connections;
 max_connections
-----------------
 200
(1 row)

SHOW max_prepared_transactions;
 max_prepared_transactions
---------------------------
 0

SELECT COUNT(*) FROM PG_LOCKS;
 count
-------
 21758
(1 row)

SELECT COUNT(INHRELID::REGCLASS) FROM PG_CATALOG.PG_INHERITS WHERE INHPARENT = 'measure'::REGCLASS
 count
-------
  3672
(1 row)

SELECT COUNT(INHRELID::REGCLASS) FROM PG_CATALOG.PG_INHERITS WHERE INHPARENT = 'valuelessmeasure'::REGCLASS
 count
-------
  3672
(1 row)

Attachment

Re: Assistance with an out of shared memory error

From
Tom Lane
Date:
Gautam Bellary <gautam@pulsasensors.com> writes:
> I've got a PL/pgSQL function (regenerate_gateway_last_seen, attached) that
> loops through all partitions of 2 tables ("Measure" and "ValuelessMeasure",
> schema attached) selecting data from each into another table
> ("GatewayLastSeenNew"). Occasionally the function runs to completion, but
> usually it errors with the message copied below. I'd appreciate any advice
> to help understand why I'm seeing this and if increasing
> max_locks_per_transaction, changing another configuration value, or
> changing how the function works would improve reliability.

Yes, you ought to raise max_locks_per_transaction ...

> 2. max_locks_per_transaction is being hit in the transaction - this also
> seems unlikely because max_locks_per_transaction is set to the default
> value of 64, but there are ~8000 partitions to consider and I expect the
> resulting GatewayLastSeenNew table to have thousands of rows.

... or else reduce the number of partitions you're using.  (People
frequently seem to think that more partitions are free.  That is
extremely not true.  I generally think that if you're using more
than a few dozen partitions per table, you are making a mistake.)

> If I was
> taking locks that would contribute towards that ceiling of 64 I'd expect
> this to fail every time, instead of failing often but not always as I
> observe.

You misinterpret what that parameter does: it is not a hard per-session
limit.  This error appears when the shared lock-table pool overflows, so
you can (usually) take a lot more than 64 locks before running into
trouble.  It matters what else is going on in the database.

> 3. The max size of the lock table is being exceeded - my understanding is
> that the lock table has room for max_locks_per_transaction *
> (max_connections + max_prepared_transactions) locks, which would be 64 *
> (200 + 0) = 12800 for my current configuration. I used 'SELECT COUNT(*)
> FROM PG_LOCKS' while the function was running and observe values as high as
> 21758, so if this is the issue it seems like I might not be estimating the
> max size of the lock table correctly or only specific locks contribute to
> that.

I don't recall for sure, but I think that the lock table has one entry per
lockable object, while the pg_locks view shows separate entries for
different lock modes on the same object.

            regards, tom lane



Re: Assistance with an out of shared memory error

From
Gautam Bellary
Date:
Hi Tom,

This was very helpful, thanks for the tips.

Gautam

On Thu, Jan 13, 2022 at 3:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gautam Bellary <gautam@pulsasensors.com> writes:
> I've got a PL/pgSQL function (regenerate_gateway_last_seen, attached) that
> loops through all partitions of 2 tables ("Measure" and "ValuelessMeasure",
> schema attached) selecting data from each into another table
> ("GatewayLastSeenNew"). Occasionally the function runs to completion, but
> usually it errors with the message copied below. I'd appreciate any advice
> to help understand why I'm seeing this and if increasing
> max_locks_per_transaction, changing another configuration value, or
> changing how the function works would improve reliability.

Yes, you ought to raise max_locks_per_transaction ...

> 2. max_locks_per_transaction is being hit in the transaction - this also
> seems unlikely because max_locks_per_transaction is set to the default
> value of 64, but there are ~8000 partitions to consider and I expect the
> resulting GatewayLastSeenNew table to have thousands of rows.

... or else reduce the number of partitions you're using.  (People
frequently seem to think that more partitions are free.  That is
extremely not true.  I generally think that if you're using more
than a few dozen partitions per table, you are making a mistake.)

> If I was
> taking locks that would contribute towards that ceiling of 64 I'd expect
> this to fail every time, instead of failing often but not always as I
> observe.

You misinterpret what that parameter does: it is not a hard per-session
limit.  This error appears when the shared lock-table pool overflows, so
you can (usually) take a lot more than 64 locks before running into
trouble.  It matters what else is going on in the database.

> 3. The max size of the lock table is being exceeded - my understanding is
> that the lock table has room for max_locks_per_transaction *
> (max_connections + max_prepared_transactions) locks, which would be 64 *
> (200 + 0) = 12800 for my current configuration. I used 'SELECT COUNT(*)
> FROM PG_LOCKS' while the function was running and observe values as high as
> 21758, so if this is the issue it seems like I might not be estimating the
> max size of the lock table correctly or only specific locks contribute to
> that.

I don't recall for sure, but I think that the lock table has one entry per
lockable object, while the pg_locks view shows separate entries for
different lock modes on the same object.

                        regards, tom lane

Re: Assistance with an out of shared memory error

From
Mladen Gogala
Date:
On 1/13/22 18:35, Tom Lane wrote:
> .. or else reduce the number of partitions you're using.  (People
> frequently seem to think that more partitions are free.  That is
> extremely not true.  I generally think that if you're using more
> than a few dozen partitions per table, you are making a mistake.)

Interesting. What resources do partitions consume? Is it documented 
anywhere? Why do you think that more than a few dozen partitions is a 
mistake?

Thanks in advance for your explanation.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com




Re: Assistance with an out of shared memory error

From
Tom Lane
Date:
Mladen Gogala <gogala.mladen@gmail.com> writes:
> On 1/13/22 18:35, Tom Lane wrote:
>> .. or else reduce the number of partitions you're using.  (People
>> frequently seem to think that more partitions are free.  That is
>> extremely not true.  I generally think that if you're using more
>> than a few dozen partitions per table, you are making a mistake.)

> Interesting. What resources do partitions consume?

Locks, planning time, execution time, cache-management overhead.
To the extent that all of your queries against a table can be
"pruned" to only touch a few partitions, you might come out ahead on
execution time; but I think few workloads are completely stylized
like that.

The poster-child use case for partitions is where you can make
them fit a recurring bulk-deletion scenario; for instance, once
a month you want to drop the oldest month's worth of data.
If you've got terabytes of data then there's also something to
be said for being able to split up vacuuming and other background
overhead.  But I think a lot of people are way too eager to
apply partitioning where it doesn't fit, or to overuse it even
where it does fit.

            regards, tom lane