Thread: Out of Shared Memory: max_locks_per_transaction

Out of Shared Memory: max_locks_per_transaction

From
Eliot Gable
Date:
I have a PGSQL 9.0.1 database which is on the back-end of an app I was stress testing last night. This morning, when I try to run psql, I get:

psql: FATAL:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

I believe something in the app is failing to release resources -- maybe a 3-way or 4-way deadlock between writing to tables inside the triggers in PGSQL or a deadlock between multiple processes talking to the app and the database or something leaking in the app itself which is causing locks to not be freed. 

How do I track down what is going on if I cannot even run psql to get into the DB to run troubleshooting queries?

Thanks in advance for any suggestions.


--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

Re: Out of Shared Memory: max_locks_per_transaction

From
Merlin Moncure
Date:
On Fri, Nov 9, 2012 at 8:43 AM, Eliot Gable
<egable+pgsql-general@gmail.com> wrote:
> I have a PGSQL 9.0.1 database which is on the back-end of an app I was
> stress testing last night. This morning, when I try to run psql, I get:
>
> psql: FATAL:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction.
>
> I believe something in the app is failing to release resources -- maybe a
> 3-way or 4-way deadlock between writing to tables inside the triggers in
> PGSQL or a deadlock between multiple processes talking to the app and the
> database or something leaking in the app itself which is causing locks to
> not be freed.

most likely possibility you have a transaction being left open and
accumulating locks.   of course, you have to rule out the fact that
you simply have to increase max_locks_per_transaction: if you have a
lot of tables, it might be reasonable to have to extend this on a
stock config.

one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have.  that's another thing
to rule out.

> How do I track down what is going on if I cannot even run psql to get into
> the DB to run troubleshooting queries?

it's a nasty problem.  if shared memory is exhausted and stuck i think
the only practical think to do is to restart the database or nuking
attached clients. maybe try restarting the test, but keep an open
session *with an open transaction* that has previously queried both
pg_locks and pg_stat_activity.  it's just a hunch, but perhaps this
might allow you to query said views and gather some details.

merlin


Re: Out of Shared Memory: max_locks_per_transaction

From
Eliot Gable
Date:

most likely possibility you have a transaction being left open and
accumulating locks.   of course, you have to rule out the fact that
you simply have to increase max_locks_per_transaction: if you have a
lot of tables, it might be reasonable to have to extend this on a
stock config.


We allow 500 connections to the DB with 64 locks per transaction = 32,000 locks. During the stress testing, I had roughly 40 simultaneous operations going through the test application. The test application uses a separate set of threads for accessing the database along with a shared connection pool and a FIFO queue attached to each connection. Queries are mostly insert, update, and delete, so they are batched into transactions in blocks of 100 - 1000 queries per transaction. At the start of the transaction, a stored procedure is called which acquires locks on 8 tables in a specific order to prevent triggers on the associated tables from deadlocking with each other and with other things accessing the database. In total, there might be 15 tables (at most) touched by the batch of queries. 

Another process comes along and processes records which are being inserted into the database. It pulls up to 10 records from a table, processes them, and moves those records into a "processed" table. The processing of the records is rather complex. To facilitate the processing, 6 temporary tables are created during the processing of each record, and then dropped after that record is completed. 8 additional tables are accessed in some way during the processing of each record. Each call to the processing stored procedure is run in its own transaction and handles only those 10 records at a time. This is done to keep the length of the transaction short so it does not block other activity in the database.
 
one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have.  that's another thing
to rule out.

How would I rule this out?
 

> How do I track down what is going on if I cannot even run psql to get into
> the DB to run troubleshooting queries?

it's a nasty problem.  if shared memory is exhausted and stuck i think
the only practical think to do is to restart the database or nuking
attached clients. maybe try restarting the test, but keep an open
session *with an open transaction* that has previously queried both
pg_locks and pg_stat_activity.  it's just a hunch, but perhaps this
might allow you to query said views and gather some details.


That is an interesting suggestion. I will definitely give that a try.

Is the pg_locks table the table I would query to see what is eating up those 32,000 locks? Is there some other table or query I could run which might provide additional information about those 32,000 locks and who / what is using them?

Thanks for the info and your suggestions! 

Re: Out of Shared Memory: max_locks_per_transaction

From
Eliot Gable
Date:

Another process comes along and processes records which are being inserted into the database. It pulls up to 10 records from a table, processes them, and moves those records into a "processed" table. The processing of the records is rather complex. To facilitate the processing, 6 temporary tables are created during the processing of each record, and then dropped after that record is completed. 8 additional tables are accessed in some way during the processing of each record. Each call to the processing stored procedure is run in its own transaction and handles only those 10 records at a time. This is done to keep the length of the transaction short so it does not block other activity in the database.
 
one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have.  that's another thing
to rule out.

How would I rule this out?
 

It really was filling the locks table.

Using your suggestion, I managed to catch it in the process of the bad behavior, before it exhausted all lock entries. After some sleuthing through the resulting pg_locks output and my other code, I was able to isolate and resolve the issue. Basically, there was a call going on which tried to materialize a stats table based on thousands of records instead of 10 at a time. It was supposed to just be materializing the base rows in that table, all zeroed out, not based on any of the records. However, it does so using the same function which actually crunches numbers for the records, and it was coded to try all records from start of day until the function was run!

Thanks for the assist.

-Eliot

Re: Out of Shared Memory: max_locks_per_transaction

From
Merlin Moncure
Date:
On Fri, Nov 9, 2012 at 2:50 PM, Eliot Gable
<egable+pgsql-general@gmail.com> wrote:
>>> one thing that can cause this unfortunately is advisory locks eating
>>> up exactly the amount of shared memory you have.  that's another thing
>>> to rule out.
>>
>> How would I rule this out?
>
> It really was filling the locks table.
>
> Using your suggestion, I managed to catch it in the process of the bad
> behavior, before it exhausted all lock entries. After some sleuthing through
> the resulting pg_locks output and my other code, I was able to isolate and
> resolve the issue. Basically, there was a call going on which tried to
> materialize a stats table based on thousands of records instead of 10 at a
> time. It was supposed to just be materializing the base rows in that table,
> all zeroed out, not based on any of the records. However, it does so using
> the same function which actually crunches numbers for the records, and it
> was coded to try all records from start of day until the function was run!

awesome...glad I could help.  in the case of advisory locks, to help
reduce the likelihood of things like this happening, it's always
better to use the recently added 'xact' flavor of the functions that
release the lock at 'end of transaction' when possible.

merlin