Thread: query against pg_locks leads to large memory alloc
Hi, I have a tool that is trying to collect stats from postgres (v9.1.13). postgres attempts to allocate more memory than is allowed: SELECT mode, count(mode) AS count FROM pg_locks GROUP BY mode ORDER BY mode; ERROR: invalid memory alloc request size 1459291560 Memory-related configs from the server: shared_buffers = 10000MB work_mem = 15MB maintenance_work_mem = 400MB effective_cache_size = 50000MB max_locks_per_transaction = 9000 max_pred_locks_per_transaction = 40000 The machine is running CentOS 6, a 32-core AMD 6276 processor, and is configured with 64GB of memory. Transparent Huge Pages are disabled :-) Thanks in advance for your time and expertise. Dave Owens
On Mon, Aug 18, 2014 at 6:01 PM, Dave Owens <dave@teamunify.com> wrote:
max_locks_per_transaction = 9000
max_pred_locks_per_transaction = 40000
Do you really need such large values? What is your max_connections value?
Regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
On Mon, Aug 18, 2014 at 4:21 PM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote: > > On Mon, Aug 18, 2014 at 6:01 PM, Dave Owens <dave@teamunify.com> wrote: >> >> max_locks_per_transaction = 9000 >> max_pred_locks_per_transaction = 40000 performance of any query to pg_locks is proportional to the setting of max_locks_per_transaction. still, something is awry here. can you 'explain' that query? also, what's the answer you get when: SELECT COUNT(*) from pg_locks; ? merlin
On Mon, Aug 18, 2014 at 2:21 PM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote: > Do you really need such large values? What is your max_connections value? max_connections = 450 ...we have found that we run out of shared memory when max_pred_locks_per_transaction is less than 30k. On Mon, Aug 18, 2014 at 2:29 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > performance of any query to pg_locks is proportional to the setting of > max_locks_per_transaction. still, something is awry here. can you > 'explain' that query? tudb=# explain SELECT mode, count(mode) AS count FROM pg_locks GROUP BY mode ORDER BY mode; QUERY PLAN ------------------------------------------------------------------------------------- Sort (cost=0.63..0.65 rows=200 width=32) Sort Key: l.mode -> HashAggregate (cost=0.30..0.32 rows=200 width=32) -> Function Scan on pg_lock_status l (cost=0.00..0.10 rows=1000 width=32) (4 rows) > SELECT COUNT(*) from pg_locks; ERROR: invalid memory alloc request size 1562436816
Dave Owens <dave@teamunify.com> wrote: > max_connections = 450 ...we have found that we run out of shared > memory when max_pred_locks_per_transaction is less than 30k. >> SELECT COUNT(*) from pg_locks; > > ERROR: invalid memory alloc request size 1562436816 It gathers the information in memory to return for all those locks (I think both the normal heavyweight locks and the predicate locks do that). 450 * 30000 is 13.5 million predicate locks you could have, so they don't need a very big structure per lock to start adding up. I guess we should refactor that to use a tuplestore, so it can spill to disk when it gets to be more than work_mem. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Dave Owens <dave@teamunify.com> wrote: >> max_connections = 450 ...we have found that we run out of shared >> memory when max_pred_locks_per_transaction is less than 30k. > It gathers the information in memory to return for all those locks > (I think both the normal heavyweight locks and the predicate locks > do that).� 450 * 30000 is 13.5 million predicate locks you could > have, so they don't need a very big structure per lock to start > adding up.� I guess we should refactor that to use a tuplestore, so > it can spill to disk when it gets to be more than work_mem. Seems to me the bigger issue is why does he need such a huge max_pred_locks_per_transaction setting? It's hard to believe that performance wouldn't tank with 10 million predicate locks active. Whether you can do "select * from pg_locks" seems pretty far down the list of concerns about this setting. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Dave Owens <dave@teamunify.com> wrote: >>> max_connections = 450 ...we have found that we run out of shared >>> memory when max_pred_locks_per_transaction is less than 30k. > >> It gathers the information in memory to return for all those locks >> (I think both the normal heavyweight locks and the predicate locks >> do that). 450 * 30000 is 13.5 million predicate locks you could >> have, so they don't need a very big structure per lock to start >> adding up. I guess we should refactor that to use a tuplestore, so >> it can spill to disk when it gets to be more than work_mem. > > Seems to me the bigger issue is why does he need such a huge > max_pred_locks_per_transaction setting? It's hard to believe that > performance wouldn't tank with 10 million predicate locks active. > Whether you can do "select * from pg_locks" seems pretty far down > the list of concerns about this setting. It would be interesting to know more about the workload which is capable of that, but it would be a lot easier to analyze what's going on if we could look at where those locks are being used (in summary, of course -- nobody can make sense of 10 million detail lines). About all I can think to ask at this point is: how many total tables and indexes are there in all databases in this cluster (counting each partition of a partitioned table as a separate table)? With the promotion of finer-grained locks to courser ones this should be pretty hard to hit without a very large number of tables. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Kevin, Looking at pg_stat_all_tables and pg_stat_all_indexes on our four databases we have: 1358 tables 1808 indexes The above totals do not include template1, template0, or postgres databases. We do not use partitioned tables. Only one database has a meaningful level of concurrency (New Relic reports about 30k calls per minute, from our main webapp). That database alone consists of 575 tables and 732 indexes. Dave Owens
Dave Owens <dave@teamunify.com> wrote: > 1358 tables > 1808 indexes Hmm, that's not outrageous. How about long-running transactions? Please check pg_stat_activity and pg_prepared_xacts for xact_start or prepared (respectively) values older than a few minutes. Since predicate locks may need to be kept until an overlapping transaction completes, a single long-running transaction can bloat the lock count. Also, could you show use the output from?: SELECT version(); -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 19, 2014 at 9:40 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Hmm, that's not outrageous. How about long-running transactions? > Please check pg_stat_activity and pg_prepared_xacts for xact_start > or prepared (respectively) values older than a few minutes. Since > predicate locks may need to be kept until an overlapping > transaction completes, a single long-running transaction can bloat > the lock count. I do see a handful of backends that like to stay IDLE in transaction for minutes at a time. We are refactoring the application responsible for these long IDLE times, which will hopefully reduce the duration of their connections. # select backend_start, xact_start, query_start, waiting, current_query from pg_stat_activity where xact_start < now() - interval '3 minutes'; backend_start | xact_start | query_start | waiting | current_query -------------------------------+-------------------------------+-------------------------------+---------+----------------------- 2014-08-19 09:48:00.398498-07 | 2014-08-19 09:49:19.157478-07 | 2014-08-19 10:03:04.99303-07 | f | <IDLE> in transaction 2014-08-19 09:38:00.493924-07 | 2014-08-19 09:53:47.00614-07 | 2014-08-19 10:03:05.003496-07 | f | <IDLE> in transaction (2 rows) ... now() was 2014-08-19 10:03 in the above query. I do not see anything in pg_prepared_xacts, we do not use two-phase commit. > Also, could you show use the output from?: > > SELECT version(); version --------------------------------------------------------------------------------------------------------------- PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit (1 row)
On 2014-08-18 14:36:52 -0700, Dave Owens wrote: > On Mon, Aug 18, 2014 at 2:21 PM, Matheus de Oliveira > <matioli.matheus@gmail.com> wrote: > > Do you really need such large values? What is your max_connections value? > > max_connections = 450 ...we have found that we run out of shared > memory when max_pred_locks_per_transaction is less than 30k. What was the precise error message when that happened? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi Andres, On Tue, Aug 19, 2014 at 10:17 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> max_connections = 450 ...we have found that we run out of shared >> memory when max_pred_locks_per_transaction is less than 30k. > > What was the precise error message when that happened? 2014-07-31 15:00:25 PDT 53dabbea.29c7ERROR: 53200: out of shared memory 2014-07-31 15:00:25 PDT 53dabbea.29c7HINT: You might need to increase max_pred_locks_per_transaction. 2014-07-31 15:00:25 PDT 53dabbea.29c7LOCATION: CreatePredicateLock, predicate.c:2247 2014-07-31 15:00:25 PDT 53dabbea.29c7STATEMENT: SELECT member_id, SUM(credit_quarters) FROM ondeck_tallies_x WHERE team_id = $1 AND credit_quarters > 0 AND EXTRACT(day from current_timestamp - dt_attendance_taken) <= $2 GROUP BY member_id
I wonder if it would be helpful to restart the database, then begin gathering information pg_locks while it can still respond to queries. I speculate that this is possible because the amount of memory needed to query pg_locks continues to grow (around 1900MB now). Dave Owens
Dave Owens <dave@teamunify.com> wrote: > I do see a handful of backends that like to stay IDLE in > transaction for minutes at a time. We are refactoring the > application responsible for these long IDLE times, which will > hopefully reduce the duration of their connections. That may help some. Other things to consider: - If you can use a connection pooler in transaction mode to reduce the number of active connections you may be able to improve performance all around, and dodge this problem in the process. Very few systems can make efficient use of hundreds of concurrent connections, but for various reasons fixing that with a connection pooler is sometimes difficult. - If you have transactions (or SELECT statements that you run outside of explicit transactions) which you know will not be modifying any data, flagging them as READ ONLY will help contain the number of predicate locks and will help overall performance. (If the SELECT statements are not in explicit transactions, you may have to put them in one to allow the READ ONLY property to be set, or set default_transaction_read_only in the session to accomplish this.) - Due to the heuristics used for thresholds for combining fine-grained locks into coarser ones, you might be able to work around this by boosting max_connections above the number you are going to use. Normally when you increase max_pred_locks_per_transaction it increases the number of page locks it will allow in a table or index before it combines them into a relation lock; increasing max_connections doesn't affect the granularity promotion threshold, but it increases the total number of predicate locks allowed, so if you boost that and reduce max_pred_locks_per_transaction in proportion, you may be able to dodge the problem. It's an ugly workaround, but it might get you into better shape. If that does work, it's good evidence that we should tweak those heuristics. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Dave Owens <dave@teamunify.com> wrote: > I wonder if it would be helpful to restart the database, then begin > gathering information pg_locks while it can still respond to queries. > I speculate that this is possible because the amount of memory needed > to query pg_locks continues to grow (around 1900MB now). If restart is an option, that sounds like a great idea. If you could capture the data into tables where we can summarize to analyze it in a meaningful way, that would be ideal. Something like: CREATE TABLE activity_snap_1 AS SELECT * FROM pg_stat_activity; Of course, boost the number for each subsequent run. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 19, 2014 at 11:01 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > If restart is an option, that sounds like a great idea. If you > could capture the data into tables where we can summarize to > analyze it in a meaningful way, that would be ideal. Something > like: > > CREATE TABLE activity_snap_1 AS SELECT * FROM pg_stat_activity; > > Of course, boost the number for each subsequent run. Kevin - Would the you or the list be interested in snapshots of pg_locks as well? I can take a restart tonight and get this going on a half-hourly basis (unless you think more frequent snaps would be useful).
Dave Owens <dave@teamunify.com> wrote: > On Tue, Aug 19, 2014 at 11:01 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >> CREATE TABLE activity_snap_1 AS SELECT * FROM pg_stat_activity; > Would the you or the list be interested in snapshots of pg_locks as well? Most definitely! I'm sorry that copied/pasted the pg_stat_activity example, I was playing with both. pg_locks is definitely the more important one, but it might be useful to try to match some of these locks up against process information as we drill down from the summary to see examples of what makes up those numbers. > I can take a restart tonight and get this going on a half-hourly basis > (unless you think more frequent snaps would be useful). Each half-hour should be fine as long as that gives at least three or four samples before you are unable to query pg_locks. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I now have 8 hours worth of snapshots from pg_stat_activity and pg_locks (16 snapshots from each table/view). I have turned off collection at this point, but I am still able to query pg_locks: # SELECT mode, count(mode) AS count FROM pg_locks GROUP BY mode ORDER BY mode; mode | count ------------------+--------- AccessShareLock | 291 ExclusiveLock | 19 RowExclusiveLock | 4 RowShareLock | 1 SIReadLock | 7287531 (5 rows) SIReadLocks continue to grow. It seems, in general, that our application code over uses Serializable... we have produced a patch that demotes some heavy-hitting queries down to Read Committed, and we will see if this makes an impact on the number of SIReadLocks. Is it interesting that only 101557 out of 7 million SIReadLocks have a pid associated with them? -Dave Owens
Dave Owens <dave@teamunify.com> wrote: > I now have 8 hours worth of snapshots from pg_stat_activity and > pg_locks (16 snapshots from each table/view). I have turned off > collection at this point, but I am still able to query pg_locks Could you take the earliest one after activity started, and the latest one before you stopped collecting them, compress them, and email them to me off-list, please? > SIReadLocks continue to grow. It seems, in general, that our > application code over uses Serializable... we have produced a patch > that demotes some heavy-hitting queries down to Read Committed, and we > will see if this makes an impact on the number of SIReadLocks. Do all of those modify data? If not, you may get nearly the same benefit from declaring them READ ONLY instead, and that would get better protection against seeing transient invalid states. One example of that is here: http://wiki.postgresql.org/wiki/SSI#Deposit_Report > Is it interesting that only 101557 out of 7 million SIReadLocks have a > pid associated with them? I would need to double-check that I'm not forgetting another case, but the two cases I can think of where the pid is NULL are if the transaction is PREPARED (for two phase commit) or if committed transactions are summarized (so they can be combined) to try to limit RAM usage. We might clear the pid if the connection is closed, but (without having checked yet) I don't think we did that. Since you don't use prepared transactions, they are probably from the summarization. But you would not normally accumulate much there unless you have a long-running transaction which is not flagged as READ ONLY. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: > Dave Owens <dave@teamunify.com> wrote: > >> I now have 8 hours worth of snapshots from pg_stat_activity and >> pg_locks (16 snapshots from each table/view). I have turned off >> collection at this point, but I am still able to query pg_locks > > Could you take the earliest one after activity started, and the > latest one before you stopped collecting them, compress them, and > email them to me off-list, please? Dave did this, off-list. There is one transaction which has been running for over 20 minutes, which seems to be the cause of the accumulation. I note that this query does not hold any of the locks it would need to take before modifying data, and it has not been assigned a transactionid -- both signs that it has (so far) not modified any data. If it is not going to modify any, it would not have caused this accumulation of locks if it was flagged as READ ONLY. This is very important to do if you are using serializable transactions in PostgreSQL. To quantify that, I show the number of SIReadLocks in total: test=# select count(*) from locks_snap_16 where mode = 'SIReadLock'; count --------- 3910257 (1 row) ... and the number of those which are only around because there is an open overlapping transaction, not flagged as read only: test=# select count(*) from locks_snap_16 l test-# where mode = 'SIReadLock' test-# and not exists (select * from locks_snap_16 a test(# where a.locktype = 'virtualxid' test(# and a.virtualxid = l.virtualtransaction); count --------- 3565155 (1 row) I can't stress enough how important it is that the advice near the bottom of this section of the documentation is heeded: http://www.postgresql.org/docs/9.2/interactive/transaction-iso.html#XACT-SERIALIZABLE Those bullet-points are listed roughly in order of importance; there is a reason this one is listed first: - Declare transactions as READ ONLY when possible. In some shops using SERIALIZABLE transactions, I have seen them set default_transaction_read_only = on, and explicitly set it off for transactions which will (or might) modify data. If you have a long-running report that might itself grab a lot of predicate locks (a/k/a SIReadLocks), you can avoid that by declaring the transaction as READ ONLY DEFERRABLE. If you do that, the transaction will wait to begin execution until it can acquire a snapshot guaranteed not to show any anomalies (like the example referenced in an earlier post can show). It then runs without acquiring any predicate locks, just like a REPEATABLE READ transaction. In fairly busy benchmarks, we never saw it take more than six seconds to acquire such a snapshot, although the wait time is not bounded. Again, getting such a snapshot will be possible sooner if you declare transactions as READ ONLY when possible. :-) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company