Thread: capturing/viewing sort_mem utilization on a per query basis
Greetings, I've got a pesky J2EE based app that is using PostgreSQL-7.4.x on the backend. Lately i've been getting a disturbing large number of PostgreSQL out of memory exceptions on the J2EE side when running queries that have some huge joins in them. I've tried increasing the sort_mem value for the DB , but that hasn't had any noticable impact. Rather than fumbling blindly in the dark, i'd like to get a better idea of how much memory this query really needs. Is there a way for me to capture that in the log? Note, i'm not the person who wrote this app or the query, i'm just the guy responsible for the DB. I've already castigated the folks who wrote this resource hog. thanks, Lonni -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
Lonni J Friedman <netllama@gmail.com> writes: > I've got a pesky J2EE based app that is using PostgreSQL-7.4.x on the > backend. Lately i've been getting a disturbing large number of > PostgreSQL out of memory exceptions on the J2EE side when running > queries that have some huge joins in them. Such an error on the client side would indicate that the query retrieved more rows than the client app had memory for. Perhaps your problem is incorrectly-constrained joins that are returning way more rows than you expect? regards, tom lane
Lonni J Friedman <netllama@gmail.com> writes: > Although, looking in the log postgresql is generating, I'm seeing the > following at the same time as that OOM above so it certainly looks > like the DB is barfing: OK, then it's a backend issue not a client-side issue. > HashBatchContext: -1934683632 total in 295 blocks; 135760 free (879 > chunks); -1934819392 used Hmm, looks like a hash join ran out of memory. What PG version is this again, and what do you have sort_mem set to? Can you show an EXPLAIN for the query that is failing like this? regards, tom lane
On Wed, 02 Feb 2005 11:17:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: > > I've got a pesky J2EE based app that is using PostgreSQL-7.4.x on the > > backend. Lately i've been getting a disturbing large number of > > PostgreSQL out of memory exceptions on the J2EE side when running > > queries that have some huge joins in them. > > Such an error on the client side would indicate that the query retrieved > more rows than the client app had memory for. Perhaps your problem is > incorrectly-constrained joins that are returning way more rows than > you expect? You might be right, i'm honestly not sure. What we're seeing on the client side is: org.postgresql.util.PSQLException: ERROR: out of memory Although, looking in the log postgresql is generating, I'm seeing the following at the same time as that OOM above so it certainly looks like the DB is barfing: ############### TopMemoryContext: 71448 total in 7 blocks; 22544 free (120 chunks); 48904 used TopTransactionContext: 8192 total in 1 blocks; 8136 free (0 chunks); 56 used DeferredTriggerXact: 0 total in 0 blocks; 0 free (0 chunks); 0 used MessageContext: 1040384 total in 7 blocks; 395024 free (1021 chunks); 645360 used PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used PortalHeapMemory: 1024 total in 1 blocks; 800 free (0 chunks); 224 used ExecutorState: 2121752 total in 9 blocks; 538864 free (12 chunks); 1582888 used HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used HashBatchContext: 451796 total in 2 blocks; 260800 free (0 chunks); 190996 used HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used HashBatchContext: -1934683632 total in 295 blocks; 135760 free (879 chunks); -1934819392 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 24576 total in 2 blocks; 11128 free (4 chunks); 13448 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used CacheMemoryContext: 2088960 total in 8 blocks; 1595032 free (3190 chunks); 493928 used relationship_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used relation_type: 1024 total in 1 blocks; 640 free (0 chunks); 384 used relation_target: 1024 total in 1 blocks; 640 free (0 chunks); 384 used relation_origin: 1024 total in 1 blocks; 640 free (0 chunks); 384 used field_value_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used field_class_idx: 1024 total in 1 blocks; 640 free (0 chunks); 384 used field_value_idx: 1024 total in 1 blocks; 640 free (0 chunks); 384 used f_val_fid_val_idx: 1024 total in 1 blocks; 320 free (0 chunks); 704 used report_param_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used report_parm_report: 1024 total in 1 blocks; 640 free (0 chunks); 384 used projectmember_pk: 1024 total in 1 blocks; 320 free (0 chunks); 704 used projmember_member: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_opclass_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used role_user_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used role_user_proj_idx: 2048 total in 1 blocks; 768 free (0 chunks); 1280 used role_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used sfuser_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used sfuser_username: 1024 total in 1 blocks; 640 free (0 chunks); 384 used role_operation_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used role_oper_obj_oper: 2048 total in 1 blocks; 768 free (0 chunks); 1280 used artifact_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used artifact_status: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_amop_opr_opc_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used project_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used project_path: 1024 total in 1 blocks; 640 free (0 chunks); 384 used folder_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used folder_parent: 1024 total in 1 blocks; 640 free (0 chunks); 384 used folder_project: 1024 total in 1 blocks; 640 free (0 chunks); 384 used folder_path: 1024 total in 1 blocks; 640 free (0 chunks); 384 used item_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used item_name: 1024 total in 1 blocks; 640 free (0 chunks); 384 used item_folder: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_index_indrelid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_proc_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_cast_source_target_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_type_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_operator_oprname_l_r_n_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_namespace_nspname_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_shadow_usesysid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_operator_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_amproc_opc_procnum_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_amop_opc_strategy_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_index_indexrelid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_class_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_amproc_opc_procnum_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_operator_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_amop_opc_strategy_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used MdSmgr: 8192 total in 1 blocks; 6120 free (0 chunks); 2072 used DynaHash: 8192 total in 1 blocks; 6912 free (0 chunks); 1280 used DynaHashTable: 8192 total in 1 blocks; 2008 free (0 chunks); 6184 used DynaHashTable: 8192 total in 1 blocks; 5080 free (0 chunks); 3112 used DynaHashTable: 8192 total in 1 blocks; 2008 free (0 chunks); 6184 used DynaHashTable: 24576 total in 2 blocks; 15256 free (4 chunks); 9320 used DynaHashTable: 8192 total in 1 blocks; 1960 free (0 chunks); 6232 used DynaHashTable: 24576 total in 2 blocks; 13240 free (4 chunks); 11336 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used ErrorContext: 49152 total in 4 blocks; 49088 free (27 chunks); 64 used 2005-02-02 01:33:51 ERROR: out of memory DETAIL: Failed on request of size 364. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
On Wed, 02 Feb 2005 12:13:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: > > Although, looking in the log postgresql is generating, I'm seeing the > > following at the same time as that OOM above so it certainly looks > > like the DB is barfing: > > OK, then it's a backend issue not a client-side issue. > > > HashBatchContext: -1934683632 total in 295 blocks; 135760 free (879 > > chunks); -1934819392 used > > Hmm, looks like a hash join ran out of memory. What PG version is this > again, and what do you have sort_mem set to? Can you show an EXPLAIN > for the query that is failing like this? This is PostgreSQL-7.4.6 running on RHES-3 (using the RPMs). sort_mem=100000 I've attached the explain for the query that is blowing up. Thanks for looking at this. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
Attachment
Lonni J Friedman <netllama@gmail.com> writes: > On Wed, 02 Feb 2005 12:13:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm, looks like a hash join ran out of memory. What PG version is this >> again, and what do you have sort_mem set to? Can you show an EXPLAIN >> for the query that is failing like this? > I've attached the explain for the query that is blowing up. One of the three Hash steps must be absorbing a whole lot more rows than the planner is expecting, but all of them look like fairly straightforward estimation situations: -> Hash (cost=108.96..108.96 rows=28 width=24) -> Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0 (cost=0.00..108.96rows=28 width=24) Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text)) -> Hash (cost=701.44..701.44 rows=34444 width=24) -> Seq Scan on field_value tracker_artifact_group0 (cost=0.00..701.44 rows=34444 width=24) -> Hash (cost=5.74..5.74 rows=1 width=80) -> Index Scan using project_path on project tracker_artifact_extension_f1 (cost=0.00..5.74 rows=1 width=80) Index Cond: (("path")::text = 'projects.meeting_broker_v3'::text) Filter: ((("path")::text = 'projects.meeting_broker_v3'::text) OR (("path")::text ~~ 'projects.meeting_broker_v3.%'::text)) Perhaps one of these tables hasn't been vacuumed/analyzed and is way bigger than the planner thinks? Can you check into how many rows actually meet the identified conditions? regards, tom lane
On Wed, 02 Feb 2005 12:58:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: > > On Wed, 02 Feb 2005 12:13:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Hmm, looks like a hash join ran out of memory. What PG version is this > >> again, and what do you have sort_mem set to? Can you show an EXPLAIN > >> for the query that is failing like this? > > > I've attached the explain for the query that is blowing up. > > One of the three Hash steps must be absorbing a whole lot more rows than > the planner is expecting, but all of them look like fairly > straightforward estimation situations: > > -> Hash (cost=108.96..108.96 rows=28 width=24) > -> Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0 (cost=0.00..108.96rows=28 width=24) > Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text)) > > -> Hash (cost=701.44..701.44 rows=34444 width=24) > -> Seq Scan on field_value tracker_artifact_group0 (cost=0.00..701.44 rows=34444 width=24) > > -> Hash (cost=5.74..5.74 rows=1 width=80) > -> Index Scan using project_path on project tracker_artifact_extension_f1 (cost=0.00..5.74 rows=1width=80) > Index Cond: (("path")::text = 'projects.meeting_broker_v3'::text) > Filter: ((("path")::text = 'projects.meeting_broker_v3'::text) OR (("path")::text ~~ 'projects.meeting_broker_v3.%'::text)) > > Perhaps one of these tables hasn't been vacuumed/analyzed and is way I'm doing a 'vacuumdb -a -z' every 15 minutes, and a full vacuum once/day. > bigger than the planner thinks? Can you check into how many rows > actually meet the identified conditions? I'm afraid i'm not clear on what i'm supposed to be checking here. Which conditions should I be looking at? thanks. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
Lonni J Friedman <netllama@gmail.com> writes: > I'm afraid i'm not clear on what i'm supposed to be checking here. > Which conditions should I be looking at? thanks. Well, for instance, >> -> Hash (cost=108.96..108.96 rows=28 width=24) >> -> Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0 (cost=0.00..108.96 rows=28width=24) >> Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text)) select count(*) from mntr_subscription where (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text)); regards, tom lane
On Wed, 02 Feb 2005 13:23:11 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: > > I'm afraid i'm not clear on what i'm supposed to be checking here. > > Which conditions should I be looking at? thanks. > > Well, for instance, > > >> -> Hash (cost=108.96..108.96 rows=28 width=24) > >> -> Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0 (cost=0.00..108.96 rows=28width=24) > >> Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text)) > > select count(*) from mntr_subscription where (((user_id)::text = > 'user1187'::text) AND ((event_operation)::text = 'update'::text)); OK, i think i see what you're requesting. How's this: select count(*) from mntr_subscription where (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text)); count ------- 9 (1 row) select count(*) from project where (("path")::text = 'projects.meeting_broker_v3'::text); count ------- 1 (1 row) select count(*) from project where ((("path")::text = 'projects.meeting_broker_v3'::text) OR (("path")::text ~~ 'projects.meeting_broker_v3.%'::text)); count ------- 1 (1 row) Let me know if you need more. thanks -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
Lonni J Friedman <netllama@gmail.com> writes: > OK, i think i see what you're requesting. How's this: What about field_value, does it in fact have circa 34444 rows? These tables are surely nowhere near big enough to bloat a hash table to upwards of 2Gb, as your log shows happened. I'm thinking that you must have found some sort of bug, or memory leak maybe. Is there a way I can get access to your data set to trace through the behavior with a debugger? (We should probably take the discussion offlist at this point.) regards, tom lane