Thread: capturing/viewing sort_mem utilization on a per query basis

capturing/viewing sort_mem utilization on a per query basis

From
Lonni J Friedman
Date:
Greetings,
I've got a pesky J2EE based app that is using PostgreSQL-7.4.x on the
backend.  Lately i've been getting a disturbing large number of
PostgreSQL out of memory exceptions on the J2EE side when running
queries that have some huge joins in them.  I've tried increasing the
sort_mem value for the DB , but that hasn't had any noticable impact.
Rather than fumbling blindly in the dark, i'd like to get a better
idea of how much memory this query really needs.

Is there a way for me to capture that in the log?

Note, i'm not the person who wrote this app or the query, i'm just the
guy responsible for the DB.  I've already castigated the folks who
wrote this resource hog.

thanks,
Lonni

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: capturing/viewing sort_mem utilization on a per query basis

From
Tom Lane
Date:
Lonni J Friedman <netllama@gmail.com> writes:
> I've got a pesky J2EE based app that is using PostgreSQL-7.4.x on the
> backend.  Lately i've been getting a disturbing large number of
> PostgreSQL out of memory exceptions on the J2EE side when running
> queries that have some huge joins in them.

Such an error on the client side would indicate that the query retrieved
more rows than the client app had memory for.  Perhaps your problem is
incorrectly-constrained joins that are returning way more rows than
you expect?

            regards, tom lane

Re: capturing/viewing sort_mem utilization on a per query basis

From
Tom Lane
Date:
Lonni J Friedman <netllama@gmail.com> writes:
> Although, looking in the log postgresql is generating, I'm seeing the
> following at the same time as that OOM above so it certainly looks
> like the DB is barfing:

OK, then it's a backend issue not a client-side issue.

> HashBatchContext: -1934683632 total in 295 blocks; 135760 free (879
> chunks); -1934819392 used

Hmm, looks like a hash join ran out of memory.  What PG version is this
again, and what do you have sort_mem set to?  Can you show an EXPLAIN
for the query that is failing like this?

            regards, tom lane

Re: capturing/viewing sort_mem utilization on a per query basis

From
Lonni J Friedman
Date:
On Wed, 02 Feb 2005 11:17:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
> > I've got a pesky J2EE based app that is using PostgreSQL-7.4.x on the
> > backend.  Lately i've been getting a disturbing large number of
> > PostgreSQL out of memory exceptions on the J2EE side when running
> > queries that have some huge joins in them.
>
> Such an error on the client side would indicate that the query retrieved
> more rows than the client app had memory for.  Perhaps your problem is
> incorrectly-constrained joins that are returning way more rows than
> you expect?

You might be right, i'm honestly not sure. What we're seeing on the
client side is:
org.postgresql.util.PSQLException: ERROR: out of memory

Although, looking in the log postgresql is generating, I'm seeing the
following at the same time as that OOM above so it certainly looks
like the DB is barfing:
###############
TopMemoryContext: 71448 total in 7 blocks; 22544 free (120 chunks); 48904 used
TopTransactionContext: 8192 total in 1 blocks; 8136 free (0 chunks); 56 used
DeferredTriggerXact: 0 total in 0 blocks; 0 free (0 chunks); 0 used
MessageContext: 1040384 total in 7 blocks; 395024 free (1021 chunks);
645360 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
ExecutorState: 2121752 total in 9 blocks; 538864 free (12 chunks); 1582888 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 451796 total in 2 blocks; 260800 free (0 chunks); 190996 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: -1934683632 total in 295 blocks; 135760 free (879
chunks); -1934819392 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 24576 total in 2 blocks; 11128 free (4 chunks); 13448 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
CacheMemoryContext: 2088960 total in 8 blocks; 1595032 free (3190
chunks); 493928 used
relationship_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
relation_type: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
relation_target: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
relation_origin: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
field_value_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
field_class_idx: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
field_value_idx: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
f_val_fid_val_idx: 1024 total in 1 blocks; 320 free (0 chunks); 704 used
report_param_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
report_parm_report: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
projectmember_pk: 1024 total in 1 blocks; 320 free (0 chunks); 704 used
projmember_member: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_opclass_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
role_user_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
role_user_proj_idx: 2048 total in 1 blocks; 768 free (0 chunks); 1280 used
role_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
sfuser_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
sfuser_username: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
role_operation_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
role_oper_obj_oper: 2048 total in 1 blocks; 768 free (0 chunks); 1280 used
artifact_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
artifact_status: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
project_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
project_path: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
folder_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
folder_parent: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
folder_project: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
folder_path: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
item_pk: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
item_name: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
item_folder: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_index_indrelid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used
pg_proc_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_cast_source_target_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_type_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_operator_oprname_l_r_n_index: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 640 free (0
chunks); 384 used
pg_shadow_usesysid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_operator_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_amproc_opc_procnum_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_amop_opc_strategy_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_class_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_amproc_opc_procnum_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_operator_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_amop_opc_strategy_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
MdSmgr: 8192 total in 1 blocks; 6120 free (0 chunks); 2072 used
DynaHash: 8192 total in 1 blocks; 6912 free (0 chunks); 1280 used
DynaHashTable: 8192 total in 1 blocks; 2008 free (0 chunks); 6184 used
DynaHashTable: 8192 total in 1 blocks; 5080 free (0 chunks); 3112 used
DynaHashTable: 8192 total in 1 blocks; 2008 free (0 chunks); 6184 used
DynaHashTable: 24576 total in 2 blocks; 15256 free (4 chunks); 9320 used
DynaHashTable: 8192 total in 1 blocks; 1960 free (0 chunks); 6232 used
DynaHashTable: 24576 total in 2 blocks; 13240 free (4 chunks); 11336 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ErrorContext: 49152 total in 4 blocks; 49088 free (27 chunks); 64 used
2005-02-02 01:33:51 ERROR:  out of memory
DETAIL:  Failed on request of size 364.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: capturing/viewing sort_mem utilization on a per query basis

From
Lonni J Friedman
Date:
On Wed, 02 Feb 2005 12:13:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
> > Although, looking in the log postgresql is generating, I'm seeing the
> > following at the same time as that OOM above so it certainly looks
> > like the DB is barfing:
>
> OK, then it's a backend issue not a client-side issue.
>
> > HashBatchContext: -1934683632 total in 295 blocks; 135760 free (879
> > chunks); -1934819392 used
>
> Hmm, looks like a hash join ran out of memory.  What PG version is this
> again, and what do you have sort_mem set to?  Can you show an EXPLAIN
> for the query that is failing like this?

This is PostgreSQL-7.4.6 running on RHES-3 (using the RPMs).
sort_mem=100000

I've attached the explain for the query that is blowing up.  Thanks
for looking at this.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Attachment

Re: capturing/viewing sort_mem utilization on a per query basis

From
Tom Lane
Date:
Lonni J Friedman <netllama@gmail.com> writes:
> On Wed, 02 Feb 2005 12:13:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm, looks like a hash join ran out of memory.  What PG version is this
>> again, and what do you have sort_mem set to?  Can you show an EXPLAIN
>> for the query that is failing like this?

> I've attached the explain for the query that is blowing up.

One of the three Hash steps must be absorbing a whole lot more rows than
the planner is expecting, but all of them look like fairly
straightforward estimation situations:

               ->  Hash  (cost=108.96..108.96 rows=28 width=24)
                 ->  Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0
(cost=0.00..108.96rows=28 width=24) 
                       Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text =
'update'::text))

   ->  Hash  (cost=701.44..701.44 rows=34444 width=24)
         ->  Seq Scan on field_value tracker_artifact_group0  (cost=0.00..701.44 rows=34444 width=24)

             ->  Hash  (cost=5.74..5.74 rows=1 width=80)
                   ->  Index Scan using project_path on project tracker_artifact_extension_f1  (cost=0.00..5.74 rows=1
width=80)
                         Index Cond: (("path")::text = 'projects.meeting_broker_v3'::text)
                         Filter: ((("path")::text = 'projects.meeting_broker_v3'::text) OR (("path")::text ~~
'projects.meeting_broker_v3.%'::text))

Perhaps one of these tables hasn't been vacuumed/analyzed and is way
bigger than the planner thinks?  Can you check into how many rows
actually meet the identified conditions?

            regards, tom lane

Re: capturing/viewing sort_mem utilization on a per query basis

From
Lonni J Friedman
Date:
On Wed, 02 Feb 2005 12:58:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
> > On Wed, 02 Feb 2005 12:13:59 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Hmm, looks like a hash join ran out of memory.  What PG version is this
> >> again, and what do you have sort_mem set to?  Can you show an EXPLAIN
> >> for the query that is failing like this?
>
> > I've attached the explain for the query that is blowing up.
>
> One of the three Hash steps must be absorbing a whole lot more rows than
> the planner is expecting, but all of them look like fairly
> straightforward estimation situations:
>
>                ->  Hash  (cost=108.96..108.96 rows=28 width=24)
>                  ->  Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0
(cost=0.00..108.96rows=28 width=24) 
>                        Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text =
'update'::text))
>
>    ->  Hash  (cost=701.44..701.44 rows=34444 width=24)
>          ->  Seq Scan on field_value tracker_artifact_group0  (cost=0.00..701.44 rows=34444 width=24)
>
>              ->  Hash  (cost=5.74..5.74 rows=1 width=80)
>                    ->  Index Scan using project_path on project tracker_artifact_extension_f1  (cost=0.00..5.74
rows=1width=80) 
>                          Index Cond: (("path")::text = 'projects.meeting_broker_v3'::text)
>                          Filter: ((("path")::text = 'projects.meeting_broker_v3'::text) OR (("path")::text ~~
'projects.meeting_broker_v3.%'::text))
>
> Perhaps one of these tables hasn't been vacuumed/analyzed and is way

I'm doing a 'vacuumdb -a -z' every 15 minutes, and a full vacuum once/day.

> bigger than the planner thinks?  Can you check into how many rows
> actually meet the identified conditions?

I'm afraid i'm not clear on what i'm supposed to be checking here.
Which conditions should I be looking at?  thanks.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: capturing/viewing sort_mem utilization on a per query basis

From
Tom Lane
Date:
Lonni J Friedman <netllama@gmail.com> writes:
> I'm afraid i'm not clear on what i'm supposed to be checking here.
> Which conditions should I be looking at?  thanks.

Well, for instance,

>> ->  Hash  (cost=108.96..108.96 rows=28 width=24)
>> ->  Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0  (cost=0.00..108.96
rows=28width=24) 
>> Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text))

select count(*) from mntr_subscription where (((user_id)::text =
'user1187'::text) AND ((event_operation)::text = 'update'::text));

            regards, tom lane

Re: capturing/viewing sort_mem utilization on a per query basis

From
Lonni J Friedman
Date:
On Wed, 02 Feb 2005 13:23:11 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
> > I'm afraid i'm not clear on what i'm supposed to be checking here.
> > Which conditions should I be looking at?  thanks.
>
> Well, for instance,
>
> >> ->  Hash  (cost=108.96..108.96 rows=28 width=24)
> >> ->  Index Scan using mntr_subscr_usrevt on mntr_subscription sfmain_monitoringsubscriptio0  (cost=0.00..108.96
rows=28width=24) 
> >> Index Cond: (((user_id)::text = 'user1187'::text) AND ((event_operation)::text = 'update'::text))
>
> select count(*) from mntr_subscription where (((user_id)::text =
> 'user1187'::text) AND ((event_operation)::text = 'update'::text));

OK, i think i see what you're requesting.  How's this:

select count(*) from mntr_subscription where (((user_id)::text =
'user1187'::text) AND ((event_operation)::text = 'update'::text));
 count
-------
     9
(1 row)

select count(*) from project where (("path")::text =
'projects.meeting_broker_v3'::text);
 count
-------
     1
(1 row)

select count(*) from project where ((("path")::text =
'projects.meeting_broker_v3'::text) OR (("path")::text ~~
'projects.meeting_broker_v3.%'::text));
 count
-------
     1
(1 row)

Let me know if you need more.  thanks

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: capturing/viewing sort_mem utilization on a per query basis

From
Tom Lane
Date:
Lonni J Friedman <netllama@gmail.com> writes:
> OK, i think i see what you're requesting.  How's this:

What about field_value, does it in fact have circa 34444 rows?

These tables are surely nowhere near big enough to bloat a hash table to
upwards of 2Gb, as your log shows happened.  I'm thinking that you must
have found some sort of bug, or memory leak maybe.  Is there a way I can
get access to your data set to trace through the behavior with a debugger?

(We should probably take the discussion offlist at this point.)

            regards, tom lane