Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance

From Jeremy Schneider
Subject Re: Out of Memory errors are frustrating as heck!
Date
Msg-id 2EC37EC7-EA2C-467B-9240-3778B227514D@ardentperf.com
Whole thread Raw
In response to Re: Out of Memory errors are frustrating as heck!  (Gunther <raj@gusw.net>)
List pgsql-performance
pg_hint_plan extension might be able to force a plan.

Also, I don’t know if perf probes & perf record/script could be useful for creating a log of all the calls to do memory allocation along with the unwound call stacks? Then analyzing that file? At least this can be done for a single process, and just while the problematic sql is running.

-Jeremy

Sent from my TI-83

On Apr 19, 2019, at 20:34, Gunther <raj@gusw.net> wrote:

On 4/19/2019 17:01, Justin Pryzby wrote:
Were you able to reproduce the issue in some minimized way ?  Like after
joining fewer tables or changing to join with fewer join conditions ?

On Thu, Apr 18, 2019 at 05:21:28PM +0200, Tomas Vondra wrote:
It would be possible to do at least one of these two things:

Thanks, and sorry for my pushyness. Yes, I have pin pointed the HashJoin, and I have created the two tables involved.

The data distribution of the join keys, they are all essentially UUIDs and essentially random.

I am sharing this data with you. However, only someone who can actually control the planner can use it to reproduce the problem. I have tried but not succeeded. But I am sure the problem is reproduced by this material.

Here is the part of the plan that generates this massive number of calls to

        ->  Hash Right Join  (cost=4255031.53..5530808.71 rows=34619 width=1197)             Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))             ->  Hash Right Join  (cost=1341541.37..2612134.36 rows=13 width=341)                   Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))                 ... let's call this tmp_q ...             ->  Hash  (cost=2908913.87..2908913.87 rows=34619 width=930)                   ->  Gather Merge  (cost=2892141.40..2908913.87 rows=34619 width=930)                       ... let's call this tmp_r ...                

This can be logically reduced to the following query

SELECT * FROM tmp_q q RIGHT OUTER JOIN tmp_r r   USING(documentInternalId, actInternalId);

with the following two tables

CREATE TABLE xtmp_q (documentinternalid        character varying(255),operationqualifiercode    character varying(512),operationqualifiername    character varying(512),actinternalid             character varying(255),approvalinternalid        character varying(255),approvalnumber            character varying(555),approvalnumbersystem      character varying(555),approvalstatecode         character varying(512),approvalstatecodesystem   character varying(512),approvaleffectivetimelow  character varying(512),approvaleffectivetimehigh character varying(512),approvalstatuscode        character varying(32),licensecode               character varying(512),agencyid                  character varying(555),agencyname                text
);

CREATE TABLE tmp_r (documentinternalid      character varying(255),is_current              character(1),documentid              character varying(555),documenttypecode        character varying(512),subjectroleinternalid   character varying(255),subjectentityinternalid character varying(255),subjectentityid         character varying(555),subjectentityidroot     character varying(555),subjectentityname       character varying,subjectentitytel        text,subjectentityemail      text,otherentityinternalid   character varying(255),confidentialitycode     character varying(512),actinternalid           character varying(255),operationcode           character varying(512),operationname           text,productitemcode         character varying(512),productinternalid       character varying(255)..
);

you can download the data here (URLs just a tiny bit obfuscated):

The small table http:// gusw dot net/tmp_q.gz

The big table is in the form of 9 parts of 20 MB each, http:// gusw dot net/tmp_r.gz.00, .01, .02, ..., .09, maybe you need only the first part.

Download as many as you have patience to grab, and then import the data like this:

\copy tmp_q from program 'zcat tmp_q.gz'
\copt tmp_r from program 'cat tmp_r.gz.* |zcat'

The only problem is that I can't test that this actually would trigger the memory problem, because I can't force the plan to use the right join, it always reverts to the left join hashing the tmp_q:

        ->  Hash Left Join  (cost=10.25..5601401.19 rows=5505039 width=12118)             Hash Cond: (((r.documentinternalid)::text = (q.documentinternalid)::text) AND ((r.actinternalid)::text = (q.actinternalid)::text))             ->  Seq Scan on tmp_r r  (cost=0.00..5560089.39 rows=5505039 width=6844)             ->  Hash  (cost=10.10..10.10 rows=10 width=6306)                   ->  Seq Scan on tmp_q q  (cost=0.00..10.10 rows=10 width=6306)

which is of course much better, but when tmp_q and tmp_r are the results of complex stuff that the planner can't estimate, then it gets it wrong, and then the issue gets triggered because we are hashing on the big tmp_r, not tmp_q.

It would be so nice if there was a way to force a specific plan for purposes of the testing.  I tried giving false data in pg_class reltuples and relpages:

foo=# analyze tmp_q;
ANALYZE
foo=# analyze tmp_r;
ANALYZE
foo=# select relname, relpages, reltuples from pg_class where relname in ('tmp_q', 'tmp_r');relname | relpages |  reltuples
---------+----------+-------------tmp_r   |  5505039 | 1.13467e+08tmp_q   |        7 |         236
(2 rows)

foo=# update pg_class set (relpages, reltuples) = (5505039, 1.13467e+08) where relname = 'tmp_q';
UPDATE 1
foo=# update pg_class set (relpages, reltuples) = (7, 236) where relname = 'tmp_r';
UPDATE 1

but that didn't help. Somehow the planner outsmarts every such trick, so I can't get it to follow my right outer join plan where the big table is hashed.  I am sure y'all know some way to force it.

regards,
-Gunther

pgsql-performance by date:

Previous
From: Gunther
Date:
Subject: Re: Out of Memory errors are frustrating as heck!
Next
From: Justin Pryzby
Date:
Subject: Re: Out of Memory errors are frustrating as heck!