Thread: surprisingly expensive join planning query

surprisingly expensive join planning query

From
Tomas Vondra
Date:
Hi,

while evaluating one of the CF patches (the incremental sort one, but
that's mostly irrelevant), I ran into a strange issue with join planning
for a fairly simple query. I needed to asses how the patch affects query
planning for different GUCs, so I ran a group of queries and stashed the
results into a table with this structure

   CREATE TABLE plans (
     query text,
     index text,
     option text,
     plan text,
     type text,
     force boolean,
     parallel boolean);

Essentially all the columns are 'dimensions' with the exception of the
'plan' column storing the explain plan generated.

The results (~60k rows / 30MB) is available here:

   https://drive.google.com/open?id=1Q4oR1KtaAil87lbMo-xUvvw_0wf_zDx-

   copy plans from '/tmp/results-100M.data';

To evaluate results, I needed to see which GUCs result in a different
plan compared to the master, so I did a query like this:

   with
     master                             AS (select * from plans where option = ''),
     create_ordered_paths_parallel      AS (select * from plans where option = 'devel_create_ordered_paths_parallel'),
     create_partial_grouping_paths_2    AS (select * from plans where option =
'devel_create_partial_grouping_paths_2'),
     create_partial_grouping_paths      AS (select * from plans where option = 'devel_create_partial_grouping_paths'),
     standard_join_search               AS (select * from plans where option = 'devel_standard_join_search'),
     add_paths_to_grouping_rel          AS (select * from plans where option = 'devel_add_paths_to_grouping_rel'),
     gather_grouping_paths              AS (select * from plans where option = 'devel_gather_grouping_paths'),
     create_ordered_paths               AS (select * from plans where option = 'devel_create_ordered_paths'),
     add_paths_to_grouping_rel_parallel AS (select * from plans where option =
'devel_add_paths_to_grouping_rel_parallel'),
     set_rel_pathlist                   AS (select * from plans where option = 'devel_set_rel_pathlist'),
     apply_scanjoin_target_to_paths     AS (select * from plans where option = 'devel_apply_scanjoin_target_to_paths')
   select
     master.query,
     master.index,
     master.type,
     master.force,
     master.parallel,
     md5(master.plan),
     (CASE WHEN (master.plan = r1.plan) THEN NULL ELSE 'DIFF' END) guc1,
     (CASE WHEN (master.plan = r2.plan) THEN NULL ELSE 'DIFF' END) guc2,
     (CASE WHEN (master.plan = r3.plan) THEN NULL ELSE 'DIFF' END) guc3,
     (CASE WHEN (master.plan = r4.plan) THEN NULL ELSE 'DIFF' END) guc4,
     (CASE WHEN (master.plan = r5.plan) THEN NULL ELSE 'DIFF' END) guc5,
     (CASE WHEN (master.plan = r6.plan) THEN NULL ELSE 'DIFF' END) guc6,
     (CASE WHEN (master.plan = r7.plan) THEN NULL ELSE 'DIFF' END) guc7,
     (CASE WHEN (master.plan = r8.plan) THEN NULL ELSE 'DIFF' END) guc8,
     (CASE WHEN (master.plan = r9.plan) THEN NULL ELSE 'DIFF' END) guc9,
     (CASE WHEN (master.plan = r10.plan) THEN NULL ELSE 'DIFF' END) guc10
   from
     master
       join create_ordered_paths_parallel r1 using (query, index, type, force, parallel)
       join create_partial_grouping_paths r2 using (query, index, type, force, parallel)
       join create_partial_grouping_paths_2 r3 using (query, index, type, force, parallel)
       join standard_join_search r4 using (query, index, type, force, parallel)
       join add_paths_to_grouping_rel r5 using (query, index, type, force, parallel)
       join gather_grouping_paths r6 using (query, index, type, force, parallel)
       join create_ordered_paths r7 using (query, index, type, force, parallel)
       join add_paths_to_grouping_rel_parallel r8 using (query, index, type, force, parallel)
       join set_rel_pathlist r9 using (query, index, type, force, parallel)
       join apply_scanjoin_target_to_paths r10 using (query, index, type, force, parallel);

This however causes pretty serious issues during planning. Firstly, it
consumes insane amounts of memory, to the extent that on my machine it
crashes due to OOM.

If I lover the join_collapse_limit to 1, it works just fine, but once I
increase it too much, the memory consumption and planning time go
through the roof and eventually crashes.

I did a bit of investigation, and after instrumenting aset.c a bit I got
a statistic like this:

    size  | alloc count | alloc sum | free count | free sum |      diff
   -------+-------------+-----------+------------+----------+-----------
       64 |     5606157 | 358794048 |        118 |     7552 | 358786496

i.e. there's a lot of 64B chunks allocated, but almost none of them are
freed, resulting in ~350MB leak. There are various other sizes with a
lot of allocated chunks, but nowhere close to this.

It seems most of this comesfrom find_mergeclauses_for_outer_pathkeys()
which builds matched_restrictinfos and then just leaves it allocated.
After pfreeing this (see attached patch), the memory usage gets way down
and the query completes. I'm sure there are other things we could pfree
to reduce the memory usage even more.

That being said, I wonder if this is worth it - the query is a bit
strange and probably could be rewritten, and we generally don't free
stuff in the planner very aggressively. OTOH consuming gigabytes of
memory and crashing with OOM is no fun.

The other issue is planning time, which looks like this:

     join_collapse_limit = 1        3.698 ms
     join_collapse_limit = 2        3.109 ms
     join_collapse_limit = 3        5.244 ms
     join_collapse_limit = 4        9.025 ms
     join_collapse_limit = 5       29.371 ms
     join_collapse_limit = 6       83.190 ms
     join_collapse_limit = 7       93.693 ms
     join_collapse_limit = 8      253.369 ms
     join_collapse_limit = 9      760.415 ms
     join_collapse_limit = 10    2385.519 ms
     join_collapse_limit = 11    7880.276 ms

compared to the execution time (consistently ~800ms) this is pretty
high, and it'd get considerably worse with more tables (it'd start
crashing because of OOM too).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: surprisingly expensive join planning query

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> It seems most of this comesfrom find_mergeclauses_for_outer_pathkeys()
> which builds matched_restrictinfos and then just leaves it allocated.
> After pfreeing this (see attached patch), the memory usage gets way down
> and the query completes.

Interesting.  The memory leak was probably much less bad before
commit 1cff1b95a, since in the old List implementation this code
would have leaked only a list header.  It makes sense to me to
add the list_free.

Alternatively, it'd be possible to get rid of the separate List
altogether, and just add the rinfo's to "mergeclauses" immediately.
The functionality of the separate list could be replaced by a
bool variable remembering whether we found any matches in this
pass through the loop.  I think the code would be a little less
clear that way, but this report makes it clear that it's a
performance bottleneck, so maybe we should just change it.

            regards, tom lane



Re: surprisingly expensive join planning query

From
Tomas Vondra
Date:
On Sun, Dec 01, 2019 at 01:27:04PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> It seems most of this comesfrom find_mergeclauses_for_outer_pathkeys()
>> which builds matched_restrictinfos and then just leaves it allocated.
>> After pfreeing this (see attached patch), the memory usage gets way down
>> and the query completes.
>
>Interesting.  The memory leak was probably much less bad before
>commit 1cff1b95a, since in the old List implementation this code
>would have leaked only a list header.  It makes sense to me to
>add the list_free.
>

I forgot to mention I tried on older releases, up to 9.5 (I suspected it
might be related to parallel queries), and I get OOM crashes there too.
I can't say if the memory is leaking slower/faster, though.

I tried fixing 9.5 - a simple pfree(matched_restrictinfos) triggers some
sort of list_concat error for me, seemed a bit weird TBH.

>Alternatively, it'd be possible to get rid of the separate List
>altogether, and just add the rinfo's to "mergeclauses" immediately.
>The functionality of the separate list could be replaced by a
>bool variable remembering whether we found any matches in this
>pass through the loop.  I think the code would be a little less
>clear that way, but this report makes it clear that it's a
>performance bottleneck, so maybe we should just change it.
>

Yes, that might be an option. And it works even on 9.5 for me (per the
attached patch). I don't think it's much less clear compared to just
doing an explicit free at the end.

It does fix cases with up to join_collapse_limit = 10, but with 11 it
still OOM-crashes. That definitely depends on available memory, of
course.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: surprisingly expensive join planning query

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Sun, Dec 01, 2019 at 01:27:04PM -0500, Tom Lane wrote:
>> Alternatively, it'd be possible to get rid of the separate List
>> altogether, and just add the rinfo's to "mergeclauses" immediately.
>> The functionality of the separate list could be replaced by a
>> bool variable remembering whether we found any matches in this
>> pass through the loop.  I think the code would be a little less
>> clear that way, but this report makes it clear that it's a
>> performance bottleneck, so maybe we should just change it.

> Yes, that might be an option. And it works even on 9.5 for me (per the
> attached patch). I don't think it's much less clear compared to just
> doing an explicit free at the end.

LGTM.

            regards, tom lane



Re: surprisingly expensive join planning query

From
Tomas Vondra
Date:
On Sun, Dec 01, 2019 at 02:17:15PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Sun, Dec 01, 2019 at 01:27:04PM -0500, Tom Lane wrote:
>>> Alternatively, it'd be possible to get rid of the separate List
>>> altogether, and just add the rinfo's to "mergeclauses" immediately.
>>> The functionality of the separate list could be replaced by a
>>> bool variable remembering whether we found any matches in this
>>> pass through the loop.  I think the code would be a little less
>>> clear that way, but this report makes it clear that it's a
>>> performance bottleneck, so maybe we should just change it.
>
>> Yes, that might be an option. And it works even on 9.5 for me (per the
>> attached patch). I don't think it's much less clear compared to just
>> doing an explicit free at the end.
>
>LGTM.
>

Thanks. Do you think this is backpatch-worthy? I'm leaning to yes, but
maybe tweaking this just in master is fine. The query is somewhat
artificial and there are probably ways to rewrite it.

The thing that annoys me a bit is that this fix is only partial. It gets
rid of maybe 80% of the allocations, but there's plenty of unnecessary
stuff left allocated ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: surprisingly expensive join planning query

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Sun, Dec 01, 2019 at 02:17:15PM -0500, Tom Lane wrote:
>> LGTM.

> Thanks. Do you think this is backpatch-worthy? I'm leaning to yes, but
> maybe tweaking this just in master is fine. The query is somewhat
> artificial and there are probably ways to rewrite it.

I don't object to back-patching.

> The thing that annoys me a bit is that this fix is only partial. It gets
> rid of maybe 80% of the allocations, but there's plenty of unnecessary
> stuff left allocated ...

Meh.  I'm not that excited about getting rid of retail space wastage,
unless there are single dominant points such as you found here.  For
small stuff it's far better to worry about memory context management.
(Speaking of which, I don't quite see why this would have been a problem
once you got past geqo_threshold; the context resets that GEQO does
should've kept things under control.)

            regards, tom lane



Re: surprisingly expensive join planning query

From
Tomas Vondra
Date:
On Mon, Dec 02, 2019 at 03:21:47PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Sun, Dec 01, 2019 at 02:17:15PM -0500, Tom Lane wrote:
>>> LGTM.
>
>> Thanks. Do you think this is backpatch-worthy? I'm leaning to yes, but
>> maybe tweaking this just in master is fine. The query is somewhat
>> artificial and there are probably ways to rewrite it.
>
>I don't object to back-patching.
>

OK

>> The thing that annoys me a bit is that this fix is only partial. It gets
>> rid of maybe 80% of the allocations, but there's plenty of unnecessary
>> stuff left allocated ...
>
>Meh.  I'm not that excited about getting rid of retail space wastage,
>unless there are single dominant points such as you found here.  For
>small stuff it's far better to worry about memory context management.

I agree.

>(Speaking of which, I don't quite see why this would have been a problem
>once you got past geqo_threshold; the context resets that GEQO does
>should've kept things under control.)
>

Not sure I follow. geqo_threshold is 12 by default, and the OOM issues
are hapenning way before that.

It might be that one reason why this example is so bad is that the CTEs
have *exactly* the different join orders are bound to be costed exactly
the same I think.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: surprisingly expensive join planning query

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> (Speaking of which, I don't quite see why this would have been a problem
>> once you got past geqo_threshold; the context resets that GEQO does
>> should've kept things under control.)

> Not sure I follow. geqo_threshold is 12 by default, and the OOM issues
> are hapenning way before that.

Ah, right.  But would the peak memory usage keep growing with more than 12
rels?

> It might be that one reason why this example is so bad is that the CTEs
> have *exactly* the different join orders are bound to be costed exactly
> the same I think.

Hmm.  I didn't really look into exactly why this example is so awful.

            regards, tom lane