surprisingly expensive join planning query - Mailing list pgsql-hackers

From Tomas Vondra
Subject surprisingly expensive join planning query
Date
Msg-id 20191201170112.kqxq6b7ctkupm3so@development
Whole thread Raw
Responses Re: surprisingly expensive join planning query
List pgsql-hackers
Hi,

while evaluating one of the CF patches (the incremental sort one, but
that's mostly irrelevant), I ran into a strange issue with join planning
for a fairly simple query. I needed to asses how the patch affects query
planning for different GUCs, so I ran a group of queries and stashed the
results into a table with this structure

   CREATE TABLE plans (
     query text,
     index text,
     option text,
     plan text,
     type text,
     force boolean,
     parallel boolean);

Essentially all the columns are 'dimensions' with the exception of the
'plan' column storing the explain plan generated.

The results (~60k rows / 30MB) is available here:

   https://drive.google.com/open?id=1Q4oR1KtaAil87lbMo-xUvvw_0wf_zDx-

   copy plans from '/tmp/results-100M.data';

To evaluate results, I needed to see which GUCs result in a different
plan compared to the master, so I did a query like this:

   with
     master                             AS (select * from plans where option = ''),
     create_ordered_paths_parallel      AS (select * from plans where option = 'devel_create_ordered_paths_parallel'),
     create_partial_grouping_paths_2    AS (select * from plans where option =
'devel_create_partial_grouping_paths_2'),
     create_partial_grouping_paths      AS (select * from plans where option = 'devel_create_partial_grouping_paths'),
     standard_join_search               AS (select * from plans where option = 'devel_standard_join_search'),
     add_paths_to_grouping_rel          AS (select * from plans where option = 'devel_add_paths_to_grouping_rel'),
     gather_grouping_paths              AS (select * from plans where option = 'devel_gather_grouping_paths'),
     create_ordered_paths               AS (select * from plans where option = 'devel_create_ordered_paths'),
     add_paths_to_grouping_rel_parallel AS (select * from plans where option =
'devel_add_paths_to_grouping_rel_parallel'),
     set_rel_pathlist                   AS (select * from plans where option = 'devel_set_rel_pathlist'),
     apply_scanjoin_target_to_paths     AS (select * from plans where option = 'devel_apply_scanjoin_target_to_paths')
   select
     master.query,
     master.index,
     master.type,
     master.force,
     master.parallel,
     md5(master.plan),
     (CASE WHEN (master.plan = r1.plan) THEN NULL ELSE 'DIFF' END) guc1,
     (CASE WHEN (master.plan = r2.plan) THEN NULL ELSE 'DIFF' END) guc2,
     (CASE WHEN (master.plan = r3.plan) THEN NULL ELSE 'DIFF' END) guc3,
     (CASE WHEN (master.plan = r4.plan) THEN NULL ELSE 'DIFF' END) guc4,
     (CASE WHEN (master.plan = r5.plan) THEN NULL ELSE 'DIFF' END) guc5,
     (CASE WHEN (master.plan = r6.plan) THEN NULL ELSE 'DIFF' END) guc6,
     (CASE WHEN (master.plan = r7.plan) THEN NULL ELSE 'DIFF' END) guc7,
     (CASE WHEN (master.plan = r8.plan) THEN NULL ELSE 'DIFF' END) guc8,
     (CASE WHEN (master.plan = r9.plan) THEN NULL ELSE 'DIFF' END) guc9,
     (CASE WHEN (master.plan = r10.plan) THEN NULL ELSE 'DIFF' END) guc10
   from
     master
       join create_ordered_paths_parallel r1 using (query, index, type, force, parallel)
       join create_partial_grouping_paths r2 using (query, index, type, force, parallel)
       join create_partial_grouping_paths_2 r3 using (query, index, type, force, parallel)
       join standard_join_search r4 using (query, index, type, force, parallel)
       join add_paths_to_grouping_rel r5 using (query, index, type, force, parallel)
       join gather_grouping_paths r6 using (query, index, type, force, parallel)
       join create_ordered_paths r7 using (query, index, type, force, parallel)
       join add_paths_to_grouping_rel_parallel r8 using (query, index, type, force, parallel)
       join set_rel_pathlist r9 using (query, index, type, force, parallel)
       join apply_scanjoin_target_to_paths r10 using (query, index, type, force, parallel);

This however causes pretty serious issues during planning. Firstly, it
consumes insane amounts of memory, to the extent that on my machine it
crashes due to OOM.

If I lover the join_collapse_limit to 1, it works just fine, but once I
increase it too much, the memory consumption and planning time go
through the roof and eventually crashes.

I did a bit of investigation, and after instrumenting aset.c a bit I got
a statistic like this:

    size  | alloc count | alloc sum | free count | free sum |      diff
   -------+-------------+-----------+------------+----------+-----------
       64 |     5606157 | 358794048 |        118 |     7552 | 358786496

i.e. there's a lot of 64B chunks allocated, but almost none of them are
freed, resulting in ~350MB leak. There are various other sizes with a
lot of allocated chunks, but nowhere close to this.

It seems most of this comesfrom find_mergeclauses_for_outer_pathkeys()
which builds matched_restrictinfos and then just leaves it allocated.
After pfreeing this (see attached patch), the memory usage gets way down
and the query completes. I'm sure there are other things we could pfree
to reduce the memory usage even more.

That being said, I wonder if this is worth it - the query is a bit
strange and probably could be rewritten, and we generally don't free
stuff in the planner very aggressively. OTOH consuming gigabytes of
memory and crashing with OOM is no fun.

The other issue is planning time, which looks like this:

     join_collapse_limit = 1        3.698 ms
     join_collapse_limit = 2        3.109 ms
     join_collapse_limit = 3        5.244 ms
     join_collapse_limit = 4        9.025 ms
     join_collapse_limit = 5       29.371 ms
     join_collapse_limit = 6       83.190 ms
     join_collapse_limit = 7       93.693 ms
     join_collapse_limit = 8      253.369 ms
     join_collapse_limit = 9      760.415 ms
     join_collapse_limit = 10    2385.519 ms
     join_collapse_limit = 11    7880.276 ms

compared to the execution time (consistently ~800ms) this is pretty
high, and it'd get considerably worse with more tables (it'd start
crashing because of OOM too).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Sergei Kornilov
Date:
Subject: Re: [HACKERS] Block level parallel vacuum