Thread: Make Append Cost aware of some run time partition prune case

Make Append Cost aware of some run time partition prune case

From
Andy Fan
Date:
Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition prune
case.  The first result is that generic plans will rarely be used for some cases.
For instance, SELECT * FROM p WHERE pkey = $1;  The custom plan will only 
count the cost of one partition, however generic plan will count the cost for all the
partitions even we are sure that only 1 partition will survive. Another impact
is that planners may choose a wrong plan.  for example,  SELECT * FROM t1, p  
WHERE t1.a = p.pkey;  The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

The basic idea is we need to estimate how many partitions will survive after
considering the runtime partition prune.  After that,  we will adjust the
cost/rows accordingly. IIUC, this follows Robert's idea at [1]. 
However there are too many special cases which is hard
to handle, but luckily the most common case would be sth like  partykey = $1,
which we can estimate well, this patch is aimed to handle that part specially.  
I supposed 75% partitions will survive for other cases arbitrarily, actually I
want to use 100% to be the same as the current situation. If we decide to 
handle their special case  differently, PartitionedRelQual has to be redefined 
a bit (see comments around that.)  which is the main data struct in the implementation.

The attached is the minimum runnable patch. There are some obvious issue here:

1. MergeAppend path is not handled on purpose, it will be done at last.
2. The cost for Parallel Append Path is not adjusted, I'm not sure about what is
   the best way to do it.  So there are 2 test cases in sql/partition_prune.out
   failed due to this, I just echo the 'results' to expected' so that you can
   know which one is impacted.


Here are some simple test results.

create table a1 partition of a for values in (1) partition by range (b, c);
create table a1_1 partition of a1 for values from (1, 0) to (1, 10);
create table a1_2 partition of a1 for values from (2, 0) to (2, 10);

create table a2 partition of a for values in (2) partition by range (b, c);
create table a2_1 partition of a2 for values from (1, 0) to (1, 10);
create table a2_2 partition of a2 for values from (2, 0) to (2, 10);


insert into a select 1, i%2 + 1, i % 10 from generate_series(1, 10000) i;
insert into a select 2, i%2 + 1, i % 10 from generate_series(1, 10000) i;

analyze a;

set plan_cache_mode to force_generic_plan;

prepare s as select * from a where a = $1;
PREPARE
explain execute s(1);
                            QUERY PLAN
-------------------------------------------------------------------
 Append  (cost=0.00..231.00 rows=10000 width=12)  << both rows/cost is adjusted.
   Subplans Removed: 2
   ->  Seq Scan on a1_1 a_1  (cost=0.00..90.50 rows=5000 width=12)
         Filter: (a = $1)
   ->  Seq Scan on a1_2 a_2  (cost=0.00..90.50 rows=5000 width=12)
         Filter: (a = $1)
(6 rows)

prepare s4 as select * from a where a = $1 and b = $2 and c = $3;
PREPARE
explain execute s4(1, 1, 2);
                             QUERY PLAN
--------------------------------------------------------------------
 Append  (cost=0.00..120.50 rows=1000 width=12)  << Here
   Subplans Removed: 3
   ->  Seq Scan on a1_1 a_1  (cost=0.00..115.50 rows=1000 width=12)
         Filter: ((a = $1) AND (b = $2) AND (c = $3))
(4 rows)

prepare s2 as select * from a where a = $1 union all select * from a where a = $2;
PREPARE
explain execute s2(1, 1);
                               QUERY PLAN
-------------------------------------------------------------------------
 Append  (cost=0.00..762.00 rows=20000 width=12)
   ->  Append  (cost=0.00..231.00 rows=10000 width=12)  << Here
         Subplans Removed: 2
         ->  Seq Scan on a1_1 a_1  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $1)
         ->  Seq Scan on a1_2 a_2  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $1)
   ->  Append  (cost=0.00..231.00 rows=10000 width=12) << Here
         Subplans Removed: 2
         ->  Seq Scan on a1_1 a_4  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $2)
         ->  Seq Scan on a1_2 a_5  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $2)
(13 rows)

prepare s3 as select * from a where a = $1 union select * from a where a = $2;
PREPARE
explain execute s3(1, 1);
                                  QUERY PLAN
-------------------------------------------------------------------------------
 HashAggregate  (cost=912.00..1112.00 rows=20000 width=12)
   Group Key: a.a, a.b, a.c
   ->  Append  (cost=0.00..762.00 rows=20000 width=12)
         ->  Append  (cost=0.00..231.00 rows=10000 width=12) << Here
               Subplans Removed: 2
               ->  Seq Scan on a1_1 a_1  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $1)
               ->  Seq Scan on a1_2 a_2  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $1)
         ->  Append  (cost=0.00..231.00 rows=10000 width=12) << Here
               Subplans Removed: 2
               ->  Seq Scan on a1_1 a_4  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $2)
               ->  Seq Scan on a1_2 a_5  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $2)
(15 rows)

-- add a limit to make sure the runtime partitions prune.
explain select * from generate_series(1, 10) i(i) join lateral (select * from a
where a.a = (i.i % 2 + 1) and a.b = (i.i % 2 + 1) and a.c = (i.i % 10) limit 10000000) m on true;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2030.10 rows=10000 width=16)
   ->  Function Scan on generate_series i  (cost=0.00..0.10 rows=10 width=4)
   ->  Limit  (cost=0.00..183.00 rows=1000 width=12)
         ->  Append  (cost=0.00..183.00 rows=1000 width=12)  << Here
               ->  Seq Scan on a1_1 a_1  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
               ->  Seq Scan on a1_2 a_2  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
               ->  Seq Scan on a2_1 a_3  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
               ->  Seq Scan on a2_2 a_4  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
(12 rows)

Any thoughts about this?  Thanks

--
Best Regards
Andy Fan
Attachment

Re: Make Append Cost aware of some run time partition prune case

From
vignesh C
Date:
On Thu, Mar 4, 2021 at 9:51 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
>>
>> I have implemented a new one, which only handles 1 level of partitioned table, and
>> only 1 partition key.  and only handle the eq operators like partkey = $1 / partkey in ($1, $2)
>> / parkey = $1 or partkey = $2;  The patch works well in my user case.  I can send
>> one on the latest master shortly, and hope it is helpful for you as well.
>>
>
> Hi:
>
> Here is the new patch for this topic,  which has proved works in my limited user
> case (apply the similar logic on pg11), and it is incomplete since more cases
> should be covered but not.  Uploading this patch now is just for discussing and
> testing.
>
> Design principle:
>
> 1). the cost of AppendPath should be reduced for either init partition prune or run
> time partition prune. All of the startup_cost, total_cost, rows should be
> adjusted. As for the startup_cost, if we should adjust it carefully, or else we can
> get the run_time_cost less than 0.
>
> 2). When we merge the subpath from sub-partitioned rel via
> accumulate_append_subpath, currently we just merge the subpaths and discard the
> cost/rows in AppendPath.  After this feature is involved, we may consider to use
> the AppendPath's cost as well during this stage.
>
> 3). When join is involved, AppendPath is not enough since the estimated rows for
> a join relation cares about rel1->rows, rel2->rows only, the Path.rows is not
> cared. so we need to adjust rel->rows as well.  and only for the init
> partition prune case.  (appendrel->rows for planning time partition prune is
> handled already).
>
> The biggest problem of the above is I don't know how to adjust the cost for
> Parallel Append Path. Currently I just ignore it, which would cause some query
> should use Parallel Append Path but not.
>
> Something I don't want to handle really unless we can address its value.
> 1. Operators like  >, <. Between and.
> 2. the uncompleted part key appeared in prunequals. Like we have partition key (a,
>    b). But use just use A = 1 as restrictinfo.
>
> The main reason I don't want to handle them are 1). it would be uncommon.
> b). It introduces extra complexity. c). at last, we can't estimate it well like partkey > $1,
> what would be a prune ratio for ).
>
> Something I don't handle so far are: 1). accumulate_append_subpath
> stuff. 2). MergeAppend.  3). Multi Partition key.
>

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh



Re: Make Append Cost aware of some run time partition prune case

From
Daniel Gustafsson
Date:
> On 14 Jul 2021, at 17:55, vignesh C <vignesh21@gmail.com> wrote:

> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

As the thread has stalled, this patch still doesn't apply (judging by the log
it's likely not too hard to resolve).  I'm marking this patch Returned with
Feedback, feel free to open a new entry for an updated patch.

--
Daniel Gustafsson        https://vmware.com/