Make Append Cost aware of some run time partition prune case - Mailing list pgsql-hackers

From Andy Fan
Subject Make Append Cost aware of some run time partition prune case
Date
Msg-id CAKU4AWp6Z087YCQj4uN69Ct61arCtcABYNgU_7Mv1Gu-grVjHw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Currently the cost model of append path sums the cost/rows for all the
subpaths, it usually works well until we run into the run-time partition prune
case.  The first result is that generic plans will rarely be used for some cases.
For instance, SELECT * FROM p WHERE pkey = $1;  The custom plan will only 
count the cost of one partition, however generic plan will count the cost for all the
partitions even we are sure that only 1 partition will survive. Another impact
is that planners may choose a wrong plan.  for example,  SELECT * FROM t1, p  
WHERE t1.a = p.pkey;  The cost/rows of t1 nest loop p is estimated highly
improperly. This patch wants to help this case to some extent.

The basic idea is we need to estimate how many partitions will survive after
considering the runtime partition prune.  After that,  we will adjust the
cost/rows accordingly. IIUC, this follows Robert's idea at [1]. 
However there are too many special cases which is hard
to handle, but luckily the most common case would be sth like  partykey = $1,
which we can estimate well, this patch is aimed to handle that part specially.  
I supposed 75% partitions will survive for other cases arbitrarily, actually I
want to use 100% to be the same as the current situation. If we decide to 
handle their special case  differently, PartitionedRelQual has to be redefined 
a bit (see comments around that.)  which is the main data struct in the implementation.

The attached is the minimum runnable patch. There are some obvious issue here:

1. MergeAppend path is not handled on purpose, it will be done at last.
2. The cost for Parallel Append Path is not adjusted, I'm not sure about what is
   the best way to do it.  So there are 2 test cases in sql/partition_prune.out
   failed due to this, I just echo the 'results' to expected' so that you can
   know which one is impacted.


Here are some simple test results.

create table a1 partition of a for values in (1) partition by range (b, c);
create table a1_1 partition of a1 for values from (1, 0) to (1, 10);
create table a1_2 partition of a1 for values from (2, 0) to (2, 10);

create table a2 partition of a for values in (2) partition by range (b, c);
create table a2_1 partition of a2 for values from (1, 0) to (1, 10);
create table a2_2 partition of a2 for values from (2, 0) to (2, 10);


insert into a select 1, i%2 + 1, i % 10 from generate_series(1, 10000) i;
insert into a select 2, i%2 + 1, i % 10 from generate_series(1, 10000) i;

analyze a;

set plan_cache_mode to force_generic_plan;

prepare s as select * from a where a = $1;
PREPARE
explain execute s(1);
                            QUERY PLAN
-------------------------------------------------------------------
 Append  (cost=0.00..231.00 rows=10000 width=12)  << both rows/cost is adjusted.
   Subplans Removed: 2
   ->  Seq Scan on a1_1 a_1  (cost=0.00..90.50 rows=5000 width=12)
         Filter: (a = $1)
   ->  Seq Scan on a1_2 a_2  (cost=0.00..90.50 rows=5000 width=12)
         Filter: (a = $1)
(6 rows)

prepare s4 as select * from a where a = $1 and b = $2 and c = $3;
PREPARE
explain execute s4(1, 1, 2);
                             QUERY PLAN
--------------------------------------------------------------------
 Append  (cost=0.00..120.50 rows=1000 width=12)  << Here
   Subplans Removed: 3
   ->  Seq Scan on a1_1 a_1  (cost=0.00..115.50 rows=1000 width=12)
         Filter: ((a = $1) AND (b = $2) AND (c = $3))
(4 rows)

prepare s2 as select * from a where a = $1 union all select * from a where a = $2;
PREPARE
explain execute s2(1, 1);
                               QUERY PLAN
-------------------------------------------------------------------------
 Append  (cost=0.00..762.00 rows=20000 width=12)
   ->  Append  (cost=0.00..231.00 rows=10000 width=12)  << Here
         Subplans Removed: 2
         ->  Seq Scan on a1_1 a_1  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $1)
         ->  Seq Scan on a1_2 a_2  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $1)
   ->  Append  (cost=0.00..231.00 rows=10000 width=12) << Here
         Subplans Removed: 2
         ->  Seq Scan on a1_1 a_4  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $2)
         ->  Seq Scan on a1_2 a_5  (cost=0.00..90.50 rows=5000 width=12)
               Filter: (a = $2)
(13 rows)

prepare s3 as select * from a where a = $1 union select * from a where a = $2;
PREPARE
explain execute s3(1, 1);
                                  QUERY PLAN
-------------------------------------------------------------------------------
 HashAggregate  (cost=912.00..1112.00 rows=20000 width=12)
   Group Key: a.a, a.b, a.c
   ->  Append  (cost=0.00..762.00 rows=20000 width=12)
         ->  Append  (cost=0.00..231.00 rows=10000 width=12) << Here
               Subplans Removed: 2
               ->  Seq Scan on a1_1 a_1  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $1)
               ->  Seq Scan on a1_2 a_2  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $1)
         ->  Append  (cost=0.00..231.00 rows=10000 width=12) << Here
               Subplans Removed: 2
               ->  Seq Scan on a1_1 a_4  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $2)
               ->  Seq Scan on a1_2 a_5  (cost=0.00..90.50 rows=5000 width=12)
                     Filter: (a = $2)
(15 rows)

-- add a limit to make sure the runtime partitions prune.
explain select * from generate_series(1, 10) i(i) join lateral (select * from a
where a.a = (i.i % 2 + 1) and a.b = (i.i % 2 + 1) and a.c = (i.i % 10) limit 10000000) m on true;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2030.10 rows=10000 width=16)
   ->  Function Scan on generate_series i  (cost=0.00..0.10 rows=10 width=4)
   ->  Limit  (cost=0.00..183.00 rows=1000 width=12)
         ->  Append  (cost=0.00..183.00 rows=1000 width=12)  << Here
               ->  Seq Scan on a1_1 a_1  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
               ->  Seq Scan on a1_2 a_2  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
               ->  Seq Scan on a2_1 a_3  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
               ->  Seq Scan on a2_2 a_4  (cost=0.00..178.00 rows=1000 width=12)
                     Filter: ((c = (i.i % 10)) AND (a = ((i.i % 2) + 1)) AND (b = ((i.i % 2) + 1)))
(12 rows)

Any thoughts about this?  Thanks

--
Best Regards
Andy Fan
Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: MultiXact\SLRU buffers configuration
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: POC: postgres_fdw insert batching