Thread: Cost model improvement for run-time partition prune

Cost model improvement for run-time partition prune

From
Andy Fan
Date:
Currently the cost model ignores the initial partition prune and run time
partition prune totally. This impacts includes: 1). The cost of Nest Loop path
is highly overrated. 2). And the rows estimator can be very wrong as well some
time. We can use the following cases to demonstrate.

CREATE TABLE p (c_type INT, v INT) partition by list(c_type);
SELECT 'create table p_'|| i || ' partition of p for values in ('|| i|| ');'
from generate_series(1, 100) i; \gexec
SELECT 'insert into p select ' || i ||', v from generate_series(1, 1000000) v;'
from generate_series(1, 100) i; \gexec
ANALYZE P;
CREATE INDEX on p(v);

Case 1: 

PREPARE s AS
SELECT * FROM generate_series(1, 10) i JOIN p ON i = p.v;
EXPLAIN execute s;

postgres=# EXPLAIN  execute s;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Nested Loop  (cost=0.43..8457.60 rows=1029 width=12)
   ->  Function Scan on generate_series i  (cost=0.00..0.10 rows=10 width=4)
   ->  Append  (cost=0.42..844.75 rows=100 width=8)
         ->  Index Scan using p_1_v_idx on p_1  (cost=0.42..8.44 rows=1 width=8)
               Index Cond: (v = i.i)
         ->  Index Scan using p_2_v_idx on p_2  (cost=0.42..8.44 rows=1 width=8)
               Index Cond: (v = i.i)
         ->  Index Scan using p_3_v_idx on p_3  (cost=0.42..8.44 rows=1 width=8)
               Index Cond: (v = i.i)

         ...

We can see the cost/rows of Append Path is highly overrated. (the rows should be 1
rather than 100,  cost should be 8.44  rather than 844). 

Case 2: 
PREPARE s2 AS
SELECT * FROM p a JOIN p b ON a.v = b.v and a.c_type = $1 and a.v < 10;
EXPLAIN execute s2(3);

postgres=# EXPLAIN execute s2(3);
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Gather  (cost=1000.85..5329.91 rows=926 width=16)
   Workers Planned: 1
   ->  Nested Loop  (cost=0.85..4237.31 rows=545 width=16)
         ->  Parallel Index Scan using p_3_v_idx on p_3 a  (cost=0.42..8.56 rows=5 width=8)
               Index Cond: (v < 10)
               Filter: (c_type = 3)
         ->  Append  (cost=0.42..844.75 rows=100 width=8)
               ->  Index Scan using p_1_v_idx on p_1 b_1  (cost=0.42..8.44 rows=1 width=8)
                     Index Cond: (v = a.v)
               ->  Index Scan using p_2_v_idx on p_2 b_2  (cost=0.42..8.44 rows=1 width=8)
                     Index Cond: (v = a.v)
               ...

set plan_cache_mode = force_generic_plan;
EXPLAIN ANALYZE execute s2(3);

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Gather  (cost=1000.85..312162.57 rows=93085 width=16)
   Workers Planned: 2
   ->  Nested Loop  (cost=0.85..301854.07 rows=38785 width=16)
         ->  Parallel Append  (cost=0.42..857.94 rows=400 width=8)
               Subplans Removed: 99
               ->  Parallel Index Scan using p_3_v_idx on p_3 a_1  (cost=0.42..8.56 rows=5 width=8)
                     Index Cond: (v < 10)
                     Filter: (c_type = $1)
         ->  Append  (cost=0.42..751.49 rows=100 width=8)
             ...

We can see the rows for Gather node changed from 926 to 93085, while the actual
rows is 900 rows. The reason for case 2 is because we adjust the rel->tuples for
plan time partition prune, but we did nothing for initial partition prune. I would like to
aim to fix both of the issues.

The scope I want to cover at the first stage are
1. Only support limited operators like '=',  'in', 'partkey = $1 OR partkey =
   $2'; which means the operators like '>', '<', 'BETWEEN .. AND ' are not
   supported.
2. Only supporting all the partition keys are used in prune quals. for example,
   if we have partkey (p1, p2). but user just have p1 = $1 in the quals.
3. All the other cases should be supported.

The reason I put some limits above is because 1). they are not common. 2). there
are no way to guess the reasonable ratio.

The design principle are:
1). Adjust the AppendPath's cost and rows for both initial partition prune and
run time partition prune in cost_append.  The ratio is just 1/nparts for all the
supported case, even for partkey in ($1, $2, $3).

2). Adjust rel->tuples for initial partition prune only.
3). Use the adjusted AppendPath's cost/rows for sub-partitioned case, around the
cases accumulate_append_subpath.

I have implemented this for 1-level partition and 1 partition key only at [1],
and I have tested it on my real user case, looks the algorithm works great. I am
planning to implement the full version recently.  Any suggestion for the
design/scope part?



--
Best Regards