Thread: Equality of columns isn't taken in account when performing partitionpruning

Equality of columns isn't taken in account when performing partitionpruning

From
Konstantin Knizhnik
Date:
I noticed some limitation of Postgres optimizer: it doesn't take in 
account information about equality of columns of joined tables when 
performs partition pruning:

create table i (pk integer primary key) partition by range(pk);
create table i_1 partition of i for values from (0) to (10);
create table i_2 partition of i for values from (10) to (20);
create table o (pk integer primary key);

explain select * from o join i on o.pk = i.pk where i.pk between 0 and 9;
explain select * from o join i on o.pk = i.pk where o.pk between 0 and 9;


Plan for the first query is optimal and access only affected partition i_1:
  Nested Loop  (cost=4.44..53.26 rows=13 width=8)
    ->  Bitmap Heap Scan on i_1 i  (cost=4.29..15.02 rows=13 width=4)
          Recheck Cond: ((pk >= 0) AND (pk <= 9))
          ->  Bitmap Index Scan on i_1_pkey  (cost=0.00..4.29 rows=13 
width=0)
                Index Cond: ((pk >= 0) AND (pk <= 9))
    ->  Index Only Scan using o_pkey on o  (cost=0.15..2.94 rows=1 width=4)
          Index Cond: (pk = i.pk)


But plan for the second query (although it is doing actually the same 
things) affects both partitions:

  Nested Loop  (cost=4.44..91.89 rows=26 width=8)
    ->  Bitmap Heap Scan on o  (cost=4.29..15.02 rows=13 width=4)
          Recheck Cond: ((pk >= 0) AND (pk <= 9))
          ->  Bitmap Index Scan on o_pkey  (cost=0.00..4.29 rows=13 width=0)
                Index Cond: ((pk >= 0) AND (pk <= 9))
    ->  Append  (cost=0.15..5.89 rows=2 width=4)
          ->  Index Only Scan using i_1_pkey on i_1 (cost=0.15..2.94 
rows=1 width=4)
                Index Cond: (pk = o.pk)
          ->  Index Only Scan using i_2_pkey on i_2 (cost=0.15..2.94 
rows=1 width=4)
                Index Cond: (pk = o.pk)


I have not investigated yet how difficult it will be to fix it.
May be there is some person more familiar with this part of optimizer 
than I.
But if nobody wants to look at it, I can try to investigate it myself.





On Wed, 29 Apr 2020 at 20:15, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
>
> I noticed some limitation of Postgres optimizer: it doesn't take in
> account information about equality of columns of joined tables when
> performs partition pruning:
>
> create table i (pk integer primary key) partition by range(pk);
> create table i_1 partition of i for values from (0) to (10);
> create table i_2 partition of i for values from (10) to (20);
> create table o (pk integer primary key);
>
> explain select * from o join i on o.pk = i.pk where i.pk between 0 and 9;
> explain select * from o join i on o.pk = i.pk where o.pk between 0 and 9;
>
>
> Plan for the first query is optimal and access only affected partition i_1:

It's not a bug. It's just an optimisation that we don't do. Also, it's
not limited to partition pruning, as if the "i" table had been a
normal table with an index then you may also want to complain that
when your hash join hashes on "i" (perhaps the planner thinks it still
has fewer rows than "o") that it does not perform the "i.pk BETWEEN 0
AND 9" during the scan with the 2nd query.

I did talk about a possible way to fix this back in [1], but I think
we'll need to build some infrastructure to optimise finding matching
expressions in a List beforehand.  The code I had put together was a
bit inefficient due to our lack of an efficient way to find an
expression in a List.

David

[1] https://www.postgresql.org/message-id/flat/30810.1449335261%40sss.pgh.pa.us#906319f5e212fc3a6a682f16da079f04