Re: A problem about partitionwise join - Mailing list pgsql-hackers

From Richard Guo
Subject Re: A problem about partitionwise join
Date
Msg-id CAMbWs48AOLfx+weJ6=2U1DuxBsWDnC3LBrQr9QXoDq6TBq+H6w@mail.gmail.com
Whole thread Raw
In response to Re: A problem about partitionwise join  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: A problem about partitionwise join  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers

On Tue, Mar 19, 2024 at 3:40 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Mar 19, 2024 at 8:18 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Mar 7, 2024 at 7:13 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Approach
--------
The equijoin condition between partition keys doesn't appear in the join's restrictilist because of 'best_score' strategy as you explained well in [2]. What if we add an extra score for clauses between partition keys and give preference to equijoin between partition keys? Have you given it a thought? I feel that having an equijoin clause involving partition keys has more usages compared to a clause with any random column. E.g. nextloop may be able to prune partitions from inner relation if the clause contains a partition key.

Hmm, I think this approach won't work in cases where one certain pair of
partition keys has formed an EC that contains pseudoconstants.  In such
cases, the EC machinery will generate restriction clauses like 'pk =
const' rather than any join clauses.

That should be ok and more desirable. Clauses like pk = const will leave only one partition around in each of the joining relations thus PWJ won't be required OR it will be automatic - whichever way you see it.

No, that's not true.  There could be multiple partition keys, and the
particular key involved in the pushed-down restriction 'pk = const' may
not be able to prune away any partitions.  To be concrete, consider the
query:

create table p (k1 int, k2 int, val int) partition by range(k1, k2);
create table p_1 partition of p for values from (1,1) to (10,100);
create table p_2 partition of p for values from (10,100) to (20,200);

set enable_partitionwise_join to on;

explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 5;
               QUERY PLAN
-----------------------------------------
 Hash Join
   Hash Cond: (foo.k1 = bar.k1)
   ->  Append
         ->  Seq Scan on p_1 foo_1
               Filter: (k2 = 5)
         ->  Seq Scan on p_2 foo_2
               Filter: (k2 = 5)
   ->  Hash
         ->  Append
               ->  Seq Scan on p_1 bar_1
                     Filter: (k2 = 5)
               ->  Seq Scan on p_2 bar_2
                     Filter: (k2 = 5)
(13 rows)

Thanks
Richard

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Amit Kapila
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation