Re: A problem about partitionwise join - Mailing list pgsql-hackers

From Richard Guo
Subject Re: A problem about partitionwise join
Date
Msg-id CAMbWs48EPxJeErxDcV-gSTYeSwsd1ri37Q_ZNxt5NSWEpP1NiQ@mail.gmail.com
Whole thread Raw
In response to Re: A problem about partitionwise join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A problem about partitionwise join  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On Sun, Apr 5, 2020 at 4:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <riguo@pivotal.io> writes:
> Rebased the patch with latest master and also addressed the test case
> failure reported by PostgreSQL Patch Tester.

I looked this patch over, but I don't like it too much: it seems very
brute-force (and badly under-commented).  Creating all those extra
RestrictInfos isn't too cheap in itself, plus they'll jam up the
equivalence-class machinery for future tests.

Thanks for the review.
 

There is already something in equivclass.c that would almost do what
we want here: exprs_known_equal() would tell us whether the partkeys
can be found in the same eclass, without having to generate data
structures along the way.  The current implementation is not watertight
because it doesn't check opclass semantics, but that consideration
can be bolted on readily enough.  So that leads me to something like
the attached.

I looked through this patch and it's much more elegant than the previous
one. Thank you for working on it.

For partkeys which fail to be identified as equal by looking through
restrictlist, it's a good idea to check them in ECs with the help of
exprs_known_equal().

I have some concern about we only check non-nullable partexprs. Is it
possible that two nullable partexprs come from the same EC? I tried to
give an example but failed.
 

One argument that could be made against this approach is that if there
are a lot of partkey expressions, this requires O(N^2) calls to
exprs_known_equal, something that's already not too cheap.  I think
that that's not a big problem because the number of partkey expressions
would only be equal to the join degree (ie it doesn't scale with the
number of partitions of the baserels) ... but maybe I'm wrong about
that?

You are right. According to how partexpr is formed for joinrel in
set_joinrel_partition_key_exprs(), each base relation within the join
contributes one partexpr, so the number of partexprs would be equal to
the join degree.
 
  I also wonder if it's really necessary to check every pair
of partkey expressions.  It seems at least plausible that in the
cases we care about, all the partkeys on each side would be in the same
eclasses anyway, so that comparing the first members of each list would
be sufficient.  But I haven't beat on that point.

Not sure about it. But cannot come out with a counterexample.

Thanks
Richard 

pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: FETCH FIRST clause WITH TIES option
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] advanced partition matching algorithm forpartition-wise join