Re: A problem about partitionwise join - Mailing list pgsql-hackers

From Richard Guo
Subject Re: A problem about partitionwise join
Date
Msg-id CAMbWs48C0ux9mc+AS7D7DP6CyJxjQS=2mxQTpMJJKEeTua5wbQ@mail.gmail.com
Whole thread Raw
In response to Re: A problem about partitionwise join  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: A problem about partitionwise join  (Jaime Casanova <jcasanov@systemguards.com.ec>)
List pgsql-hackers

On Fri, Nov 27, 2020 at 8:05 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Nov 10, 2020 at 2:43 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Thanks Anastasia. I've rebased the patch with latest master.
>
> To recap, the problem we are fixing here is when generating join clauses
> from equivalence classes, we only select the joinclause with the 'best
> score', or the first joinclause with a score of 3. This may cause us to
> miss some joinclause on partition keys and thus fail to generate
> partitionwise join.
>
> The initial idea for the fix is to create all the RestrictInfos from ECs
> in order to check whether there exist equi-join conditions involving
> pairs of matching partition keys of the relations being joined for all
> partition keys. And then Tom proposed a much better idea which leverages
> function exprs_known_equal() to tell whether the partkeys can be found
> in the same eclass, which is the current implementation in the latest
> patch.
>

In the example you gave earlier, the equi join on partition key was
there but it was replaced by individual constant assignment clauses.
So if we keep the original restrictclause in there with a new flag
indicating that it's redundant, have_partkey_equi_join will still be
able to use it without much change. Depending upon where all we need
to use avoid restrictclauses with the redundant flag, this might be an
easier approach. However, with Tom's idea partition-wise join may be
used even when there is no equi-join between partition keys but there
are clauses like pk = const for all tables involved and const is the
same for all such tables.

Correct. So with Tom's idea partition-wise join can cope with clauses
such as 'foo.k1 = bar.k1 and foo.k2 = 16 and bar.k2 = 16'.
 

In the spirit of small improvement made to the performance of
have_partkey_equi_join(), pk_has_clause should be renamed as
pk_known_equal and pks_known_equal as num_equal_pks.

Thanks for the suggestion. Will do that in the new version of patch.
 

The loop traversing the partition keys at a given position, may be
optimized further if we pass lists to exprs_known_equal() which in
turns checks whether one expression from each list is member of a
given EC. This will avoid traversing all equivalence classes for each
partition key expression, which can be a huge improvement when there
are many ECs. But I think if one of the partition key expression at a
given position is member of an equivalence class all the other
partition key expressions at that position should be part of that
equivalence class since there should be an equi-join between those. So
the loop in loop may not be required to start with.

Good point. Quote from one of Tom's earlier emails,
 "It seems at least plausible that in the cases we care about, all the
 partkeys on each side would be in the same eclasses anyway, so that
 comparing the first members of each list would be sufficient."

But I'm not sure if this holds true in all cases. However, since each
base relation within the join contributes only one partexpr, the number
of partexprs would only be equal to the join degree. Thus the loop in
loop may not be a big problem?

PS. Sorry for delaying so long time!

Thanks
Richard
Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Have I found an interval arithmetic bug?
Next
From: Ronan Dunklau
Date:
Subject: Re: ORDER BY pushdowns seem broken in postgres_fdw