Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different. - Mailing list pgsql-hackers

From Tender Wang
Subject Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.
Date
Msg-id CAHewXNk0U+FJAXT1zm6LtZrojYCmnhc_0=3DjvJ3LvZTsjsKAw@mail.gmail.com
Whole thread Raw
In response to Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.  (Tender Wang <tndrwang@gmail.com>)
List pgsql-hackers


Amit Langote <amitlangote09@gmail.com> 于2024年10月31日周四 21:09写道:
On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> On Wed, Oct 30, 2024 at 11:58 AM jian he <jian.universality@gmail.com> wrote:
> >
> > I missed a case when column collation and partition key collation are
> > the same and indeterministic.
> > that should be fine for partition-wise join.
> > so v2 attached.
> >
> > have_partkey_equi_join, match_expr_to_partition_keys didn't do any
> > collation related check.
> > propose v2 change disallow partitionwise join for  case when
> > column collation is indeterministic *and* is differ from partition
> > key's collation.
> >
> > the attached partition_wise_join_collation.sql is the test script.
> > you may use it to compare with the master behavior.
>
> What if the partkey collation and column collation are both deterministic,
> but with different sort order?
>
> I'm not familiar with this part of the code base, but it seems to me the
> partition wise join should use partkey collation instead of column collation,
> because it's the partkey collation that decides which partition a row to
> be dispatched.
>
> What Jian proposed is also reasonable but seems another aspect of $subject?

I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.

+           {
+               Oid     colloid =  exprCollation((Node *) expr);
+
+               if ((partcoll != colloid) &&
+                    OidIsValid(colloid) &&
+                    !get_collation_isdeterministic(colloid))
+                   *coll_incompatiable = true;

I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.

Me, too. 

Attached 0002 is what I came up with.  One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1

Agree.  

(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.

In have_partkey_equi_join()
...
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
{
         Oid partcoll2 = rel1->part_scheme->partcollation[ipk];
         ....

I think we should use rel2 here, not rel1.


--
Thanks,
Tender Wang

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Eager aggregation, take 3
Next
From: Tender Wang
Date:
Subject: Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.