Re: Enable partitionwise join for partition keys wrapped by RelabelType - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: Enable partitionwise join for partition keys wrapped by RelabelType
Date
Msg-id 0132f1c4-f701-4e2d-9022-e3e95cdb01d5@gmail.com
Whole thread Raw
In response to Re: Enable partitionwise join for partition keys wrapped by RelabelType  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
On 25/02/26 09:46, jian he wrote:
> On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> Although this make sense to me I see difference in row estimation using
>> your v2 patch for the following example:
>>
>> ...
>>
>>
>> V1 patch:
>>
>> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
>>   estimated | actual
>> -----------+--------
>>          12 |     12
>>
>> V2 patch:
>>
>> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
>>   estimated | actual
>> -----------+--------
>>         144 |     12
>>
>> I've also tried to make the partitions of t1 and t2 as foreign tables
>> and I got the same row estimation difference.
>>
>> I'm just wondering if we are missing something?
>>
> Hi.
> 
> in function process_equivalence, we have:
>      /*
>       * Ensure both input expressions expose the desired collation (their types
>       * should be OK already); see comments for canonicalize_ec_expression.
>       */
>      item1 = canonicalize_ec_expression(item1,
>                                         exprType((Node *) item1),
>                                         collation);
>      item2 = canonicalize_ec_expression(item2,
>                                         exprType((Node *) item2),
>                                         collation);
> 
> 
> Let's simplify the test case.
> CREATE COLLATION case_insensitive (provider = icu, locale =
> '@colStrength=secondary', deterministic = false);
> CREATE DOMAIN d_txt1 AS text collate case_insensitive;
> CREATE TABLE t3 (a int, b int, c text);
> INSERT INTO t3 SELECT i  % 12, i, to_char(i/50, 'FM0000') FROM
> generate_series(0, 599, 2) i;
> ANALYZE t3;
> CREATE TABLE t4 (a int, b int, c d_txt1);
> INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM
> generate_series(0, 599, 3) i;
> ANALYZE t4;
> EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c;
> 
> The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will
> produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1
> uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true,
> therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as
> ""GROUP BY t3.c".
> 
> However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return
> false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it
> multiply these two distinct numbers.  IMHO, That's the reason for
> estimate number 144 versus 12.
> 

Ok, that make sense to me, thanks for pointing this out.

So do you think that v3 attached on [1] correctly address this?

> Please also see the comments in canonicalize_ec_expression.
> Actually, I think the comments in canonicalize_ec_expression discourage strip
> RelabelType nodes when RelabelType->resultcollid differs from the collation of
> RelabelType->arg.
> 

Agree, thanks.

[1] 
https://www.postgresql.org/message-id/DFZHIGROJHVS.25OYGENTHBLSM%40gmail.com

--
Matheus Alcantara
EDB: https://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: AIX support
Next
From: Masahiko Sawada
Date:
Subject: Re: Initial COPY of Logical Replication is too slow