Re: Enable partitionwise join for partition keys wrapped by RelabelType - Mailing list pgsql-hackers
| From | Matheus Alcantara |
|---|---|
| Subject | Re: Enable partitionwise join for partition keys wrapped by RelabelType |
| Date | |
| Msg-id | 0132f1c4-f701-4e2d-9022-e3e95cdb01d5@gmail.com Whole thread Raw |
| In response to | Re: Enable partitionwise join for partition keys wrapped by RelabelType (jian he <jian.universality@gmail.com>) |
| List | pgsql-hackers |
On 25/02/26 09:46, jian he wrote: > On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara > <matheusssilv97@gmail.com> wrote: >> >> Although this make sense to me I see difference in row estimation using >> your v2 patch for the following example: >> >> ... >> >> >> V1 patch: >> >> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$); >> estimated | actual >> -----------+-------- >> 12 | 12 >> >> V2 patch: >> >> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$); >> estimated | actual >> -----------+-------- >> 144 | 12 >> >> I've also tried to make the partitions of t1 and t2 as foreign tables >> and I got the same row estimation difference. >> >> I'm just wondering if we are missing something? >> > Hi. > > in function process_equivalence, we have: > /* > * Ensure both input expressions expose the desired collation (their types > * should be OK already); see comments for canonicalize_ec_expression. > */ > item1 = canonicalize_ec_expression(item1, > exprType((Node *) item1), > collation); > item2 = canonicalize_ec_expression(item2, > exprType((Node *) item2), > collation); > > > Let's simplify the test case. > CREATE COLLATION case_insensitive (provider = icu, locale = > '@colStrength=secondary', deterministic = false); > CREATE DOMAIN d_txt1 AS text collate case_insensitive; > CREATE TABLE t3 (a int, b int, c text); > INSERT INTO t3 SELECT i % 12, i, to_char(i/50, 'FM0000') FROM > generate_series(0, 599, 2) i; > ANALYZE t3; > CREATE TABLE t4 (a int, b int, c d_txt1); > INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM > generate_series(0, 599, 3) i; > ANALYZE t4; > EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c; > > The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will > produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1 > uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true, > therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as > ""GROUP BY t3.c". > > However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return > false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it > multiply these two distinct numbers. IMHO, That's the reason for > estimate number 144 versus 12. > Ok, that make sense to me, thanks for pointing this out. So do you think that v3 attached on [1] correctly address this? > Please also see the comments in canonicalize_ec_expression. > Actually, I think the comments in canonicalize_ec_expression discourage strip > RelabelType nodes when RelabelType->resultcollid differs from the collation of > RelabelType->arg. > Agree, thanks. [1] https://www.postgresql.org/message-id/DFZHIGROJHVS.25OYGENTHBLSM%40gmail.com -- Matheus Alcantara EDB: https://www.enterprisedb.com
pgsql-hackers by date: