Thread: BUG #18568: BUG: Result wrong when do group by on partition table!
The following bug has been logged on the website: Bug reference: 18568 Logged by: Webbo Han Email address: 1105066510@qq.com PostgreSQL version: 16.3 Operating system: centos 7.6 Description: First, we create one case-insensitive collation use ICU: ```sql CREATE COLLATION case_insensitive ( provider = icu, locale = 'und-u-ks-level2', deterministic = false ); ``` Then, we create the partition table, meanwhile we set the collation of column c to `case_insensitive`, and set partkey's collation to 'C'. ```sql SET enable_partitionwise_aggregate TO true; SET enable_partitionwise_join TO true; SET max_parallel_workers_per_gather TO 0; SET enable_incremental_sort TO off; CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c collate "C"); CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b', 'c', 'd'); CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f', 'A'); CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C', 'D', 'E'); INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM generate_series(0, 2999) i; ANALYZE pagg_tab; ``` We do group by on the table pagg_tab use `case_insensitive` collation, we hope group key is case-insensitive. but we find the execution result is not what we expected. ```shell postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP BY c collate case_insensitive; c | count ---+------- A | 300 e | 300 E | 300 D | 300 C | 300 B | 300 d | 300 c | 300 b | 300 a | 300 (10 rows) ``` The reason is the function group_by_has_partkey() do not check partkey's collation, that lead to explain error. ```shell postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP BY c collate case_insensitive ; QUERY PLAN -------------------------------------------------------------------------------------- Append (cost=12.00..60.15 rows=10 width=10) -> HashAggregate (cost=12.00..12.02 rows=2 width=10) Group Key: pagg_tab.c -> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00 rows=600 width=2) -> HashAggregate (cost=24.00..24.04 rows=4 width=10) Group Key: pagg_tab_1.c -> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00 rows=1200 width=2) -> HashAggregate (cost=24.00..24.04 rows=4 width=10) Group Key: pagg_tab_2.c -> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00 rows=1200 width=2) (10 rows) ``` So, group_by_has_partkey() need to verify if the partkey's collation matches the groupkey, meanwhile, if groupkey is RelabelType node and it's collation equal to partkey's, it should also set variable `found` to true.
PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道:
The following bug has been logged on the website:
Bug reference: 18568
Logged by: Webbo Han
Email address: 1105066510@qq.com
PostgreSQL version: 16.3
Operating system: centos 7.6
Description:
First, we create one case-insensitive collation use ICU:
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
```
Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
ANALYZE pagg_tab;
```
We do group by on the table pagg_tab use `case_insensitive` collation, we
hope group key is case-insensitive.
but we find the execution result is not what we expected.
```shell
postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP
BY c collate case_insensitive;
c | count
---+-------
A | 300
e | 300
E | 300
D | 300
C | 300
B | 300
d | 300
c | 300
b | 300
a | 300
(10 rows)
```
The reason is the function group_by_has_partkey() do not check partkey's
collation, that lead to explain error.
```shell
postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
pagg_tab GROUP BY c collate case_insensitive ;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=12.00..60.15 rows=10 width=10)
-> HashAggregate (cost=12.00..12.02 rows=2 width=10)
Group Key: pagg_tab.c
-> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00 rows=600
width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_1.c
-> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00
rows=1200 width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_2.c
-> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00
rows=1200 width=2)
(10 rows)
```
So, group_by_has_partkey() need to verify if the partkey's collation matches
the groupkey,
meanwhile, if groupkey is RelabelType node and it's collation equal to
partkey's, it should
also set variable `found` to true.
But I found this when debug into group_by_has_partkey(), as below:
call nodeToString(groupexprs):
VAR : varcollid 16384
call nodeToString(partexpr):
VAR: varcollid 16384
So the collid of partkey and groupexpr is same, so add check here may not fix this issue.
I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info is
set in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.
And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.
Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't find
failed regress.
--
Tender Wang
Attachment
Re: BUG #18568: BUG: Result wrong when do group by on partition table!
From
Aleksander Alekseev
Date:
Hi, > [...] > I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info is > set in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value. > > And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data. > Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, andI don't find > failed regress. ``` +SELECT c collate case_insensitive, count(c) FROM +pagg_tab_col GROUP BY c collate case_insensitive; + c | count +---+------- + e | 600 + D | 600 + C | 600 + B | 600 + A | 600 +(5 rows) ``` Shouldn't we use UPPER(c) and ORDER BY in the test case to make the results deterministic? -- Best regards, Aleksander Alekseev
Tender Wang <tndrwang@gmail.com> 于2024年8月6日周二 16:42写道:
Yeah, I can reproduce $subject on HEAD.But I found this when debug into group_by_has_partkey(), as below:call nodeToString(groupexprs):VAR : varcollid 16384call nodeToString(partexpr):VAR: varcollid 16384So the collid of partkey and groupexpr is same, so add check here may not fix this issue.I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info isset in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't findfailed regress.
The plan seems not what we want, because I forgot to set enable_partitionwise_aggregate to true.
Tender Wang
Attachment
Partkey's collation stored in RelOptInfo->part_scheme, and I use it to fix the bug.
The attachment is my solution!
------------------ 原始邮件 ------------------
发件人: "Tender Wang" <tndrwang@gmail.com>;
发送时间: 2024年8月6日(星期二) 下午4:42
收件人: "狂奔的蜗牛"<1105066510@qq.com>;"pgsql-bugs"<pgsql-bugs@lists.postgresql.org>;
主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道:
The following bug has been logged on the website:
Bug reference: 18568
Logged by: Webbo Han
Email address: 1105066510@qq.com
PostgreSQL version: 16.3
Operating system: centos 7.6
Description:
First, we create one case-insensitive collation use ICU:
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
```
Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
ANALYZE pagg_tab;
```
We do group by on the table pagg_tab use `case_insensitive` collation, we
hope group key is case-insensitive.
but we find the execution result is not what we expected.
```shell
postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP
BY c collate case_insensitive;
c | count
---+-------
A | 300
e | 300
E | 300
D | 300
C | 300
B | 300
d | 300
c | 300
b | 300
a | 300
(10 rows)
```
The reason is the function group_by_has_partkey() do not check partkey's
collation, that lead to explain error.
```shell
postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
pagg_tab GROUP BY c collate case_insensitive ;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=12.00..60.15 rows=10 width=10)
-> HashAggregate (cost=12.00..12.02 rows=2 width=10)
Group Key: pagg_tab.c
-> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00 rows=600
width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_1.c
-> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00
rows=1200 width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_2.c
-> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00
rows=1200 width=2)
(10 rows)
```
So, group_by_has_partkey() need to verify if the partkey's collation matches
the groupkey,
meanwhile, if groupkey is RelabelType node and it's collation equal to
partkey's, it should
also set variable `found` to true.
But I found this when debug into group_by_has_partkey(), as below:
call nodeToString(groupexprs):
VAR : varcollid 16384
call nodeToString(partexpr):
VAR: varcollid 16384
So the collid of partkey and groupexpr is same, so add check here may not fix this issue.
I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info is
set in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.
And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.
Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't find
failed regress.
--
Tender Wang
Attachment
Hi,
I am working to compatible with Oracle in PG,
in oracle, the result used case-insensitive collation is not unique.
There are two oracle's test case:
first case:
```sql
CREATE TABLE group_by_ci_test_1 (c varchar2(20));
INSERT INTO group_by_ci_test_1 VALUES ('a');
INSERT INTO group_by_ci_test_1 VALUES ('a');
INSERT INTO group_by_ci_test_1 VALUES ('A');
SELECT c collate binary_ci, count(c) FROM group_by_ci_test_1 GROUP BY c collate binary_ci;
```
first result
```shell
CCOLLATEBINARY_CI COUNT(C)
-------------------- ----------
a 3
```
second case:
```sql
CREATE TABLE group_by_ci_test_2 (c varchar2(20));
INSERT INTO group_by_ci_test_2 VALUES ('A');
INSERT INTO group_by_ci_test_2 VALUES ('a');
INSERT INTO group_by_ci_test_2 VALUES ('a');
SELECT c collate binary_ci, count(c) FROM group_by_ci_test_2 GROUP BY c collate binary_ci;
```
second result:
```shell
CCOLLATEBINARY_CI COUNT(C)
-------------------- ----------
A 6
```
------------------ 原始邮件 ------------------
发件人: "Aleksander Alekseev" <aleksander@timescale.com>;
发送时间: 2024年8月6日(星期二) 下午5:09
收件人: "pgsql-bugs"<pgsql-bugs@lists.postgresql.org>;
抄送: "Tender Wang"<tndrwang@gmail.com>;"狂奔的蜗牛"<1105066510@qq.com>;
主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
> [...]
> I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info is
> set in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.
>
> And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.
> Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't find
> failed regress.
```
+SELECT c collate case_insensitive, count(c) FROM
+pagg_tab_col GROUP BY c collate case_insensitive;
+ c | count
+---+-------
+ e | 600
+ D | 600
+ C | 600
+ B | 600
+ A | 600
+(5 rows)
```
Shouldn't we use UPPER(c) and ORDER BY in the test case to make the
results deterministic?
--
Best regards,
Aleksander Alekseev
PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道:
The following bug has been logged on the website:
Bug reference: 18568
Logged by: Webbo Han
Email address: 1105066510@qq.com
PostgreSQL version: 16.3
Operating system: centos 7.6
Description:
First, we create one case-insensitive collation use ICU:
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
```
Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
I think above create table again. Should we allow this CREATE TABLE? The partition key
definition are not same with column definiton. Is it better to report error for users?
Tender Wang
狂奔的蜗牛 <1105066510@qq.com> 于2024年8月6日周二 19:33写道:
Partkey's collation stored in RelOptInfo->part_scheme, and I use it to fix the bug.The attachment is my solution!
I look through your patch. Some code is not native for PG. For example:
if (nodeTag(groupexpr) == T_RelabelType)
usually, we use IsA() func.
Tender Wang
I think what I have done in v2 is not the right way. Because partition prune logic first checks whether it is equal
or not, then it will check the collation match or not. So I should not change the set_baserel_partition_key_exprs() logic.
I look through your patch and make some changes.
1.
git am your patch, report warnings, some code format issue.
2.
I removed this branch: if (IsA(groupexpr, RelabelType)).
Because in your added test case, it didn't enter this branch. I didn't figure out what kind of group by clause is RelableType.
You can provide a test case based on the v3 patch.
3. Tweek a little about the test case.
By the way, your last two emails only sent to me, please cc the pgsql-bugs.
狂奔的蜗牛 <1105066510@qq.com> 于2024年8月6日周二 19:33写道:
Partkey's collation stored in RelOptInfo->part_scheme, and I use it to fix the bug.The attachment is my solution!------------------ 原始邮件 ------------------发件人: "Tender Wang" <tndrwang@gmail.com>;发送时间: 2024年8月6日(星期二) 下午4:42收件人: "狂奔的蜗牛"<1105066510@qq.com>;"pgsql-bugs"<pgsql-bugs@lists.postgresql.org>;主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道:The following bug has been logged on the website:
Bug reference: 18568
Logged by: Webbo Han
Email address: 1105066510@qq.com
PostgreSQL version: 16.3
Operating system: centos 7.6
Description:
First, we create one case-insensitive collation use ICU:
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
```
Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
ANALYZE pagg_tab;
```
We do group by on the table pagg_tab use `case_insensitive` collation, we
hope group key is case-insensitive.
but we find the execution result is not what we expected.
```shell
postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP
BY c collate case_insensitive;
c | count
---+-------
A | 300
e | 300
E | 300
D | 300
C | 300
B | 300
d | 300
c | 300
b | 300
a | 300
(10 rows)
```
The reason is the function group_by_has_partkey() do not check partkey's
collation, that lead to explain error.
```shell
postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
pagg_tab GROUP BY c collate case_insensitive ;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=12.00..60.15 rows=10 width=10)
-> HashAggregate (cost=12.00..12.02 rows=2 width=10)
Group Key: pagg_tab.c
-> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00 rows=600
width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_1.c
-> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00
rows=1200 width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_2.c
-> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00
rows=1200 width=2)
(10 rows)
```
So, group_by_has_partkey() need to verify if the partkey's collation matches
the groupkey,
meanwhile, if groupkey is RelabelType node and it's collation equal to
partkey's, it should
also set variable `found` to true.Yeah, I can reproduce $subject on HEAD.But I found this when debug into group_by_has_partkey(), as below:call nodeToString(groupexprs):VAR : varcollid 16384call nodeToString(partexpr):VAR: varcollid 16384So the collid of partkey and groupexpr is same, so add check here may not fix this issue.I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info isset in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't findfailed regress.--Tender Wang
Tender Wang
Attachment
this case will enter `if (IsA(groupexpr, RelabelType))` branch.
We set "C" as groupkey's collation, and it's not equal to column's.
In transformGroupClause(), syntax `COLLATE xxx` will create CollateExpr node, and CollateExpr's arg is Var node.
And then, planner will call eval_const_expressions_mutator() to transform CollateExpr to RelableType if CollateExpr->collOid not equal to Collate->arg's collation.
About V3 patch, PartCollMatchesExprColl() may be not suitable, because collation of partkey must be equal to groupkey's, even though they are all InvalidOid.
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
drop table pagg_tab;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b', 'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f', 'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C', 'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;
EXPLAIN SELECT c collate "C", count(c) FROM pagg_tab GROUP BY c collate "C" ;
SELECT c collate "C", count(c) FROM pagg_tab GROUP BY c collate "C";
```
------------------ 原始邮件 ------------------
发件人: "Tender Wang" <tndrwang@gmail.com>;
发送时间: 2024年8月7日(星期三) 中午12:01
收件人: "狂奔的蜗牛"<1105066510@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@lists.postgresql.org>;
主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
I think what I have done in v2 is not the right way. Because partition prune logic first checks whether it is equal
or not, then it will check the collation match or not. So I should not change the set_baserel_partition_key_exprs() logic.
I look through your patch and make some changes.
1.
git am your patch, report warnings, some code format issue.
2.
I removed this branch: if (IsA(groupexpr, RelabelType)).
Because in your added test case, it didn't enter this branch. I didn't figure out what kind of group by clause is RelableType.
You can provide a test case based on the v3 patch.
3. Tweek a little about the test case.
By the way, your last two emails only sent to me, please cc the pgsql-bugs.
狂奔的蜗牛 <1105066510@qq.com> 于2024年8月6日周二 19:33写道:
Partkey's collation stored in RelOptInfo->part_scheme, and I use it to fix the bug.The attachment is my solution!------------------ 原始邮件 ------------------发件人: "Tender Wang" <tndrwang@gmail.com>;发送时间: 2024年8月6日(星期二) 下午4:42收件人: "狂奔的蜗牛"<1105066510@qq.com>;"pgsql-bugs"<pgsql-bugs@lists.postgresql.org>;主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道:The following bug has been logged on the website:
Bug reference: 18568
Logged by: Webbo Han
Email address: 1105066510@qq.com
PostgreSQL version: 16.3
Operating system: centos 7.6
Description:
First, we create one case-insensitive collation use ICU:
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
```
Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
ANALYZE pagg_tab;
```
We do group by on the table pagg_tab use `case_insensitive` collation, we
hope group key is case-insensitive.
but we find the execution result is not what we expected.
```shell
postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP
BY c collate case_insensitive;
c | count
---+-------
A | 300
e | 300
E | 300
D | 300
C | 300
B | 300
d | 300
c | 300
b | 300
a | 300
(10 rows)
```
The reason is the function group_by_has_partkey() do not check partkey's
collation, that lead to explain error.
```shell
postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
pagg_tab GROUP BY c collate case_insensitive ;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=12.00..60.15 rows=10 width=10)
-> HashAggregate (cost=12.00..12.02 rows=2 width=10)
Group Key: pagg_tab.c
-> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00 rows=600
width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_1.c
-> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00
rows=1200 width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_2.c
-> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00
rows=1200 width=2)
(10 rows)
```
So, group_by_has_partkey() need to verify if the partkey's collation matches
the groupkey,
meanwhile, if groupkey is RelabelType node and it's collation equal to
partkey's, it should
also set variable `found` to true.Yeah, I can reproduce $subject on HEAD.But I found this when debug into group_by_has_partkey(), as below:call nodeToString(groupexprs):VAR : varcollid 16384call nodeToString(partexpr):VAR: varcollid 16384So the collid of partkey and groupexpr is same, so add check here may not fix this issue.I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info isset in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't findfailed regress.--Tender Wang
Tender Wang
狂奔的蜗牛 <1105066510@qq.com> 于2024年8月7日周三 13:35写道:
this case will enter `if (IsA(groupexpr, RelabelType))` branch.We set "C" as groupkey's collation, and it's not equal to column's.In transformGroupClause(), syntax `COLLATE xxx` will create CollateExpr node, and CollateExpr's arg is Var node.And then, planner will call eval_const_expressions_mutator() to transform CollateExpr to RelableType if CollateExpr->collOid not equal to Collate->arg's collation.
Yech. Thanks for the explanation.
Because exprCollation() return the resultcollid if the nodetag is Relabletype, the expr() && PartKeyCollMatchesExprColl() can handle this situation.
So we don't need "if (IsA(groupexpr, RelableTyple) " this branch.
About V3 patch, PartCollMatchesExprColl() may be not suitable, because collation of partkey must be equal to groupkey's, even though they are all InvalidOid.
I'm not sure about what you said. I uesd it because partition prune do this way. I keep it temporarily.
The test case in v3 patch only has EXPLAIN statement. I add SELECT statement and make the result more stable according to Aleksander advices.
Tender Wang
Attachment
PartCollMatchesExprColl() is not strict enough.
If partcoll == InvalidOid and groupcoll != InvalidOid, PartCollMatchesExprColl() return true always.
Just determine whether groupcoll is equal to partcoll, like this "partcoll == groupcoll".
We cannot delete "if (IsA(groupexpr, RelableTyple) " branch,
becasuse if groupexpr is RelabelType and partcoll equal to groupcoll, the "equal(groupexpr, partexpr) && PartKeyCollMatchesExprColl(partcoll, groupexpr_coll)" condition return false!!!
This is not what we expect.
We could rename "groupexpr_coll" to groupcoll, it looks more elegant.
------------------ 原始邮件 ------------------
发件人: "Tender Wang" <tndrwang@gmail.com>;
发送时间: 2024年8月7日(星期三) 下午2:57
收件人: "狂奔的蜗牛"<1105066510@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@lists.postgresql.org>;"aleksander"<aleksander@timescale.com>;
主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
狂奔的蜗牛 <1105066510@qq.com> 于2024年8月7日周三 13:35写道:
this case will enter `if (IsA(groupexpr, RelabelType))` branch.We set "C" as groupkey's collation, and it's not equal to column's.In transformGroupClause(), syntax `COLLATE xxx` will create CollateExpr node, and CollateExpr's arg is Var node.And then, planner will call eval_const_expressions_mutator() to transform CollateExpr to RelableType if CollateExpr->collOid not equal to Collate->arg's collation.
Yech. Thanks for the explanation.
Because exprCollation() return the resultcollid if the nodetag is Relabletype, the expr() && PartKeyCollMatchesExprColl() can handle this situation.
So we don't need "if (IsA(groupexpr, RelableTyple) " this branch.
About V3 patch, PartCollMatchesExprColl() may be not suitable, because collation of partkey must be equal to groupkey's, even though they are all InvalidOid.
I'm not sure about what you said. I uesd it because partition prune do this way. I keep it temporarily.
The test case in v3 patch only has EXPLAIN statement. I add SELECT statement and make the result more stable according to Aleksander advices.
Tender Wang
狂奔的蜗牛 <1105066510@qq.com> 于2024年8月7日周三 16:06写道:
PartCollMatchesExprColl() is not strict enough.If partcoll == InvalidOid and groupcoll != InvalidOid, PartCollMatchesExprColl() return true always.Just determine whether groupcoll is equal to partcoll, like this "partcoll == groupcoll".
I understand what you said. Actually, I keep it just curious when partcoll is InvalidOid. Why it works for partition prune.
Is it not same between check partkey is equal to groupexpr and check partkey is equal to qualclause?
We cannot delete "if (IsA(groupexpr, RelableTyple) " branch,becasuse if groupexpr is RelabelType and partcoll equal to groupcoll, the "equal(groupexpr, partexpr) && PartKeyCollMatchesExprColl(partcoll, groupexpr_coll)" condition return false!!!This is not what we expect.
" if groupexpr is RelabelType and partcoll equal to groupcoll ", according to original logic, will return false in this situation.
Now you think we can support above situation. Am I understand correctly?
We're better to add more test case to cover the code if we're going to support this. The test cases now seem not going
into the RelableTyple branch.
We could rename "groupexpr_coll" to groupcoll, it looks more elegant.
Tender Wang
Hi, On Tue, Aug 6, 2024 at 10:24 PM Tender Wang <tndrwang@gmail.com> wrote: > PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道: >> >> The following bug has been logged on the website: >> >> Bug reference: 18568 >> Logged by: Webbo Han >> Email address: 1105066510@qq.com >> PostgreSQL version: 16.3 >> Operating system: centos 7.6 >> Description: >> >> First, we create one case-insensitive collation use ICU: >> ```sql >> CREATE COLLATION case_insensitive ( >> provider = icu, >> locale = 'und-u-ks-level2', >> deterministic = false >> ); >> ``` >> >> Then, we create the partition table, meanwhile we set the collation of >> column c to `case_insensitive`, >> and set partkey's collation to 'C'. >> ```sql >> SET enable_partitionwise_aggregate TO true; >> SET enable_partitionwise_join TO true; >> SET max_parallel_workers_per_gather TO 0; >> SET enable_incremental_sort TO off; >> CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c >> collate "C"); > > > I think above create table again. Should we allow this CREATE TABLE? The partition key > definition are not same with column definiton. Is it better to report error for users? Not really. As the documentation says, collation can be specified per column or per operation: https://www.postgresql.org/docs/current/collation.html In this case, the operation is partitioning. When you specify the COLLATE clause for a partition key, it means that the partitioning logic, such as partition tuple routing, will use that collation instead of the column-specified or the column type's collation. That's similar to how you can create an index on a column using a different collation than the column's own: create table foo (a text collate case_insensitive); create index on foo (a collate "C"); I do notice that the CREATE TABLE documentation does not describe what the COLLATE clause does when mentioned in a PARTITION BY clause. We'll need to fix that. -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道:
Hi,
On Tue, Aug 6, 2024 at 10:24 PM Tender Wang <tndrwang@gmail.com> wrote:
> PG Bug reporting form <noreply@postgresql.org> 于2024年8月6日周二 15:01写道:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference: 18568
>> Logged by: Webbo Han
>> Email address: 1105066510@qq.com
>> PostgreSQL version: 16.3
>> Operating system: centos 7.6
>> Description:
>>
>> First, we create one case-insensitive collation use ICU:
>> ```sql
>> CREATE COLLATION case_insensitive (
>> provider = icu,
>> locale = 'und-u-ks-level2',
>> deterministic = false
>> );
>> ```
>>
>> Then, we create the partition table, meanwhile we set the collation of
>> column c to `case_insensitive`,
>> and set partkey's collation to 'C'.
>> ```sql
>> SET enable_partitionwise_aggregate TO true;
>> SET enable_partitionwise_join TO true;
>> SET max_parallel_workers_per_gather TO 0;
>> SET enable_incremental_sort TO off;
>> CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
>> collate "C");
>
>
> I think above create table again. Should we allow this CREATE TABLE? The partition key
> definition are not same with column definiton. Is it better to report error for users?
Not really. As the documentation says, collation can be specified per
column or per operation:
https://www.postgresql.org/docs/current/collation.html
In this case, the operation is partitioning. When you specify the
COLLATE clause for a partition key, it means that the partitioning
logic, such as partition tuple routing, will use that collation
instead of the column-specified or the column type's collation.
That's similar to how you can create an index on a column using a
different collation than the column's own:
create table foo (a text collate case_insensitive);
create index on foo (a collate "C");
Thanks for your explanation.
I do notice that the CREATE TABLE documentation does not describe what
the COLLATE clause does when mentioned in a PARTITION BY clause.
We'll need to fix that.
Agree.
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道:
Not really. As the documentation says, collation can be specified per
column or per operation:
https://www.postgresql.org/docs/current/collation.html
In this case, the operation is partitioning. When you specify the
COLLATE clause for a partition key, it means that the partitioning
logic, such as partition tuple routing, will use that collation
instead of the column-specified or the column type's collation.
Since you said partition key had its own collation, and but we used column type's collation in
set_baserel_partition_key_exprs() as below:
partexpr = (Expr *) makeVar(varno, attno,
partkey->parttypid[cnt],
partkey->parttypmod[cnt],
partkey->parttypcoll[cnt], 0);
partkey->parttypid[cnt],
partkey->parttypmod[cnt],
partkey->parttypcoll[cnt], 0);
I think why not we directly use the partition key collation(e.g. partcollation).
Thanks,
Tender Wang
On Tue, Oct 22, 2024 at 5:30 PM Tender Wang <tndrwang@gmail.com> wrote: > Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道: >> Not really. As the documentation says, collation can be specified per >> column or per operation: >> >> https://www.postgresql.org/docs/current/collation.html >> >> In this case, the operation is partitioning. When you specify the >> COLLATE clause for a partition key, it means that the partitioning >> logic, such as partition tuple routing, will use that collation >> instead of the column-specified or the column type's collation. > > > Since you said partition key had its own collation, and but we used column type's collation in > set_baserel_partition_key_exprs() as below: > > partexpr = (Expr *) makeVar(varno, attno, > partkey->parttypid[cnt], > partkey->parttypmod[cnt], > partkey->parttypcoll[cnt], 0); > > I think why not we directly use the partition key collation(e.g. partcollation). That's a good question but I don't immediately know the answer. It seems like it has been like this since the beginning or since the commit that added the RelOptInfo.partexprs field (9140cf8269). -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 17:25写道:
On Tue, Oct 22, 2024 at 5:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道:
>> Not really. As the documentation says, collation can be specified per
>> column or per operation:
>>
>> https://www.postgresql.org/docs/current/collation.html
>>
>> In this case, the operation is partitioning. When you specify the
>> COLLATE clause for a partition key, it means that the partitioning
>> logic, such as partition tuple routing, will use that collation
>> instead of the column-specified or the column type's collation.
>
>
> Since you said partition key had its own collation, and but we used column type's collation in
> set_baserel_partition_key_exprs() as below:
>
> partexpr = (Expr *) makeVar(varno, attno,
> partkey->parttypid[cnt],
> partkey->parttypmod[cnt],
> partkey->parttypcoll[cnt], 0);
>
> I think why not we directly use the partition key collation(e.g. partcollation).
That's a good question but I don't immediately know the answer.
It seems like it has been like this since the beginning or since the
commit that added the RelOptInfo.partexprs field (9140cf8269).
When I looked at the commit(9140cf8269), I found that in 9140cf8269, the PartitionSchemeData struct had
a filed: Oid *parttypcoll; whose value was equal to column type's collation. But now HEAD this field has changed to
Oid *partcollation; whose value is equal to partition key collation. This commit 2af28e6 made above change.
commit 2af28e603319224e87fd35ab62f36ef6de45eaac
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Feb 28 12:16:09 2018 -0500
For partitionwise join, match on partcollation, not parttypcoll.
The previous code considered two tables to have the partition scheme
if the underlying columns had the same collation, but what we
actually need to compare is not the collations associated with the
column but the collation used for partitioning. Fix that.
Robert Haas and Amit Langote
Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Feb 28 12:16:09 2018 -0500
For partitionwise join, match on partcollation, not parttypcoll.
The previous code considered two tables to have the partition scheme
if the underlying columns had the same collation, but what we
actually need to compare is not the collations associated with the
column but the collation used for partitioning. Fix that.
Robert Haas and Amit Langote
Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp
Now I suspect the commit 2af28e6 forgot to fix RelOptInfo's partexprs collation.
If we change the partexprs collation to Partition Key collation, as I said in [1], this bug will not happen.
But I'm not sure this fix will affect other codes that use RelOptInfo's partexprs.
Any thoughts?
Thanks,
Tender Wang
Hi, On Tue, Oct 22, 2024 at 8:14 PM Tender Wang <tndrwang@gmail.com> wrote: > Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 17:25写道: >> On Tue, Oct 22, 2024 at 5:30 PM Tender Wang <tndrwang@gmail.com> wrote: >> > Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道: >> >> Not really. As the documentation says, collation can be specified per >> >> column or per operation: >> >> >> >> https://www.postgresql.org/docs/current/collation.html >> >> >> >> In this case, the operation is partitioning. When you specify the >> >> COLLATE clause for a partition key, it means that the partitioning >> >> logic, such as partition tuple routing, will use that collation >> >> instead of the column-specified or the column type's collation. >> > >> > >> > Since you said partition key had its own collation, and but we used column type's collation in >> > set_baserel_partition_key_exprs() as below: >> > >> > partexpr = (Expr *) makeVar(varno, attno, >> > partkey->parttypid[cnt], >> > partkey->parttypmod[cnt], >> > partkey->parttypcoll[cnt], 0); >> > >> > I think why not we directly use the partition key collation(e.g. partcollation). >> >> That's a good question but I don't immediately know the answer. >> >> It seems like it has been like this since the beginning or since the >> commit that added the RelOptInfo.partexprs field (9140cf8269). > > When I looked at the commit(9140cf8269), I found that in 9140cf8269, the PartitionSchemeData struct had > a filed: Oid *parttypcoll; whose value was equal to column type's collation. But now HEAD this field has changed to > Oid *partcollation; whose value is equal to partition key collation. This commit 2af28e6 made above change. > > commit 2af28e603319224e87fd35ab62f36ef6de45eaac > Author: Robert Haas <rhaas@postgresql.org> > Date: Wed Feb 28 12:16:09 2018 -0500 > > For partitionwise join, match on partcollation, not parttypcoll. > > The previous code considered two tables to have the partition scheme > if the underlying columns had the same collation, but what we > actually need to compare is not the collations associated with the > column but the collation used for partitioning. Fix that. > > Robert Haas and Amit Langote > > Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp Good find. > Now I suspect the commit 2af28e6 forgot to fix RelOptInfo's partexprs collation. > If we change the partexprs collation to Partition Key collation, as I said in [1], this bug will not happen. > But I'm not sure this fix will affect other codes that use RelOptInfo's partexprs. Yeah, I suspect we could maybe just put the collation from partcollation into varcollid of the partition key Var, but what about partition keys that are not simple column references? Would they carry the correct collation such that exprCollation() returns the right collation OID? Also, we'll need to be sure that the semantic of anything that's using partexprs is not broken because of this. -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 20:11写道:
Hi,
On Tue, Oct 22, 2024 at 8:14 PM Tender Wang <tndrwang@gmail.com> wrote:
> Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 17:25写道:
>> On Tue, Oct 22, 2024 at 5:30 PM Tender Wang <tndrwang@gmail.com> wrote:
>> > Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道:
>> >> Not really. As the documentation says, collation can be specified per
>> >> column or per operation:
>> >>
>> >> https://www.postgresql.org/docs/current/collation.html
>> >>
>> >> In this case, the operation is partitioning. When you specify the
>> >> COLLATE clause for a partition key, it means that the partitioning
>> >> logic, such as partition tuple routing, will use that collation
>> >> instead of the column-specified or the column type's collation.
>> >
>> >
>> > Since you said partition key had its own collation, and but we used column type's collation in
>> > set_baserel_partition_key_exprs() as below:
>> >
>> > partexpr = (Expr *) makeVar(varno, attno,
>> > partkey->parttypid[cnt],
>> > partkey->parttypmod[cnt],
>> > partkey->parttypcoll[cnt], 0);
>> >
>> > I think why not we directly use the partition key collation(e.g. partcollation).
>>
>> That's a good question but I don't immediately know the answer.
>>
>> It seems like it has been like this since the beginning or since the
>> commit that added the RelOptInfo.partexprs field (9140cf8269).
>
> When I looked at the commit(9140cf8269), I found that in 9140cf8269, the PartitionSchemeData struct had
> a filed: Oid *parttypcoll; whose value was equal to column type's collation. But now HEAD this field has changed to
> Oid *partcollation; whose value is equal to partition key collation. This commit 2af28e6 made above change.
>
> commit 2af28e603319224e87fd35ab62f36ef6de45eaac
> Author: Robert Haas <rhaas@postgresql.org>
> Date: Wed Feb 28 12:16:09 2018 -0500
>
> For partitionwise join, match on partcollation, not parttypcoll.
>
> The previous code considered two tables to have the partition scheme
> if the underlying columns had the same collation, but what we
> actually need to compare is not the collations associated with the
> column but the collation used for partitioning. Fix that.
>
> Robert Haas and Amit Langote
>
> Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp
Good find.
> Now I suspect the commit 2af28e6 forgot to fix RelOptInfo's partexprs collation.
> If we change the partexprs collation to Partition Key collation, as I said in [1], this bug will not happen.
> But I'm not sure this fix will affect other codes that use RelOptInfo's partexprs.
Yeah, I suspect we could maybe just put the collation from
partcollation into varcollid of the partition key Var, but what about
partition keys that are not simple column references? Would they
carry the correct collation such that exprCollation() returns the
right collation OID?
The value of RelOptInfo's partexprs actually are assigned from RelationData's rd_partkey.
And the rd_partkey is filled from pg_partitioned_table. The partcollation also gets from pg_partitioned_table,
even though the partition key is not simple column reference, the collation is correct.
Also, we'll need to be sure that the semantic of anything that's using
partexprs is not broken because of this.
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 17:25写道:
On Tue, Oct 22, 2024 at 5:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道:
>> Not really. As the documentation says, collation can be specified per
>> column or per operation:
>>
>> https://www.postgresql.org/docs/current/collation.html
>>
>> In this case, the operation is partitioning. When you specify the
>> COLLATE clause for a partition key, it means that the partitioning
>> logic, such as partition tuple routing, will use that collation
>> instead of the column-specified or the column type's collation.
>
>
> Since you said partition key had its own collation, and but we used column type's collation in
> set_baserel_partition_key_exprs() as below:
>
> partexpr = (Expr *) makeVar(varno, attno,
> partkey->parttypid[cnt],
> partkey->parttypmod[cnt],
> partkey->parttypcoll[cnt], 0);
>
> I think why not we directly use the partition key collation(e.g. partcollation).
That's a good question but I don't immediately know the answer.
It seems like it has been like this since the beginning or since the
commit that added the RelOptInfo.partexprs field (9140cf8269).
Hi Amit,
I have another case that can confirm what I suspect. I now think in set_baserel_partition_key_exprs(),
we should use partkey->partcollation[cnt] instead of partkey->parttypcoll[cnt].
I find a wrong result when enable partitionwise_join, as below:
postgres=# set enable_partitionwise_join = on;
SET
postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=17010.00..17010.01 rows=1 width=8)
-> Append (cost=16.50..14760.00 rows=900000 width=0)
-> Hash Join (cost=16.50..2052.00 rows=180000 width=0)
Hash Cond: (t1_1.c = t2_1.c)
-> Seq Scan on pagg_tab_p2 t1_1 (cost=0.00..9.00 rows=600 width=2)
-> Hash (cost=9.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p2 t2_1 (cost=0.00..9.00 rows=600 width=2)
-> Hash Join (cost=33.00..4104.00 rows=360000 width=0)
Hash Cond: (t1_2.c = t2_2.c)
-> Seq Scan on pagg_tab_p3 t1_2 (cost=0.00..18.00 rows=1200 width=2)
-> Hash (cost=18.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p3 t2_2 (cost=0.00..18.00 rows=1200 width=2)
-> Hash Join (cost=33.00..4104.00 rows=360000 width=0)
Hash Cond: (t1_3.c = t2_3.c)
-> Seq Scan on pagg_tab_p1 t1_3 (cost=0.00..18.00 rows=1200 width=2)
-> Hash (cost=18.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t2_3 (cost=0.00..18.00 rows=1200 width=2)
(17 rows)
postgres=# select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
count
--------
900000
(1 row)
SET
postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=17010.00..17010.01 rows=1 width=8)
-> Append (cost=16.50..14760.00 rows=900000 width=0)
-> Hash Join (cost=16.50..2052.00 rows=180000 width=0)
Hash Cond: (t1_1.c = t2_1.c)
-> Seq Scan on pagg_tab_p2 t1_1 (cost=0.00..9.00 rows=600 width=2)
-> Hash (cost=9.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p2 t2_1 (cost=0.00..9.00 rows=600 width=2)
-> Hash Join (cost=33.00..4104.00 rows=360000 width=0)
Hash Cond: (t1_2.c = t2_2.c)
-> Seq Scan on pagg_tab_p3 t1_2 (cost=0.00..18.00 rows=1200 width=2)
-> Hash (cost=18.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p3 t2_2 (cost=0.00..18.00 rows=1200 width=2)
-> Hash Join (cost=33.00..4104.00 rows=360000 width=0)
Hash Cond: (t1_3.c = t2_3.c)
-> Seq Scan on pagg_tab_p1 t1_3 (cost=0.00..18.00 rows=1200 width=2)
-> Hash (cost=18.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t2_3 (cost=0.00..18.00 rows=1200 width=2)
(17 rows)
postgres=# select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
count
--------
900000
(1 row)
postgres=# set enable_partitionwise_join = off;
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=1269.02..1269.03 rows=1 width=8)
-> Merge Join (cost=466.52..1156.52 rows=45000 width=0)
Merge Cond: (t1.c = t2.c)
-> Sort (cost=233.26..240.76 rows=3000 width=2)
Sort Key: t1.c COLLATE case_insensitive
-> Append (cost=0.00..60.00 rows=3000 width=2)
-> Seq Scan on pagg_tab_p2 t1_1 (cost=0.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p3 t1_2 (cost=0.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t1_3 (cost=0.00..18.00 rows=1200 width=2)
-> Sort (cost=233.26..240.76 rows=3000 width=2)
Sort Key: t2.c COLLATE case_insensitive
-> Append (cost=0.00..60.00 rows=3000 width=2)
-> Seq Scan on pagg_tab_p2 t2_1 (cost=0.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p3 t2_2 (cost=0.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t2_3 (cost=0.00..18.00 rows=1200 width=2)
(15 rows)
postgres=# select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
count
---------
1800000
(1 row)
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=1269.02..1269.03 rows=1 width=8)
-> Merge Join (cost=466.52..1156.52 rows=45000 width=0)
Merge Cond: (t1.c = t2.c)
-> Sort (cost=233.26..240.76 rows=3000 width=2)
Sort Key: t1.c COLLATE case_insensitive
-> Append (cost=0.00..60.00 rows=3000 width=2)
-> Seq Scan on pagg_tab_p2 t1_1 (cost=0.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p3 t1_2 (cost=0.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t1_3 (cost=0.00..18.00 rows=1200 width=2)
-> Sort (cost=233.26..240.76 rows=3000 width=2)
Sort Key: t2.c COLLATE case_insensitive
-> Append (cost=0.00..60.00 rows=3000 width=2)
-> Seq Scan on pagg_tab_p2 t2_1 (cost=0.00..9.00 rows=600 width=2)
-> Seq Scan on pagg_tab_p3 t2_2 (cost=0.00..18.00 rows=1200 width=2)
-> Seq Scan on pagg_tab_p1 t2_3 (cost=0.00..18.00 rows=1200 width=2)
(15 rows)
postgres=# select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
count
---------
1800000
(1 row)
I gave a try that use partkey->partcollation[cnt] then the query result is correct when enable_partitionwise_join is on.
And I found some codes only use equal not check the collation, for example, in match_expr_to_partition_keys():
if (equal(lfirst(lc), expr))
lfirst(lc) is partkeys.
I thinks other codes assume that the collation of partexprs in RelOptInfo is same with pg_partitioned_table. But it is not.
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2024年10月23日周三 14:25写道:
And I found some codes only use equal not check the collation, for example, in match_expr_to_partition_keys():if (equal(lfirst(lc), expr))lfirst(lc) is partkeys.I thinks other codes assume that the collation of partexprs in RelOptInfo is same with pg_partitioned_table. But it is not.
match_clause_to_partition_key():
/*
* Partition key match also requires collation match. There may be
* multiple partkeys with the same expression but different
* collations, so failure is NOMATCH.
*/
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
* Partition key match also requires collation match. There may be
* multiple partkeys with the same expression but different
* collations, so failure is NOMATCH.
*/
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
return PARTCLAUSE_NOMATCH;
Others only do equal(), as far as I know.
I tried the patch I provided in [1], and the regression test cases all passed.
--
Thanks,
Tender Wang
On Wed, Oct 23, 2024 at 6:43 PM Tender Wang <tndrwang@gmail.com> wrote: > > I tried the patch I provided in [1], and the regression test cases all passed. > //////////////////////// ComputePartitionAttrs code snippet ELSE { /* Expression */ Node *expr = pelem->expr; char partattname[16]; Assert(expr != NULL); atttype = exprType(expr); attcollation = exprCollation(expr); } /* * Apply collation override if any */ if (pelem->collation) attcollation = get_collation_oid(pelem->collation, false); partcollation[attn] = attcollation; //////////////////////// create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C"); PartitionElem->expr only cover "substr(a,1)". PartitionElem->collation is for explicitly COLLATION clauses. you can also see https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L4556 From the above "collation override" comments, we can say exprCollation(PartitionElem->expr) does not always equal PartitionElem->collation PartitionElem->collation is the true collation OID. so you change in but didn't cover the ELSE branch. else { if (lc == NULL) elog(ERROR, "wrong number of partition key expressions"); /* Re-stamp the expression with given varno. */ partexpr = (Expr *) copyObject(lfirst(lc)); ChangeVarNodes((Node *) partexpr, 1, varno, 0); lc = lnext(partkey->partexprs, lc); } as you mentioned partkey->partcollation is correct collation for PartitionKey. but the ELSE branch, we cannot do else { if (lc == NULL) elog(ERROR, "wrong number of partition key expressions"); /* Re-stamp the expression with given varno. */ partexpr = (Expr *) copyObject(lfirst(lc)); ChangeVarNodes((Node *) partexpr, 1, varno, 0); exprSetCollation(Node *partexpr, Oid collation) lc = lnext(partkey->partexprs, lc); } because in struct inPartitionElem, collation and expr is seperated. that means after set_baserel_partition_key_exprs we still cannot be sure that RelOptInfo->partexprs have the correct PartitionKey collation information. I doubt [1] your change will solve all the problems. [1] https://postgr.es/m/CAHewXNnKLrZYG4iqaYw=uB3XWRrYRZHo7VtcMsbUEbdbajQg2Q@mail.gmail.com
jian he <jian.universality@gmail.com> 于2024年10月23日周三 22:18写道:
On Wed, Oct 23, 2024 at 6:43 PM Tender Wang <tndrwang@gmail.com> wrote:
>
> I tried the patch I provided in [1], and the regression test cases all passed.
>
////////////////////////
ComputePartitionAttrs code snippet
ELSE
{
/* Expression */
Node *expr = pelem->expr;
char partattname[16];
Assert(expr != NULL);
atttype = exprType(expr);
attcollation = exprCollation(expr);
}
/*
* Apply collation override if any
*/
if (pelem->collation)
attcollation = get_collation_oid(pelem->collation, false);
partcollation[attn] = attcollation;
////////////////////////
create table coll_pruning_multi (a text) partition by range (substr(a,
1) collate "POSIX", substr(a, 1) collate "C");
PartitionElem->expr only cover "substr(a,1)".
PartitionElem->collation is for explicitly COLLATION clauses.
you can also see
https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L4556
From the above "collation override" comments, we can say
exprCollation(PartitionElem->expr)
does not always equal PartitionElem->collation
PartitionElem->collation is the true collation OID.
so you change in but didn't cover the ELSE branch.
else
{
if (lc == NULL)
elog(ERROR, "wrong number of partition key expressions");
/* Re-stamp the expression with given varno. */
partexpr = (Expr *) copyObject(lfirst(lc));
ChangeVarNodes((Node *) partexpr, 1, varno, 0);
lc = lnext(partkey->partexprs, lc);
}
as you mentioned partkey->partcollation is correct collation for PartitionKey.
but the ELSE branch, we cannot do
else
{
if (lc == NULL)
elog(ERROR, "wrong number of partition key expressions");
/* Re-stamp the expression with given varno. */
partexpr = (Expr *) copyObject(lfirst(lc));
ChangeVarNodes((Node *) partexpr, 1, varno, 0);
exprSetCollation(Node *partexpr, Oid collation)
lc = lnext(partkey->partexprs, lc);
}
because in struct inPartitionElem, collation and expr is seperated.
that means after set_baserel_partition_key_exprs
we still cannot be sure that RelOptInfo->partexprs have the correct
PartitionKey collation information.
Yeah, you're right. I confirm this again. In set_baserel_partition_key_exprs(),
we copy partkey->partexprs not including partcollation, if it is not simple column reference.
So I think how we can fix this thread issue and the [1] I reported by me using a uniform solution.
By the way, I re-started a new thread [2] to track the issue I found in [1]. I will reply to an email reflecting what you said here and cc you.
Thanks,
Tender Wang
On Thu, Oct 24, 2024 at 11:04 AM Tender Wang <tndrwang@gmail.com> wrote: > jian he <jian.universality@gmail.com> 于2024年10月23日周三 22:18写道: >> >> On Wed, Oct 23, 2024 at 6:43 PM Tender Wang <tndrwang@gmail.com> wrote: >> > >> > I tried the patch I provided in [1], and the regression test cases all passed. >> > >> >> //////////////////////// >> ComputePartitionAttrs code snippet >> ELSE >> { >> /* Expression */ >> Node *expr = pelem->expr; >> char partattname[16]; >> Assert(expr != NULL); >> atttype = exprType(expr); >> attcollation = exprCollation(expr); >> } >> /* >> * Apply collation override if any >> */ >> if (pelem->collation) >> attcollation = get_collation_oid(pelem->collation, false); >> partcollation[attn] = attcollation; >> //////////////////////// >> >> create table coll_pruning_multi (a text) partition by range (substr(a, >> 1) collate "POSIX", substr(a, 1) collate "C"); >> PartitionElem->expr only cover "substr(a,1)". >> PartitionElem->collation is for explicitly COLLATION clauses. >> you can also see >> https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L4556 >> >> From the above "collation override" comments, we can say >> exprCollation(PartitionElem->expr) >> does not always equal PartitionElem->collation >> PartitionElem->collation is the true collation OID. >> >> so you change in but didn't cover the ELSE branch. >> >> else >> { >> if (lc == NULL) >> elog(ERROR, "wrong number of partition key expressions"); >> /* Re-stamp the expression with given varno. */ >> partexpr = (Expr *) copyObject(lfirst(lc)); >> ChangeVarNodes((Node *) partexpr, 1, varno, 0); >> lc = lnext(partkey->partexprs, lc); >> } >> >> as you mentioned partkey->partcollation is correct collation for PartitionKey. >> but the ELSE branch, we cannot do >> else >> { >> if (lc == NULL) >> elog(ERROR, "wrong number of partition key expressions"); >> /* Re-stamp the expression with given varno. */ >> partexpr = (Expr *) copyObject(lfirst(lc)); >> ChangeVarNodes((Node *) partexpr, 1, varno, 0); >> exprSetCollation(Node *partexpr, Oid collation) >> lc = lnext(partkey->partexprs, lc); >> } >> >> because in struct inPartitionElem, collation and expr is seperated. >> that means after set_baserel_partition_key_exprs >> we still cannot be sure that RelOptInfo->partexprs have the correct >> PartitionKey collation information. > > > Yeah, you're right. I confirm this again. In set_baserel_partition_key_exprs(), > we copy partkey->partexprs not including partcollation, if it is not simple column reference. > > So I think how we can fix this thread issue and the [1] I reported by me using a uniform solution. > By the way, I re-started a new thread [2] to track the issue I found in [1]. I will reply to an email reflecting what yousaid here and cc you. > > [1] https://www.postgresql.org/message-id/CAHewXNnyWUEmdHrRK3yg4k2TzSbb5WnkKLWxyO%2BOVZPhPFX7ew%40mail.gmail.com > > [2] https://www.postgresql.org/message-id/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr%2BPS2Dwg%40mail.gmail.com FTR, the patch to fix the bug reported here is being discussed at [2]. -- Thanks, Amit Langote
On Tue, Nov 5, 2024 at 4:57 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Thu, Oct 24, 2024 at 11:04 AM Tender Wang <tndrwang@gmail.com> wrote: > > jian he <jian.universality@gmail.com> 于2024年10月23日周三 22:18写道: > >> > >> On Wed, Oct 23, 2024 at 6:43 PM Tender Wang <tndrwang@gmail.com> wrote: > >> > > >> > I tried the patch I provided in [1], and the regression test cases all passed. > >> > > >> > >> //////////////////////// > >> ComputePartitionAttrs code snippet > >> ELSE > >> { > >> /* Expression */ > >> Node *expr = pelem->expr; > >> char partattname[16]; > >> Assert(expr != NULL); > >> atttype = exprType(expr); > >> attcollation = exprCollation(expr); > >> } > >> /* > >> * Apply collation override if any > >> */ > >> if (pelem->collation) > >> attcollation = get_collation_oid(pelem->collation, false); > >> partcollation[attn] = attcollation; > >> //////////////////////// > >> > >> create table coll_pruning_multi (a text) partition by range (substr(a, > >> 1) collate "POSIX", substr(a, 1) collate "C"); > >> PartitionElem->expr only cover "substr(a,1)". > >> PartitionElem->collation is for explicitly COLLATION clauses. > >> you can also see > >> https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L4556 > >> > >> From the above "collation override" comments, we can say > >> exprCollation(PartitionElem->expr) > >> does not always equal PartitionElem->collation > >> PartitionElem->collation is the true collation OID. > >> > >> so you change in but didn't cover the ELSE branch. > >> > >> else > >> { > >> if (lc == NULL) > >> elog(ERROR, "wrong number of partition key expressions"); > >> /* Re-stamp the expression with given varno. */ > >> partexpr = (Expr *) copyObject(lfirst(lc)); > >> ChangeVarNodes((Node *) partexpr, 1, varno, 0); > >> lc = lnext(partkey->partexprs, lc); > >> } > >> > >> as you mentioned partkey->partcollation is correct collation for PartitionKey. > >> but the ELSE branch, we cannot do > >> else > >> { > >> if (lc == NULL) > >> elog(ERROR, "wrong number of partition key expressions"); > >> /* Re-stamp the expression with given varno. */ > >> partexpr = (Expr *) copyObject(lfirst(lc)); > >> ChangeVarNodes((Node *) partexpr, 1, varno, 0); > >> exprSetCollation(Node *partexpr, Oid collation) > >> lc = lnext(partkey->partexprs, lc); > >> } > >> > >> because in struct inPartitionElem, collation and expr is seperated. > >> that means after set_baserel_partition_key_exprs > >> we still cannot be sure that RelOptInfo->partexprs have the correct > >> PartitionKey collation information. > > > > > > Yeah, you're right. I confirm this again. In set_baserel_partition_key_exprs(), > > we copy partkey->partexprs not including partcollation, if it is not simple column reference. > > > > So I think how we can fix this thread issue and the [1] I reported by me using a uniform solution. > > By the way, I re-started a new thread [2] to track the issue I found in [1]. I will reply to an email reflecting whatyou said here and cc you. > > > > [1] https://www.postgresql.org/message-id/CAHewXNnyWUEmdHrRK3yg4k2TzSbb5WnkKLWxyO%2BOVZPhPFX7ew%40mail.gmail.com > > > > [2] https://www.postgresql.org/message-id/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr%2BPS2Dwg%40mail.gmail.com > > FTR, the patch to fix the bug reported here is being discussed at [2]. I've pushed the fix for this issue to all the branches down to 12. Thanks for the report and the analyses. -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2024年11月8日周五 17:28写道:
I've pushed the fix for this issue to all the branches down to 12.
Thanks for the report and the analyses.
Thanks for pushing these collation-related fixed patches.
Thanks,
Tender Wang