BUG #18568: BUG: Result wrong when do group by on partition table! - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18568: BUG: Result wrong when do group by on partition table!
Date
Msg-id 18568-2a9afb6b9f7e6ed3@postgresql.org
Whole thread Raw
Responses Re: BUG #18568: BUG: Result wrong when do group by on partition table!
Re: BUG #18568: BUG: Result wrong when do group by on partition table!
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18568
Logged by:          Webbo Han
Email address:      1105066510@qq.com
PostgreSQL version: 16.3
Operating system:   centos 7.6
Description:

First, we create one case-insensitive collation use ICU:
```sql
    CREATE COLLATION case_insensitive (
        provider = icu,
        locale = 'und-u-ks-level2',
        deterministic = false
    );
```

Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
    SET enable_partitionwise_aggregate TO true;
    SET enable_partitionwise_join TO true;
    SET max_parallel_workers_per_gather TO 0;
    SET enable_incremental_sort TO off;
    CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
    CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
    CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
    CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
    INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
    ANALYZE pagg_tab;
```

We do group by on the table pagg_tab use `case_insensitive` collation, we
hope group key is case-insensitive.
but we find the execution result is not what we expected.
```shell
    postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP
BY c collate case_insensitive;
     c | count
    ---+-------
     A |   300
     e |   300
     E |   300
     D |   300
     C |   300
     B |   300
     d |   300
     c |   300
     b |   300
     a |   300
    (10 rows)
```

The reason is the function group_by_has_partkey() do not check partkey's
collation, that lead to explain error.
```shell
    postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
pagg_tab GROUP BY c collate case_insensitive ;
                                          QUERY PLAN
    --------------------------------------------------------------------------------------
     Append  (cost=12.00..60.15 rows=10 width=10)
       ->  HashAggregate  (cost=12.00..12.02 rows=2 width=10)
             Group Key: pagg_tab.c
             ->  Seq Scan on pagg_tab_p2 pagg_tab  (cost=0.00..9.00 rows=600
width=2)
       ->  HashAggregate  (cost=24.00..24.04 rows=4 width=10)
             Group Key: pagg_tab_1.c
             ->  Seq Scan on pagg_tab_p3 pagg_tab_1  (cost=0.00..18.00
rows=1200 width=2)
       ->  HashAggregate  (cost=24.00..24.04 rows=4 width=10)
             Group Key: pagg_tab_2.c
             ->  Seq Scan on pagg_tab_p1 pagg_tab_2  (cost=0.00..18.00
rows=1200 width=2)
    (10 rows)
```

So, group_by_has_partkey() need to verify if the partkey's collation matches
the groupkey,
meanwhile, if groupkey is RelabelType node and it's collation equal to
partkey's, it should 
also set variable `found` to true.


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18567: ERROR: cache lookup failed for attribute 1 of relation 74580
Next
From: PG Bug reporting form
Date:
Subject: BUG #18569: Memory leak in Postgres Enterprise server