And finally, I've got the synthetic test:
CREATE TABLE mess_grouping (x integer, y integer, z integer, w integer,
f integer);
INSERT INTO mess_grouping (x,y,z,w,f) (SELECT x%10, x % 2, x%2, 2, x%10
FROM generate_series(1,100) AS x);
ANALYZE mess_grouping;
SET enable_nestloop = 'off';
SET enable_hashjoin = 'off';
SET enable_hashagg = 'off';
SET enable_group_by_reordering = 'on';
SELECT c1.z, c1.w, string_agg(''::text, repeat(''::text, c1.f) ORDER BY
c1.x,c1.y)
FROM mess_grouping c1 JOIN mess_grouping c2 ON (c1.x = c2.f)
GROUP BY c1.w, c1.z;
SET enable_group_by_reordering = 'off';
SELECT c1.z, c1.w, string_agg(''::text, repeat(''::text, c1.f) ORDER BY
c1.x,c1.y)
FROM mess_grouping c1 JOIN mess_grouping c2 ON (c1.x = c2.f)
GROUP BY c1.w, c1.z;
DROP TABLE IF EXISTS mess_grouping CASCADE;
You can see here, that first query execution produces:
z | w | string_agg
---+---+------------
0 | 2 |
1 | 2 |
0 | 2 |
1 | 2 |
0 | 2 |
1 | 2 |
0 | 2 |
1 | 2 |
0 | 2 |
1 | 2 |
(10 rows)
and second execution gives correct result:
z | w | string_agg
---+---+------------
0 | 2 |
1 | 2 |
(2 rows)
The simple fix is in the attachment. But I'm not sure we should fix
GROUP-BY optimization instead of the more general issue.
The source of the problem is root->group_pathkeys, which contains
grouping pathkeys and aggregate pathkeys. For now, their 'sortref'
values could intersect, and we can differ which one references the query
target list and which one the target list of the aggregate.
So, I would like to get advice here: should we make a quick fix here, or
is such a mess in the sortref values not a mess and designed for some
purposes?
--
regards,
Andrei Lepikhov
Postgres Professional