Re: Order of columns in GROUP BY is significant to the planner. - Mailing list pgsql-bugs

From David Rowley
Subject Re: Order of columns in GROUP BY is significant to the planner.
Date
Msg-id CAKJS1f-ic8+DVSt1GJCG1njkbS5ui4d4-tmmkhSt0Aq2q+B_uQ@mail.gmail.com
Whole thread Raw
In response to Order of columns in GROUP BY is significant to the planner.  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Order of columns in GROUP BY is significant to the planner.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On 21 December 2017 at 19:16, Jeff Janes <jeff.janes@gmail.com> wrote:
> EXPLAIN (BUFFERS, ANALYZE, timing off) SELECT SUM(col5), col2, col1 FROM foo
> WHERE col3 = '4' AND col4 <= '0.9' GROUP BY col2,col1;
>
> EXPLAIN (BUFFERS, ANALYZE, timing off) SELECT SUM(col5), col2, col1 FROM foo
> WHERE col3 = '4' AND col4 <= '0.9' GROUP BY col1,col2;
>
> The first one inserts a sort node on col1,col2 before doing the Group
> Aggregate.  The second one uses the ordering of the tuples derived from the
> index scan to do the Group Aggregate directly.  Isn't it surprising that the
> order of the columns in the GROUP BY has to be same as the order in the
> index definition in order to make maximal use of the index?  Is that a bug?

Not a bug, just the number of combinations to try could end up growing
very large and then you'd likely want or need to re-perform the join
search with each order and keep the cheapest one. Likely it would just
be too slow, especially when there are many tables in the join search
and many columns in the GROUP BY.

There's a comment at the top of preprocess_groupclause() that explains
that we don't do it, it just does not explain why we don't. It really
just mentions that hash agg is probably better in most cases, which
seems like a bit of a cop-out. It likely should just explain that we
err on the side of caution as the planning effort may often outweigh
the benefits we get during execution.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-bugs by date:

Previous
From: Jeff Janes
Date:
Subject: Order of columns in GROUP BY is significant to the planner.
Next
From: PG Bug reporting form
Date:
Subject: BUG #14988: application server couldnot contacted