Thread: GROUP BY ALL

GROUP BY ALL

From
Andrey Borodin
Date:
Hi hackers!

I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.
I always was writing something like
    select datname, usename, count(*) from pg_stat_activity group by 1,2;
and then rewriting to
    select datname, usename, query, count(*) from pg_stat_activity group by 1,2;
and then "aaahhhh, add a number at the end".

With the proposed feature I can write just
    select datname, usename, count(*) from pg_stat_activity group by all;

PFA very dummy implementation just for a discussion. I think we can
add all non-aggregating targets.

What do you think?


Best regards, Andrey Borodin.

[0] https://www.linkedin.com/posts/mosha_duckdb-firebolt-snowflake-activity-7009615821006131200-VQ0o/

Attachment

Re: GROUP BY ALL

From
Tom Lane
Date:
Andrey Borodin <amborodin86@gmail.com> writes:
> I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.

Isn't that just a nonstandard spelling of SELECT DISTINCT?

What would happen if there are aggregate functions in the tlist?
I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".

            regards, tom lane



Re: GROUP BY ALL

From
Andrey Borodin
Date:
On Sun, Dec 18, 2022 at 8:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I'm not especially on board with "ALL" meaning "ALL (oh, but not
> aggregates)".

Yes, that's the weak part of the proposal. I even thought about
renaming it to "GROUP BY SOMEHOW" or even "GROUP BY SURPRISE ME".
I mean I see some cases when it's useful and much less cases when it's
dangerously ambiguous. E.g. grouping by result of a subquery looks way
too complex and unpredictable. But with simple Vars... what could go
wrong?

Best regards, Andrey Borodin.



Re: GROUP BY ALL

From
"David G. Johnston"
Date:
On Sunday, December 18, 2022, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrey Borodin <amborodin86@gmail.com> writes:
> I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.

Isn't that just a nonstandard spelling of SELECT DISTINCT?

What would happen if there are aggregate functions in the tlist?
I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".


IIUC some systems treat any non-aggregated column as an implicit group by column.  This proposal is an explicit way to enable that implicit behavior in PostgreSQL.  It is, as you note, an odd meaning for the word ALL.

We tend to not accept non-standard usability syntax extensions even if others systems implement them.  I don’t see this one ending up being an exception…

David J.

Re: GROUP BY ALL

From
Isaac Morland
Date:
On Sun, 18 Dec 2022 at 23:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrey Borodin <amborodin86@gmail.com> writes:
> I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.

Isn't that just a nonstandard spelling of SELECT DISTINCT?

In a pure relational system, yes; but since Postgres allows duplicate rows, both in actual table data and in intermediate and final result sets, no. Although I'm pretty sure no aggregates other than count() are useful - any other aggregate would always just combine count() copies of the duplicated value in some way.

What would happen if there are aggregate functions in the tlist?
I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".

The requested behaviour can be accomplished by an invocation something like:

select (t).*, count(*) from (select (…field1, field2, …) as t from …tables…) s group by t;

So we collect all the required fields as a tuple, group by the tuple, and then unpack it into separate columns in the outer query.

Re: GROUP BY ALL

From
Vik Fearing
Date:
On 12/19/22 05:19, Andrey Borodin wrote:
> Hi hackers!
> 
> I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.
> I always was writing something like
>      select datname, usename, count(*) from pg_stat_activity group by 1,2;
> and then rewriting to
>      select datname, usename, query, count(*) from pg_stat_activity group by 1,2;
> and then "aaahhhh, add a number at the end".
> 
> With the proposed feature I can write just
>      select datname, usename, count(*) from pg_stat_activity group by all;


We already have GROUP BY ALL, but it doesn't do this.


> PFA very dummy implementation just for a discussion. I think we can
> add all non-aggregating targets.
> 
> What do you think?


I think this is a pretty terrible idea.  If we want that kind of 
behavior, we should just allow the GROUP BY to be omitted since without 
grouping sets, it is kind of redundant anyway.

I don't know what my opinion is on that.
-- 
Vik Fearing




Re: GROUP BY ALL

From
Bruce Momjian
Date:
On Mon, Dec 19, 2022 at 05:53:46PM +0100, Vik Fearing wrote:
> I think this is a pretty terrible idea.  If we want that kind of behavior,
> we should just allow the GROUP BY to be omitted since without grouping sets,
> it is kind of redundant anyway.
> 
> I don't know what my opinion is on that.

This is a very interesting concept.  Because Postgres requires GROUP BY
of all non-aggregate columns of a target list, Postgres could certainly
automatically generate the GROUP BY.  However, readers of the query
might not easily distinguish function calls from aggregates, so in a way
the GROUP BY is for the reader, not for the database server.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: GROUP BY ALL

From
Andrey Borodin
Date:
On Fri, Jan 6, 2023 at 1:56 PM Bruce Momjian <bruce@momjian.us> wrote:
> Because Postgres requires GROUP BY
> of all non-aggregate columns of a target list, Postgres could certainly
> automatically generate the GROUP BY.  However, readers of the query
> might not easily distinguish function calls from aggregates, so in a way
> the GROUP BY is for the reader, not for the database server.
>

How about "SELECT a,b, count(*) FROM t GROUP AUTOMATICALLY;" ? And
then a shorthand for "SELECT a,b, count(*) FROM t GROUP;".

Anyway, the problem is not in clever syntax, but in the fact that it's
an SQL extension, not a standard...

Best regards, Andrey Borodin.