Re: Document aggregate functions better w.r.t. ORDER BY - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Document aggregate functions better w.r.t. ORDER BY
Date
Msg-id ZTnHdk833s1UG-Vi@momjian.us
Whole thread Raw
In response to Re: Document aggregate functions better w.r.t. ORDER BY  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Wed, Oct 25, 2023 at 05:10:17PM -0700, David G. Johnston wrote:
> The paragraph leading into the last added example needs to be tweaked:
> 
> If DISTINCT is specified within an aggregate, the data is sorted in ascending
> order while extracting unique values.  You can add an ORDER BY clause, limited
> to expressions matching the regular arguments of the aggregate, to sort the
> output in descending order.
> 
> (show existing - DISTINCT only - example here)
> 
> <programlisting>
> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
> SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
>  string_agg
> -----------
>   4;3;2;1
> </programlisting>
> 
> (existing note)

I see what you mean.  I added an example that doesn't match the existing
paragraph.  I have rewritten the paragraph and used a relevant example;
patch attached.

> Question: Do you know whether we for certain always sort ascending here to
> compute the unique values or whether if, say, there is an index on the column
> in descending order (or ascending and traversed backwards) that the data within
> the aggregate could, with an order by, be returned in descending order?  If it
> is ascending, is that part of the SQL Standard (since it doesn't even allow an
> order by to give the user the ability the control the output ordering) or does
> the SQL Standard expect that even a random order would be fine since there are
> algorithms that can be used that do not involve sorting the input?

I don't think order is ever guaranteed in the standard without an ORDER
BY.

> It seems redundant to first say "regular arguments" then negate it in order to
> say "DISTINCT list".  Using the positive form with "DISTINCT list" should get
> the point across sufficiently and succinctly.  It also avoids me feeling like
> there should be an example of what happens when you do "sort on an expression
> that is not included in the DISTINCT list".

Agreed, I rewrote that.

> Interestingly:
> 
> WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
> SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;
> 
> ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in
> argument list
> LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...
> 
> But both expressions in the argument list (el and semicolon) do appear in the
> ORDER BY...

I think ORDER BY has to match DISTINCT columns, while you are using ';'.
I used a simpler example with array_agg() in my patch to avoid the issue.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

pgsql-hackers by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: libpq async connection and multiple hosts
Next
From: Michael Paquier
Date:
Subject: Re: Introduce a new view for checkpointer related stats