Home > mailing lists

Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

From	David G. Johnston
Subject	Re: ANY_VALUE aggregate
Date	December 8, 2022 08:48:16
Msg-id	CAKFQuwYK_UeLFUrfzJxsdmzMv-0uYQO8v3X96jYp=vkzRzwvmg@mail.gmail.com Whole thread Raw
In response to	Re: ANY_VALUE aggregate (Vik Fearing <vik@postgresfriends.org>)
Responses	Re: ANY_VALUE aggregate
List	pgsql-hackers

Tree view

On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 12/7/22 04:22, David G. Johnston wrote:
> On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org> wrote:
>
>> On 12/6/22 05:57, David G. Johnston wrote:
>>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org>
>> wrote:
>>>
>>>> I can imagine an optimization that would remove an ORDER BY clause
>>>> because it isn't needed for any other aggregate.
>>>
>>>
>>> I'm referring to the query:
>>>
>>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
>>> // produces 1, per the documented implementation-defined behavior.
>>
>> Implementation-dependent. It is NOT implementation-defined, per spec.
>
> I really don't care all that much about the spec here given that ORDER BY
> in an aggregate call is non-spec.

Well, this is demonstrably wrong.

<array aggregate function> ::=
ARRAY_AGG <left paren>
<value expression>
[ ORDER BY <sort specification list> ]
<right paren>

Demoable only by you and a few others...

We should update our documentation - the source of SQL Standard knowledge for mere mortals.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

"Note: The ability to specify both DISTINCT and ORDER BY in an aggregate function is a PostgreSQL extension."

Apparently only DISTINCT remains as our extension.

> You are de-facto creating a first_value aggregate (which is by definition
> non-standard) whether you like it or not.

I am de jure creating an any_value aggregate (which is by definition
standard) whether you like it or not.

Yes, both statements seem true. At least until we decide to start ignoring a user's explicit order by clause.

>> If you care about which value you get back, use something else.
>
> There isn't a "something else" to use so that isn't presently an option.

The query

SELECT proposed_first_value(x ORDER BY y) FROM ...

is equivalent to

SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...

so I am not very sympathetic to your claim of "no other option".

Semantically, yes, in terms of performance, not so much, for any non-trivial sized group.

I'm done, and apologize for getting too emotionally invested in this. I hope to get others to voice enough +1s to get a first_value function into core along-side this one (which makes the above discussion either moot or deferred until there is a concrete use case for ignoring an explicit ORDER BY). If that doesn't happen, well, it isn't going to make or break us either way.

David J.

pgsql-hackers by date:

From: Peter Smith
Date: 08 December 2022, 08:20:04
Subject: Re: PGDOCS - Logical replication GUCs - added some xrefs

From: Jeff Davis
Date: 08 December 2022, 08:56:38
Subject: Re: Collation version tracking for macOS

Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

Previous

Next