Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: ANY_VALUE aggregate
Date
Msg-id CAKFQuwYXaQ9nY-drJ0fd-qDiaKqTfVUpEjxj=56FXxOF6vFogg@mail.gmail.com
Whole thread Raw
In response to Re: ANY_VALUE aggregate  (Vik Fearing <vik@postgresfriends.org>)
Responses Re: ANY_VALUE aggregate  (Vik Fearing <vik@postgresfriends.org>)
List pgsql-hackers
On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org> wrote:
On 12/6/22 05:57, David G. Johnston wrote:
> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org> wrote:
>
>> I can imagine an optimization that would remove an ORDER BY clause
>> because it isn't needed for any other aggregate.
>
>
> I'm referring to the query:
>
> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
> // produces 1, per the documented implementation-defined behavior.

Implementation-dependent.  It is NOT implementation-defined, per spec.

I really don't care all that much about the spec here given that ORDER BY in an aggregate call is non-spec.

We often loosen the spec rules when they don't make technical sense to
us, but I don't know of any example of when we have tightened them.

The function has to choose some row from among its inputs, and the system has to obey an order by specification added to the function call.  You are de-facto creating a first_value aggregate (which is by definition non-standard) whether you like it or not.  I'm just saying to be upfront and honest about it - our users do want such a capability so maybe accept that there is a first time for everything.  Not that picking an advantageous "implementation-dependent" implementation should be considered deviating from the spec.


> Someone writing:
>
> select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;
>
> Is not presently, nor am I saying, promised the value 1.
>
> I'm assuming you are thinking of the second query form, while the guarantee
> only needs to apply to the first.

I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
bestow upon ANY_VALUE the ability to make those two queries equivalent.

That theoretical idea should not be entertained.  Removing a user's explicitly added ORDER BY should be off-limits.  Any approach at optimization here should simply look at whether an ORDER BY is specified and pass that information to the function.  If the function itself really believes that ordering matters it can emit its own runtime exception stating that fact and the user can fix their query.


If you care about which value you get back, use something else.


There isn't a "something else" to use so that isn't presently an option.

I suppose it comes down to what level of belief and care you have that people will simply mis-use this function if it is added in its current form to get the desired first_value effect that it produces.

David J.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Commit fest 2022-11
Next
From: Andres Freund
Date:
Subject: Re: [RFC] building postgres with meson - v13