Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

From Vik Fearing
Subject Re: ANY_VALUE aggregate
Date
Msg-id e66347b4-670f-eed6-8809-efd9184e0dd1@postgresfriends.org
Whole thread Raw
In response to Re: ANY_VALUE aggregate  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: ANY_VALUE aggregate  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On 12/7/22 04:22, David G. Johnston wrote:
> On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org> wrote:
> 
>> On 12/6/22 05:57, David G. Johnston wrote:
>>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org>
>> wrote:
>>>
>>>> I can imagine an optimization that would remove an ORDER BY clause
>>>> because it isn't needed for any other aggregate.
>>>
>>>
>>> I'm referring to the query:
>>>
>>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
>>> // produces 1, per the documented implementation-defined behavior.
>>
>> Implementation-dependent.  It is NOT implementation-defined, per spec.
>
> I really don't care all that much about the spec here given that ORDER BY
> in an aggregate call is non-spec.


Well, this is demonstrably wrong.

<array aggregate function> ::=
   ARRAY_AGG <left paren>
     <value expression>
     [ ORDER BY <sort specification list> ]
     <right paren>


>> We often loosen the spec rules when they don't make technical sense to
>> us, but I don't know of any example of when we have tightened them.
> 
> The function has to choose some row from among its inputs,


True.


> and the system has to obey an order by specification added to the function call.


False.


> You are de-facto creating a first_value aggregate (which is by definition
> non-standard) whether you like it or not.


I am de jure creating an any_value aggregate (which is by definition 
standard) whether you like it or not.


> I'm just saying to be upfront
> and honest about it - our users do want such a capability so maybe accept
> that there is a first time for everything.  Not that picking an
> advantageous "implementation-dependent" implementation should be considered
> deviating from the spec.
> 
> 
>>> Someone writing:
>>>
>>> select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;
>>>
>>> Is not presently, nor am I saying, promised the value 1.
>>>
>>> I'm assuming you are thinking of the second query form, while the
>> guarantee
>>> only needs to apply to the first.
>>
>> I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
>> bestow upon ANY_VALUE the ability to make those two queries equivalent.
>>
> 
> That theoretical idea should not be entertained.  Removing a user's
> explicitly added ORDER BY should be off-limits.  Any approach at
> optimization here should simply look at whether an ORDER BY is specified
> and pass that information to the function.  If the function itself really
> believes that ordering matters it can emit its own runtime exception
> stating that fact and the user can fix their query.


It absolutely should be entertained, and I plan on doing so in an 
upcoming thread.  Whether it errors or ignores is something that should 
be discussed on that thread.


>> If you care about which value you get back, use something else.
>
> There isn't a "something else" to use so that isn't presently an option.


The query

     SELECT proposed_first_value(x ORDER BY y) FROM ...

is equivalent to

     SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...

so I am not very sympathetic to your claim of "no other option".


> I suppose it comes down to what level of belief and care you have that
> people will simply mis-use this function if it is added in its current form
> to get the desired first_value effect that it produces.


People who rely on explicitly undefined behavior get what they deserve 
when the implementation changes.
-- 
Vik Fearing




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Avoid streaming the transaction which are skipped (in corner cases)
Next
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs