Re: ANY_VALUE aggregate - Mailing list pgsql-hackers
From | Vik Fearing |
---|---|
Subject | Re: ANY_VALUE aggregate |
Date | |
Msg-id | e66347b4-670f-eed6-8809-efd9184e0dd1@postgresfriends.org Whole thread Raw |
In response to | Re: ANY_VALUE aggregate ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: ANY_VALUE aggregate
|
List | pgsql-hackers |
On 12/7/22 04:22, David G. Johnston wrote: > On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik@postgresfriends.org> wrote: > >> On 12/6/22 05:57, David G. Johnston wrote: >>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org> >> wrote: >>> >>>> I can imagine an optimization that would remove an ORDER BY clause >>>> because it isn't needed for any other aggregate. >>> >>> >>> I'm referring to the query: >>> >>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v); >>> // produces 1, per the documented implementation-defined behavior. >> >> Implementation-dependent. It is NOT implementation-defined, per spec. > > I really don't care all that much about the spec here given that ORDER BY > in an aggregate call is non-spec. Well, this is demonstrably wrong. <array aggregate function> ::= ARRAY_AGG <left paren> <value expression> [ ORDER BY <sort specification list> ] <right paren> >> We often loosen the spec rules when they don't make technical sense to >> us, but I don't know of any example of when we have tightened them. > > The function has to choose some row from among its inputs, True. > and the system has to obey an order by specification added to the function call. False. > You are de-facto creating a first_value aggregate (which is by definition > non-standard) whether you like it or not. I am de jure creating an any_value aggregate (which is by definition standard) whether you like it or not. > I'm just saying to be upfront > and honest about it - our users do want such a capability so maybe accept > that there is a first time for everything. Not that picking an > advantageous "implementation-dependent" implementation should be considered > deviating from the spec. > > >>> Someone writing: >>> >>> select any_value(v) from (values (2),(1),(3)) as vals (v) order by v; >>> >>> Is not presently, nor am I saying, promised the value 1. >>> >>> I'm assuming you are thinking of the second query form, while the >> guarantee >>> only needs to apply to the first. >> >> I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could >> bestow upon ANY_VALUE the ability to make those two queries equivalent. >> > > That theoretical idea should not be entertained. Removing a user's > explicitly added ORDER BY should be off-limits. Any approach at > optimization here should simply look at whether an ORDER BY is specified > and pass that information to the function. If the function itself really > believes that ordering matters it can emit its own runtime exception > stating that fact and the user can fix their query. It absolutely should be entertained, and I plan on doing so in an upcoming thread. Whether it errors or ignores is something that should be discussed on that thread. >> If you care about which value you get back, use something else. > > There isn't a "something else" to use so that isn't presently an option. The query SELECT proposed_first_value(x ORDER BY y) FROM ... is equivalent to SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ... so I am not very sympathetic to your claim of "no other option". > I suppose it comes down to what level of belief and care you have that > people will simply mis-use this function if it is added in its current form > to get the desired first_value effect that it produces. People who rely on explicitly undefined behavior get what they deserve when the implementation changes. -- Vik Fearing
pgsql-hackers by date: