Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: ANY_VALUE aggregate
Date
Msg-id CAKFQuwZio6maUZ_SxzudV9vP+DF8+fYWNAn0M-fxh=iKnAnSYA@mail.gmail.com
Whole thread Raw
In response to Re: ANY_VALUE aggregate  (Vik Fearing <vik@postgresfriends.org>)
Responses Re: ANY_VALUE aggregate  (Vik Fearing <vik@postgresfriends.org>)
Re: ANY_VALUE aggregate  (Pantelis Theodosiou <ypercube@gmail.com>)
List pgsql-hackers
On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik@postgresfriends.org> wrote:
On 12/6/22 05:22, David G. Johnston wrote:
> On Mon, Dec 5, 2022 at 8:46 PM Vik Fearing <vik@postgresfriends.org> wrote:
>
>> On 12/5/22 18:56, David G. Johnston wrote:
>>> Also, maybe we should have any_value do something like compute a 50/50
>>> chance that any new value seen replaces the existing chosen value,
>> instead
>>> of simply returning the first value all the time.  Maybe even prohibit
>> the
>>> first value from being chosen so long as a second value appears.
>>
>> The spec says the result is implementation-dependent meaning we don't
>> even need to document how it is obtained, but surely behavior like this
>> would preclude future optimizations like the ones I mentioned?
>>
>
> So, given the fact that we don't actually want to name a function
> first_value (because some users are readily confused as to when the concept
> of first is actually valid or not) but some users do actually wish for this
> functionality - and you are proposing to implement it here anyway - how
> about we actually do document that we promise to return the first non-null
> value encountered by the aggregate.  We can then direct people to this
> function and just let them know to pretend the function is really named
> first_value in the case where they specify an order by. (last_value comes
> for basically free with descending sorting).

I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.

I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.

Someone writing:

select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;

Is not presently, nor am I saying, promised the value 1.

I'm assuming you are thinking of the second query form, while the guarantee only needs to apply to the first.

David J.

pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: ANY_VALUE aggregate
Next
From: Tom Lane
Date:
Subject: Re: move some bitmapset.c macros to bitmapset.h