Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

From Vik Fearing
Subject Re: ANY_VALUE aggregate
Date
Msg-id 46173800-4658-afb5-b83f-4365d2fd50e9@postgresfriends.org
Whole thread Raw
In response to Re: ANY_VALUE aggregate  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: ANY_VALUE aggregate  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On 12/6/22 05:22, David G. Johnston wrote:
> On Mon, Dec 5, 2022 at 8:46 PM Vik Fearing <vik@postgresfriends.org> wrote:
> 
>> On 12/5/22 18:56, David G. Johnston wrote:
>>> Also, maybe we should have any_value do something like compute a 50/50
>>> chance that any new value seen replaces the existing chosen value,
>> instead
>>> of simply returning the first value all the time.  Maybe even prohibit
>> the
>>> first value from being chosen so long as a second value appears.
>>
>> The spec says the result is implementation-dependent meaning we don't
>> even need to document how it is obtained, but surely behavior like this
>> would preclude future optimizations like the ones I mentioned?
>>
> 
> So, given the fact that we don't actually want to name a function
> first_value (because some users are readily confused as to when the concept
> of first is actually valid or not) but some users do actually wish for this
> functionality - and you are proposing to implement it here anyway - how
> about we actually do document that we promise to return the first non-null
> value encountered by the aggregate.  We can then direct people to this
> function and just let them know to pretend the function is really named
> first_value in the case where they specify an order by. (last_value comes
> for basically free with descending sorting).

I can imagine an optimization that would remove an ORDER BY clause 
because it isn't needed for any other aggregate.  There is no reason to 
cause an extra sort when the user has requested *any value*.

>> I once wrote a random_agg() for a training course that used reservoir
>> sampling to get an evenly distributed value from the inputs.  Something
>> like that seems to be what you are looking for here.  I don't see the
>> use case for adding it to core, though.
>>
>>
> The use case was basically what Tom was saying - I don't want our users
> that don't understand the necessity of order by, and don't read the
> documentation, to observe that we consistently return the first non-null
> value and assume that this is what the function promises when we are not
> making any such promise to them.  

Documenting something for the benefit of those who do not read the 
documentation is a ridiculous proposal.

> As noted above, my preference at this point would be to just make that promise.

I see no reason to paint ourselves into a corner here.
-- 
Vik Fearing




pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Generate pg_stat_get_* functions with Macros
Next
From: "David G. Johnston"
Date:
Subject: Re: ANY_VALUE aggregate