Re: ANY_VALUE aggregate - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: ANY_VALUE aggregate |
Date | |
Msg-id | CAApHDvoCMki1NWLyeijUj_P5uxymWejGf5r2zX7o_JOEku_55Q@mail.gmail.com Whole thread Raw |
In response to | Re: ANY_VALUE aggregate (Vik Fearing <vik@postgresfriends.org>) |
Responses |
Re: ANY_VALUE aggregate
|
List | pgsql-hackers |
On Thu, 19 Jan 2023 at 06:01, Vik Fearing <vik@postgresfriends.org> wrote: > Thank you for the review. Attached is a new version rebased to d540a02a72. I've only a bunch of nit-picks, personal preferences and random thoughts to offer as a review: 1. I'd be inclined *not* to mention the possible future optimisation in: + * Currently this just returns the first value, but in the future it might be + * able to signal to the aggregate that it does not need to be called anymore. I think it's unlikely that the transfn would "signal" such a thing. It seems more likely if we did anything about it that nodeAgg.c would maybe have some additional knowledge not to call that function if the agg state already has a value. Just so we're not preempting how we might do such a thing in the future, it seems best just to remove the mention of it. I don't really think it serves as a good reminder that we might want to do this one day anyway. 2. +any_value_trans(PG_FUNCTION_ARGS) Many of transition function names end in "transfn", not "trans". I think it's better to follow the existing (loosely followed) naming pattern that a few aggregates seem to follow rather than invent a new one. 3. I tend to try to copy the capitalisation of keywords from the surrounding regression tests. I see the following breaks that. +SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v); (obviously, ideally, we'd always just follow the same capitalisation of keywords everywhere in each .sql file, but we've long broken that and the best way can do is be consistent with surrounding tests) 4. I think I'd use the word "Returns" instead of "Chooses" in: + Chooses a non-deterministic value from the non-null input values. 5. I've not managed to find a copy of the 2023 draft, so I'm assuming you've got the ignoring of NULLs correct. I tried to see what other databases do using https://www.db-fiddle.com/ . I was surprised to see MySQL 8.0 returning NULL with: create table a (a int, b int); insert into a values(1,null),(1,2),(1,null); select any_value(b) from a group by a; I'd have expected "2" to be returned. (It gets even weirder without the GROUP BY clause, so I'm not too hopeful any useful information can be obtained from looking here) I know MySQL doesn't follow the spec quite as closely as we do, so I might not be that surprised if they didn't pay attention to the wording when implementing this, however, I've not seen the spec, so I can only speculate what value should be returned. Certainly not doing any aggregation for any_value() when there is no GROUP BY seems strange. I see they don't do the same with sum(). Perhaps this is just a side effect of their loose standards when it came to columns in the SELECT clause that are not in the GROUP BY clause. 6. Is it worth adding a WindowFunc test somewhere in window.sql with an any_value(...) over (...)? Is what any_value() returns as a WindowFunc equally as non-deterministic as when it's used as an Aggref? Can we assume there's no guarantee that it'll return the same value for each partition in each row? Does the spec mention anything about that? 7. I wondered if it's worth adding a SupportRequestOptimizeWindowClause support function for this aggregate. I'm thinking that it might not be as likely people would use something more specific like first_value/nth_value/last_value instead of using any_value as a WindowFunc. Also, I'm currently thinking that a SupportRequestWFuncMonotonic for any_value() is not worth the dozen or so lines of code it would take to write it. I'm assuming it would always be a MONOTONICFUNC_BOTH function. It seems unlikely that someone would have a subquery with a WHERE clause in the upper-level query referencing the any_value() aggregate. Thought I'd mention both of these things anyway as someone else might think of some good reason we should add them that I didn't think of. David
pgsql-hackers by date: