Re: ANY_VALUE aggregate - Mailing list pgsql-hackers
From | Isaac Morland |
---|---|
Subject | Re: ANY_VALUE aggregate |
Date | |
Msg-id | CAMsGm5fb8nRjTSJfAQ4LAyxDyY9NLj2=TnzQC2V9jO8zf710Eg@mail.gmail.com Whole thread Raw |
In response to | Re: ANY_VALUE aggregate (Vik Fearing <vik@postgresfriends.org>) |
List | pgsql-hackers |
On Mon, 5 Dec 2022 at 22:52, Vik Fearing <vik@postgresfriends.org> wrote:
On 12/5/22 20:31, Corey Huinker wrote:
>
> Adding to the pile of wanted aggregates: in the past I've lobbied for
> only_value() which is like first_value() but it raises an error on
> encountering a second value.
I have had use for this in the past, but I can't remember why. What is
your use case for it? I will happily write a patch for it, and also
submit it to the SQL Committee for inclusion in the standard. I need to
justify why it's a good idea, though, and we would need to consider what
to do with nulls now that there is <unique null treatment>.
I have this in my local library of "stuff that I really wish came with Postgres", although I call it same_agg and it just goes to NULL if there are more than one distinct value.
I sometimes use it when normalizing non-normalized data, but more commonly I use it when the query planner isn't capable of figuring out that a column I want to use in the output depends only on the grouping columns. For example, something like:
SELECT group_id, group_name, count(*) from group_group as gg natural join group_member as gm group by group_id
I think that exact example actually does or is supposed to work now, since it realizes that I'm grouping on the primary key of group_group so the group_name field in the same table can't differ between rows of a group, but most of the time when I expect that feature to allow me to use a field it actually doesn't.
I have a vague notion that part of the issue may be the distinction between gg.group_id, gm.group_id, and group_id; maybe the above doesn't work but it does work if I group by gg.group_id instead of by group_id. But obviously there should be no difference because in this query those 3 values cannot differ (outer joins are another story).
For reference, here is my definition:
CREATE OR REPLACE FUNCTION same_sfunc (
a anyelement,
b anyelement
) RETURNS anyelement
LANGUAGE SQL IMMUTABLE STRICT
SET search_path FROM CURRENT
AS $$
SELECT CASE WHEN $1 = $2 THEN $1 ELSE NULL END
$$;
COMMENT ON FUNCTION same_sfunc (anyelement, anyelement) IS 'SFUNC for same_agg aggregate; returns common value of parameters, or NULL if they differ';
DROP AGGREGATE IF EXISTS same_agg (anyelement);
CREATE AGGREGATE same_agg (anyelement) (
SFUNC = same_sfunc,
STYPE = anyelement
);
COMMENT ON AGGREGATE same_agg (anyelement) IS 'Return the common non-NULL value of all non-NULL aggregated values, or NULL if some values differ';
a anyelement,
b anyelement
) RETURNS anyelement
LANGUAGE SQL IMMUTABLE STRICT
SET search_path FROM CURRENT
AS $$
SELECT CASE WHEN $1 = $2 THEN $1 ELSE NULL END
$$;
COMMENT ON FUNCTION same_sfunc (anyelement, anyelement) IS 'SFUNC for same_agg aggregate; returns common value of parameters, or NULL if they differ';
DROP AGGREGATE IF EXISTS same_agg (anyelement);
CREATE AGGREGATE same_agg (anyelement) (
SFUNC = same_sfunc,
STYPE = anyelement
);
COMMENT ON AGGREGATE same_agg (anyelement) IS 'Return the common non-NULL value of all non-NULL aggregated values, or NULL if some values differ';
You can tell I've had this for a while - there are several newer Postgres features that could be used to clean this up noticeably.
I also have a repeat_agg which returns the last value (not so interesting) but which is sometimes useful as a window function (more interesting: replace NULLs with the previous non-NULL value in the column).
pgsql-hackers by date: