Re: ANY_VALUE aggregate - Mailing list pgsql-hackers

From Isaac Morland
Subject Re: ANY_VALUE aggregate
Date
Msg-id CAMsGm5fb8nRjTSJfAQ4LAyxDyY9NLj2=TnzQC2V9jO8zf710Eg@mail.gmail.com
Whole thread Raw
In response to Re: ANY_VALUE aggregate  (Vik Fearing <vik@postgresfriends.org>)
List pgsql-hackers
On Mon, 5 Dec 2022 at 22:52, Vik Fearing <vik@postgresfriends.org> wrote:
On 12/5/22 20:31, Corey Huinker wrote:
>
> Adding to the pile of wanted aggregates: in the past I've lobbied for
> only_value() which is like first_value() but it raises an error on
> encountering a second value.

I have had use for this in the past, but I can't remember why.  What is
your use case for it?  I will happily write a patch for it, and also
submit it to the SQL Committee for inclusion in the standard.  I need to
justify why it's a good idea, though, and we would need to consider what
to do with nulls now that there is <unique null treatment>.

I have this in my local library of "stuff that I really wish came with Postgres", although I call it same_agg and it just goes to NULL if there are more than one distinct value.

I sometimes use it when normalizing non-normalized data, but more commonly I use it when the query planner isn't capable of figuring out that a column I want to use in the output depends only on the grouping columns. For example, something like:

SELECT group_id, group_name, count(*) from group_group as gg natural join group_member as gm group by group_id

I think that exact example actually does or is supposed to work now, since it realizes that I'm grouping on the primary key of group_group so the group_name field in the same table can't differ between rows of a group, but most of the time when I expect that feature to allow me to use a field it actually doesn't.

I have a vague notion that part of the issue may be the distinction between gg.group_id, gm.group_id, and group_id; maybe the above doesn't work but it does work if I group by gg.group_id instead of by group_id. But obviously there should be no difference because in this query those 3 values cannot differ (outer joins are another story).

For reference, here is my definition:

CREATE OR REPLACE FUNCTION same_sfunc (
    a anyelement,
    b anyelement
) RETURNS anyelement
    LANGUAGE SQL IMMUTABLE STRICT
    SET search_path FROM CURRENT
AS $$
    SELECT CASE WHEN $1 = $2 THEN $1 ELSE NULL END
$$;
COMMENT ON FUNCTION same_sfunc (anyelement, anyelement) IS 'SFUNC for same_agg aggregate; returns common value of parameters, or NULL if they differ';

DROP AGGREGATE IF EXISTS same_agg (anyelement);
CREATE AGGREGATE same_agg (anyelement) (
    SFUNC = same_sfunc,
    STYPE = anyelement
);
COMMENT ON AGGREGATE same_agg (anyelement) IS 'Return the common non-NULL value of all non-NULL aggregated values, or NULL if some values differ';

You can tell I've had this for a while - there are several newer Postgres features that could be used to clean this up noticeably.

I also have a repeat_agg which returns the last value (not so interesting) but which is sometimes useful as a window function (more interesting: replace NULLs with the previous non-NULL value in the column).

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Generate pg_stat_get_* functions with Macros
Next
From: John Naylor
Date:
Subject: Re: move some bitmapset.c macros to bitmapset.h