Thread: Creating a non-strict custom aggregate that initializes to the first value

Creating a non-strict custom aggregate that initializes to the first value

From
Timothy Garnett
Date:
Hi all,

I'm trying to create a custom aggregate function that returns the value from the first row (possibly null).

For ex.

Table t
a | b
-----
1 | A
2 | NULL


SELECT my_first(b order by a) => A
SELECT my_first(b order by a DESC) => NULL

The straightforward way would seem to be something like

CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
        SELECT $1;
$$;

but if that is declared strict then it would take the first non-null value and return A in my second example, if declared non-strict then the initial state would be fed as null rather then the first value. Is there a way to declare the function non-strict (so that null values are passed) but still have it initialize to the first value like it would if it was strict?


Thanks for the help,
Tim

Re: Creating a non-strict custom aggregate that initializes to the first value

From
"David G. Johnston"
Date:
On Thu, Mar 26, 2015 at 1:49 PM, Timothy Garnett <tgarnett@panjiva.com> wrote:

but if that is declared strict then it would take the first non-null value and return A in my second example, if declared non-strict then the initial state would be fed as null rather then the first value. Is there a way to declare the function non-strict (so that null values are passed) but still have it initialize to the first value like it would if it was strict?

​Late night pondering here but...

Because of the way SQL null works, and your desired to handle "anyelement", you are stuck determining whether you are currently evaluating the first row of your input - or not.  For the first row you always take the "new" value while for all subsequent rows you take the "state" value.  So, your state needs to encompass both "prior row number" and "active value", which suggests you need to create a custom type for your state variable.

You want NULL to both mean "not initialized" and "unknown value" which is impossible and SQL does not provide any other universal literal that means one or the other.

I'm not sure how you deal with "anyelement" in a custom type that could be used as a state variable...

David J.

Re: Creating a non-strict custom aggregate that initializes to the first value

From
Francisco Olarte
Date:
Hi Timothy:

On Thu, Mar 26, 2015 at 9:49 PM, Timothy Garnett <tgarnett@panjiva.com> wrote:
> I'm trying to create a custom aggregate function that returns the value from
> the first row (possibly null).
...
> The straightforward way would seem to be something like
>
> CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
> RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
>         SELECT $1;
> $$;
> but if that is declared strict then it would take the first non-null value
> and return A in my second example, if declared non-strict then the initial
> state would be fed as null rather then the first value. Is there a way to
> declare the function non-strict (so that null values are passed) but still
> have it initialize to the first value like it would if it was strict?

First, your pasted code is incomplete, you should post all relevant
code ( I suspect a highly relevant create aggregate statement is
missing ).

Next, it's already been pointed you are trying to use null for two
meanings, which is not gonna work.

I think your problem is you are using the one function form of
aggregates. Aggregates, in postgres and lots of other places, are done
with two functions, an state transition function called for each row (
more or less, strictness asside ) and a final function. Think on
average, you need an state consisting of sum, count, and state
transition function adding to sum and incremementing count and a final
function to do the division.

Generally, you start with STATE_TYPE state=initial_state; foreach
value state=transition(state, current_val); AGG_TYPE agg_value =
final(state). STATE_TYPE and AGG_TYPE need not be the same.

 When STATE_TYPE and AGG_TYPE are the same many systems, postgres
included,  have a shortcut for using just an state transition function
, using indentity as the final function, as this is adequate for
things like count, min/max, sum, and this seems to be the variant you
are using.

In your case you should probably use the two form agregate, use a
composite type for the state ( first_time, val ), an state transition
function ( if ft true store val, else skip ) and a final function (
return val from state ) ( there are posibly other methods, like using
an array for state, initializing to empty, storing val if empty or
skipping, just play around ).

Also, you may be having an http://xyproblem.info/ , do you want an
aggregate to return the first value, or a function?. ( I suspect it is
the first case, as you are pasting a create function for first_agg and
a select for my_first, but not having posted a complete example makes
it very hard ).

Regards.

   Francisco Olarte.


"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thu, Mar 26, 2015 at 1:49 PM, Timothy Garnett <tgarnett@panjiva.com>
> wrote:
>> but if that is declared strict then it would take the first non-null value
>> and return A in my second example, if declared non-strict then the initial
>> state would be fed as null rather then the first value. Is there a way to
>> declare the function non-strict (so that null values are passed) but still
>> have it initialize to the first value like it would if it was strict?

> You want NULL to both mean "not initialized" and "unknown value" which is
> impossible and SQL does not provide any other universal literal that means
> one or the other.

Yeah.  You need distinct representations for "nothing seen yet" and "saw a
NULL"; the built-in behavior doesn't suffice for this.

One idea is for the state value to be of anyarray type: initially null,
and a one-element array containing the first input value once you've seen
that.

It strikes me though that this aggregate is ill-defined by nature.  In
particular, if you're going to treat NULL as being a real data value,
then what're you gonna return when there were no input rows?  You won't be
able to distinguish "no input rows" from "first input row had a NULL".
Maybe you should rethink whatever activity you were wanting it for.

            regards, tom lane