Thread: Strict min and max aggregate functions

Strict min and max aggregate functions

From
Jeff Janes
Date:
I need "strict" MIN and MAX aggregate functions, meaning they return NULL upon any NULL input, and behave like the built-in aggregates if none of the input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I can't find other discussion of it.  Perhaps because none of the words here are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about 50 fold slower than the built-ins; for my current purpose this is not a big problem but I would love it to be faster if that could be done easily).  

So here is my approach.  Any suggestions to improve it?  Or are there better canned solutions I've failed to find?


-- If no values have been delivered to the aggregate, the internal state is the
-- NULL array.  If a null values has been delivered, the internal status is an
-- array with one element, which is NULL.  Otherwise, it is an array with one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE
            WHEN $1 IS NULL THEN ARRAY[$2]
            WHEN $1[1] IS NULL THEN $1
            WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve type
            ELSE ARRAY[least($1[1],$2)] END ;
$$;


CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
        sfunc     = strict_min_agg,
        stype     = anyarray,
        finalfunc = strict_min_final
);


Cheers,

Jeff

Re: Strict min and max aggregate functions

From
Kim Rose Carlsen
Date:

> CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )

RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE
            WHEN $1 IS NULL THEN ARRAY[$2]
            WHEN $1[1] IS NULL THEN $1
            WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve  type
            ELSE ARRAY[least($1[1],$2)] END ;
$$;


CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
        sfunc     = strict_min_agg,
        stype     = anyarray,
        finalfunc = strict_min_final
);


It seems like this should be possible to do in something more close to O(log n). But I'm not sure how to fix the semantics with aggregates.

SELECT max(<column>) FROM <table>;
SELECT true FROM <table> WHERE <column> IS NULL LIMIT 1;

Both these queries can be resolved with a index lookup (if one is available).


Re: Strict min and max aggregate functions

From
Pavel Stehule
Date:


2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:
I need "strict" MIN and MAX aggregate functions, meaning they return NULL upon any NULL input, and behave like the built-in aggregates if none of the input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I can't find other discussion of it.  Perhaps because none of the words here are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about 50 fold slower than the built-ins; for my current purpose this is not a big problem but I would love it to be faster if that could be done easily).  

So here is my approach.  Any suggestions to improve it?  Or are there better canned solutions I've failed to find?


-- If no values have been delivered to the aggregate, the internal state is the
-- NULL array.  If a null values has been delivered, the internal status is an
-- array with one element, which is NULL.  Otherwise, it is an array with one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE
            WHEN $1 IS NULL THEN ARRAY[$2]
            WHEN $1[1] IS NULL THEN $1
            WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve type
            ELSE ARRAY[least($1[1],$2)] END ;
$$;


CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
        sfunc     = strict_min_agg,
        stype     = anyarray,
        finalfunc = strict_min_final
);

can you use plpgsql instead sql?

you can use composite type instead array too.

Regards

Pavel


Cheers,

Jeff

Re: Strict min and max aggregate functions

From
Thomas Kellerer
Date:
Jeff Janes schrieb am 19.11.2016 um 22:12:
> I need "strict" MIN and MAX aggregate functions, meaning they return
> NULL upon any NULL input, and behave like the built-in aggregates if
> none of the input values are NULL.
>
> This doesn't seem like an outlandish thing to want, and I'm surprised
> I can't find other discussion of it. Perhaps because none of the
> words here are very effective as search terms as they are so
> individually common.
>
> I've hit upon a solution that works, but it is both ugly and slow
> (about 50 fold slower than the built-ins; for my current purpose this
> is not a big problem but I would love it to be faster if that could
> be done easily).

This is not really pretty as well, but might be faster:

     select a,
            case when group_count = nn_count then min_b end as min_b
     from (
       select a,
              min(b) as min_b,
              count(b) as nn_count,
              count(*) as group_count
       from x
       group by a
     ) t;

As the expensive part is the group by I wouldn't expect the additional aggregates to make a big difference.

Alternatively:

     select a, case when no_nulls then min_b end as min_b
     from (
       select a,
              min(b) as min_b,
              bool_and(b is not null) as no_nulls
       from x
       group by a
     ) t;

Re: Strict min and max aggregate functions

From
Jeff Janes
Date:
On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:
I need "strict" MIN and MAX aggregate functions, meaning they return NULL upon any NULL input, and behave like the built-in aggregates if none of the input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I can't find other discussion of it.  Perhaps because none of the words here are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about 50 fold slower than the built-ins; for my current purpose this is not a big problem but I would love it to be faster if that could be done easily).  

So here is my approach.  Any suggestions to improve it?  Or are there better canned solutions I've failed to find?


-- If no values have been delivered to the aggregate, the internal state is the
-- NULL array.  If a null values has been delivered, the internal status is an
-- array with one element, which is NULL.  Otherwise, it is an array with one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE
            WHEN $1 IS NULL THEN ARRAY[$2]
            WHEN $1[1] IS NULL THEN $1
            WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve type
            ELSE ARRAY[least($1[1],$2)] END ;
$$;


CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
        sfunc     = strict_min_agg,
        stype     = anyarray,
        finalfunc = strict_min_final
);

can you use plpgsql instead sql?

I can.  Would there be an advantage?

you can use composite type instead array too.

I tried a composite type of (flag int, value anyelement) but you can't use anyelement in a composite type.  So the aggregate function couldn't be polymorphic.  Or, that was my conclusion after making a few attempts. Maybe I need to give on polymorphism if I want to get performance?

Cheers,

Jeff

Re: Strict min and max aggregate functions

From
Pavel Stehule
Date:


2016-11-20 20:18 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:
On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:
I need "strict" MIN and MAX aggregate functions, meaning they return NULL upon any NULL input, and behave like the built-in aggregates if none of the input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I can't find other discussion of it.  Perhaps because none of the words here are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about 50 fold slower than the built-ins; for my current purpose this is not a big problem but I would love it to be faster if that could be done easily).  

So here is my approach.  Any suggestions to improve it?  Or are there better canned solutions I've failed to find?


-- If no values have been delivered to the aggregate, the internal state is the
-- NULL array.  If a null values has been delivered, the internal status is an
-- array with one element, which is NULL.  Otherwise, it is an array with one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE
            WHEN $1 IS NULL THEN ARRAY[$2]
            WHEN $1[1] IS NULL THEN $1
            WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve type
            ELSE ARRAY[least($1[1],$2)] END ;
$$;


CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
        SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
        sfunc     = strict_min_agg,
        stype     = anyarray,
        finalfunc = strict_min_final
);

can you use plpgsql instead sql?

I can.  Would there be an advantage?

PLpgSQL uses prepared statements - the expressions should be evaluated faster. In this case there are not possible SQL inlining.

you can use composite type instead array too.

I tried a composite type of (flag int, value anyelement) but you can't use anyelement in a composite type.  So the aggregate function couldn't be polymorphic.  Or, that was my conclusion after making a few attempts. Maybe I need to give on polymorphism if I want to get performance?

Cheers,

Jeff

Re: Strict min and max aggregate functions

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>> you can use composite type instead array too.

> I tried a composite type of (flag int, value anyelement) but you can't use
> anyelement in a composite type.  So the aggregate function couldn't be
> polymorphic.  Or, that was my conclusion after making a few attempts. Maybe
> I need to give on polymorphism if I want to get performance?

I would bet on a composite type being as slow or slower than the array
solution.  You could do a quick test with a non-polymorphic definition
just to see, but I doubt it's a promising avenue.

            regards, tom lane