NULL-handling in aggregate(DISTINCT ...) - Mailing list pgsql-hackers

From Andrew Gierth
Subject NULL-handling in aggregate(DISTINCT ...)
Date
Msg-id 873a4kv8rn.fsf@news-spur.riddles.org.uk
Whole thread Raw
Responses Re: NULL-handling in aggregate(DISTINCT ...)
List pgsql-hackers
Quoth the comments in nodeAgg.c:
* We don't currently implement DISTINCT aggs for aggs having more* than one argument.  This isn't required for anything
inthe SQL* spec, but really it ought to be implemented for* feature-completeness.  FIXME someday.
 

and:
* DISTINCT always suppresses nulls, per SQL spec, regardless of the* transition function's strictness.

(What the SQL spec actually says is that aggregate calls which are
<general set operation> ignore all nulls regardless of whether they
are ALL or DISTINCT. Other kinds of aggregates are not permitted by
the spec to use ALL or DISTINCT.)

Currently we have this behaviour:

postgres=# select array_agg(all a) from (values (1),(null)) v(a);array_agg 
-----------{1,NULL}
(1 row)

postgres=# select array_agg(distinct a) from (values (1),(null)) v(a);array_agg 
-----------{1}
(1 row)

which personally I feel is somewhat wrong, since 1 and NULL are in
fact distinct, but which is due to the logic expressed in the second
comment above. (The spec does not allow array_agg(distinct a) so it
is no help here.)

Now the question: If the limit of one argument for DISTINCT aggs were
removed (which I'm considering doing as part of an update to the
aggregate ORDER BY patch I posted a while back), what should be the
behaviour of agg(distinct x,y) where one or both of x or y is null?
And should it depend on the strictness of the transition function?

-- 
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Partitioning option for COPY
Next
From: Tom Lane
Date:
Subject: Re: NULL-handling in aggregate(DISTINCT ...)