Thread: Contradictory behavior of array_agg(distinct) aggregate.

Contradictory behavior of array_agg(distinct) aggregate.

From
Konstantin Knizhnik
Date:
Hi hackers!


Is it only me who consider that current behavior of array_agg(distinct) 
contradicts to interpretation of nulls in other cases ("null" is 
something like "unknown" which means that we can not say weather two 
nulls are the same or not). This is why it is allowed to insert multiple 
nulls in the unique column.

postgres=# create table t(x integer unique);
CREATE TABLE
postgres=# insert into t values (null),(null);
INSERT 0 2
postgres=# select count(distinct x) from t;
  count
-------
      0
(1 row)

postgres=# select array_agg(distinct x) from t;
  array_agg
-----------
  {NULL}
(1 row)

postgres=# select array_agg(x) from t;
   array_agg
-------------
  {NULL,NULL}
(1 row)


So what is the number of distinct "x" values in this case? I think that 
according to SQL model - 0 (as count(distinct) returns).
Why in this case array_agg(distinct x) returns non-empty array?

Yes, unlike most other aggregates, `array_agg` is not ignoring null values.
But is it correct to treat two null values as the same (non-distinct)?
IMHO correct result in this case should be either {} or NULL, either 
{NULL,NULL}.






Re: Contradictory behavior of array_agg(distinct) aggregate.

From
Tom Lane
Date:
Konstantin Knizhnik <knizhnik@garret.ru> writes:
> postgres=# create table t(x integer unique);
> CREATE TABLE
> postgres=# insert into t values (null),(null);
> INSERT 0 2
> postgres=# select count(distinct x) from t;
>   count
> -------
>       0
> (1 row)

> postgres=# select array_agg(distinct x) from t;
>   array_agg
> -----------
>   {NULL}
> (1 row)

> postgres=# select array_agg(x) from t;
>    array_agg
> -------------
>   {NULL,NULL}
> (1 row)

I see nothing contradictory here.  "array_agg(distinct x)"
combines the two NULLs into one, which is the normal
behavior of DISTINCT.  "count(distinct x)" does the same
thing --- but count() only counts non-null inputs, so
you end with zero.

            regards, tom lane



Re: Contradictory behavior of array_agg(distinct) aggregate.

From
Konstantin Knizhnik
Date:
On 04/12/2024 9:03 am, Tom Lane wrote:
> Konstantin Knizhnik <knizhnik@garret.ru> writes:
>> postgres=# create table t(x integer unique);
>> CREATE TABLE
>> postgres=# insert into t values (null),(null);
>> INSERT 0 2
>> postgres=# select count(distinct x) from t;
>>    count
>> -------
>>        0
>> (1 row)
>> postgres=# select array_agg(distinct x) from t;
>>    array_agg
>> -----------
>>    {NULL}
>> (1 row)
>> postgres=# select array_agg(x) from t;
>>     array_agg
>> -------------
>>    {NULL,NULL}
>> (1 row)
> I see nothing contradictory here.  "array_agg(distinct x)"
> combines the two NULLs into one, which is the normal
> behavior of DISTINCT.


Sorry.
It is actually inconsistency in basic SQL model in interpretation of 
NULL by UNIQUE and DISTINCT, and array_agg just follows this rule.




Re: Contradictory behavior of array_agg(distinct) aggregate.

From
"David G. Johnston"
Date:
On Tuesday, December 3, 2024, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
Is it only me who consider that current behavior of array_agg(distinct) contradicts to interpretation of nulls in other cases ("null" is something like "unknown" which means that we can not say weather two nulls are the same or not).

Null value handling has a few varied behaviors related to it.  This particular one is best thought of as desirably being consistent with group by.
 

This is why it is allowed to insert multiple nulls in the unique column.

Nowadays the DBA gets to choose which of the two behaviors a unique index applies, which allows indexes to get on the same page as group by et al., thus fixing your inconsistency claim here.


postgres=# create table t(x integer unique);
CREATE TABLE
postgres=# insert into t values (null),(null);
INSERT 0 2
postgres=# select count(distinct x) from t;
 count
-------
     0
(1 row) 
postgres=# select array_agg(distinct x) from t;
 array_agg
-----------
 {NULL}
(1 row)

postgres=# select array_agg(x) from t;
  array_agg
-------------
 {NULL,NULL}
(1 row)


So what is the number of distinct "x" values in this case? I think that according to SQL model - 0 (as count(distinct) returns).

1, but getting this answer computed is a non-trivial expression as the count aggregate can’t do the counting.
 
Why in this case array_agg(distinct x) returns non-empty array?

I can be convinced to see an inconsistency here.  In count though, not array_agg.  The inability for count to see and thus count null values cannot be worked around while you can always apply a filter clause to ignore null values you don’t want.


Yes, unlike most other aggregates, `array_agg` is not ignoring null values.
But is it correct to treat two null values as the same (non-distinct)?

Yes, in most contexts where null values are forced to be compared to each other they do so by defining all null values to be representationally identical.  See group by for the most authoritative reference case.
 
IMHO correct result in this case should be either {} or NULL, either {NULL,NULL}.

You have a typo here somewhere… 

If you want the empty array use a filter clause to make it behave “strictly”.  Producing a null output seems indefensible.

A policy that all nulls values are indistinct from one another, which is the most prevalent one in SQL, makes the most sense to me.  My gut says that “group by col nulls [not] distinct” would be an undesirable thing to add to the language.  It was probably added to unique indexes because they went the wrong way and needed a way to course-correct.

David J.

Re: Contradictory behavior of array_agg(distinct) aggregate.

From
Dagfinn Ilmari Mannsåker
Date:
Konstantin Knizhnik <knizhnik@garret.ru> writes:

> On 04/12/2024 9:03 am, Tom Lane wrote:
>> Konstantin Knizhnik <knizhnik@garret.ru> writes:
>>> postgres=# create table t(x integer unique);
>>> CREATE TABLE
>>> postgres=# insert into t values (null),(null);
>>> INSERT 0 2
>>> postgres=# select count(distinct x) from t;
>>>    count
>>> -------
>>>        0
>>> (1 row)
>>> postgres=# select array_agg(distinct x) from t;
>>>    array_agg
>>> -----------
>>>    {NULL}
>>> (1 row)
>>> postgres=# select array_agg(x) from t;
>>>     array_agg
>>> -------------
>>>    {NULL,NULL}
>>> (1 row)
>> I see nothing contradictory here.  "array_agg(distinct x)"
>> combines the two NULLs into one, which is the normal
>> behavior of DISTINCT.
>
>
> Sorry.
> It is actually inconsistency in basic SQL model in interpretation of
> NULL by UNIQUE and DISTINCT, and array_agg just follows this rule.

The behaviour of DISTINCT in aggregates matches the behaviour of the IS
(NOT) DISTINCT FROM predicate, which considers NULLs NOT DISTINCT from
eachother.

UNIQUE constraints leave it implementation-defined whether NULLs are
considered distinct (PostgreSQL defaults to NULLS DISTINCT), but that
can be overridden in the constraint definition.

- ilmari