Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN - Mailing list pgsql-bugs

From sulfinu@gmail.com
Subject Re: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN
Date
Msg-id CAGH1kmwPpDUCe7-0fN0aRDkUuFgv2HEXh59XQY3k06We193PCg@mail.gmail.com
Whole thread Raw
In response to Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN  (sulfinu@gmail.com)
List pgsql-bugs
As it always happens, I've had a better idea for an workaround after sending the e-mail:
select * from
a,
    lateral (select json_agg(b) from
             b
             where b.a = a.id) q;


Hello,

I'm sorry if this message brings up once more an already settled issue, but there's no public list of  bug reports for PostgreSQL (the release notes contain only acknowledged, solved bugs).

This problem was noticed in version 15.1, as I don't have the latest version at hand, but I've read the release notes for 15.2 and 15.3 and there's no sign of any related change. It's about the outcome of a SELECT statement which involves computing aggregate functions like json_agg, jsonb_agg and array_agg over columns that originate from outer-joined entities, when these entities are missing.

Here's a very simple schema to illustrate the problem:
create table a (
  id serial2 primary key,
  name text
);

create table b (
  name text primary key,
  a int2 references a(id)
);

insert into a (name) values ('a1'), ('a2');
insert into b (name, a) values
  ('b for a1', 1), ('another b for a1', 1);
Notice that entity named a2 in table a has no corresponding bs. Now run this query:
select a.*, json_agg(b) from
  a
  left join b on b.a = a.id
  group by a.id;
The aggregation column for entity a2 has value [null] (if array_agg() was used, the obtained value would be {NULL}).

Since the query without aggregation
select a.*, b.* from
  a
  left join b on b.a = a.id;
produces an entry for entity a2 with NULL values for columns belonging to table b, I expect the aggregation to produce either [{"name":null,"a":null}] or the SQL NULL for it, preferrably the latter.

I suppose there's a dillema regarding what to return: the aggregation is computed over one row, but its values are actually missing. The curent choice of PostgreSQL is somewhere in-between, which is inappropriate from both perspectives - a political compromise. Moreover, in the case of array_agg(), the returned value could have also originated from a single-row single-column actual NULL value!

Returning a simple SQL NULL is the most appropriate choice, in line with the general rule that aggregations over zero rows return NULL. For the database client, it has the benefit of being both cheap to detect and distinguishable from other cases.

Although there is a workaround for this problem, it implies checking a stupid condition for absolutely every row in the JOIN result:
select a.*, json_agg(b) filter (where b.name is not null) from
  a
  left join b on b.a = a.id
  group by a.id;

Thoughts?

pgsql-bugs by date:

Previous
From: sulfinu@gmail.com
Date:
Subject: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN
Next
From: Dippu Kumar
Date:
Subject: Re: Need Support to Upgrade from 13.6 to 15.3