Thread: BUG #15884: json_object_agg errors on null in field name

BUG #15884: json_object_agg errors on null in field name

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15884
Logged by:          Tim Möhlmann
Email address:      muhlemmer@gmail.com
PostgreSQL version: 11.3
Operating system:   Gentoo linux (stable)
Description:

According to the documentation on aggregate expressions:

https://www.postgresql.org/docs/11/sql-expressions.html#SYNTAX-AGGREGATES
"Most aggregate functions ignore null inputs, so that rows in which one or
more of the expression(s) yield null are discarded. This can be assumed to
be true, unless otherwise specified, for all built-in aggregates."

On aggregate function documentation no specific mention is made for
json_object_agg().
https://www.postgresql.org/docs/11/functions-aggregate.html

However, json_object_agg() throws an error in case of null in the
argument(s): "error: field name must not be null". And I get it, JSON keys
need to be unique strings and null is not that. However, this error is also
thrown if both keys and values are an empty CTE result.

In the following example there are pages. Each page has sections and each
sections has text fields (texts). It is a simplified version of my app's
query. When there is a page without sections json_object_agg() gets the
empty result for "s.title" and the nested json_build_object() call.

create table pages (
    page_id serial primary key,
    domain text unique not null
);

create table sections (
    section_id serial primary key,
    title text not null,
    page_id int references pages
);

create table texts (
    section_id int references sections,
    pos int not null,
    content text not null,
    primary key (section_id, pos)
);

-- spanac.com will have 3 sections with texts and images in each, various
amounts
insert into pages (domain) values ('spanac.com');
-- foo.com has 1 empty section
insert into pages (domain) values ('foo.com');
-- bar.com has no sections
insert into pages (domain) values ('bar.com');

-- spanac.com

with s as (
    insert into sections (page_id, title) select page_id, 'first' from pages
where domain = 'spanac.com' returning section_id
),
t1 as (
    insert into texts (section_id, pos, content) select section_id, 1,
'spanac one.one' from s
)
insert into texts (section_id, pos, content) select section_id, 2, 'spanac
one.two' from s;

with s as (
    insert into sections (page_id, title) select page_id, 'second' from
pages where domain = 'spanac.com' returning section_id
),
t1 as (
    insert into texts (section_id, pos, content) select section_id, 1,
'spanac two.one' from s
)
insert into texts (section_id, pos, content) select section_id, 2, 'spanac
two.two' from s;

-- foo.com

insert into sections (page_id, title) select page_id, 'empty' from pages
where domain = 'foo.com';

And this is the query that triggers the error:

with secs as (
    select p.page_id, p.domain, s.section_id as sid, s.title as title
    from pages p
    left join sections s on p.page_id = s.page_id
    where p.domain = 'bar.com' -- 'foo.com' and 'spanac.com' work fine
),
txt as (
    select
        sid,
        json_agg(
            json_build_object(
                'Pos', pos,
                'Text', content
            )
            order by pos asc
        ) as txts
    from texts
    join secs on sid = section_id
    group by sid
)
select
    json_build_object(
        'ID', s.page_id,
        'Domain', domain,
        'Sections', json_object_agg ( -- Error occurs here
            s.title,
              json_build_object(
                'ID', s.sid,
                'Texts', t.txts
            )
            order by s.sid asc
        )
    )
from secs s
left join txt t on s.sid = t.sid
group by s.page_id, domain;

The above is also available in a fiddle, although it does not match the
PostgreSQL version: https://www.db-fiddle.com/f/nzZz7jjrDSAYKtYK53bz7L/1

I've asked a question regarding this on:
https://dba.stackexchange.com/q/241541/150398. The answer I got is that this
might be a bug, hence I'm posting here. Two notes:
1. I cannot reproduce this when I simplify the query into not using a CTE
and nested json_build_object()
2. I "blame" json_object_agg()", because I used a regular json_agg() without
the s.title fields before and it worked fine. As in, the resulting json
document just had "Sections": null.

Best regards, Tim Mohlmann (muhlemmer)


Re: BUG #15884: json_object_agg errors on null in field name

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> According to the documentation on aggregate expressions:
> https://www.postgresql.org/docs/11/sql-expressions.html#SYNTAX-AGGREGATES
> "Most aggregate functions ignore null inputs, so that rows in which one or
> more of the expression(s) yield null are discarded. This can be assumed to
> be true, unless otherwise specified, for all built-in aggregates."

> On aggregate function documentation no specific mention is made for
> json_object_agg().
> https://www.postgresql.org/docs/11/functions-aggregate.html

> However, json_object_agg() throws an error in case of null in the
> argument(s): "error: field name must not be null". And I get it, JSON keys
> need to be unique strings and null is not that. However, this error is also
> thrown if both keys and values are an empty CTE result.

I'm inclined to think this is just a documentation deficiency, ie the
functions-aggregate page needs to mention that the keys input isn't
allowed to be null.

The function does (and should, I think) accept rows that have non-null key
and null value, so that's already a deviation from the "default" aggregate
behavior that should be documented.  And giving a null key with non-null
value probably needs to be an error, because silently ignoring non-null
input doesn't seem nice.  You could make an argument that rows in which
both are null should be silently ignored, but I think that's at best a
judgment call.  While it'd be convenient in some cases, you could get the
same behavior by excluding such rows with a WHERE test.  On the other
side, it's not very orthogonal with the other two cases, and arguably
it could mask mistakes.

Given that it's been working this way since the function was introduced
in 9.4, I'm disinclined to change it now.  I'll go see about improving
the docs, though.

            regards, tom lane