Thread: BUG #13680: PostgreSQL backend process crashes on jsonb_object_agg() in plpgsql recursive function

The following bug has been logged on the website:

Bug reference:      13680
Logged by:          Boyko
Email address:      b.yordanov2@gmail.com
PostgreSQL version: 9.5beta1
Operating system:   CentOS release 6.4 (Final)
Description:

Hi,

I wrote the following function:

CREATE OR REPLACE FUNCTION public.recursive_stats_daily(d jsonb, i
interval)
 RETURNS TABLE(day_time_col timestamp without time zone, data_col jsonb)
 LANGUAGE plpgsql
AS $function$
begin
    return query select stats.day_time_col,jsonb_object_agg(z,(d->z ||
stats.data_col->z)) as dd from stats,jsonb_object_keys(stats.data_col) z
where stats.day_time_col::date = current_date and stats.day_time_col::time =
'00:00:00'::time + i group by stats.day_time_col;
    if day_time_col::time < '23:00:00'::time then
        return query select * from
recursive_stats_daily(dd,i+'1h'::interval);
    end if;
end $function$

The idea is to recursively concat nested json data, which is generated
hourly (24 jsons per day) into single "daily" json object.

Structure of stats:

CREATE TABLE stats (
    id integer NOT NULL,
    day_time_col timestamp without time zone DEFAULT now() NOT NULL,
    at time without time zone DEFAULT '00:00:00'::time without time zone NOT
NULL,
    data_col jsonb DEFAULT '{}'::jsonb NOT NULL
);

Example json data:

=> select * from stats where day_time_col::date = current_date;
-[ RECORD 1

]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 209
day_time_col | 2015-10-14 00:00:00
at           | 00:00:00
data_col     | {"36": {"2285": {"c": 1}}, "61": {"4023": {"c": 1}}, "71":
{"3572": {"c": 1}}, "143": {"4916": {"c": 1}}, "166": {"1171": {"c": 1}},
"185": {"604": {"c": 1}}, "264": {"6535": {"c": 1}}, "321": {"7817": {"c":
1}}, "383": {"4031": {"c": 1}}, "462": {"8303": {"c": 1}}, "486": {"9795":
{"c": 1}}, "497": {"2745": {"c": 1}}}
-[ RECORD 2

]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id           | 210
day_time_col | 2015-10-14 01:00:00
at           | 00:00:00
data_col     | {"36": {"1135": {"c": 1}}, "47": {"1364": {"c": 1}}, "54":
{"5920": {"c": 1}, "8402": {"c": 1}}, "71": {"386": {"c": 1}}, "90":
{"7145": {"c": 1}}, "94": {"8625": {"c": 1}}, "168": {"3814": {"c": 1}},
"326": {"722": {"c": 1}}, "339": {"4888": {"c": 1}}, "420": {"6447": {"c":
1}}, "430": {"5701": {"c": 1}}, "460": {"861": {"c": 1}}, "466": {"1722":
{"c": 1}}, "496": {"6945": {"c": 1}}, "498": {"6929": {"c": 1}}}

Calling the function like this causes the crash:

sravni_hstore=> select * from
recursive_stats_daily('{}'::jsonb,'1h'::interval);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Log entry:

< 2015-10-14 19:01:54.640 EEST >LOG:  statement: select * from
recursive_stats_daily('{}'::jsonb,'1h'::interval);
< 2015-10-14 19:01:54.645 EEST >LOG:  server process (PID 15993) was
terminated by signal 11: Segmentation fault
< 2015-10-14 19:01:54.645 EEST >DETAIL:  Failed process was running: select
* from recursive_stats_daily('{}'::jsonb,'1h'::interval);
< 2015-10-14 19:01:54.645 EEST >LOG:  terminating any other active server
processes
< 2015-10-14 19:01:54.645 EEST >WARNING:  terminating connection because of
crash of another server process
< 2015-10-14 19:01:54.645 EEST >DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
< 2015-10-14 19:01:54.645 EEST >HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
< 2015-10-14 19:01:54.645 EEST >FATAL:  the database system is in recovery
mode
< 2015-10-14 19:01:54.645 EEST >LOG:  all server processes terminated;
reinitializing
< 2015-10-14 19:01:54.654 EEST >LOG:  database system was interrupted; last
known up at 2015-10-14 19:01:12 EEST
< 2015-10-14 19:01:54.671 EEST >LOG:  database system was not properly shut
down; automatic recovery in progress
< 2015-10-14 19:01:54.672 EEST >LOG:  invalid record length at 0/7561438
< 2015-10-14 19:01:54.672 EEST >LOG:  redo is not required
< 2015-10-14 19:01:54.678 EEST >LOG:  MultiXact member wraparound
protections are now enabled
< 2015-10-14 19:01:54.678 EEST >LOG:  database system is ready to accept
connections
< 2015-10-14 19:01:54.678 EEST >LOG:  autovacuum launcher started
On Thu, Oct 15, 2015 at 1:08 AM,  <b.yordanov2@gmail.com> wrote:
> Calling the function like this causes the crash:
> sravni_hstore=> select * from
> recursive_stats_daily('{}'::jsonb,'1h'::interval);
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Indeed. Reusing your example with the script attached I can reproduce
the crash easily:
* thread #1: tid = 0x0000, 0x0000000106674720
postgres`JsonbValueToJsonb(val=0x0000000000000000) + 16 at
jsonb_util.c:82, stop reason = signal SIGSTOP
  * frame #0: 0x0000000106674720
postgres`JsonbValueToJsonb(val=0x0000000000000000) + 16 at
jsonb_util.c:82
    frame #1: 0x0000000106670a05
postgres`jsonb_object_agg_transfn(fcinfo=0x00007f9fc8958710) + 1061 at
jsonb.c:1824
    frame #2: 0x0000000106401bb2
postgres`advance_transition_function(aggstate=0x00007f9fc8950788,
pertrans=0x00007f9fc8958638, pergroupstate=0x00007f9fc89637b0) + 386
at nodeAgg.c:7

I am looking into it in more details, for now I have added an open item for 9.5.
Regards,
--
Michael

Attachment
On Thu, Oct 15, 2015 at 10:44 AM, Michael Paquier wrote:
> I am looking into it in more details, for now I have added an open item for 9.5.
> Regards,

This simple query reproduces the crash as well:
=# select json_object_agg(1, NULL::json);
 json_object_agg
-----------------
 { "1" : null }
(1 row)
=# select jsonb_object_agg(1, NULL::jsonb);
server closed the connection unexpectedly

It happens that jsonb_object_agg_transfn is not able to manage
correctly NULL values in the context of a JSONB value, and it seems to
me that this is caused by an oversight in datum_to_jsonb regarding the
handling of NULL values.

Attached is a patch with some regression tests for master and
REL9_5_STABLE where the bug has been introduced.
Thoughts?
--
Michael

Attachment
On Thu, Oct 15, 2015 at 7:07 PM, boyko yordanov wrote:
> sravni_hstore=# select ('{"3": {"9703": {"c": 1}}}'::jsonb->'3') || ('{"3":
> {"8309": {"c": 1}}}'::jsonb->'3');
> -[ RECORD 1 ]----------------------------------
> ?column? | {"8309": {"c": 1}, "9703": {"c": 1}}
>
> sravni_hstore=# select '{"3": {"9703": {"c": 1}}}'::jsonb->'3' || '{"3":
> {"8309": {"c": 1}}}'::jsonb->'3';
> -[ RECORD 1 ]----------------
> ?column? | {"8309": {"c": 1}}
>
> What I expect is that there should be no difference in the output of the
> last two queries. Not sure if I should submit this as a separate bug?

The last one looks correct to me, the concat operator is using on the
right side '{"3": {"8309": {"c": 1}}}'::jsonb in your last query, and
not '{"3": {"8309": {"c": 1}}}'::jsonb->'3', so || takes precedence on
->. When applying the parenthesis, ('{"3": {"8309": {"c":
1}}}'::jsonb->'3') is completely taken into account.
--
Michael
Now I get it, its about operators precedence. Thanks once again!

2015-10-15 14:31 GMT+03:00 Michael Paquier <michael.paquier@gmail.com>:

> On Thu, Oct 15, 2015 at 7:07 PM, boyko yordanov wrote:
> > sravni_hstore=# select ('{"3": {"9703": {"c": 1}}}'::jsonb->'3') ||
> ('{"3":
> > {"8309": {"c": 1}}}'::jsonb->'3');
> > -[ RECORD 1 ]----------------------------------
> > ?column? | {"8309": {"c": 1}, "9703": {"c": 1}}
> >
> > sravni_hstore=# select '{"3": {"9703": {"c": 1}}}'::jsonb->'3' || '{"3":
> > {"8309": {"c": 1}}}'::jsonb->'3';
> > -[ RECORD 1 ]----------------
> > ?column? | {"8309": {"c": 1}}
> >
> > What I expect is that there should be no difference in the output of the
> > last two queries. Not sure if I should submit this as a separate bug?
>
> The last one looks correct to me, the concat operator is using on the
> right side '{"3": {"8309": {"c": 1}}}'::jsonb in your last query, and
> not '{"3": {"8309": {"c": 1}}}'::jsonb->'3', so || takes precedence on
> ->. When applying the parenthesis, ('{"3": {"8309": {"c":
> 1}}}'::jsonb->'3') is completely taken into account.
> --
> Michael
>
Applied the patch and the crash is gone, looks good, was able to debug my
function and it now works as expected, posting it for the record:

create or replace function recursive_stats_daily (d jsonb, i interval)
returns table (daily_json_for date, data_col jsonb)
language plpgsql
as $$
begin
    if i::time < '23:00:00'::time then
        return query
        select * from recursive_stats_daily(
        (select d || jsonb_object_agg(z,(coalesce(d->z,'{}'::jsonb)::jsonb
|| (stats.data_col->z)::jsonb)) from
stats,jsonb_object_keys(stats.data_col) z where stats.day_time_col::date =
current_date and stats.day_time_col::time = '00:00:00'::time + i group by
stats.day_time_col),
        (i+'1h'::interval)
        );
    else
    return query select current_date, d;
    end if;
end $$;

As a side note - I notice this weird behavior of the concat operator -
below queries show what I mean:

sravni_hstore=# select ('{"3": {"9703": {"c": 1}}}'::jsonb->'3');
-[ RECORD 1 ]----------------
?column? | {"9703": {"c": 1}}

sravni_hstore=# select '{"3": {"9703": {"c": 1}}}'::jsonb->'3';
-[ RECORD 1 ]----------------
?column? | {"9703": {"c": 1}}

sravni_hstore=# select ('{"3": {"8309": {"c": 1}}}'::jsonb->'3');
-[ RECORD 1 ]----------------
?column? | {"8309": {"c": 1}}

sravni_hstore=# select '{"3": {"8309": {"c": 1}}}'::jsonb->'3';
-[ RECORD 1 ]----------------
?column? | {"8309": {"c": 1}}

sravni_hstore=# select ('{"3": {"9703": {"c": 1}}}'::jsonb->'3') || ('{"3":
{"8309": {"c": 1}}}'::jsonb->'3');
-[ RECORD 1 ]----------------------------------
?column? | {"8309": {"c": 1}, "9703": {"c": 1}}

sravni_hstore=# select '{"3": {"9703": {"c": 1}}}'::jsonb->'3' || '{"3":
{"8309": {"c": 1}}}'::jsonb->'3';
-[ RECORD 1 ]----------------
?column? | {"8309": {"c": 1}}

What I expect is that there should be no difference in the output of the
last two queries. Not sure if I should submit this as a separate bug?

Thanks for the effort! :)
Boyko

2015-10-15 8:41 GMT+03:00 Michael Paquier <michael.paquier@gmail.com>:

> On Thu, Oct 15, 2015 at 10:44 AM, Michael Paquier wrote:
> > I am looking into it in more details, for now I have added an open item
> for 9.5.
> > Regards,
>
> This simple query reproduces the crash as well:
> =# select json_object_agg(1, NULL::json);
>  json_object_agg
> -----------------
>  { "1" : null }
> (1 row)
> =# select jsonb_object_agg(1, NULL::jsonb);
> server closed the connection unexpectedly
>
> It happens that jsonb_object_agg_transfn is not able to manage
> correctly NULL values in the context of a JSONB value, and it seems to
> me that this is caused by an oversight in datum_to_jsonb regarding the
> handling of NULL values.
>
> Attached is a patch with some regression tests for master and
> REL9_5_STABLE where the bug has been introduced.
> Thoughts?
> --
> Michael
>
Michael Paquier <michael.paquier@gmail.com> writes:
> It happens that jsonb_object_agg_transfn is not able to manage
> correctly NULL values in the context of a JSONB value, and it seems to
> me that this is caused by an oversight in datum_to_jsonb regarding the
> handling of NULL values.

> Attached is a patch with some regression tests for master and
> REL9_5_STABLE where the bug has been introduced.

This looks sane to me, although since the function's header comment
specifies that tcategory is not to be relied on when is_null is true,
I think it would be cleaner to test is_null before looking at tcategory.
Will adjust and push.

            regards, tom lane
On Fri, Oct 16, 2015 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> It happens that jsonb_object_agg_transfn is not able to manage
>> correctly NULL values in the context of a JSONB value, and it seems to
>> me that this is caused by an oversight in datum_to_jsonb regarding the
>> handling of NULL values.
>
>> Attached is a patch with some regression tests for master and
>> REL9_5_STABLE where the bug has been introduced.
>
> This looks sane to me, although since the function's header comment
> specifies that tcategory is not to be relied on when is_null is true,
> I think it would be cleaner to test is_null before looking at tcategory.
> Will adjust and push.

Thanks!
--
Michael