Thread: to_json(NULL) should to return JSON null instead NULL
Hi
currently JSON support NULL value - but we doesn't convert NULL to JSON correctlypostgres=# select to_json(NULL::text) IS NULL;
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
like
postgres=# select json 'null' is null;
┌──────────┐
│ ?column? │
╞══════════╡
│ f │
└──────────┘
(1 row)
postgres=# select json 'null' is null;
┌──────────┐
│ ?column? │
╞══════════╡
│ f │
└──────────┘
(1 row)
?
Regardssee:
http://stackoverflow.com/questions/32283125/why-cant-null-be-converted-to-jsons-null-in-postgresql
http://stackoverflow.com/questions/21120999/representing-null-in-json
http://stackoverflow.com/questions/32283125/why-cant-null-be-converted-to-jsons-null-in-postgresql
http://stackoverflow.com/questions/21120999/representing-null-in-json
On Sat, Aug 29, 2015 at 8:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
probably should be json "null";Hicurrently JSON support NULL value - but we doesn't convert NULL to JSON correctly
postgres=# select to_json(NULL::text) IS NULL;
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)like
postgres=# select json 'null' is null;
┌──────────┐
│ ?column? │
╞══════════╡
│ f │
└──────────┘
(1 row)
I agree that for correctness reasons it should, but just don't think it would be OK to change this behavior -- the function was there since 9.3...
Given there were no loud complaints about this, the current behavior is appropriate for most users, the rest can still work around using coalesce(to_json(...), json 'null').
--
Alex
On 08/29/2015 08:47 AM, Shulgin, Oleksandr wrote: > On Sat, Aug 29, 2015 at 8:39 AM, Pavel Stehule > <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote: > > Hi > > currently JSON support NULL value - but we doesn't convert NULL to > JSON correctly > > postgres=# select to_json(NULL::text) IS NULL; > ┌──────────┐ > │ ?column? │ > ╞══════════╡ > │ t │ > └──────────┘ > (1 row) > > probably should be json "null"; > > like > > postgres=# select json 'null' is null; > ┌──────────┐ > │ ?column? │ > ╞══════════╡ > │ f │ > └──────────┘ > (1 row) > > > I agree that for correctness reasons it should, but just don't think > it would be OK to change this behavior -- the function was there since > 9.3... > > Given there were no loud complaints about this, the current behavior > is appropriate for most users, the rest can still work around using > coalesce(to_json(...), json 'null'). > > I don't think it's necessarily more correct. But I do agree that it's not a good idea to change the behaviour unless there is major unhappiness with it. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 08/29/2015 08:47 AM, Shulgin, Oleksandr wrote: >> Given there were no loud complaints about this, the current behavior >> is appropriate for most users, the rest can still work around using >> coalesce(to_json(...), json 'null'). > I don't think it's necessarily more correct. But I do agree that it's > not a good idea to change the behaviour unless there is major > unhappiness with it. I'm not entirely convinced that JSON NULL and SQL NULL should be treated as the same concept, so I would say that the current behavior is fine --- at least when you think about it in isolation. However, haven't we already bought into that equivalence in these examples? regression=# select row_to_json(row(1,null,2)); row_to_json ---------------------------{"f1":1,"f2":null,"f3":2} (1 row) regression=# select array_to_json(array[1,null,2]);array_to_json ---------------[1,null,2] (1 row) or even in to_json itself: regression=# select to_json(array[1,null,2]); to_json ------------[1,null,2] (1 row) The scalar case is definitely failing to be consistent with these. Is consistency a sufficient reason to change it? regards, tom lane
On Sat, Aug 29, 2015 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/29/2015 08:47 AM, Shulgin, Oleksandr wrote:
>> Given there were no loud complaints about this, the current behavior
>> is appropriate for most users, the rest can still work around using
>> coalesce(to_json(...), json 'null').
> I don't think it's necessarily more correct. But I do agree that it's
> not a good idea to change the behaviour unless there is major
> unhappiness with it.
I'm not entirely convinced that JSON NULL and SQL NULL should be treated
as the same concept, so I would say that the current behavior is fine ---
at least when you think about it in isolation. However, haven't we
already bought into that equivalence in these examples?
regression=# select row_to_json(row(1,null,2));
row_to_json
---------------------------
{"f1":1,"f2":null,"f3":2}
(1 row)
regression=# select array_to_json(array[1,null,2]);
array_to_json
---------------
[1,null,2]
(1 row)
or even in to_json itself:
regression=# select to_json(array[1,null,2]);
to_json
------------
[1,null,2]
(1 row)
The scalar case is definitely failing to be consistent with these.
Yes, that's my argument for correctness also: to_json() on a composite object should behave like distribution of to_json() calls over object/array elements.
Is consistency a sufficient reason to change it?
Not for me.
--
Alex
2015-08-29 15:43 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Sat, Aug 29, 2015 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/29/2015 08:47 AM, Shulgin, Oleksandr wrote:
>> Given there were no loud complaints about this, the current behavior
>> is appropriate for most users, the rest can still work around using
>> coalesce(to_json(...), json 'null').
> I don't think it's necessarily more correct. But I do agree that it's
> not a good idea to change the behaviour unless there is major
> unhappiness with it.
I'm not entirely convinced that JSON NULL and SQL NULL should be treated
as the same concept, so I would say that the current behavior is fine ---
at least when you think about it in isolation. However, haven't we
already bought into that equivalence in these examples?
regression=# select row_to_json(row(1,null,2));
row_to_json
---------------------------
{"f1":1,"f2":null,"f3":2}
(1 row)
regression=# select array_to_json(array[1,null,2]);
array_to_json
---------------
[1,null,2]
(1 row)
or even in to_json itself:
regression=# select to_json(array[1,null,2]);
to_json
------------
[1,null,2]
(1 row)
The scalar case is definitely failing to be consistent with these.Yes, that's my argument for correctness also: to_json() on a composite object should behave like distribution of to_json() calls over object/array elements.Is consistency a sufficient reason to change it?Not for me.
It is bug - and it should be fixed. I agree, so this change is too strong for fixing in minor version - but we can change it in unreleased major versions - 9.5 and master.
Regards
Pavel
--Alex
On 8/29/15 11:02 AM, Pavel Stehule wrote: > It is bug - and it should be fixed. I agree, so this change is too > strong for fixing in minor version - but we can change it in unreleased > major versions - 9.5 and master. +1 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On 08/29/2015 12:02 PM, Pavel Stehule wrote: > > > 2015-08-29 15:43 GMT+02:00 Shulgin, Oleksandr > <oleksandr.shulgin@zalando.de <mailto:oleksandr.shulgin@zalando.de>>: > > On Sat, Aug 29, 2015 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Andrew Dunstan <andrew@dunslane.net > <mailto:andrew@dunslane.net>> writes: > > On 08/29/2015 08:47 AM, Shulgin, Oleksandr wrote: > >> Given there were no loud complaints about this, the current behavior > >> is appropriate for most users, the rest can still work > around using > >> coalesce(to_json(...), json 'null'). > > > I don't think it's necessarily more correct. But I do agree > that it's > > not a good idea to change the behaviour unless there is major > > unhappiness with it. > > I'm not entirely convinced that JSON NULL and SQL NULL should > be treated > as the same concept, so I would say that the current behavior > is fine --- > at least when you think about it in isolation. However, > haven't we > already bought into that equivalence in these examples? > > regression=# select row_to_json(row(1,null,2)); > row_to_json > --------------------------- > {"f1":1,"f2":null,"f3":2} > (1 row) > > regression=# select array_to_json(array[1,null,2]); > array_to_json > --------------- > [1,null,2] > (1 row) > > or even in to_json itself: > > regression=# select to_json(array[1,null,2]); > to_json > ------------ > [1,null,2] > (1 row) > > The scalar case is definitely failing to be consistent with these. > > > Yes, that's my argument for correctness also: to_json() on a > composite object should behave like distribution of to_json() > calls over object/array elements. > > Is consistency a sufficient reason to change it? > > > Not for me. > > > It is bug - and it should be fixed. I agree, so this change is too > strong for fixing in minor version - but we can change it in > unreleased major versions - 9.5 and master. No, frankly that's being far too free with the word bug. It's not even unambiguously incorrect. Note that all the to_json functions are strict. In this sense it's quite consistent. If we change it to being called on null input, what should we return if a null non-scalar is passed in? cheers andrew
2015-08-29 19:11 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:
On 08/29/2015 12:02 PM, Pavel Stehule wrote:
2015-08-29 15:43 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de <mailto:oleksandr.shulgin@zalando.de>>:
On Sat, Aug 29, 2015 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:
Andrew Dunstan <andrew@dunslane.net<mailto:andrew@dunslane.net>> writes:
> On 08/29/2015 08:47 AM, Shulgin, Oleksandr wrote:
>> Given there were no loud complaints about this, the current behavior
>> is appropriate for most users, the rest can still work
around using
>> coalesce(to_json(...), json 'null').
> I don't think it's necessarily more correct. But I do agree
that it's
> not a good idea to change the behaviour unless there is major
> unhappiness with it.
I'm not entirely convinced that JSON NULL and SQL NULL should
be treated
as the same concept, so I would say that the current behavior
is fine ---
at least when you think about it in isolation. However,
haven't we
already bought into that equivalence in these examples?
regression=# select row_to_json(row(1,null,2));
row_to_json
---------------------------
{"f1":1,"f2":null,"f3":2}
(1 row)
regression=# select array_to_json(array[1,null,2]);
array_to_json
---------------
[1,null,2]
(1 row)
or even in to_json itself:
regression=# select to_json(array[1,null,2]);
to_json
------------
[1,null,2]
(1 row)
The scalar case is definitely failing to be consistent with these.
Yes, that's my argument for correctness also: to_json() on a
composite object should behave like distribution of to_json()
calls over object/array elements.
Is consistency a sufficient reason to change it?
Not for me.
It is bug - and it should be fixed. I agree, so this change is too strong for fixing in minor version - but we can change it in unreleased major versions - 9.5 and master.
No, frankly that's being far too free with the word bug. It's not even unambiguously incorrect.
Note that all the to_json functions are strict. In this sense it's quite consistent. If we change it to being called on null input, what should we return if a null non-scalar is passed in?
what is correct from JSON perspective? All fields with NULL
Pavel
cheers
andrew
On 8/29/15 12:29 PM, Pavel Stehule wrote: > Note that all the to_json functions are strict. In this sense it's > quite consistent. If we change it to being called on null input, > what should we return if a null non-scalar is passed in? > > > what is correct from JSON perspective? All fields with NULL ISTM that the whole purpose of to_json is to properly jsonify something, and the proper json form for "undefined" is 'null', is it not? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-08-29 22:06 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 8/29/15 12:29 PM, Pavel Stehule wrote:Note that all the to_json functions are strict. In this sense it's
quite consistent. If we change it to being called on null input,
what should we return if a null non-scalar is passed in?
what is correct from JSON perspective? All fields with NULL
ISTM that the whole purpose of to_json is to properly jsonify something, and the proper json form for "undefined" is 'null', is it not?
it is probably equivalent
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 8/29/15 12:29 PM, Pavel Stehule wrote: >> what is correct from JSON perspective? All fields with NULL > ISTM that the whole purpose of to_json is to properly jsonify something, > and the proper json form for "undefined" is 'null', is it not? What's not entirely clear is what we should do with cases like regression=# select array_to_json(null::int[]);array_to_json --------------- (1 row) regression=# select row_to_json(null::record);row_to_json ------------- (1 row) If we leave those alone (and in the latter case, in particular, there is not enough information available to do much else) then it's not so clear that changing to_json() is really improving consistency overall. For instance, do we really want row_to_json(null::record) and to_json(null::record) giving different results? Or if we make them both return "null", that breaks the previous invariant that row_to_json always yields a JSON object. An advantage of leaving these things as strict is that the user can easily substitute whatever specific behavior she wants for NULLs via coalesce(), as was shown upthread. If we put in a different behavior, then the only way to override it would be with a CASE, which is tedious and creates multiple-evaluation issues. I'm not necessarily against changing it --- but it doesn't seem entirely black-and-white to me, and we do now have a couple of versions worth of precedent we'd be breaking with. If we do vote to change it, I'd want to do so now (ie in 9.5) rather than create yet another year's worth of precedent. regards, tom lane
On 8/29/15 3:27 PM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> On 8/29/15 12:29 PM, Pavel Stehule wrote: >>> what is correct from JSON perspective? All fields with NULL > >> ISTM that the whole purpose of to_json is to properly jsonify something, >> and the proper json form for "undefined" is 'null', is it not? > > What's not entirely clear is what we should do with cases like > > regression=# select array_to_json(null::int[]); > array_to_json > --------------- > > (1 row) > > regression=# select row_to_json(null::record); > row_to_json > ------------- > > (1 row) > > If we leave those alone (and in the latter case, in particular, there is > not enough information available to do much else) then it's not so clear > that changing to_json() is really improving consistency overall. > For instance, do we really want row_to_json(null::record) and > to_json(null::record) giving different results? Or if we make them > both return "null", that breaks the previous invariant that row_to_json > always yields a JSON object. The tricky part is that if you're calling any of those functions to find the value to stick inside a JSON array or object then NULL is definitely incorrect. IE: if you were trying to turn the results of this create table t(a int, b text); insert into t(a) values(1); Into a JSON object, you'd want '{"a":1,"b":null}', not NULL. Of course you'd just use row_to_json() for something that simple, but if you were doing something more complex you might have to code a transform yourself. > An advantage of leaving these things as strict is that the user can easily > substitute whatever specific behavior she wants for NULLs via coalesce(), > as was shown upthread. If we put in a different behavior, then the > only way to override it would be with a CASE, which is tedious and creates > multiple-evaluation issues. Certainly true. The downside to leaving it alone is this will probably be hard to debug if you're using it to build a complex JSON object. One NULL ends up in the right place and suddenly your whole output becomes NULL. I think this is why %s works the way it does in format as well. If we do change it I think it best to add an argument to control what it does with a NULL so you can get whichever you need. It might be worth adding to (array|record)_to_json as well, though I think the use case for those is markedly different than for plain to_json, so maybe not. > I'm not necessarily against changing it --- but it doesn't seem entirely > black-and-white to me, and we do now have a couple of versions worth > of precedent we'd be breaking with. > > If we do vote to change it, I'd want to do so now (ie in 9.5) rather than > create yet another year's worth of precedent. I wonder how much people are actually using to_json(). I've done some amount of JSON massaging and don't recall needing it yet. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 8/29/15 12:29 PM, Pavel Stehule wrote:
>> what is correct from JSON perspective? All fields with NULL
> ISTM that the whole purpose of to_json is to properly jsonify something,
> and the proper json form for "undefined" is 'null', is it not?
What's not entirely clear is what we should do with cases like
regression=# select array_to_json(null::int[]);
array_to_json
---------------
(1 row)
regression=# select row_to_json(null::record);
row_to_json
-------------
(1 row)
If we leave those alone (and in the latter case, in particular, there is
not enough information available to do much else) then it's not so clear
that changing to_json() is really improving consistency overall.
For instance, do we really want row_to_json(null::record) and
to_json(null::record) giving different results? Or if we make them
both return "null", that breaks the previous invariant that row_to_json
always yields a JSON object.
An advantage of leaving these things as strict is that the user can easily
substitute whatever specific behavior she wants for NULLs via coalesce(),
as was shown upthread. If we put in a different behavior, then the
only way to override it would be with a CASE, which is tedious and creates
multiple-evaluation issues.
Wouldn't NULLIF(any, any) be another means of overriding?
I'm not necessarily against changing it --- but it doesn't seem entirely
black-and-white to me, and we do now have a couple of versions worth
of precedent we'd be breaking with.
If we do vote to change it, I'd want to do so now (ie in 9.5) rather than
create yet another year's worth of precedent.
I don't see COALESCE as being particularly problematic since we are going to JSON which means that in (nearly?) all situations all of the potential input values will be homogeneous and a single coalesce can apply the relevant logic at the point of conversion. Since row_to_json and record_to_json effectively deal with heterogeneous input types the same opportunity is not available to them and simply incorporating null into the output for that position is the only reasonable thing to do.
I'm tending to favor removing the strict modifier and having NULL => json 'null' with the advice that the old behavior can be obtained by writing "NULLIF(to_json(...), json 'null')".
If we go this route we should also consider returning json 'null' for (record|array)_to_json(NULL) at the same time.
My experience with the json feature is has strictly been the json_to_record direction...but it seems like a wasted opportunity to be useful in the default case when JSON provides a null concept that is semantically similar to the concept in SQL. While I'm hesitant to change this without representative use cases to work from the theory argument holds enough weight to strongly consider making the change.
David J.
On 08/29/2015 04:27 PM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> On 8/29/15 12:29 PM, Pavel Stehule wrote: >>> what is correct from JSON perspective? All fields with NULL >> ISTM that the whole purpose of to_json is to properly jsonify something, >> and the proper json form for "undefined" is 'null', is it not? > What's not entirely clear is what we should do with cases like > > regression=# select array_to_json(null::int[]); > array_to_json > --------------- > > (1 row) > > regression=# select row_to_json(null::record); > row_to_json > ------------- > > (1 row) > > If we leave those alone (and in the latter case, in particular, there is > not enough information available to do much else) then it's not so clear > that changing to_json() is really improving consistency overall. > For instance, do we really want row_to_json(null::record) and > to_json(null::record) giving different results? Or if we make them > both return "null", that breaks the previous invariant that row_to_json > always yields a JSON object. > > An advantage of leaving these things as strict is that the user can easily > substitute whatever specific behavior she wants for NULLs via coalesce(), > as was shown upthread. If we put in a different behavior, then the > only way to override it would be with a CASE, which is tedious and creates > multiple-evaluation issues. > > I'm not necessarily against changing it --- but it doesn't seem entirely > black-and-white to me, and we do now have a couple of versions worth > of precedent we'd be breaking with. > > If we do vote to change it, I'd want to do so now (ie in 9.5) rather than > create yet another year's worth of precedent. > > I agree with pretty much all of this. My fairly strong inclination is to leave it as it is and document the behaviour more clearly. Changing it seems likely to introduce a different inconsistency which is harder to understand. cheers andrew
2015-08-30 4:57 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:
I agree with pretty much all of this. My fairly strong inclination is to leave it as it is and document the behaviour more clearly. Changing it seems likely to introduce a different inconsistency which is harder to understand.
On 08/29/2015 04:27 PM, Tom Lane wrote:Jim Nasby <Jim.Nasby@BlueTreble.com> writes:On 8/29/15 12:29 PM, Pavel Stehule wrote:What's not entirely clear is what we should do with cases likewhat is correct from JSON perspective? All fields with NULLISTM that the whole purpose of to_json is to properly jsonify something,
and the proper json form for "undefined" is 'null', is it not?
regression=# select array_to_json(null::int[]);
array_to_json
---------------
(1 row)
regression=# select row_to_json(null::record);
row_to_json
-------------
(1 row)
If we leave those alone (and in the latter case, in particular, there is
not enough information available to do much else) then it's not so clear
that changing to_json() is really improving consistency overall.
For instance, do we really want row_to_json(null::record) and
to_json(null::record) giving different results? Or if we make them
both return "null", that breaks the previous invariant that row_to_json
always yields a JSON object.
An advantage of leaving these things as strict is that the user can easily
substitute whatever specific behavior she wants for NULLs via coalesce(),
as was shown upthread. If we put in a different behavior, then the
only way to override it would be with a CASE, which is tedious and creates
multiple-evaluation issues.
I'm not necessarily against changing it --- but it doesn't seem entirely
black-and-white to me, and we do now have a couple of versions worth
of precedent we'd be breaking with.
If we do vote to change it, I'd want to do so now (ie in 9.5) rather than
create yet another year's worth of precedent.
I agree so there is not clear solution - and both possible solution can have a real base. On second hand, the fix with COALESCE, NULLIF, .. is not hard and it is has not a performance impact - so better documentation can be good enough fix. The custom solution is ugly named but simple function to_json2
CREATE OR REPLACE FUNCTION to_json2(anyelement)
RETURNS json AS $$
SELECT COALESCE(to_json($1), json 'null')
$$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION to_json2(anyelement)
RETURNS json AS $$
SELECT COALESCE(to_json($1), json 'null')
$$ LANGUAGE sql;
Regards
Pavel
cheers
andrew
On 30/08/15 04:57, Andrew Dunstan wrote: > > > On 08/29/2015 04:27 PM, Tom Lane wrote: >> I'm not necessarily against changing it --- but it doesn't seem entirely >> black-and-white to me, and we do now have a couple of versions worth >> of precedent we'd be breaking with. >> >> If we do vote to change it, I'd want to do so now (ie in 9.5) rather than >> create yet another year's worth of precedent. >> >> > > I agree with pretty much all of this. My fairly strong inclination is to > leave it as it is and document the behaviour more clearly. Changing it > seems likely to introduce a different inconsistency which is harder to > understand. This thread reminds me of the decision we had to make when implementing ISO healthcare datatypes with its own NullFlavors in PostgreSQL, see section 3.2 of http://arxiv.org/pdf/1003.3370v1.pdf for a discussion. TLDR: even though semantically there might be overlap in the two kinds of nulls, there are two mechanisms to represent 'value at present unknown': in the heaptuple and in the datatype varlena Datum storage, each with their own IS NULL / STRICT vs isnull and other functions. We decided that trying to merge both null representing mechanisms would probably lead to an incomplete merge, and thus many unexpected problems, and that therefore a clean separation would be easiest to explain and work with. regards, Yeb Havinga