Thread: BUG #9519: Allows storing scalar json, but fails when querying
The following bug has been logged on the website: Bug reference: 9519 Logged by: Alf Kristian Støyle Email address: alf.kristian@gmail.com PostgreSQL version: 9.3.2 Operating system: Red Hat 4.6.3-2 Description: Steps to reproduce: create table jtest (data json); => CREATE TABLE insert into jtest (data) values ('1'); => INSERT 0 1 select data->>'foo' from jtest; => ERROR: cannot extract element from a scalar I think the insert should fail, since '1' is not valid JSON. After the data is in the database every query using the ->> operator, hitting the row containing '1' will fail. Our database runs in Amazon RDS, 'select version();' returns: PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2), 64-bit
On 3/10/14, 4:09 PM, alf.kristian@gmail.com wrote: > I think the insert should fail, since '1' is not valid JSON. It's a valid "JSON value", it's just not an object. If you only want to store JSON objects in the table, consider using a CHECK constraint. Regards, Marko Tiikkaja
On 3/10/2014 12:52 PM, Marko Tiikkaja wrote: > On 3/10/14, 4:09 PM, alf.kristian@gmail.com wrote: >> I think the insert should fail, since '1' is not valid JSON. > > It's a valid "JSON value", it's just not an object. then why does querying it fail ? -- john r pierce 37N 122W somewhere on the middle of the left coast
If the JSON datatype accepts JSON values (not just objects), which I supposed is a valid approach, then shouldn't the JSON query operators be able to deal with that?
select data->>'foo' from jtest;
=> ERROR: cannot extract element from a scalar
Having to add an extra check constraint here is not obvious, imho.
For the record, we are working around this problem, through a small hack. Some other system is storing this kind of data in our database. We have reported a bug with them to fix it.
This is not a big problem for us, but we love Postgres, so we thought we should report this.
Cheers,
Alf
Alf
On 10 March 2014 20:52, Marko Tiikkaja <marko@joh.to> wrote:
On 3/10/14, 4:09 PM, alf.kristian@gmail.com wrote:I think the insert should fail, since '1' is not valid JSON.
It's a valid "JSON value", it's just not an object. If you only want to store JSON objects in the table, consider using a CHECK constraint.
Regards,
Marko Tiikkaja
On 3/10/2014 1:39 PM, Alf Kristian Støyle wrote: > This is the "unfortunate" part: > /select data->>'foo' from jtest; > => ERROR: cannot extract element from a scalar > / > So either, only accept JSON object (not simple JSON values), or make > the JSON query operators work with JSON values as well. the json ->> 'fieldname' operator has no meaning when applied to a scalar value. what would you suggest it should do? return a NULL ? -- john r pierce 37N 122W somewhere on the middle of the left coast
alf.kristian wrote > If the JSON datatype accepts JSON values (not just objects), which I > supposed is a valid approach, then shouldn't the JSON query operators be > able to deal with that? > > This is the "unfortunate" part: > > > *select data->>'foo' from jtest; => ERROR: cannot extract element from a > scalar* > So either, only accept JSON object (not simple JSON values), or make the > JSON query operators work with JSON values as well. > > Having to add an extra check constraint here is not obvious, imho. > > For the record, we are working around this problem, through a small hack. > Some other system is storing this kind of data in our database. We have > reported a bug with them to fix it. > > This is not a big problem for us, but we love Postgres, so we thought we > should report this. > > Cheers, > Alf > > > > On 10 March 2014 20:52, Marko Tiikkaja < > marko@ > > wrote: > >> On 3/10/14, 4:09 PM, > alf.kristian@ > wrote: >> >>> I think the insert should fail, since '1' is not valid JSON. >>> >> >> It's a valid "JSON value", it's just not an object. If you only want to >> store JSON objects in the table, consider using a CHECK constraint. >> >> >> Regards, >> Marko Tiikkaja >> Not really sure how robust you expect the system to be in the face of polymorphic columns. SELECT ('["a","b","c"]'::json)->>'not_a_key' -- ERROR: cannot extract field from a non-object The system supposes that, at a structural level, you are dealing with column-consistent data and so if you ask for something that does not make sense (i.e., an object key when you have an array or a scalar) it will warn you. I guess, in theory, any de-referencing that does not find a valid target could return NULL...though I'm not sure that is an improvement. This is a relational database and so it is expected that a column defines a single thing and that thing naturally can be one-of a scalar, object, or array. Given that underlying assumption - though likely never truly spelled out anywhere in the documentation - allowing and then throwing a run-time error when the specific sub-type of json does not match the given operator makes sense. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-9519-Allows-storing-scalar-json-but-fails-when-querying-tp5795417p5795449.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
John R Pierce wrote > On 3/10/2014 1:39 PM, Alf Kristian St=C3=B8yle wrote: >> This is the "unfortunate" part: >> /select data->>'foo' from jtest; >> =3D> ERROR: cannot extract element from a scalar >> / >> So either, only accept JSON object (not simple JSON values), or make=20 >> the JSON query operators work with JSON values as well. >=20 > the json ->> 'fieldname' operator has no meaning when applied to a=20 > scalar value. what would you suggest it should do? return a NULL ? "key does not exist" is represented by NULL if the operator is applied to a= n object so, in theory, since the key does not exist in a scalar the same value - NULL - would not be unexpected. From a theory perspective, and based upon typical database normalization rules, the current behavior makes the most sense but it does force the user to be consistent in defining of their JSON models - a little big-brother-is= h but if you are only going to support a single set of rules the more-strict ones are generally preferable. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-9= 519-Allows-storing-scalar-json-but-fails-when-querying-tp5795417p5795451.ht= ml Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
On Mon, Mar 10, 2014 at 8:09 AM, <alf.kristian@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: 9519 > Logged by: Alf Kristian St=F8yle > Email address: alf.kristian@gmail.com > PostgreSQL version: 9.3.2 > Operating system: Red Hat 4.6.3-2 > Description: > > Steps to reproduce: > create table jtest (data json); > =3D> CREATE TABLE > > insert into jtest (data) values ('1'); > =3D> INSERT 0 1 > > select data->>'foo' from jtest; > =3D> ERROR: cannot extract element from a scalar > > > I think the insert should fail, since '1' is not valid JSON. > > After the data is in the database every query using the ->> operator, > hitting the row containing '1' will fail. > Lets say the value was instead {"a":1}. Now every query using data->'a'->>'b' will fail when it hits that row. So forbidding values does not fix the problem, it just moves it down a level. A possible solution is to make ->> return NULL (like it does for accessing values of non-existent keys) rather than raise an error when used on a scalar. Whether this would be an improvement, I don't know. Note that the construct: data #> '{a,b}' does return null in this case, and does not raise an error. You could argue that that is an inconsistency. On the other hand, you could argue it provides you with the flexibility to accomplish different things depending on which you desire. So if you want the NULL behavior, you could use this to get it: data #>> '{foo}' Cheers, Jeff
Hi, On 10/03/14 14:09, David Johnston wrote: > The system supposes that, at a structural level, you are dealing with > column-consistent data and so if you ask for something that does not make > sense (i.e., an object key when you have an array or a scalar) it will wa= rn > you. Looking at the common usage of JSON it doesn't seem to be a good idea to imply column-consistent JSON. Normally when JSON is used the application ensures consistency and when a non-existent key is accessed NULL is returned. I think this is expected behavior and we should do it like that, too; if one wants it different, he should use CHECK constraints. Best regards, --=20 Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
The #> operator works in SELECT, e.g. does not fail on when JSON column contains JSON values. Thanks for the tip!
However when using it in the WHERE clause I get no result.
select * from jtest;
data
-------------
1
1
{"a" : "b"}
(3 rows)
select data #> '{"a"}' from jtest;
?column?
----------
"b"
(3 rows)
select data #> '{"a"}' from jtest where (data #> '{"a"}') = 'b';
ERROR: operator does not exist: json = unknown
LINE 1: ...CT data #> '{"a"}' from jtest where (data #> '{"a"}') = 'b';
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
select data #> '{"a"}' from jtest where (data #> '{"a"}')::text = 'b';
?column?
----------
(0 rows)
select * from jtest;
data
-------------
1
1
{"a" : "b"}
(3 rows)
select data #> '{"a"}' from jtest;
?column?
----------
"b"
(3 rows)
select data #> '{"a"}' from jtest where (data #> '{"a"}') = 'b';
ERROR: operator does not exist: json = unknown
LINE 1: ...CT data #> '{"a"}' from jtest where (data #> '{"a"}') = 'b';
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
select data #> '{"a"}' from jtest where (data #> '{"a"}')::text = 'b';
?column?
----------
(0 rows)
Am I doing a wrong conversion here, or is something else going on? If the data in the database did not contain scalar values, then ->> works fine in WHERE. The following is almost the query we are actually trying run (checking for existence):
select data->>'a' from jtest where data->>'a' = 'b';
?column?
----------
b
(1 row)
select data->>'a' from jtest where data->>'a' = 'b';
?column?
----------
b
(1 row)
Regarding the ->> operator, I think it is unfortunate behavior it fails like that, I suppose we were expecting NULL behavior. However we are working around this, so if you don't think this should change, then we are fine with that :)
Just a note though. It took us a while to track down the problem. We have a table with several million rows, and suddenly our queries started failing, since someone had started to insert scalars. Others might also struggle to figure out what is wrong if they bump into this behavior.
Cheers,
Alf
Alf
On 10 March 2014 22:42, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Mar 10, 2014 at 8:09 AM, <alf.kristian@gmail.com> wrote:The following bug has been logged on the website:
Bug reference: 9519
Logged by: Alf Kristian Støyle
Email address: alf.kristian@gmail.com
PostgreSQL version: 9.3.2
Operating system: Red Hat 4.6.3-2
Description:
Steps to reproduce:
create table jtest (data json);
=> CREATE TABLE
insert into jtest (data) values ('1');
=> INSERT 0 1
select data->>'foo' from jtest;
=> ERROR: cannot extract element from a scalarI think the insert should fail, since '1' is not valid JSON.After the data is in the database every query using the ->> operator,
hitting the row containing '1' will fail.Lets say the value was instead {"a":1}.Now every query using data->'a'->>'b' will fail when it hits that row.So forbidding values does not fix the problem, it just moves it down a level.A possible solution is to make ->> return NULL (like it does for accessing values of non-existent keys) rather than raise an error when used on a scalar. Whether this would be an improvement, I don't know.Note that the construct:data #> '{a,b}'does return null in this case, and does not raise an error. You could argue that that is an inconsistency. On the other hand, you could argue it provides you with the flexibility to accomplish different things depending on which you desire.So if you want the NULL behavior, you could use this to get it:data #>> '{foo}'Cheers,Jeff
Christian Kruse-4 wrote > Hi, > > On 10/03/14 14:09, David Johnston wrote: > >> The system supposes that, at a structural level, you are dealing with >> column-consistent data and so if you ask for something that does not make >> sense (i.e., an object key when you have an array or a scalar) it will >> warn >> you. > > Looking at the common usage of JSON it doesn't seem to be a good idea > to imply column-consistent JSON. Normally when JSON is used the > application ensures consistency and when a non-existent key is > accessed NULL is returned. I think this is expected behavior and we > should do it like that, too; if one wants it different, he should use > CHECK constraints. The database is the application. It's job is to store data so that, multiple, other applications can access it and in such a way that those applications do not have to enforce data integrity since the database has taken care of that responsibility. I can see it both ways and so maybe a set of null-returning functions and operators need to be made standard so the user can choose which behavior is desired. I cannot see changing the behavior of the current operators since these are in the wild - though turning errors into non-errors isn't really a risk since no existing code can reasonably be said to rely on such behavior. The advantage of avoiding data-specific errors is probably worth it. It makes finding problems harder, not easier, to locate if one doesn't make liberal use of check constraints but personally I'd rather have some wrong data interspersed with the good data instead of throwing out al the data because one single value is bad - a value that may never make it to the final result anyway. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-9519-Allows-storing-scalar-json-but-fails-when-querying-tp5795417p5795536.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Alf Kristian St=C3=B8yle wrote > select data #> '{"a"}' from jtest; > ?column? > ---------- >=20 >=20 > "b" > (3 rows) >=20 >=20 > select data #> '{"a"}' from jtest where (data #> '{"a"}')::text =3D 'b'; > ?column? > ---------- > (0 rows) >=20 >=20 > Am I doing a wrong conversion here, or is something else going on? If the > data in the database did not contain scalar values, then ->> works fine i= n > WHERE. The following is almost the query we are actually trying run > (checking for existence): Side not - please follow the example set by others and trim your quoting an= d bottom-post. As to your comment - the example above shows that the where clause is evaluating to: Where '"b"' =3D 'b' -- which is false. Maybe try #>> which is defined to return text instead of an object. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-9= 519-Allows-storing-scalar-json-but-fails-when-querying-tp5795417p5795538.ht= ml Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
On Tue, Mar 11, 2014 at 1:32 AM, Alf Kristian St=F8yle <alf.kristian@gmail.= com > wrote: > > > select data #> '{"a"}' from jtest where (data #> '{"a"}') =3D 'b'; > ERROR: operator does not exist: json =3D unknown > LINE 1: ...CT data #> '{"a"}' from jtest where (data #> '{"a"}') =3D 'b'; > ^ > HINT: No operator matches the given name and argument type(s). You might > need to add explicit type casts. > "#>>" returns text directly, just like ->> vs ->. > > select data #> '{"a"}' from jtest where (data #> '{"a"}')::text =3D 'b'; > ?column? > ---------- > (0 rows) > > > Am I doing a wrong conversion here, or is something else going on? > If you put the (data #> '{"a"}')::text construct in the select, you can see what is going on. pulling out the element as JSON quotes the value (if it not a number), because that is what JSON values are supposed to be, and then converting to text leaves the quotes in place. So you are comparing the 3 character '"b"' to the one character string 'b', and they are not equal. If you use #>>, it is pulled out as text in the first place and the quotes are not put on there. > If the data in the database did not contain scalar values, then ->> works > fine in WHERE. The following is almost the query we are actually trying r= un > (checking for existence): > > select data->>'a' from jtest where data->>'a' =3D 'b'; > ?column? > ---------- > b > (1 row) > > > Regarding the ->> operator, I think it is unfortunate behavior it fails > like that, I suppose we were expecting NULL behavior. However we are > working around this, so if you don't think this should change, then we ar= e > fine with that :) > I don't really have an opinion on that, it just isn't obvious which way is better--I can see times I would want either one. There are people who have thought about this much more deeply than I have, but they haven't shown up on this thread yet. (I think they are too busy over on the hackers list, arguing over what behavior the next generation of json operators should have.) You can create a new operator with the behavior you want. I would like some simple notation one could add to an operator or function invocation which means "catch errors and convert to null", as I have several plperl functions which I have created in two forms, one with an eval block and one without. It would be nice to have one function with a run-time notation to distinguish the behavior. Cheers, Jeff
Thank you for your help, and thorough explanation.
Creating a new operator, is way out of my league, and we now have a good way to work around our problems. So we are very pleased :)
I suppose if no one else complains, this shouldn't be fixed/changed, and you shouldn't spend time creating a new operator.Creating a new operator, is way out of my league, and we now have a good way to work around our problems. So we are very pleased :)
Alf
On 11 March 2014 17:03, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Mar 11, 2014 at 1:32 AM, Alf Kristian Støyle <alf.kristian@gmail.com> wrote:
select data #> '{"a"}' from jtest where (data #> '{"a"}') = 'b';
ERROR: operator does not exist: json = unknown
LINE 1: ...CT data #> '{"a"}' from jtest where (data #> '{"a"}') = 'b';
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts."#>>" returns text directly, just like ->> vs ->.
select data #> '{"a"}' from jtest where (data #> '{"a"}')::text = 'b';
?column?
----------
(0 rows)Am I doing a wrong conversion here, or is something else going on?If you put the (data #> '{"a"}')::text construct in the select, you can see what is going on. pulling out the element as JSON quotes the value (if it not a number), because that is what JSON values are supposed to be, and then converting to text leaves the quotes in place. So you are comparing the 3 character '"b"' to the one character string 'b', and they are not equal. If you use #>>, it is pulled out as text in the first place and the quotes are not put on there.If the data in the database did not contain scalar values, then ->> works fine in WHERE. The following is almost the query we are actually trying run (checking for existence):
select data->>'a' from jtest where data->>'a' = 'b';
?column?
----------
b
(1 row)Regarding the ->> operator, I think it is unfortunate behavior it fails like that, I suppose we were expecting NULL behavior. However we are working around this, so if you don't think this should change, then we are fine with that :)I don't really have an opinion on that, it just isn't obvious which way is better--I can see times I would want either one. There are people who have thought about this much more deeply than I have, but they haven't shown up on this thread yet. (I think they are too busy over on the hackers list, arguing over what behavior the next generation of json operators should have.)You can create a new operator with the behavior you want. I would like some simple notation one could add to an operator or function invocation which means "catch errors and convert to null", as I have several plperl functions which I have created in two forms, one with an eval block and one without. It would be nice to have one function with a run-time notation to distinguish the behavior.Cheers,Jeff