Thread: jsonb array-style subscripting
Hi,
Some time ago the array-style subscripting for the jsonb data type was discussed in this mailing list. I think it will be quite convenient to have a such nice syntax to update jsonb objects, so I'm trying to implement this. I created a patch, that allows doing something like this:
=# create TEMP TABLE test_jsonb_subscript (
id int,
test_json jsonb
);
=# insert into test_jsonb_subscript values
(1, '{}'),
(2, '{}');
=# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;
=# select * from test_jsonb_subscript;
id | test_json
----+--------------------------
1 | {"a": {"a1": {"a2": 42}}}
2 | {"a": {"a1": {"a2": 42}}}
(2 rows)
=# select test_json['a']['a1'] from test_jsonb_subscript;
test_json
------------
{"a2": 42}
{"a2": 42}
(2 rows)
This patch has a status "work in progress" of course. Generally speaking, this implementation extends the `ArrayRef` usage for the jsonb.
And I need some sort of advice about several questions:
* is it interesting for the community?
* is that a good idea to extend the `ArrayRef` for jsonb? If it's appropriate, probably we can rename it to `ArrayJsonbRef` of something.
* what can be improved in the code at the top level (function placement, probably, functionality duplication, etc.)?
* are there any special cases, that I should take care of in this implementation?
Attachment
On 8/17/15 12:57 PM, Dmitry Dolgov wrote: > * is it interesting for the community? We definitely need better ways to manipulate JSON. > * is that a good idea to extend the `ArrayRef` for jsonb? If it's > appropriate, probably we can rename it to `ArrayJsonbRef` of something. > * what can be improved in the code at the top level (function placement, > probably, functionality duplication, etc.)? > * are there any special cases, that I should take care of in this > implementation? How would this work when you have a JSON array? Postgres array syntax suddenly becoming key/value syntax for JSON seems like a pretty bad idea to me. Could a different syntax (maybe {}) be used instead? I'm not sure having the UPDATE you show cause objects to spring to life is so great either. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Aug 17, 2015 at 12:57 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > Hi, > > Some time ago the array-style subscripting for the jsonb data type was > discussed in this mailing list. I think it will be quite convenient to have > a such nice syntax to update jsonb objects, so I'm trying to implement this. > I created a patch, that allows doing something like this: > > > =# create TEMP TABLE test_jsonb_subscript ( > id int, > test_json jsonb > ); > > =# insert into test_jsonb_subscript values > (1, '{}'), > (2, '{}'); > > =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; > =# select * from test_jsonb_subscript; > id | test_json > ----+-------------------------- > 1 | {"a": {"a1": {"a2": 42}}} > 2 | {"a": {"a1": {"a2": 42}}} > (2 rows) > > =# select test_json['a']['a1'] from test_jsonb_subscript; > test_json > ------------ > {"a2": 42} > {"a2": 42} > (2 rows) > > > This patch has a status "work in progress" of course. Generally speaking, > this implementation extends the `ArrayRef` usage for the jsonb. > And I need some sort of advice about several questions: > > * is it interesting for the community? > * is that a good idea to extend the `ArrayRef` for jsonb? If it's > appropriate, probably we can rename it to `ArrayJsonbRef` of something. > * what can be improved in the code at the top level (function placement, > probably, functionality duplication, etc.)? > * are there any special cases, that I should take care of in this > implementation? I'm not sure if this: update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; ...is a good idea. postgres operators tend to return immutable copies of the item they are referring to. In other words, you'd never see a column operator on the 'left' side of the equals in an update statement. I think you need to look at a function to get the behavior you want: update test_jsonb_subscript set test_json = jsonb_modify(test_json, '[a][a1][a2] = 42');] ...as a hypothetical example. The idea is you need to make a function that provides the ability to make the complete json you want. Update statements always make a copy of the record anyways. merlin
Dmitry Dolgov wrote: > Some time ago the array-style subscripting for the jsonb data type was > discussed in this mailing list. I think it will be quite convenient to have > a such nice syntax to update jsonb objects, so I'm trying to implement > this. I created a patch, that allows doing something like this: > =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; FWIW we discussed exactly this during the unconference https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Unconference#Direction_of_json_and_jsonb and Andrew and Tom seemed okay with this syntax. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 17, 2015 at 12:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > I'm not sure if this: > update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; > > ...is a good idea. This kind of "targetlist indirection" is already possible with arrays and composite types. -- Peter Geoghegan
On 08/17/2015 03:26 PM, Merlin Moncure wrote: > > I'm not sure if this: > update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; > > ...is a good idea. postgres operators tend to return immutable copies > of the item they are referring to. In other words, you'd never see a > column operator on the 'left' side of the equals in an update > statement. I think you need to look at a function to get the behavior > you want: > > update test_jsonb_subscript set test_json = jsonb_modify(test_json, > '[a][a1][a2] = 42');] > > ...as a hypothetical example. The idea is you need to make a > function that provides the ability to make the complete json you want. > Update statements always make a copy of the record anyways. > Why should jsonb be different from an array? You can assign to an array element, using exactly this syntax, except that the index expressions have to be integers. This was discussed at pgcon and generally met with approval. There is some demand for it. If Dmitry hadn't done this I would probably have done it myself. cheers andrew
On Mon, Aug 17, 2015 at 12:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > ...is a good idea. postgres operators tend to return immutable copies > of the item they are referring to. This patch does not add an operator at all, actually. If feels like there ought to be an operator, but in fact there is not. The parser is hard-coded to recognize array-style subscripts, which this uses. While I'm certainly glad that Dmitry took the time to work on this, I think we will need an operator, too. Or, more accurately, there should probably be a way to make something like this use some available GIN index: postgres=# explain analyze select * from testjsonb where p['a'] = '[1]'; QUERYPLAN -----------------------------------------------------------------------------------------------------Seq Scan on testjsonb (cost=0.00..27.00 rows=7 width=32) (actual time=0.022..0.023 rows=1 loops=1) Filter: (p['a'] = '[1]'::jsonb)Planning time: 0.070 msExecution time: 0.054 ms (4 rows) This doesn't really matter with arrays, but ISTM that it matters here. I have no strong feelings on how it should work, but certain things do seem to suggest themselves. For example, maybe the parser can be made to create a query tree that uses an indexable operator based on special-case logic. Although maybe that's a kludge too far, since I can imagine it breaking other legitimate things. My sense is that this will need to be discussed. -- Peter Geoghegan
On 08/17/2015 10:57 AM, Dmitry Dolgov wrote: > Hi, > > Some time ago the array-style subscripting for the jsonb data type was > discussed in this mailing list. I think it will be quite convenient to > have a such nice syntax to update jsonb objects, so I'm trying to > implement this. I created a patch, that allows doing something like this: Yaaay! > =# create TEMP TABLE test_jsonb_subscript ( > id int, > test_json jsonb > ); > > =# insert into test_jsonb_subscript values > (1, '{}'), > (2, '{}'); > > =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; > =# select * from test_jsonb_subscript; > id | test_json > ----+-------------------------- > 1 | {"a": {"a1": {"a2": 42}}} > 2 | {"a": {"a1": {"a2": 42}}} > (2 rows) So, both perl and python do not allow "deep nesting" of assignments. For example: >>> d = { "a" : { } } >>> d["a"]["a1"]["a2"] = 42 Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'a1' ... you have to append one key level at a time. Your approach, on the other hand, feels more user-friendly to me; I can't tell you the number of "if 'a2' in dic[key]" tests I've written. So, is there any reason why consistency with perl/python behavior would be more desirable than user-friendliness? I'm thinking no, but figured that it's something which needs to come up. There is one ambiguous case you need to address: testjson = '{ "a" : { } }' SET testjson['a']['a1']['1'] = 42 ... so in this case, is '1' a key, or the first item of an array? how do we determine that? How does the user assign something to an array? > > =# select test_json['a']['a1'] from test_jsonb_subscript; > test_json > ------------ > {"a2": 42} > {"a2": 42} > (2 rows) Again, how do we handle missing keys? Just return NULL? or ERROR? I'd prefer the former, but there will be arguments the other way. > This patch has a status "work in progress" of course. Generally > speaking, this implementation extends the `ArrayRef` usage for the jsonb. > And I need some sort of advice about several questions: > > * is it interesting for the community? > * is that a good idea to extend the `ArrayRef` for jsonb? If it's > appropriate, probably we can rename it to `ArrayJsonbRef` of something. > * what can be improved in the code at the top level (function placement, > probably, functionality duplication, etc.)? > * are there any special cases, that I should take care of in this > implementation? array/key ambiguity is going to be painful. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Aug 17, 2015 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 08/17/2015 03:26 PM, Merlin Moncure wrote: >> >> >> I'm not sure if this: >> update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42; >> >> ...is a good idea. postgres operators tend to return immutable copies >> of the item they are referring to. In other words, you'd never see a >> column operator on the 'left' side of the equals in an update >> statement. I think you need to look at a function to get the behavior >> you want: >> >> update test_jsonb_subscript set test_json = jsonb_modify(test_json, >> '[a][a1][a2] = 42');] >> >> ...as a hypothetical example. The idea is you need to make a >> function that provides the ability to make the complete json you want. >> Update statements always make a copy of the record anyways. >> > > > Why should jsonb be different from an array? You can assign to an array > element, using exactly this syntax, except that the index expressions have > to be integers. > > This was discussed at pgcon and generally met with approval. There is some > demand for it. If Dmitry hadn't done this I would probably have done it > myself. Yeah, this all makes sense. I withdraw the statement. merlin
On 8/17/15 3:33 PM, Josh Berkus wrote: > Again, how do we handle missing keys? Just return NULL? or ERROR? I'd > prefer the former, but there will be arguments the other way. I've been wondering if we should add some kind of "strict" JSON. My big concern is throwing an error if you try to provide duplicate keys, but it seems reasonable that json_strict would throw an error if you try to reference something that doesn't exist. >> >This patch has a status "work in progress" of course. Generally >> >speaking, this implementation extends the `ArrayRef` usage for the jsonb. >> >And I need some sort of advice about several questions: >> > >> >* is it interesting for the community? >> >* is that a good idea to extend the `ArrayRef` for jsonb? If it's >> >appropriate, probably we can rename it to `ArrayJsonbRef` of something. >> >* what can be improved in the code at the top level (function placement, >> >probably, functionality duplication, etc.)? >> >* are there any special cases, that I should take care of in this >> >implementation? > array/key ambiguity is going to be painful. JSON keys are required to be strings, so maybe it's OK to differentiate based on whether the index is a string or a number. Or perhaps we use different nomenclature (ie: {} for objects). We should also think about what would work for hstore. Adding this for hstore is clearly separate work, but it'd be bad to close that door here. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
Dmitry Dolgov <9erthalion6@gmail.com> writes: > * is that a good idea to extend the `ArrayRef` for jsonb? No. Make a new expression node type. (Salesforce did something similar for an internal feature, and it was a disaster both for code modularity and performance. We had to change it to a separate node type, which I just got finished doing. Don't go down that path. While you're at it, I'd advise that fetch and assignment be two different node types rather than copying ArrayRef's bad precedent of using only one.) regards, tom lane
On 08/17/2015 02:18 PM, Jim Nasby wrote: > On 8/17/15 3:33 PM, Josh Berkus wrote: >> Again, how do we handle missing keys? Just return NULL? or ERROR? I'd >> prefer the former, but there will be arguments the other way. > > I've been wondering if we should add some kind of "strict" JSON. My big > concern is throwing an error if you try to provide duplicate keys, but > it seems reasonable that json_strict would throw an error if you try to > reference something that doesn't exist. Only if there's demand for it. Is there? >> array/key ambiguity is going to be painful. > > JSON keys are required to be strings, so maybe it's OK to differentiate > based on whether the index is a string or a number. Or perhaps we use > different nomenclature (ie: {} for objects). Well, we did get rid of all of those implicit conversions for a reason.So maybe that's good enough? i.e. json['a']['b'][1] = 5 "assign 5 as the first element in the array 'b' of object 'a'" json['a']['b']['1'] = 5 "assign 5 to key '1' of object 'b' of object 'a'" -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Peter Geoghegan <pg@heroku.com> writes: > This patch does not add an operator at all, actually. If feels like > there ought to be an operator, but in fact there is not. The parser is > hard-coded to recognize array-style subscripts, which this uses. > While I'm certainly glad that Dmitry took the time to work on this, I > think we will need an operator, too. Or, more accurately, there should > probably be a way to make something like this use some available GIN > index: > postgres=# explain analyze select * from testjsonb where p['a'] = '[1]'; Hm. There is definitely attraction to the idea that x[y] is some sort of operator-like syntactic sugar for invocation of an underlying function, rather than necessarily a hard-coded behavior. That would provide a framework for extending the feature to all container-like datatypes, whereas the approach Dimitry has prototyped doesn't look like it offers much help at all for other datatypes. But I do not think that that will change things very much as far as making it possible for things like the above to be indexed. You'd still have an expression tree that's a nest of two operators, which doesn't fit into our ideas about index opclasses. (That is, it still has to be a special case, so it doesn't matter that much if one of the nodes is some kind of NotAnArrayRef rather than a FuncExpr with some new CoercionForm value to make it print differently.) Also, the syntactic sugar would have to be able to deal with multiple subscripts, which makes things a lot more complicated. regards, tom lane
On 2015-08-17 22:33, Josh Berkus wrote: > So, both perl and python do not allow "deep nesting" of assignments. > For example: >>>> d = { "a" : { } } >>>> d["a"]["a1"]["a2"] = 42 > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > KeyError: 'a1' Not sure I understand what you mean. In Perl you'd do $ perl -e '%d = (a => {}); $d{a}{a1}{a2} = 42; print $d{a}{a1}{a2}' 42 which looks pretty much like what's proposed. /kaare
Hi
2015-08-17 21:12 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 8/17/15 12:57 PM, Dmitry Dolgov wrote:* is it interesting for the community?
We definitely need better ways to manipulate JSON.* is that a good idea to extend the `ArrayRef` for jsonb? If it's
appropriate, probably we can rename it to `ArrayJsonbRef` of something.
* what can be improved in the code at the top level (function placement,
probably, functionality duplication, etc.)?
* are there any special cases, that I should take care of in this
implementation?
How would this work when you have a JSON array? Postgres array syntax suddenly becoming key/value syntax for JSON seems like a pretty bad idea to me. Could a different syntax (maybe {}) be used instead?
I don't understand why '{}' should be better than '[]' ?
The lot of modern languages doesn't different between arrays and hash.
Regards
Pavel
I'm not sure having the UPDATE you show cause objects to spring to life is so great either.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 17, 2015 at 11:26 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Aug 17, 2015 at 12:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> ...is a good idea. postgres operators tend to return immutable copies
> of the item they are referring to.
This patch does not add an operator at all, actually. If feels like
there ought to be an operator, but in fact there is not. The parser is
hard-coded to recognize array-style subscripts, which this uses.
While I'm certainly glad that Dmitry took the time to work on this, I
think we will need an operator, too. Or, more accurately, there should
probably be a way to make something like this use some available GIN
index:
postgres=# explain analyze select * from testjsonb where p['a'] = '[1]';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on testjsonb (cost=0.00..27.00 rows=7 width=32) (actual
time=0.022..0.023 rows=1 loops=1)
Filter: (p['a'] = '[1]'::jsonb)
Planning time: 0.070 ms
Execution time: 0.054 ms
(4 rows)
This doesn't really matter with arrays, but ISTM that it matters here.
I have no strong feelings on how it should work, but certain things do
seem to suggest themselves. For example, maybe the parser can be made
to create a query tree that uses an indexable operator based on
special-case logic. Although maybe that's a kludge too far, since I
can imagine it breaking other legitimate things. My sense is that this
will need to be discussed.
Peter, we are thinking about better indexing of subselects, let's first have the syntax sugar in core, which Dmitry implemented.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/18/2015 01:32 AM, Pavel Stehule wrote: > Hi > > 2015-08-17 21:12 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>>: > > On 8/17/15 12:57 PM, Dmitry Dolgov wrote: > > * is it interesting for the community? > > > We definitely need better ways to manipulate JSON. > > * is that a good idea to extend the `ArrayRef` for jsonb? If it's > appropriate, probably we can rename it to `ArrayJsonbRef` of > something. > * what can be improved in the code at the top level (function > placement, > probably, functionality duplication, etc.)? > * are there any special cases, that I should take care of in this > implementation? > > > How would this work when you have a JSON array? Postgres array > syntax suddenly becoming key/value syntax for JSON seems like a > pretty bad idea to me. Could a different syntax (maybe {}) be used > instead? > > > I don't understand why '{}' should be better than '[]' ? > > The lot of modern languages doesn't different between arrays and hash. > > What is more, it would be a lot more intrusive to the parser, I suspect. We currently allow json path expressions to contain both object key and array index values, and I don't see any reason to do this differently. This is the syntax that was discussed at pgcon. cheers andrew
On 08/18/2015 01:11 AM, Kaare Rasmussen wrote: > On 2015-08-17 22:33, Josh Berkus wrote: >> So, both perl and python do not allow "deep nesting" of assignments. >> For example: >>>>> d = { "a" : { } } >>>>> d["a"]["a1"]["a2"] = 42 >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> KeyError: 'a1' > > Not sure I understand what you mean. In Perl you'd do > > $ perl -e '%d = (a => {}); $d{a}{a1}{a2} = 42; print $d{a}{a1}{a2}' > 42 > > which looks pretty much like what's proposed. > > Indeed, I mentioned recently that perl auto-vivifies intermediate paths like this. But we don't do that in jsonb_set, as was discussed back in May, and as JS doesn't either I think that decision is sound. cheers andrew
On 8/17/15 4:25 PM, Josh Berkus wrote: > On 08/17/2015 02:18 PM, Jim Nasby wrote: >> >On 8/17/15 3:33 PM, Josh Berkus wrote: >>> >>Again, how do we handle missing keys? Just return NULL? or ERROR? I'd >>> >>prefer the former, but there will be arguments the other way. >> > >> >I've been wondering if we should add some kind of "strict" JSON. My big >> >concern is throwing an error if you try to provide duplicate keys, but >> >it seems reasonable that json_strict would throw an error if you try to >> >reference something that doesn't exist. > Only if there's demand for it. Is there? I'm certainly worried (paranoid?) about it. Postgres is very good about not silently dropping data and this seems a glaring departure from that. I haven't looked yet but I'm hoping this could at least be added as an extension without duplicating a bunch of the existing JSON stuff. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 08/20/2015 03:24 PM, Jim Nasby wrote: > On 8/17/15 4:25 PM, Josh Berkus wrote: >> On 08/17/2015 02:18 PM, Jim Nasby wrote: >>> >On 8/17/15 3:33 PM, Josh Berkus wrote: >>>> >>Again, how do we handle missing keys? Just return NULL? or >>>> ERROR? I'd >>>> >>prefer the former, but there will be arguments the other way. >>> > >>> >I've been wondering if we should add some kind of "strict" JSON. My >>> big >>> >concern is throwing an error if you try to provide duplicate keys, but >>> >it seems reasonable that json_strict would throw an error if you >>> try to >>> >reference something that doesn't exist. >> Only if there's demand for it. Is there? > > I'm certainly worried (paranoid?) about it. Postgres is very good > about not silently dropping data and this seems a glaring departure > from that. I haven't looked yet but I'm hoping this could at least be > added as an extension without duplicating a bunch of the existing JSON > stuff. What on earth does this have to do with duplicate keys? The jsonb input rules on duplicate keys are very clear, incidentally, and follow the practice of most JSON processors (duplicates are discarded, last one wins.) But that has nothing at all to do with this feature, really. This feature can NOT be added as an extension, since it requires grammar processing modification. cheers andrew
On 08/20/2015 12:24 PM, Jim Nasby wrote: > On 8/17/15 4:25 PM, Josh Berkus wrote: >> On 08/17/2015 02:18 PM, Jim Nasby wrote: >>> >On 8/17/15 3:33 PM, Josh Berkus wrote: >>>> >>Again, how do we handle missing keys? Just return NULL? or >>>> ERROR? I'd >>>> >>prefer the former, but there will be arguments the other way. >>> > >>> >I've been wondering if we should add some kind of "strict" JSON. My big >>> >concern is throwing an error if you try to provide duplicate keys, but >>> >it seems reasonable that json_strict would throw an error if you try to >>> >reference something that doesn't exist. >> Only if there's demand for it. Is there? > > I'm certainly worried (paranoid?) about it. Postgres is very good about > not silently dropping data and this seems a glaring departure from that. > I haven't looked yet but I'm hoping this could at least be added as an > extension without duplicating a bunch of the existing JSON stuff. There's a big difference between silently dropping data and implicitly creating it. As I said, the only reason I could see wanting a strict mode is because AppDev users expect it to be consistent with their programming languages. Otherwise, from a user perspective, being able to create a whole nested chain in one statement is a big win. What could be added as an extension? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 8/20/15 3:44 PM, Josh Berkus wrote: > What could be added as an extension? A method for preventing duplicate object keys. Since I'm in the minority here lets just drop it. :) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com