Further issues with jsonb semantics, documentation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Further issues with jsonb semantics, documentation
Date
Msg-id CAM3SWZQFSWMi2aVi-Lun_JBYh-RfHQ3-0fm8TXpW8OLc+v8ZnQ@mail.gmail.com
Whole thread Raw
Responses Re: Further issues with jsonb semantics, documentation  (Peter Geoghegan <pg@heroku.com>)
Re: Further issues with jsonb semantics, documentation  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
I've noticed some more issues with the jsonb documentation, and the
new jsonb stuff generally. I didn't set out to give Andrew feedback on
the semantics weeks after feature freeze, but unfortunately this feels
like another discussion that we need to have now rather than later.

"operator jsonb - integer"
===================

Summary: I think that this operator has a problem, but a problem that
can easily be fixed.


I think it was a bad idea to allow array-style removal of object
key/value pairs. ISTM that it implies a level of stability in the
ordering that doesn't make sense. Besides, is it really all that
useful?

Consider this case:

postgres=# select '{"c":5, "a":6, "b":7}'::jsonb - 1;
     ?column?
------------------
 {"a": 6, "c": 5}
(1 row)

Clearly anyone expecting the value "a" to be removed here would be in
for a surprise. Moreover, it is inconsistent with the established
behavior of the corresponding array-wise subscript operator:

postgres=# select '{"c":5, "a":6, "b":7}'::jsonb -> 1;
 ?column?
----------
 [null]
(1 row)

I suggest, given that this is conceptually a data-modifying operator,
that the minus operator/jsonb_delete() case raise an error rather than
matching "operator jsonb -> integer" and returning NULL. I say this as
the person who successfully argued that the -> operator case above
should return NULL during the 9.4 beta period; returning SQL NULL for
the delete/minus operator feels like going too far in the direction of
permissiveness, even for jsonb; my expression index argument does not
apply here as it did for the "operator jsonb -> integer" case.

"operator jsonb - text"
================

Summary: I think that this operator is fine.


Documentation needs work, though. The "operator jsonb - text" operator
ought to be documented as in the attached patch, which is closer to
the equivalent hstore operator, and emphasizes the "operator jsonb ?
text" definition of a key. It should emphasize its similarity to the
established "operator jsonb ? text" operator, and in particular that
array elements behave as keys *iff* they're strings.

"operator jsonb - text[]" (and *nested* deletion more generally)
===============================================

Summary: I think that this operator has many problems, and should be
scraped (although only as an operator). IMV nested deletion should
only be handled by functions, and the way that nested deletion works
in general should be slightly adjusted.


The new "operator jsonb - text[]" operator is confusingly inconsistent with:

A) "operator jsonb text"

and:

B) the established "operator hstore - text[]" operator, since that
operator deletes all key/value pairs that have keys that match any of
the right operand text array values. In contrast, this new operator is
passed as its right operand an array of text elements that constitute
a "path" (so the order in the rhs text[] operand matters). If the text
element in the rhs text[] operand happens to be what would pass for a
Postgres integer literal, it can be used to traverse lhs array values
through subscripting at that nesting level.

Regarding nested deletion behavior more generally, consider this
example of how this can work out badly:

postgres=# select jsonb_delete(jsonb_set('["a"]', '{5}', '"b"'), '{5}')  ;
 jsonb_delete
--------------
 ["a", "b"]
(1 row)

Here, we're adding and then deleting an array element at offset 5 (the
string "b"). But the element is never deleted by the outer
jsonb_delete(), because we can't rely on the element actually being
stored at offset 5. Seems a bit fragile.

More importantly, consider the inconsistency with "operator jsonb
text" ("point A" above):

postgres=# select '["a"]'::jsonb  ?| '{a}'::text[]; -- historic/9.4 behavior
 ?column?
----------
 t
(1 row)

postgres=# select '["a"]'::jsonb  - '{a}'::text[]; -- new to 9.5
operator, does not delete!
 ?column?
----------
 ["a"]
(1 row)

Perhaps most questionably of all, the non-array based minus/delete
operator (which I like) *does* have the same idea of matching a key as
the established "operator jsonb ?| text[]" operator (and "operator
jsonb ? text", etc):

postgres=# select '["a"]'::jsonb  - 'a'::text; -- new to 9.5 operator,
*does* delete!
 ?column?
----------
 []
(1 row)

This conceptual model for manipulating jsonb is entirely new and novel
to this new operator "operator text[]" (and jsonb_set()).

"operator jsonb - text[]" categorization/conceptual model
==========================================

Operators like the established "operator jsonb -> integer" operator (a
jsonb "array-wise" operator) always seemed okay to me because the rhs
operand really was a Postgres integer, and because it's explicitly an
array-wise operator (just like "operator -> text" is explicitly
object-wise). But now, with these new operators, you've added a
"shadow type" system to certain rhs text[] operands, consisting of
types not explicitly delineated by JSON-style double quotes (for
strings, say). So there is kind of a second shadow type system in
play, similar to that of jsonb except that text[] "shadow types"
cannot be distinguished -- '{0}'::text[] could be intended to affect
an array or an object with the key value "0" in one of its pairs.

Accordingly, I would vote for changing this, and making the nested
deletion stuff only care about object key values and array *string*
elements. This backs out of the idea of a new "shadow type" system for
text arrays. I think that this is conceptually a lot cleaner, while
actually not being significantly less useful for most use cases
(nesting tends to involve only objects anyway). As noted in my summary
of the previous section, I would also vote for scraping "operator
jsonb - text[]" as a jsonb_delete() wrapper (see closing summary below
for more on why).

While I appreciate that Andrew wanted to make deletion as flexible as
possible, these inconsistencies feel arbitrary to me.

Closing summary
=============

At the very least, some of these new jsonb operators need to decide if they're:

1) Specifically "array-wise" or "object-wise", like the existing
operators "operator -> integer", or  "operator -> text".

or:

2) "Key" operators.  Operators that share the "operator jsonb ? text"
idea of a key, and operate on jsonb datums accordingly. "Keys" here
include array elements that are strings.

As I've said, I prefer option 2 for deletion, which is why I like
"operator jsonb - text" but not "operator jsonb - text[]", but either
way it needs to be *a lot* clearer than it is at the moment.

You might also say that there is a category 3 jsonb operator, of which
the containment operator ("operator @>") is the best example. It takes
jsonb on the rhs, and so naturally cares about types including
container types on the lhs. Clearly these new operators are far enough
away from category 3 that we cannot really contemplate moving them
into category 3, particularly post feature freeze (even though, as I
said, I think that would be a superior approach).

Some of what I've said here is just my opinion, but I feel pretty
confident that for example we don't want to add a fourth "operator
category" to jsonb, a weird hybrid between category 2 and category 3.
Having 3 "operator categories" seems quite enough. And by making
*nested* deletion only possible through a function call to
jsonb_delete() (by scrapping "operator jsonb - text[]" as I suggest),
you also avoid having an *operator* that is still somewhat not like
other operators in category 2 (since the other operators in category 2
don't care about nesting). That feels cleaner -- IMV the oddness of
walking a path based on a text[] value ought to be confined to a well
named function. Plus you can then add a new "operator jsonb - text[]"
that matches "operator hstore - text[]" and comports with the existing
"operator jsonb - text".

If you don't like my taxonomy for jsonb operators, then feel free to
come up with your own. As things stand, it is hard to describe a
taxonomy that makes the operators easy to understand and
non-surprising -- there'd be more exceptions than rules. I feel we
need to be disciplined about this stuff, or jsonb will become much
harder to use than it needs to be.

Opinions?
--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add error handling to byteaout.
Next
From: Amit Kapila
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file