Thread: Avoiding possible future conformance headaches in JSON work
Hi, We had a short conversation about this on Friday but I didn't have time to think of a constructive suggestion, and now I've had more time to think about it. Regarding the proposed PG 13 jsonpath extensions (array, map, and sequence constructors, lambdas, map/fold/reduce, user-defined functions), literally all this stuff is in XPath/XQuery 3.1, and clearly the SQL committee is imitating XPath/XQuery in the design of jsonpath. Therefore it would not be surprising at all if the committee eventually adds those features in jsonpath. At that point, if the syntax matches what we've added, we are happy, and if not, we have a multi-year, multi-release, standard_conforming_strings-style headache. So, a few ideas fall out.... First, with Peter being a participant, if there are any rumblings in the SQL committee about adding those features, we should know the proposed syntax as soon as we can and try to follow that. If such rumblings are entirely absent, we should see what we can do to start some, proposing the syntax we've got. In either case, perhaps we should immediately add a way to identify a jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can be accepted at the start in addition to any lax/strict, so you could have 'pg lax $.map(x => x + 10)'. If we initially /require/ 'pg' for the extensions to be recognized, then we can relax the requirement for whichever ones later appear in the spec using the same syntax. If they appear in the spec with a different syntax, then by requiring 'pg' already for our variant, we already have avoided the standard_conforming_strings kind of multi-release reconciliation effort. In the near term, there is already one such potential conflict in 12beta: the like_regex using POSIX REs instead of XQuery ones as the spec requires. Of course we don't currently have an XQuery regex engine, but if we ever have one, we then face a headache if we want to move jsonpath toward using it. (Ties in to conversation [1].) Maybe we could avoid that by recognizing now an extra P in flags, to specify a POSIX re. Or, as like_regex has a named-parameter-like syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be an extra keyword in that grammar: like-regex("abc" posix). That would be safe from the committee adding a P flag that means something else. The conservative approach would be to simply require the 'posix' keyword in all cases now, simply because we don't have the XQuery regex engine. Alternatively, if there's a way to analyze a regex for the use of any constructs with different meanings in POSIX and XQuery REs (and if that's appreciably easier than writing an XQuery regex engine), then the 'posix' keyword could be required only when it matters. But the conservative approach sounds easier, and sufficient. The finer-grained analysis would have to catch not just constructs that are in one RE style and not the other, but any subtleties in semantics, and I certainly wouldn't trust myself to write that. -Chap [1] https://www.postgresql.org/message-id/5CF2754F.7000702%40anastigmatix.net
On 2019-Jun-01, Chapman Flack wrote: > In either case, perhaps we should immediately add a way to identify a > jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can > be accepted at the start in addition to any lax/strict, so you could > have 'pg lax $.map(x => x + 10)'. > > If we initially /require/ 'pg' for the extensions to be recognized, then > we can relax the requirement for whichever ones later appear in the spec > using the same syntax. If they appear in the spec with a different > syntax, then by requiring 'pg' already for our variant, we already have > avoided the standard_conforming_strings kind of multi-release > reconciliation effort. I agree we should do this (or something similar) now, to avoid future pain. It seems a similar problem to E'' strings vs. SQL-standard ''-ones, which was a painful transition. We have an opportunity to do better this time. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 1 Jun 2019, 16:41 Chapman Flack, <chap@anastigmatix.net> wrote:
Hi,
We had a short conversation about this on Friday but I didn't have time
to think of a constructive suggestion, and now I've had more time to
think about it.
Regarding the proposed PG 13 jsonpath extensions (array, map, and
sequence constructors, lambdas, map/fold/reduce, user-defined
functions), literally all this stuff is in XPath/XQuery 3.1, and
clearly the SQL committee is imitating XPath/XQuery in the design
of jsonpath.
Therefore it would not be surprising at all if the committee eventually
adds those features in jsonpath. At that point, if the syntax matches
what we've added, we are happy, and if not, we have a multi-year,
multi-release, standard_conforming_strings-style headache.
So, a few ideas fall out....
First, with Peter being a participant, if there are any rumblings in the
SQL committee about adding those features, we should know the proposed
syntax as soon as we can and try to follow that.
AFAIK, there is rumour about 'native json data type' and 'dot style syntax' for json, but not about jsonpath.
If such rumblings are entirely absent, we should see what we can do to
start some, proposing the syntax we've got.
In either case, perhaps we should immediately add a way to identify a
jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can
be accepted at the start in addition to any lax/strict, so you could
have 'pg lax $.map(x => x + 10)'.
This is exactly what we were thinking about !
If we initially /require/ 'pg' for the extensions to be recognized, then
we can relax the requirement for whichever ones later appear in the spec
using the same syntax. If they appear in the spec with a different
syntax, then by requiring 'pg' already for our variant, we already have
avoided the standard_conforming_strings kind of multi-release
reconciliation effort.
In the near term, there is already one such potential conflict in
12beta: the like_regex using POSIX REs instead of XQuery ones as the
spec requires. Of course we don't currently have an XQuery regex
engine, but if we ever have one, we then face a headache if we want to
move jsonpath toward using it. (Ties in to conversation [1].)
Maybe we could avoid that by recognizing now an extra P in flags, to
specify a POSIX re. Or, as like_regex has a named-parameter-like
syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be
an extra keyword in that grammar: like-regex("abc" posix). That would
be safe from the committee adding a P flag that means something else.
The conservative approach would be to simply require the 'posix' keyword
in all cases now, simply because we don't have the XQuery regex engine.
Alternatively, if there's a way to analyze a regex for the use of any
constructs with different meanings in POSIX and XQuery REs (and if
that's appreciably easier than writing an XQuery regex engine), then
the 'posix' keyword could be required only when it matters. But the
conservative approach sounds easier, and sufficient. The finer-grained
analysis would have to catch not just constructs that are in one RE
style and not the other, but any subtleties in semantics, and I
certainly wouldn't trust myself to write that.
We didn't think about regex, I don't know anybody working on xquery.
On 6/18/19 12:51 PM, Oleg Bartunov wrote: >> have 'pg lax $.map(x => x + 10)'. > > This is exactly what we were thinking about ! Perfect! >> specify a POSIX re. Or, as like_regex has a named-parameter-like >> syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be >> an extra keyword in that grammar: like-regex("abc" posix). That would >> be safe from the committee adding a P flag that means something else. > > We didn't think about regex, I don't know anybody working on xquery. I do. :) But is that even the point? It's already noted in [1] that the standard calls for one style of regexps and we're providing another. It's relatively uncomplicated now to add some kind of distinguishing syntax for our posix flavor of like_regex. Yes, it would be a change between beta1 and final release, but that doesn't seem unheard-of. In contrast, if such a distinction is not added now, we know that will be a headache for any future effort to more closely conform to the standard. Whether such a future effort seems near-term or far off, it doesn't seem strategic to make current choices that avoidably make it harder. Aside: I just looked over the 12 doco to see if the note in [1] is in there, and all I see is that 'like_regex' is documented as "Tests pattern matching with POSIX regular expressions." In my opinion, that ought to have a note flagging that as different from the standard. The user experience is not so good if someone comes assuming we conform to the standard, writes code, then has to learn why it didn't work. The whole doc section [2] about XML is intended to spare people from unwelcome discoveries of that sort, but it was written after the fact. I think it's better to have it from the start. [1] https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#sqljson-conformance [2] https://www.postgresql.org/docs/12/xml-limits-conformance.html
Chapman Flack <chap@anastigmatix.net> writes: > But is that even the point? It's already noted in [1] that the standard > calls for one style of regexps and we're providing another. > It's relatively uncomplicated now to add some kind of distinguishing > syntax for our posix flavor of like_regex. Yes, it would be a change > between beta1 and final release, but that doesn't seem unheard-of. > In contrast, if such a distinction is not added now, we know that will > be a headache for any future effort to more closely conform to the > standard. Whether such a future effort seems near-term or far off, it > doesn't seem strategic to make current choices that avoidably make it > harder. Just to not leave this thread hanging --- the discussion was picked up in this other thread: https://www.postgresql.org/message-id/flat/CAPpHfdvDci4iqNF9fhRkTqhe-5_8HmzeLt56drH%2B_Rv2rNRqfg%40mail.gmail.com and I think we've come to the conclusion that the only really awful regex compatibility problem is differing interpretations of the 'x' flag, which we solved temporarily by treating that as unimplemented in jsonpath. There are some other unimplemented features that we can consider adding later, too. (Fortunately, Spencer's engine throws errors for all of those, so adding them won't create new compatibility issues.) And we did add some documentation: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0a97edb12ec44f8d2d8828cbca6dd7639408ac88 There remains the question of whether we should do something like requiring a "pg" prefix to allow access to the other nonstandard features we added to jsonpath. I see the point that the SQL committee might well add something pretty similar in future. But I'm not too concerned about that, on two grounds: (1) the same argument could be raised against *every* non-spec feature we have or ever will have; (2) now that Peter's in on SQL committee deliberations, we have a chance to push for any future spec changes to not be unnecessarily incompatible. So my inclination is to close this open item as sufficiently done, once the minor lexer issues raised in the other thread are done. regards, tom lane
On 09/19/19 18:35, Tom Lane wrote: > There remains the question of whether we should do something like > requiring a "pg" prefix to allow access to the other nonstandard > features we added to jsonpath. I see the point that the SQL committee > might well add something pretty similar in future. But I'm not too > concerned about that, on two grounds: (1) the same argument could be > raised against *every* non-spec feature we have or ever will have; This should not be read as a violent objection, but I do think that point (1) glosses over a, well, possibly salient difference in likelihood: Sure, against *every* non-spec feature we have or ever will have, someone /could/ raise a generic "what if SQL committee might add something pretty similar in future". But what we have in this case are specific non-spec features (array, map, and sequence constructors, lambdas, map/fold/reduce, user-defined functions) that are flat-out already present in the current version of the language that the SQL committee is clearly modeling jsonpath on. That might raise the likelihood of collision in this case above its usual, universal cosmic background level. Regards, -Chap
Chapman Flack <chap@anastigmatix.net> writes: > Sure, against *every* non-spec feature we have or ever will have, someone > /could/ raise a generic "what if SQL committee might add something pretty > similar in future". > But what we have in this case are specific non-spec features (array, map, > and sequence constructors, lambdas, map/fold/reduce, user-defined > functions) that are flat-out already present in the current version of > the language that the SQL committee is clearly modeling jsonpath on. Sure. But we also modeled those features on the same language that the committee is looking at (or at least I sure hope we did). So it's reasonable to assume that they would come out at the same spot without any prompting. And we can prompt them ;-). regards, tom lane
On 2019-09-20 01:14, Tom Lane wrote: > Chapman Flack <chap@anastigmatix.net> writes: >> Sure, against *every* non-spec feature we have or ever will have, someone >> /could/ raise a generic "what if SQL committee might add something pretty >> similar in future". >> But what we have in this case are specific non-spec features (array, map, >> and sequence constructors, lambdas, map/fold/reduce, user-defined >> functions) that are flat-out already present in the current version of >> the language that the SQL committee is clearly modeling jsonpath on. > > Sure. But we also modeled those features on the same language that the > committee is looking at (or at least I sure hope we did). So it's > reasonable to assume that they would come out at the same spot without > any prompting. And we can prompt them ;-). Also, I understand these are features proposed for PG13, not in PG12. So while this is an important discussion, it's not relevant to the PG12 release, right? (If so, I'm content to close these open items.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2019-09-20 01:14, Tom Lane wrote: >> Sure. But we also modeled those features on the same language that the >> committee is looking at (or at least I sure hope we did). So it's >> reasonable to assume that they would come out at the same spot without >> any prompting. And we can prompt them ;-). > Also, I understand these are features proposed for PG13, not in PG12. > So while this is an important discussion, it's not relevant to the PG12 > release, right? > (If so, I'm content to close these open items.) I took a quick look to compare our jsonpath documentation with ISO/IEC TR_19075-6_2017 (I did *not* try to see if the code agrees with the docs ;-)). As far as I can see, everything described in our docs appears in the TR, with the exception of two things that are already documented as Postgres extensions: 1. Recursive (multilevel) wildcards, ie .** and .**{level [to level]} accessors, per table 8.25. 2. We allow a path expression to be a Boolean predicate, although the TR allows predicates only in filters, per example in 9.16.1: '$.track.segments[*].HR < 70' (It's not exactly clear to me why this syntax is necessary; what's it do that you can't do more verbosely with a filter?) I have no opinion on whether we're opening ourselves to significant spec-compliance risks through these two features. I am, however, unexcited about adding some kind of "PG only" marker to the language, for a couple of reasons. First, I really doubt that a single boolean flag would get us far in terms of dealing with future compliance issues. As soon as we have two extension features (i.e., already) we have the question of what happens if one gets standardized and the other doesn't; and that risk gets bigger if we're going to add half a dozen more things. Second, we've procrastinated too long and thereby effectively made a decision already. At this point I don't see how we could push in any such change without delaying the release. So my vote at this point is "ship it as-is". regards, tom lane