Thread: Avoiding possible future conformance headaches in JSON work

Avoiding possible future conformance headaches in JSON work

From
Chapman Flack
Date:
Hi,

We had a short conversation about this on Friday but I didn't have time
to think of a constructive suggestion, and now I've had more time to
think about it.

Regarding the proposed PG 13 jsonpath extensions (array, map, and
sequence constructors, lambdas, map/fold/reduce, user-defined
functions), literally all this stuff is in XPath/XQuery 3.1, and
clearly the SQL committee is imitating XPath/XQuery in the design
of jsonpath.

Therefore it would not be surprising at all if the committee eventually
adds those features in jsonpath. At that point, if the syntax matches
what we've added, we are happy, and if not, we have a multi-year,
multi-release, standard_conforming_strings-style headache.

So, a few ideas fall out....

First, with Peter being a participant, if there are any rumblings in the
SQL committee about adding those features, we should know the proposed
syntax as soon as we can and try to follow that.

If such rumblings are entirely absent, we should see what we can do to
start some, proposing the syntax we've got.

In either case, perhaps we should immediately add a way to identify a
jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can
be accepted at the start in addition to any lax/strict, so you could
have 'pg lax $.map(x => x + 10)'.

If we initially /require/ 'pg' for the extensions to be recognized, then
we can relax the requirement for whichever ones later appear in the spec
using the same syntax. If they appear in the spec with a different
syntax, then by requiring 'pg' already for our variant, we already have
avoided the standard_conforming_strings kind of multi-release
reconciliation effort.

In the near term, there is already one such potential conflict in
12beta: the like_regex using POSIX REs instead of XQuery ones as the
spec requires. Of course we don't currently have an XQuery regex
engine, but if we ever have one, we then face a headache if we want to
move jsonpath toward using it. (Ties in to conversation [1].)

Maybe we could avoid that by recognizing now an extra P in flags, to
specify a POSIX re. Or, as like_regex has a named-parameter-like
syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be
an extra keyword in that grammar: like-regex("abc" posix). That would
be safe from the committee adding a P flag that means something else.

The conservative approach would be to simply require the 'posix' keyword
in all cases now, simply because we don't have the XQuery regex engine.

Alternatively, if there's a way to analyze a regex for the use of any
constructs with different meanings in POSIX and XQuery REs (and if
that's appreciably easier than writing an XQuery regex engine), then
the 'posix' keyword could be required only when it matters. But the
conservative approach sounds easier, and sufficient. The finer-grained
analysis would have to catch not just constructs that are in one RE
style and not the other, but any subtleties in semantics, and I
certainly wouldn't trust myself to write that.

-Chap


[1]
https://www.postgresql.org/message-id/5CF2754F.7000702%40anastigmatix.net



Re: Avoiding possible future conformance headaches in JSON work

From
Alvaro Herrera
Date:
On 2019-Jun-01, Chapman Flack wrote:

> In either case, perhaps we should immediately add a way to identify a
> jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can
> be accepted at the start in addition to any lax/strict, so you could
> have 'pg lax $.map(x => x + 10)'.
> 
> If we initially /require/ 'pg' for the extensions to be recognized, then
> we can relax the requirement for whichever ones later appear in the spec
> using the same syntax. If they appear in the spec with a different
> syntax, then by requiring 'pg' already for our variant, we already have
> avoided the standard_conforming_strings kind of multi-release
> reconciliation effort.

I agree we should do this (or something similar) now, to avoid future
pain.  It seems a similar problem to E'' strings vs. SQL-standard
''-ones, which was a painful transition.  We have an opportunity to do
better this time.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Avoiding possible future conformance headaches in JSON work

From
Oleg Bartunov
Date:


On Sat, 1 Jun 2019, 16:41 Chapman Flack, <chap@anastigmatix.net> wrote:
Hi,

We had a short conversation about this on Friday but I didn't have time
to think of a constructive suggestion, and now I've had more time to
think about it.

Regarding the proposed PG 13 jsonpath extensions (array, map, and
sequence constructors, lambdas, map/fold/reduce, user-defined
functions), literally all this stuff is in XPath/XQuery 3.1, and
clearly the SQL committee is imitating XPath/XQuery in the design
of jsonpath.

Therefore it would not be surprising at all if the committee eventually
adds those features in jsonpath. At that point, if the syntax matches
what we've added, we are happy, and if not, we have a multi-year,
multi-release, standard_conforming_strings-style headache.

So, a few ideas fall out....

First, with Peter being a participant, if there are any rumblings in the
SQL committee about adding those features, we should know the proposed
syntax as soon as we can and try to follow that.

AFAIK, there is rumour about 'native json data type' and 'dot style syntax' for json, but not about jsonpath.


If such rumblings are entirely absent, we should see what we can do to
start some, proposing the syntax we've got.

In either case, perhaps we should immediately add a way to identify a
jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can
be accepted at the start in addition to any lax/strict, so you could
have 'pg lax $.map(x => x + 10)'.

This is exactly what we were thinking about !

If we initially /require/ 'pg' for the extensions to be recognized, then
we can relax the requirement for whichever ones later appear in the spec
using the same syntax. If they appear in the spec with a different
syntax, then by requiring 'pg' already for our variant, we already have
avoided the standard_conforming_strings kind of multi-release
reconciliation effort.

In the near term, there is already one such potential conflict in
12beta: the like_regex using POSIX REs instead of XQuery ones as the
spec requires. Of course we don't currently have an XQuery regex
engine, but if we ever have one, we then face a headache if we want to
move jsonpath toward using it. (Ties in to conversation [1].)

Maybe we could avoid that by recognizing now an extra P in flags, to
specify a POSIX re. Or, as like_regex has a named-parameter-like
syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be
an extra keyword in that grammar: like-regex("abc" posix). That would
be safe from the committee adding a P flag that means something else.

The conservative approach would be to simply require the 'posix' keyword
in all cases now, simply because we don't have the XQuery regex engine.

Alternatively, if there's a way to analyze a regex for the use of any
constructs with different meanings in POSIX and XQuery REs (and if
that's appreciably easier than writing an XQuery regex engine), then
the 'posix' keyword could be required only when it matters. But the
conservative approach sounds easier, and sufficient. The finer-grained
analysis would have to catch not just constructs that are in one RE
style and not the other, but any subtleties in semantics, and I
certainly wouldn't trust myself to write that.

We didn't think about regex, I don't know anybody working on xquery. 

Re: Avoiding possible future conformance headaches in JSON work

From
Chapman Flack
Date:
On 6/18/19 12:51 PM, Oleg Bartunov wrote:
>> have 'pg lax $.map(x => x + 10)'.
> 
> This is exactly what we were thinking about !

Perfect!

>> specify a POSIX re. Or, as like_regex has a named-parameter-like
>> syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be
>> an extra keyword in that grammar: like-regex("abc" posix). That would
>> be safe from the committee adding a P flag that means something else.
> 
> We didn't think about regex, I don't know anybody working on xquery.

I do. :)

But is that even the point? It's already noted in [1] that the standard
calls for one style of regexps and we're providing another.

It's relatively uncomplicated now to add some kind of distinguishing
syntax for our posix flavor of like_regex. Yes, it would be a change
between beta1 and final release, but that doesn't seem unheard-of.

In contrast, if such a distinction is not added now, we know that will
be a headache for any future effort to more closely conform to the
standard. Whether such a future effort seems near-term or far off, it
doesn't seem strategic to make current choices that avoidably make it
harder.

Aside: I just looked over the 12 doco to see if the note in [1] is
in there, and all I see is that 'like_regex' is documented as "Tests
pattern matching with POSIX regular expressions."

In my opinion, that ought to have a note flagging that as different
from the standard. The user experience is not so good if someone comes
assuming we conform to the standard, writes code, then has to learn
why it didn't work. The whole doc section [2] about XML is intended
to spare people from unwelcome discoveries of that sort, but it was
written after the fact. I think it's better to have it from the start.

[1]
https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#sqljson-conformance

[2] https://www.postgresql.org/docs/12/xml-limits-conformance.html



Re: Avoiding possible future conformance headaches in JSON work

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> But is that even the point? It's already noted in [1] that the standard
> calls for one style of regexps and we're providing another.

> It's relatively uncomplicated now to add some kind of distinguishing
> syntax for our posix flavor of like_regex. Yes, it would be a change
> between beta1 and final release, but that doesn't seem unheard-of.

> In contrast, if such a distinction is not added now, we know that will
> be a headache for any future effort to more closely conform to the
> standard. Whether such a future effort seems near-term or far off, it
> doesn't seem strategic to make current choices that avoidably make it
> harder.

Just to not leave this thread hanging --- the discussion was picked up
in this other thread:

https://www.postgresql.org/message-id/flat/CAPpHfdvDci4iqNF9fhRkTqhe-5_8HmzeLt56drH%2B_Rv2rNRqfg%40mail.gmail.com

and I think we've come to the conclusion that the only really awful regex
compatibility problem is differing interpretations of the 'x' flag, which
we solved temporarily by treating that as unimplemented in jsonpath.
There are some other unimplemented features that we can consider adding
later, too.  (Fortunately, Spencer's engine throws errors for all of
those, so adding them won't create new compatibility issues.)  And we did
add some documentation:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0a97edb12ec44f8d2d8828cbca6dd7639408ac88

There remains the question of whether we should do something like
requiring a "pg" prefix to allow access to the other nonstandard
features we added to jsonpath.  I see the point that the SQL committee
might well add something pretty similar in future.  But I'm not too
concerned about that, on two grounds: (1) the same argument could be
raised against *every* non-spec feature we have or ever will have;
(2) now that Peter's in on SQL committee deliberations, we have a
chance to push for any future spec changes to not be unnecessarily
incompatible.  So my inclination is to close this open item as
sufficiently done, once the minor lexer issues raised in the other
thread are done.

            regards, tom lane



Re: Avoiding possible future conformance headaches in JSON work

From
Chapman Flack
Date:
On 09/19/19 18:35, Tom Lane wrote:

> There remains the question of whether we should do something like
> requiring a "pg" prefix to allow access to the other nonstandard
> features we added to jsonpath.  I see the point that the SQL committee
> might well add something pretty similar in future.  But I'm not too
> concerned about that, on two grounds: (1) the same argument could be
> raised against *every* non-spec feature we have or ever will have;

This should not be read as a violent objection, but I do think that
point (1) glosses over a, well, possibly salient difference in likelihood:

Sure, against *every* non-spec feature we have or ever will have, someone
/could/ raise a generic "what if SQL committee might add something pretty
similar in future".

But what we have in this case are specific non-spec features (array, map,
and sequence constructors, lambdas, map/fold/reduce, user-defined
functions) that are flat-out already present in the current version of
the language that the SQL committee is clearly modeling jsonpath on.

That might raise the likelihood of collision in this case above its
usual, universal cosmic background level.

Regards,
-Chap



Re: Avoiding possible future conformance headaches in JSON work

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> Sure, against *every* non-spec feature we have or ever will have, someone
> /could/ raise a generic "what if SQL committee might add something pretty
> similar in future".
> But what we have in this case are specific non-spec features (array, map,
> and sequence constructors, lambdas, map/fold/reduce, user-defined
> functions) that are flat-out already present in the current version of
> the language that the SQL committee is clearly modeling jsonpath on.

Sure.  But we also modeled those features on the same language that the
committee is looking at (or at least I sure hope we did).  So it's
reasonable to assume that they would come out at the same spot without
any prompting.  And we can prompt them ;-).

            regards, tom lane



Re: Avoiding possible future conformance headaches in JSON work

From
Peter Eisentraut
Date:
On 2019-09-20 01:14, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> Sure, against *every* non-spec feature we have or ever will have, someone
>> /could/ raise a generic "what if SQL committee might add something pretty
>> similar in future".
>> But what we have in this case are specific non-spec features (array, map,
>> and sequence constructors, lambdas, map/fold/reduce, user-defined
>> functions) that are flat-out already present in the current version of
>> the language that the SQL committee is clearly modeling jsonpath on.
> 
> Sure.  But we also modeled those features on the same language that the
> committee is looking at (or at least I sure hope we did).  So it's
> reasonable to assume that they would come out at the same spot without
> any prompting.  And we can prompt them ;-).

Also, I understand these are features proposed for PG13, not in PG12.
So while this is an important discussion, it's not relevant to the PG12
release, right?

(If so, I'm content to close these open items.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Avoiding possible future conformance headaches in JSON work

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-09-20 01:14, Tom Lane wrote:
>> Sure.  But we also modeled those features on the same language that the
>> committee is looking at (or at least I sure hope we did).  So it's
>> reasonable to assume that they would come out at the same spot without
>> any prompting.  And we can prompt them ;-).

> Also, I understand these are features proposed for PG13, not in PG12.
> So while this is an important discussion, it's not relevant to the PG12
> release, right?
> (If so, I'm content to close these open items.)

I took a quick look to compare our jsonpath documentation with
ISO/IEC TR_19075-6_2017 (I did *not* try to see if the code agrees
with the docs ;-)).  As far as I can see, everything described in
our docs appears in the TR, with the exception of two things
that are already documented as Postgres extensions:

1. Recursive (multilevel) wildcards, ie .** and .**{level [to level]}
accessors, per table 8.25.

2. We allow a path expression to be a Boolean predicate, although the TR
allows predicates only in filters, per example in 9.16.1:
    '$.track.segments[*].HR < 70'
(It's not exactly clear to me why this syntax is necessary; what's
it do that you can't do more verbosely with a filter?)

I have no opinion on whether we're opening ourselves to significant
spec-compliance risks through these two features.  I am, however,
unexcited about adding some kind of "PG only" marker to the language,
for a couple of reasons.  First, I really doubt that a single boolean
flag would get us far in terms of dealing with future compliance
issues.  As soon as we have two extension features (i.e., already)
we have the question of what happens if one gets standardized and
the other doesn't; and that risk gets bigger if we're going to add
half a dozen more things.  Second, we've procrastinated too long
and thereby effectively made a decision already.  At this point
I don't see how we could push in any such change without delaying
the release.

So my vote at this point is "ship it as-is".

            regards, tom lane