Avoiding possible future conformance headaches in JSON work - Mailing list pgsql-hackers

From Chapman Flack
Subject Avoiding possible future conformance headaches in JSON work
Date
Msg-id 5CF28EA0.80902@anastigmatix.net
Whole thread Raw
Responses Re: Avoiding possible future conformance headaches in JSON work
Re: Avoiding possible future conformance headaches in JSON work
List pgsql-hackers
Hi,

We had a short conversation about this on Friday but I didn't have time
to think of a constructive suggestion, and now I've had more time to
think about it.

Regarding the proposed PG 13 jsonpath extensions (array, map, and
sequence constructors, lambdas, map/fold/reduce, user-defined
functions), literally all this stuff is in XPath/XQuery 3.1, and
clearly the SQL committee is imitating XPath/XQuery in the design
of jsonpath.

Therefore it would not be surprising at all if the committee eventually
adds those features in jsonpath. At that point, if the syntax matches
what we've added, we are happy, and if not, we have a multi-year,
multi-release, standard_conforming_strings-style headache.

So, a few ideas fall out....

First, with Peter being a participant, if there are any rumblings in the
SQL committee about adding those features, we should know the proposed
syntax as soon as we can and try to follow that.

If such rumblings are entirely absent, we should see what we can do to
start some, proposing the syntax we've got.

In either case, perhaps we should immediately add a way to identify a
jsonpath as being PostgreSQL-extended. Maybe a keyword 'pg' that can
be accepted at the start in addition to any lax/strict, so you could
have 'pg lax $.map(x => x + 10)'.

If we initially /require/ 'pg' for the extensions to be recognized, then
we can relax the requirement for whichever ones later appear in the spec
using the same syntax. If they appear in the spec with a different
syntax, then by requiring 'pg' already for our variant, we already have
avoided the standard_conforming_strings kind of multi-release
reconciliation effort.

In the near term, there is already one such potential conflict in
12beta: the like_regex using POSIX REs instead of XQuery ones as the
spec requires. Of course we don't currently have an XQuery regex
engine, but if we ever have one, we then face a headache if we want to
move jsonpath toward using it. (Ties in to conversation [1].)

Maybe we could avoid that by recognizing now an extra P in flags, to
specify a POSIX re. Or, as like_regex has a named-parameter-like
syntax--like_regex("abc" flag "i")--perhaps 'posix' should just be
an extra keyword in that grammar: like-regex("abc" posix). That would
be safe from the committee adding a P flag that means something else.

The conservative approach would be to simply require the 'posix' keyword
in all cases now, simply because we don't have the XQuery regex engine.

Alternatively, if there's a way to analyze a regex for the use of any
constructs with different meanings in POSIX and XQuery REs (and if
that's appreciably easier than writing an XQuery regex engine), then
the 'posix' keyword could be required only when it matters. But the
conservative approach sounds easier, and sufficient. The finer-grained
analysis would have to catch not just constructs that are in one RE
style and not the other, but any subtleties in semantics, and I
certainly wouldn't trust myself to write that.

-Chap


[1]
https://www.postgresql.org/message-id/5CF2754F.7000702%40anastigmatix.net



pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: PostgreSQL vs SQL/XML Standards
Next
From: Floris Van Nee
Date:
Subject: Re: Index Skip Scan