Re: Q: Escapes in jsonpath Idents - Mailing list pgsql-hackers

From Erik Wienhold
Subject Re: Q: Escapes in jsonpath Idents
Date
Msg-id 2udxavv5raiz4kgire33io7xr7gtf4d7fwvv6oerqia5gjpwp5@gebfvx6k342j
Whole thread Raw
In response to Re: Q: Escapes in jsonpath Idents  ("David E. Wheeler" <david@justatheory.com>)
Responses Re: Q: Escapes in jsonpath Idents
Re: Q: Escapes in jsonpath Idents
List pgsql-hackers
On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
> On Mar 17, 2024, at 15:12, Erik Wienhold <ewie@ewie.name> wrote:
> > So I think it makes sense to reword the entire backslash part of the
> > paragraph and remove references to JSON entirely.  The attached patch
> > does that and also formats the backslash escapes as a bulleted list for
> > readability.
> 
> Ah, it’s JavaScript format, not JSON! This does clarify things quite
> nicely, thank you. Happy to add my review once it’s in a commit fest.

Thanks.  https://commitfest.postgresql.org/48/4899/

> > The first case ($.$foo) is in line with the restriction on member
> > accessors that you quoted first.
> 
> Huh, that’s now how I read it. Here it is again:
> 
> >> Member accessor that returns an object member with the specified
> >> key. If the key name matches some named variable starting with $ or
> >> does not meet the JavaScript rules for an identifier, it must be
> >> enclosed in double quotes to make it a string literal.
> 
> 
> Note that in my example `$foo` does not match a variable. I mean it
> looks like a variable, but none is used here. I guess it’s being
> conservative because it might be used in one of the functions, like
> jsonb_path_exists(), to which variables might be passed.

I had the same reasoning while writing my first reply but scrapped that
part because I found it obvious:  That jsonpath is parsed before calling
jsonb_path_exists() and therefore the parser has no context about any
variables, which might not even be hardcoded but may result from a
query.

> > The error message 'syntax error at or near "$oo" of jsonpath input' for
> > the second case ($.f$oo), however, looks as if the scanner identifies
> > '$oo' as a variable instead of contiuing the scan of identifier (f$oo)
> > for the member accessor.  Looks like a bug to me because a variable
> > doesn't even make sense in that place.
> 
> Right. Maybe the docs should be updated to say that a literal dollar
> sign isn’t supported in identifiers, unlike in JavaScript, except
> through escapes like this:

Unfortunately, I don't have access to that part of the SQL spec.  So I
don't know how the jsonpath grammar is specified.

I had a look into Oracle, MySQL, and SQLite docs to see what they
implement:

* Oracle requires the unquoted field names to match [A-Za-z][A-Za-z0-9]*
  (see "object steps").  It also supports variables.
  https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-path-expressions.html

* MySQL refers to ECMAScript identifiers but does not say anything about
  variables: https://dev.mysql.com/doc/refman/8.3/en/json.html#json-path-syntax

* SQLite skimps on details and does not document a grammar:
  https://sqlite.org/json1.html#path_arguments
  But it looks as if it strives for compatibility with MySQL and our dear
  Postgres: https://sqlite.org/src/doc/json-in-core/doc/json-enhancements.md

Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
insights but haven't found any yet.

-- 
Erik



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Add pg_basetype() function to obtain a DOMAIN base type
Next
From: Tom Lane
Date:
Subject: Re: Support json_errdetail in FRONTEND builds