Re: Q: Escapes in jsonpath Idents - Mailing list pgsql-hackers

From Erik Wienhold
Subject Re: Q: Escapes in jsonpath Idents
Date
Msg-id hozm6ofykt3bup7xaequgpf47y5hf2xca55rurazdt2gvhqn7u@d4gnloqbuo53
Whole thread Raw
In response to Q: Escapes in jsonpath Idents  ("David E. Wheeler" <david@justatheory.com>)
Responses Re: Q: Escapes in jsonpath Idents
Re: Q: Escapes in jsonpath Idents
List pgsql-hackers
Hi David,

On 2024-03-16 19:39 +0100, David E. Wheeler wrote:
> The jsonpath doc[1] has an excellent description of the format of
> strings, but for unquoted path keys, it simply says:
> 
> > Member accessor that returns an object member with the specified
> > key. If the key name matches some named variable starting with $ or
> > does not meet the JavaScript rules for an identifier, it must be
> > enclosed in double quotes to make it a string literal.
> 
> I went looking for the JavaScript rules for an identifier and found
> this in the MDN docs[2]:
> 
> > In JavaScript, identifiers can contain Unicode letters, $, _, and
> > digits (0-9), but may not start with a digit. An identifier differs
> > from a string in that a string is data, while an identifier is part
> > of the code. In JavaScript, there is no way to convert identifiers
> > to strings, but sometimes it is possible to parse strings into
> > identifiers.
> 
> 
> However, the Postgres parsing of jsonpath keys appears to follow the
> same rules as strings, allowing backslash escapes:
> 
> david=# select '$.fo\u00f8 == $x'::jsonpath;
>      jsonpath       -------------------
>  ($."foø" == $"x")
> 
> This would seem to contradict the documentation. Is this behavior
> required by the SQL standard? Do the docs need updating? Or should the
> code actually follow the JSON identifier behavior?

That quoted MDN page does not give the whole picture.  ECMAScript and JS
do allow Unicode escape sequences in identifier names:

https://262.ecma-international.org/#sec-identifier-names
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers

> PS: Those excellent docs on strings mentions support for \v, but the
> grammar in the right nav of https://www.json.org/json-en.html does
> not. Another bonus feature?

You refer to that sentence: "Other special backslash sequences include
those recognized in JSON strings: \b, \f, \n, \r, \t, \v for various
ASCII control characters, and \uNNNN for a Unicode character identified
by its 4-hex-digit code point."

Mentioning JSON and \v in the same sentence is wrong: JavaScript allows
that escape in strings but JSON doesn't.  I think the easiest is to just
replace "JSON" with "JavaScript" in that sentence to make it right.  The
paragraph also already says "embedded string literals follow JavaScript/
ECMAScript conventions", so mentioning JSON seems unnecessary to me.

The last sentence also mentions backslash escapes \xNN and \u{N...} as
deviations from JSON when in fact those are valid escape sequences from
ECMA-262: https://262.ecma-international.org/#prod-HexEscapeSequence
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely.  The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.

> [1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
> [2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier

On 2024-03-16 21:33 +0100, David E. Wheeler wrote:
> On Mar 16, 2024, at 14:39, David E. Wheeler <david@justatheory.com>
> wrote:
> 
> > I went looking for the JavaScript rules for an identifier and found
> > this in the MDN docs[2]:
> > 
> >> In JavaScript, identifiers can contain Unicode letters, $, _, and
> >> digits (0-9), but may not start with a digit. An identifier differs
> >> from a string in that a string is data, while an identifier is part
> >> of the code. In JavaScript, there is no way to convert identifiers
> >> to strings, but sometimes it is possible to parse strings into
> >> identifiers.
> 
> Coda: Dollar signs don’t work at all outside double-quoted string
> identifiers:
> 
> david=# select '$.$foo'::jsonpath;
> ERROR:  syntax error at or near "$foo" of jsonpath input
> LINE 1: select '$.$foo'::jsonpath;
>                ^
> 
> david=# select '$.f$oo'::jsonpath;
> ERROR:  syntax error at or near "$oo" of jsonpath input
> LINE 1: select '$.f$oo'::jsonpath;
>                ^
> 
> david=# select '$."$foo"'::jsonpath;
>  jsonpath 
> ----------
>  $."$foo"
> 
> This, too, contradicts the MDM definition an identifier (and some
> quick browser tests).

The first case ($.$foo) is in line with the restriction on member
accessors that you quoted first.

The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor.  Looks like a bug to me because a variable
doesn't even make sense in that place.

What works though, besides double quoting, is escaping the dollar sign:

    regress=# select '$.\u0024foo'::jsonpath;
     jsonpath
    ----------
     $."$foo"
    (1 row)

And we've come full circle :)

-- 
Erik

Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Autogenerate some wait events code and documentation
Next
From: Tomas Vondra
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring