Re: Q: Escapes in jsonpath Idents - Mailing list pgsql-hackers
From | Erik Wienhold |
---|---|
Subject | Re: Q: Escapes in jsonpath Idents |
Date | |
Msg-id | hozm6ofykt3bup7xaequgpf47y5hf2xca55rurazdt2gvhqn7u@d4gnloqbuo53 Whole thread Raw |
In response to | Q: Escapes in jsonpath Idents ("David E. Wheeler" <david@justatheory.com>) |
Responses |
Re: Q: Escapes in jsonpath Idents
Re: Q: Escapes in jsonpath Idents |
List | pgsql-hackers |
Hi David, On 2024-03-16 19:39 +0100, David E. Wheeler wrote: > The jsonpath doc[1] has an excellent description of the format of > strings, but for unquoted path keys, it simply says: > > > Member accessor that returns an object member with the specified > > key. If the key name matches some named variable starting with $ or > > does not meet the JavaScript rules for an identifier, it must be > > enclosed in double quotes to make it a string literal. > > I went looking for the JavaScript rules for an identifier and found > this in the MDN docs[2]: > > > In JavaScript, identifiers can contain Unicode letters, $, _, and > > digits (0-9), but may not start with a digit. An identifier differs > > from a string in that a string is data, while an identifier is part > > of the code. In JavaScript, there is no way to convert identifiers > > to strings, but sometimes it is possible to parse strings into > > identifiers. > > > However, the Postgres parsing of jsonpath keys appears to follow the > same rules as strings, allowing backslash escapes: > > david=# select '$.fo\u00f8 == $x'::jsonpath; > jsonpath ------------------- > ($."foø" == $"x") > > This would seem to contradict the documentation. Is this behavior > required by the SQL standard? Do the docs need updating? Or should the > code actually follow the JSON identifier behavior? That quoted MDN page does not give the whole picture. ECMAScript and JS do allow Unicode escape sequences in identifier names: https://262.ecma-international.org/#sec-identifier-names https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers > PS: Those excellent docs on strings mentions support for \v, but the > grammar in the right nav of https://www.json.org/json-en.html does > not. Another bonus feature? You refer to that sentence: "Other special backslash sequences include those recognized in JSON strings: \b, \f, \n, \r, \t, \v for various ASCII control characters, and \uNNNN for a Unicode character identified by its 4-hex-digit code point." Mentioning JSON and \v in the same sentence is wrong: JavaScript allows that escape in strings but JSON doesn't. I think the easiest is to just replace "JSON" with "JavaScript" in that sentence to make it right. The paragraph also already says "embedded string literals follow JavaScript/ ECMAScript conventions", so mentioning JSON seems unnecessary to me. The last sentence also mentions backslash escapes \xNN and \u{N...} as deviations from JSON when in fact those are valid escape sequences from ECMA-262: https://262.ecma-international.org/#prod-HexEscapeSequence So I think it makes sense to reword the entire backslash part of the paragraph and remove references to JSON entirely. The attached patch does that and also formats the backslash escapes as a bulleted list for readability. > [1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH > [2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier On 2024-03-16 21:33 +0100, David E. Wheeler wrote: > On Mar 16, 2024, at 14:39, David E. Wheeler <david@justatheory.com> > wrote: > > > I went looking for the JavaScript rules for an identifier and found > > this in the MDN docs[2]: > > > >> In JavaScript, identifiers can contain Unicode letters, $, _, and > >> digits (0-9), but may not start with a digit. An identifier differs > >> from a string in that a string is data, while an identifier is part > >> of the code. In JavaScript, there is no way to convert identifiers > >> to strings, but sometimes it is possible to parse strings into > >> identifiers. > > Coda: Dollar signs don’t work at all outside double-quoted string > identifiers: > > david=# select '$.$foo'::jsonpath; > ERROR: syntax error at or near "$foo" of jsonpath input > LINE 1: select '$.$foo'::jsonpath; > ^ > > david=# select '$.f$oo'::jsonpath; > ERROR: syntax error at or near "$oo" of jsonpath input > LINE 1: select '$.f$oo'::jsonpath; > ^ > > david=# select '$."$foo"'::jsonpath; > jsonpath > ---------- > $."$foo" > > This, too, contradicts the MDM definition an identifier (and some > quick browser tests). The first case ($.$foo) is in line with the restriction on member accessors that you quoted first. The error message 'syntax error at or near "$oo" of jsonpath input' for the second case ($.f$oo), however, looks as if the scanner identifies '$oo' as a variable instead of contiuing the scan of identifier (f$oo) for the member accessor. Looks like a bug to me because a variable doesn't even make sense in that place. What works though, besides double quoting, is escaping the dollar sign: regress=# select '$.\u0024foo'::jsonpath; jsonpath ---------- $."$foo" (1 row) And we've come full circle :) -- Erik
Attachment
pgsql-hackers by date: