Thread: [MASSMAIL]❓ JSON Path Dot Precedence
Hello Hackers, A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers: > Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON in someminor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON. In other words, this is valid: david=# select '2.'::jsonpath; jsonpath ---------- 2 But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should thatbe parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followedby the path segment “p10”? Here’s the parser’s answer: david=# select '0x2.p10'::jsonpath; jsonpath ----------- (2)."p10" So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimalexpression. Is that intentional/correct? Discovered while writing my Go lexer and throwing all of Go’s floating point literal examples[2] at it and comparing to thePostgres path parser. Curiously, this is only an issue for 0x/0o/0b numeric expressions; a decimal expression does notbehave in the same way: david=# select '2.p10'::jsonpath; ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input LINE 1: select '2.p10'::jsonpath; Which maybe seems a bit inconsistent. Thoughts on what the “correct” behavior should be? Best, David [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH [2]: https://tip.golang.org/ref/spec#Floating-point_literals
On 2024-04-07 18:13 +0200, David E. Wheeler wrote: > A question about the behavior of the JSON Path parser. The docs[1] > have this to say about numbers: > > > Numeric literals in SQL/JSON path expressions follow JavaScript > > rules, which are different from both SQL and JSON in some minor > > details. For example, SQL/JSON path allows .1 and 1., which are > > invalid in JSON. > > In other words, this is valid: > > david=# select '2.'::jsonpath; > jsonpath > ---------- > 2 > > But this feature creates a bit of a conflict with the use of a dot for > path expressions. Consider `0x2.p10`. How should that be parsed? As an > invalid decimal expression ("trailing junk after numeric literal”), or > as a valid integer 2 followed by the path segment “p10”? Here’s the > parser’s answer: > > david=# select '0x2.p10'::jsonpath; > jsonpath > ----------- > (2)."p10" > > So it would seem that, other things being equal, a path key expression > (`.foo`) is slightly higher precedence than a decimal expression. Is > that intentional/correct? I guess jsonpath assumes that hex, octal, and binary literals are integers. So there's no ambiguity about any fractional part that might follow. > Discovered while writing my Go lexer and throwing all of Go’s floating > point literal examples[2] at it and comparing to the Postgres path > parser. Curiously, this is only an issue for 0x/0o/0b numeric > expressions; a decimal expression does not behave in the same way: > > david=# select '2.p10'::jsonpath; > ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input > LINE 1: select '2.p10'::jsonpath; It scans the decimal "2." and then finds junks "p10". Works with a full decimal: test=# select '3.14.p10'::jsonpath; jsonpath -------------- (3.14)."p10" (1 row) And with extra whitespace to resolve the ambiguity: test=# select '2 .p10'::jsonpath; jsonpath ----------- (2)."p10" (1 row) > Which maybe seems a bit inconsistent. > > Thoughts on what the “correct” behavior should be? I'd say a member accessor after a number doesn't really make sense because object keys are strings. One could argue that path "$.2.p10" should match JSON '{"2":{"p10":42}}', i.e. the numeric accessor is converted to a string. For example, in nodejs I can do: > var x = {2: {p10: 42}} > x[2].p10 42 But that's JavaScript, not JSON. Also, is there even a use case for path "0x2.p10"? The path has to start with "$" or ("@" in case of a filter expression), doesn't it? And it that case it doesn't parse: test=# select '$.0x2.p10'::jsonpath; ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath input LINE 1: select '$.0x2.p10'::jsonpath; Even with extra whitespace: test=# select '$ . 0x2 . p10'::jsonpath; ERROR: syntax error at or near "0x2" of jsonpath input LINE 1: select '$ . 0x2 . p10'::jsonpath; Or should it behave like an array accessor? Similar to: test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath); jsonb_path_query ------------------ 42 (1 row) > [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH > [2]: https://tip.golang.org/ref/spec#Floating-point_literals -- Erik
On Apr 7, 2024, at 15:46, Erik Wienhold <ewie@ewie.name> wrote: > I guess jsonpath assumes that hex, octal, and binary literals are > integers. So there's no ambiguity about any fractional part that might > follow. Yeah, that’s what the comment in the flex file says: https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105 > Also, is there even a use case for path "0x2.p10"? The path has to > start with "$" or ("@" in case of a filter expression), doesn't it? And > it that case it doesn't parse: > > test=# select '$.0x2.p10'::jsonpath; > ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath input > LINE 1: select '$.0x2.p10'::jsonpath; > > Even with extra whitespace: > > test=# select '$ . 0x2 . p10'::jsonpath; > ERROR: syntax error at or near "0x2" of jsonpath input > LINE 1: select '$ . 0x2 . p10'::jsonpath; > > Or should it behave like an array accessor? Similar to: > > test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath); > jsonb_path_query > ------------------ > 42 > (1 row) I too am curious why these parse successfully, but don’t appear to be useful. Best, David
On 07.04.24 18:13, David E. Wheeler wrote: > Hello Hackers, > > A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers: > >> Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON insome minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON. > > In other words, this is valid: > > david=# select '2.'::jsonpath; > jsonpath > ---------- > 2 > > But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How shouldthat be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followedby the path segment “p10”? Here’s the parser’s answer: > > david=# select '0x2.p10'::jsonpath; > jsonpath > ----------- > (2)."p10" > > So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimalexpression. Is that intentional/correct? I think the derivation would be like this: (I'm not sure what the top-level element would be, so let's start somewhere in the middle ...) <JSON unary expression> ::= <JSON accessor expression> <JSON accessor expression> ::= <JSON path primary> <JSON accessor op> <JSON path primary> ::= <JSON path literal> <JSON accessor op> ::= <JSON member accessor> <JSON member accessor> ::= <period> <JSON path key name> So the whole thing is <JSON path literal> <period> <JSON path key name> The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1. 0x2 is a HexIntegerLiteral. (There can be no dots in that.) p10 is an Identifier. So I think this is all correct.
On Apr 10, 2024, at 10:29, Peter Eisentraut <peter@eisentraut.org> wrote: > So the whole thing is > > <JSON path literal> <period> <JSON path key name> > > The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1. > > 0x2 is a HexIntegerLiteral. (There can be no dots in that.) > > p10 is an Identifier. > > So I think this is all correct. That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: whatpath can `'0x2.p10` even select? Best, David