Thread: [MASSMAIL]❓ JSON Path Dot Precedence
Hello Hackers, A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers: > Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON in someminor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON. In other words, this is valid: david=# select '2.'::jsonpath; jsonpath ---------- 2 But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should thatbe parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followedby the path segment “p10”? Here’s the parser’s answer: david=# select '0x2.p10'::jsonpath; jsonpath ----------- (2)."p10" So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimalexpression. Is that intentional/correct? Discovered while writing my Go lexer and throwing all of Go’s floating point literal examples[2] at it and comparing to thePostgres path parser. Curiously, this is only an issue for 0x/0o/0b numeric expressions; a decimal expression does notbehave in the same way: david=# select '2.p10'::jsonpath; ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input LINE 1: select '2.p10'::jsonpath; Which maybe seems a bit inconsistent. Thoughts on what the “correct” behavior should be? Best, David [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH [2]: https://tip.golang.org/ref/spec#Floating-point_literals
On 2024-04-07 18:13 +0200, David E. Wheeler wrote: > A question about the behavior of the JSON Path parser. The docs[1] > have this to say about numbers: > > > Numeric literals in SQL/JSON path expressions follow JavaScript > > rules, which are different from both SQL and JSON in some minor > > details. For example, SQL/JSON path allows .1 and 1., which are > > invalid in JSON. > > In other words, this is valid: > > david=# select '2.'::jsonpath; > jsonpath > ---------- > 2 > > But this feature creates a bit of a conflict with the use of a dot for > path expressions. Consider `0x2.p10`. How should that be parsed? As an > invalid decimal expression ("trailing junk after numeric literal”), or > as a valid integer 2 followed by the path segment “p10”? Here’s the > parser’s answer: > > david=# select '0x2.p10'::jsonpath; > jsonpath > ----------- > (2)."p10" > > So it would seem that, other things being equal, a path key expression > (`.foo`) is slightly higher precedence than a decimal expression. Is > that intentional/correct? I guess jsonpath assumes that hex, octal, and binary literals are integers. So there's no ambiguity about any fractional part that might follow. > Discovered while writing my Go lexer and throwing all of Go’s floating > point literal examples[2] at it and comparing to the Postgres path > parser. Curiously, this is only an issue for 0x/0o/0b numeric > expressions; a decimal expression does not behave in the same way: > > david=# select '2.p10'::jsonpath; > ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input > LINE 1: select '2.p10'::jsonpath; It scans the decimal "2." and then finds junks "p10". Works with a full decimal: test=# select '3.14.p10'::jsonpath; jsonpath -------------- (3.14)."p10" (1 row) And with extra whitespace to resolve the ambiguity: test=# select '2 .p10'::jsonpath; jsonpath ----------- (2)."p10" (1 row) > Which maybe seems a bit inconsistent. > > Thoughts on what the “correct” behavior should be? I'd say a member accessor after a number doesn't really make sense because object keys are strings. One could argue that path "$.2.p10" should match JSON '{"2":{"p10":42}}', i.e. the numeric accessor is converted to a string. For example, in nodejs I can do: > var x = {2: {p10: 42}} > x[2].p10 42 But that's JavaScript, not JSON. Also, is there even a use case for path "0x2.p10"? The path has to start with "$" or ("@" in case of a filter expression), doesn't it? And it that case it doesn't parse: test=# select '$.0x2.p10'::jsonpath; ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath input LINE 1: select '$.0x2.p10'::jsonpath; Even with extra whitespace: test=# select '$ . 0x2 . p10'::jsonpath; ERROR: syntax error at or near "0x2" of jsonpath input LINE 1: select '$ . 0x2 . p10'::jsonpath; Or should it behave like an array accessor? Similar to: test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath); jsonb_path_query ------------------ 42 (1 row) > [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH > [2]: https://tip.golang.org/ref/spec#Floating-point_literals -- Erik
On Apr 7, 2024, at 15:46, Erik Wienhold <ewie@ewie.name> wrote: > I guess jsonpath assumes that hex, octal, and binary literals are > integers. So there's no ambiguity about any fractional part that might > follow. Yeah, that’s what the comment in the flex file says: https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105 > Also, is there even a use case for path "0x2.p10"? The path has to > start with "$" or ("@" in case of a filter expression), doesn't it? And > it that case it doesn't parse: > > test=# select '$.0x2.p10'::jsonpath; > ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath input > LINE 1: select '$.0x2.p10'::jsonpath; > > Even with extra whitespace: > > test=# select '$ . 0x2 . p10'::jsonpath; > ERROR: syntax error at or near "0x2" of jsonpath input > LINE 1: select '$ . 0x2 . p10'::jsonpath; > > Or should it behave like an array accessor? Similar to: > > test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath); > jsonb_path_query > ------------------ > 42 > (1 row) I too am curious why these parse successfully, but don’t appear to be useful. Best, David
On 07.04.24 18:13, David E. Wheeler wrote: > Hello Hackers, > > A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers: > >> Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON insome minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON. > > In other words, this is valid: > > david=# select '2.'::jsonpath; > jsonpath > ---------- > 2 > > But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How shouldthat be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followedby the path segment “p10”? Here’s the parser’s answer: > > david=# select '0x2.p10'::jsonpath; > jsonpath > ----------- > (2)."p10" > > So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimalexpression. Is that intentional/correct? I think the derivation would be like this: (I'm not sure what the top-level element would be, so let's start somewhere in the middle ...) <JSON unary expression> ::= <JSON accessor expression> <JSON accessor expression> ::= <JSON path primary> <JSON accessor op> <JSON path primary> ::= <JSON path literal> <JSON accessor op> ::= <JSON member accessor> <JSON member accessor> ::= <period> <JSON path key name> So the whole thing is <JSON path literal> <period> <JSON path key name> The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1. 0x2 is a HexIntegerLiteral. (There can be no dots in that.) p10 is an Identifier. So I think this is all correct.
On Apr 10, 2024, at 10:29, Peter Eisentraut <peter@eisentraut.org> wrote: > So the whole thing is > > <JSON path literal> <period> <JSON path key name> > > The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1. > > 0x2 is a HexIntegerLiteral. (There can be no dots in that.) > > p10 is an Identifier. > > So I think this is all correct. That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: whatpath can `'0x2.p10` even select? Best, David
Hi, following up on some old threads. > On Apr 10, 2024, at 16:44, David E. Wheeler <david@justatheory.com> wrote: > > That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: whatpath can `'0x2.p10` even select? I’m wondering whether the jsonpath parser should be updated to reject cases like this. I think it will always return no results.AFAICT, there’s no way to navigate to an object identifier immediately after a number: david=# select '0x2.p10'::jsonpath; jsonpath ----------- (2)."p10" (1 row) david=# select jsonb_path_query(target => '[0, 1, {"p10": true}]', path => '0x2.p10'); jsonb_path_query ------------------ (0 rows) david=# select jsonb_path_query(target => '{"0x2": {"p10": true}}', path => '0x2.p10'); jsonb_path_query ------------------ (0 rows) It’s just inherently meaningless. BTW, it’s not limited to hex numbers: david=# select '(2).p10'::jsonpath; jsonpath ----------- (2)."p10" OTOH, maybe that’s a corner case we can live with. Best, David
On Mon, Jul 8, 2024 at 8:27 AM David E. Wheeler <david@justatheory.com> wrote:
Hi, following up on some old threads.
> On Apr 10, 2024, at 16:44, David E. Wheeler <david@justatheory.com> wrote:
>
> That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select?
I’m wondering whether the jsonpath parser should be updated to reject cases like this. I think it will always return no results. AFAICT, there’s no way to navigate to an object identifier immediately after a number:
If we go down this path wouldn't the correct framing be: do not allow accessors after scalars ? The same argument applies to false/"john" and other scalar types since by definition none of them have subcomponents to be accessed.
That said, the parser has a lax mode which somewhat implies it doesn't expect the jsonpath type to perform much in the way of validation of the semantic correctness of the encoded path expression.
I like the idea of a smarter expression-holding type and would even wish to have had this on day one. Retrofitting is less appealing. We document a similarity with regular expressions here where we, for better and worse, have lived without a regexppath data type forever and leave it to the executor to tell the user their pattern is invalid. Leaning on that precedence here makes accepting the status quo more reasonable. Though strict/lax modes and, I think, variables, motivates me to put my vote toward the "do more validation" group.
Does the standard even have a separate type here or is that our implementation detail invention?
David J.
On Jul 8, 2024, at 12:05, David G. Johnston <david.g.johnston@gmail.com> wrote: > If we go down this path wouldn't the correct framing be: do not allow accessors after scalars ? The same argument appliesto false/"john" and other scalar types since by definition none of them have subcomponents to be accessed. Yes, excellent point. > That said, the parser has a lax mode which somewhat implies it doesn't expect the jsonpath type to perform much in theway of validation of the semantic correctness of the encoded path expression. My understanding is that lax mode means it ignores where the JSON doesn’t abide by expectations of the path expression, notthat the path parsing is lax. > I like the idea of a smarter expression-holding type and would even wish to have had this on day one. Retrofitting isless appealing. We document a similarity with regular expressions here where we, for better and worse, have lived withouta regexppath data type forever and leave it to the executor to tell the user their pattern is invalid. Leaning onthat precedence here makes accepting the status quo more reasonable. Though strict/lax modes and, I think, variables,motivates me to put my vote toward the "do more validation" group. This feels different from a documented difference in behavior as an implementation choice, like path regex vs. Spencer. Inthis case, the expression is technically meaningless, but there’s never so much as an error thrown. > Does the standard even have a separate type here or is that our implementation detail invention? Sorry, separate type for what? Best, David
On Mon, Jul 8, 2024 at 9:12 AM David E. Wheeler <david@justatheory.com> wrote:
On Jul 8, 2024, at 12:05, David G. Johnston <david.g.johnston@gmail.com> wrote:
> Does the standard even have a separate type here or is that our implementation detail invention?
Sorry, separate type for what?
We created a data type named: jsonpath. Does the standard actually have that data type and defined parsing behavior or does it just have functions where one of the inputs is text whose contents are a path expression?
David J.
On Jul 8, 2024, at 12:17, David G. Johnston <david.g.johnston@gmail.com> wrote: > We created a data type named: jsonpath. Does the standard actually have that data type and defined parsing behavior ordoes it just have functions where one of the inputs is text whose contents are a path expression? Ah, got it. D