Thread: [MASSMAIL]❓ JSON Path Dot Precedence

[MASSMAIL]❓ JSON Path Dot Precedence

From
"David E. Wheeler"
Date:
Hello Hackers,

A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers:

>  Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON in
someminor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON. 

In other words, this is valid:

david=# select '2.'::jsonpath;
 jsonpath
----------
 2

But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should
thatbe parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2
followedby the path segment “p10”? Here’s the parser’s answer: 

david=# select '0x2.p10'::jsonpath;
 jsonpath
-----------
 (2)."p10"

So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a
decimalexpression. Is that intentional/correct? 

Discovered while writing my Go lexer and throwing all of Go’s floating point literal examples[2] at it and comparing to
thePostgres path parser. Curiously, this is only an issue for 0x/0o/0b numeric expressions; a decimal expression does
notbehave in the same way: 

david=# select '2.p10'::jsonpath;
ERROR:  trailing junk after numeric literal at or near "2.p" of jsonpath input
LINE 1: select '2.p10'::jsonpath;

Which maybe seems a bit inconsistent.

Thoughts on what the “correct” behavior should be?

Best,

David

  [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
  [2]: https://tip.golang.org/ref/spec#Floating-point_literals


Re: ❓ JSON Path Dot Precedence

From
Erik Wienhold
Date:
On 2024-04-07 18:13 +0200, David E. Wheeler wrote:
> A question about the behavior of the JSON Path parser. The docs[1]
> have this to say about numbers:
> 
> >  Numeric literals in SQL/JSON path expressions follow JavaScript
> >  rules, which are different from both SQL and JSON in some minor
> >  details. For example, SQL/JSON path allows .1 and 1., which are
> >  invalid in JSON.
> 
> In other words, this is valid:
> 
> david=# select '2.'::jsonpath;
>  jsonpath 
> ----------
>  2
> 
> But this feature creates a bit of a conflict with the use of a dot for
> path expressions. Consider `0x2.p10`. How should that be parsed? As an
> invalid decimal expression ("trailing junk after numeric literal”), or
> as a valid integer 2 followed by the path segment “p10”? Here’s the
> parser’s answer:
> 
> david=# select '0x2.p10'::jsonpath;
>  jsonpath  
> -----------
>  (2)."p10"
> 
> So it would seem that, other things being equal, a path key expression
> (`.foo`) is slightly higher precedence than a decimal expression. Is
> that intentional/correct?

I guess jsonpath assumes that hex, octal, and binary literals are
integers.  So there's no ambiguity about any fractional part that might
follow.

> Discovered while writing my Go lexer and throwing all of Go’s floating
> point literal examples[2] at it and comparing to the Postgres path
> parser. Curiously, this is only an issue for 0x/0o/0b numeric
> expressions; a decimal expression does not behave in the same way:
> 
> david=# select '2.p10'::jsonpath;
> ERROR:  trailing junk after numeric literal at or near "2.p" of jsonpath input
> LINE 1: select '2.p10'::jsonpath;

It scans the decimal "2." and then finds junks "p10".

Works with a full decimal:

    test=# select '3.14.p10'::jsonpath;
       jsonpath
    --------------
     (3.14)."p10"
    (1 row)

And with extra whitespace to resolve the ambiguity:

    test=# select '2 .p10'::jsonpath;
     jsonpath
    -----------
     (2)."p10"
    (1 row)

> Which maybe seems a bit inconsistent.
> 
> Thoughts on what the “correct” behavior should be?

I'd say a member accessor after a number doesn't really make sense
because object keys are strings.  One could argue that path "$.2.p10"
should match JSON '{"2":{"p10":42}}', i.e. the numeric accessor is
converted to a string.  For example, in nodejs I can do:

    > var x = {2: {p10: 42}}
    > x[2].p10
    42

But that's JavaScript, not JSON.

Also, is there even a use case for path "0x2.p10"?  The path has to
start with "$" or ("@" in case of a filter expression), doesn't it?  And
it that case it doesn't parse:

    test=# select '$.0x2.p10'::jsonpath;
    ERROR:  trailing junk after numeric literal at or near ".0x" of jsonpath input
    LINE 1: select '$.0x2.p10'::jsonpath;

Even with extra whitespace:

    test=# select '$ . 0x2 . p10'::jsonpath;
    ERROR:  syntax error at or near "0x2" of jsonpath input
    LINE 1: select '$ . 0x2 . p10'::jsonpath;

Or should it behave like an array accessor?  Similar to:

    test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath);
     jsonb_path_query
    ------------------
     42
    (1 row)

>   [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
>   [2]: https://tip.golang.org/ref/spec#Floating-point_literals

-- 
Erik



Re: ❓ JSON Path Dot Precedence

From
"David E. Wheeler"
Date:
On Apr 7, 2024, at 15:46, Erik Wienhold <ewie@ewie.name> wrote:

> I guess jsonpath assumes that hex, octal, and binary literals are
> integers.  So there's no ambiguity about any fractional part that might
> follow.

Yeah, that’s what the comment in the flex file says:

https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105


> Also, is there even a use case for path "0x2.p10"?  The path has to
> start with "$" or ("@" in case of a filter expression), doesn't it?  And
> it that case it doesn't parse:
>
>    test=# select '$.0x2.p10'::jsonpath;
>    ERROR:  trailing junk after numeric literal at or near ".0x" of jsonpath input
>    LINE 1: select '$.0x2.p10'::jsonpath;
>
> Even with extra whitespace:
>
>    test=# select '$ . 0x2 . p10'::jsonpath;
>    ERROR:  syntax error at or near "0x2" of jsonpath input
>    LINE 1: select '$ . 0x2 . p10'::jsonpath;
>
> Or should it behave like an array accessor?  Similar to:
>
>    test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath);
>     jsonb_path_query
>    ------------------
>     42
>    (1 row)

I too am curious why these parse successfully, but don’t appear to be useful.

Best,

David




Re: ❓ JSON Path Dot Precedence

From
Peter Eisentraut
Date:
On 07.04.24 18:13, David E. Wheeler wrote:
> Hello Hackers,
> 
> A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers:
> 
>>   Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON
insome minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON.
 
> 
> In other words, this is valid:
> 
> david=# select '2.'::jsonpath;
>   jsonpath
> ----------
>   2
> 
> But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How
shouldthat be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2
followedby the path segment “p10”? Here’s the parser’s answer:
 
> 
> david=# select '0x2.p10'::jsonpath;
>   jsonpath
> -----------
>   (2)."p10"
> 
> So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a
decimalexpression. Is that intentional/correct?
 

I think the derivation would be like this:

(I'm not sure what the top-level element would be, so let's start 
somewhere in the middle ...)

<JSON unary expression> ::= <JSON accessor expression>

<JSON accessor expression> ::= <JSON path primary> <JSON accessor op>

<JSON path primary> ::= <JSON path literal>

<JSON accessor op> ::= <JSON member accessor>

<JSON member accessor> ::= <period> <JSON path key name>

So the whole thing is

<JSON path literal> <period> <JSON path key name>

The syntax of <JSON path literal> and <JSON path key name> is then 
punted to ECMAScript 5.1.

0x2 is a HexIntegerLiteral.  (There can be no dots in that.)

p10 is an Identifier.

So I think this is all correct.




Re: ❓ JSON Path Dot Precedence

From
"David E. Wheeler"
Date:
On Apr 10, 2024, at 10:29, Peter Eisentraut <peter@eisentraut.org> wrote:

> So the whole thing is
>
> <JSON path literal> <period> <JSON path key name>
>
> The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1.
>
> 0x2 is a HexIntegerLiteral.  (There can be no dots in that.)
>
> p10 is an Identifier.
>
> So I think this is all correct.

That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point:
whatpath can `'0x2.p10` even select? 

Best,

David