Thread: [MASSMAIL]❓ JSON Path Dot Precedence

[MASSMAIL]❓ JSON Path Dot Precedence

From

"David E. Wheeler"

Date:

07 April 2024, 16:13:45

Hello Hackers,

A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers:

> Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON in
someminor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON.

In other words, this is valid:

david=# select '2.'::jsonpath;
jsonpath
----------
2

But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should
thatbe parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2
followedby the path segment “p10”? Here’s the parser’s answer:

david=# select '0x2.p10'::jsonpath;
jsonpath
-----------
(2)."p10"

So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a
decimalexpression. Is that intentional/correct?

Discovered while writing my Go lexer and throwing all of Go’s floating point literal examples[2] at it and comparing to
thePostgres path parser. Curiously, this is only an issue for 0x/0o/0b numeric expressions; a decimal expression does
notbehave in the same way:

david=# select '2.p10'::jsonpath;
ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input
LINE 1: select '2.p10'::jsonpath;

Which maybe seems a bit inconsistent.

Thoughts on what the “correct” behavior should be?

Best,

David

[1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
[2]: https://tip.golang.org/ref/spec#Floating-point_literals

Re: ❓ JSON Path Dot Precedence

From

Erik Wienhold

Date:

07 April 2024, 19:46:58

On 2024-04-07 18:13 +0200, David E. Wheeler wrote:
> A question about the behavior of the JSON Path parser. The docs[1]
> have this to say about numbers:
> 
> >  Numeric literals in SQL/JSON path expressions follow JavaScript
> >  rules, which are different from both SQL and JSON in some minor
> >  details. For example, SQL/JSON path allows .1 and 1., which are
> >  invalid in JSON.
> 
> In other words, this is valid:
> 
> david=# select '2.'::jsonpath;
>  jsonpath 
> ----------
>  2
> 
> But this feature creates a bit of a conflict with the use of a dot for
> path expressions. Consider `0x2.p10`. How should that be parsed? As an
> invalid decimal expression ("trailing junk after numeric literal”), or
> as a valid integer 2 followed by the path segment “p10”? Here’s the
> parser’s answer:
> 
> david=# select '0x2.p10'::jsonpath;
>  jsonpath  
> -----------
>  (2)."p10"
> 
> So it would seem that, other things being equal, a path key expression
> (`.foo`) is slightly higher precedence than a decimal expression. Is
> that intentional/correct?

I guess jsonpath assumes that hex, octal, and binary literals are
integers.  So there's no ambiguity about any fractional part that might
follow.

> Discovered while writing my Go lexer and throwing all of Go’s floating
> point literal examples[2] at it and comparing to the Postgres path
> parser. Curiously, this is only an issue for 0x/0o/0b numeric
> expressions; a decimal expression does not behave in the same way:
> 
> david=# select '2.p10'::jsonpath;
> ERROR:  trailing junk after numeric literal at or near "2.p" of jsonpath input
> LINE 1: select '2.p10'::jsonpath;

It scans the decimal "2." and then finds junks "p10".

Works with a full decimal:

    test=# select '3.14.p10'::jsonpath;
       jsonpath
    --------------
     (3.14)."p10"
    (1 row)

And with extra whitespace to resolve the ambiguity:

    test=# select '2 .p10'::jsonpath;
     jsonpath
    -----------
     (2)."p10"
    (1 row)

> Which maybe seems a bit inconsistent.
> 
> Thoughts on what the “correct” behavior should be?

I'd say a member accessor after a number doesn't really make sense
because object keys are strings.  One could argue that path "$.2.p10"
should match JSON '{"2":{"p10":42}}', i.e. the numeric accessor is
converted to a string.  For example, in nodejs I can do:

    > var x = {2: {p10: 42}}
    > x[2].p10
    42

But that's JavaScript, not JSON.

Also, is there even a use case for path "0x2.p10"?  The path has to
start with "$" or ("@" in case of a filter expression), doesn't it?  And
it that case it doesn't parse:

    test=# select '$.0x2.p10'::jsonpath;
    ERROR:  trailing junk after numeric literal at or near ".0x" of jsonpath input
    LINE 1: select '$.0x2.p10'::jsonpath;

Even with extra whitespace:

    test=# select '$ . 0x2 . p10'::jsonpath;
    ERROR:  syntax error at or near "0x2" of jsonpath input
    LINE 1: select '$ . 0x2 . p10'::jsonpath;

Or should it behave like an array accessor?  Similar to:

    test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath);
     jsonb_path_query
    ------------------
     42
    (1 row)

>   [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
>   [2]: https://tip.golang.org/ref/spec#Floating-point_literals

-- 
Erik

Re: ❓ JSON Path Dot Precedence

From

"David E. Wheeler"

Date:

07 April 2024, 22:11:32

On Apr 7, 2024, at 15:46, Erik Wienhold <ewie@ewie.name> wrote:

> I guess jsonpath assumes that hex, octal, and binary literals are
> integers.  So there's no ambiguity about any fractional part that might
> follow.

Yeah, that’s what the comment in the flex file says:

https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105


> Also, is there even a use case for path "0x2.p10"?  The path has to
> start with "$" or ("@" in case of a filter expression), doesn't it?  And
> it that case it doesn't parse:
>
>    test=# select '$.0x2.p10'::jsonpath;
>    ERROR:  trailing junk after numeric literal at or near ".0x" of jsonpath input
>    LINE 1: select '$.0x2.p10'::jsonpath;
>
> Even with extra whitespace:
>
>    test=# select '$ . 0x2 . p10'::jsonpath;
>    ERROR:  syntax error at or near "0x2" of jsonpath input
>    LINE 1: select '$ . 0x2 . p10'::jsonpath;
>
> Or should it behave like an array accessor?  Similar to:
>
>    test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath);
>     jsonb_path_query
>    ------------------
>     42
>    (1 row)

I too am curious why these parse successfully, but don’t appear to be useful.

Best,

David

Re: ❓ JSON Path Dot Precedence

From

Peter Eisentraut

Date:

10 April 2024, 14:29:28

On 07.04.24 18:13, David E. Wheeler wrote:
> Hello Hackers,
> 
> A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers:
> 
>>   Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON
insome minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON.
 
> 
> In other words, this is valid:
> 
> david=# select '2.'::jsonpath;
>   jsonpath
> ----------
>   2
> 
> But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How
shouldthat be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2
followedby the path segment “p10”? Here’s the parser’s answer:
 
> 
> david=# select '0x2.p10'::jsonpath;
>   jsonpath
> -----------
>   (2)."p10"
> 
> So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a
decimalexpression. Is that intentional/correct?
 

I think the derivation would be like this:

(I'm not sure what the top-level element would be, so let's start 
somewhere in the middle ...)

<JSON unary expression> ::= <JSON accessor expression>

<JSON accessor expression> ::= <JSON path primary> <JSON accessor op>

<JSON path primary> ::= <JSON path literal>

<JSON accessor op> ::= <JSON member accessor>

<JSON member accessor> ::= <period> <JSON path key name>

So the whole thing is

<JSON path literal> <period> <JSON path key name>

The syntax of <JSON path literal> and <JSON path key name> is then 
punted to ECMAScript 5.1.

0x2 is a HexIntegerLiteral.  (There can be no dots in that.)

p10 is an Identifier.

So I think this is all correct.

Re: ❓ JSON Path Dot Precedence

From

"David E. Wheeler"

Date:

10 April 2024, 20:44:51

On Apr 10, 2024, at 10:29, Peter Eisentraut <peter@eisentraut.org> wrote:

> So the whole thing is
>
> <JSON path literal> <period> <JSON path key name>
>
> The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1.
>
> 0x2 is a HexIntegerLiteral.  (There can be no dots in that.)
>
> p10 is an Identifier.
>
> So I think this is all correct.

That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point:
whatpath can `'0x2.p10` even select? 

Best,

David

Re: ❓ JSON Path Dot Precedence

From

"David E. Wheeler"

Date:

08 July 2024, 15:27:36

Hi, following up on some old threads.

> On Apr 10, 2024, at 16:44, David E. Wheeler <david@justatheory.com> wrote:
>
> That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point:
whatpath can `'0x2.p10` even select? 

I’m wondering whether the jsonpath parser should be updated to reject cases like this. I think it will always return no
results.AFAICT, there’s no way to navigate to an object identifier immediately after a number: 

david=# select '0x2.p10'::jsonpath;
 jsonpath
-----------
 (2)."p10"
(1 row)

david=# select jsonb_path_query(target => '[0, 1, {"p10": true}]', path => '0x2.p10');
 jsonb_path_query
------------------
(0 rows)

david=# select jsonb_path_query(target => '{"0x2": {"p10": true}}', path => '0x2.p10');
 jsonb_path_query
------------------
(0 rows)

It’s just inherently meaningless. BTW, it’s not limited to hex numbers:

david=# select '(2).p10'::jsonpath;
 jsonpath
-----------
 (2)."p10"

OTOH, maybe that’s a corner case we can live with.

Best,

David

Re: ❓ JSON Path Dot Precedence

From

"David G. Johnston"

Date:

08 July 2024, 16:05:30

On Mon, Jul 8, 2024 at 8:27 AM David E. Wheeler <david@justatheory.com> wrote:

Hi, following up on some old threads.

> On Apr 10, 2024, at 16:44, David E. Wheeler <david@justatheory.com> wrote:
>
> That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select?

I’m wondering whether the jsonpath parser should be updated to reject cases like this. I think it will always return no results. AFAICT, there’s no way to navigate to an object identifier immediately after a number:

If we go down this path wouldn't the correct framing be: do not allow accessors after scalars ? The same argument applies to false/"john" and other scalar types since by definition none of them have subcomponents to be accessed.

That said, the parser has a lax mode which somewhat implies it doesn't expect the jsonpath type to perform much in the way of validation of the semantic correctness of the encoded path expression.

I like the idea of a smarter expression-holding type and would even wish to have had this on day one. Retrofitting is less appealing. We document a similarity with regular expressions here where we, for better and worse, have lived without a regexppath data type forever and leave it to the executor to tell the user their pattern is invalid. Leaning on that precedence here makes accepting the status quo more reasonable. Though strict/lax modes and, I think, variables, motivates me to put my vote toward the "do more validation" group.

Does the standard even have a separate type here or is that our implementation detail invention?

David J.

Re: ❓ JSON Path Dot Precedence

From

"David E. Wheeler"

Date:

08 July 2024, 16:11:58

On Jul 8, 2024, at 12:05, David G. Johnston <david.g.johnston@gmail.com> wrote:

> If we go down this path wouldn't the correct framing be:  do not allow accessors after scalars ?  The same argument
appliesto false/"john" and other scalar types since by definition none of them have subcomponents to be accessed. 

Yes, excellent point.

> That said, the parser has a lax mode which somewhat implies it doesn't expect the jsonpath type to perform much in
theway of validation of the semantic correctness of the encoded path expression. 

My understanding is that lax mode means it ignores where the JSON doesn’t abide by expectations of the path expression,
notthat the path parsing is lax. 

> I like the idea of a smarter expression-holding type and would even wish to have had this on day one.  Retrofitting
isless appealing.  We document a similarity with regular expressions here where we, for better and worse, have lived
withouta regexppath data type forever and leave it to the executor to tell the user their pattern is invalid.  Leaning
onthat precedence here makes accepting the status quo more reasonable.  Though strict/lax modes and, I think,
variables,motivates me to put my vote toward the "do more validation" group. 

This feels different from a documented difference in behavior as an implementation choice, like path regex vs. Spencer.
Inthis case, the expression is technically meaningless, but there’s never so much as an error thrown. 

> Does the standard even have a separate type here or is that our implementation detail invention?

Sorry, separate type for what?

Best,

David

Re: ❓ JSON Path Dot Precedence

From

"David G. Johnston"

Date:

08 July 2024, 16:17:09

On Mon, Jul 8, 2024 at 9:12 AM David E. Wheeler <david@justatheory.com> wrote:

On Jul 8, 2024, at 12:05, David G. Johnston <david.g.johnston@gmail.com> wrote:

> Does the standard even have a separate type here or is that our implementation detail invention?

Sorry, separate type for what?

We created a data type named: jsonpath. Does the standard actually have that data type and defined parsing behavior or does it just have functions where one of the inputs is text whose contents are a path expression?

David J.

Re: ❓ JSON Path Dot Precedence

From

"David E. Wheeler"

Date:

08 July 2024, 16:54:48

On Jul 8, 2024, at 12:17, David G. Johnston <david.g.johnston@gmail.com> wrote:

> We created a data type named: jsonpath.  Does the standard actually have that data type and defined parsing behavior
ordoes it just have functions where one of the inputs is text whose contents are a path expression? 

Ah, got it.

D