Thread: Patch bug: Fix jsonpath .* on Arrays
Hackers, The behavior of the .* jpiAnyKey jsonpath selector seems incorrect. ``` select jsonb_path_query('[1,2,3]', '$.*'); jsonb_path_query ------------------ (0 rows) select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', '$.*'); jsonb_path_query ------------------ [3, 4, 5] ``` The first example might be expected, since .* is intended for object keys, but the handing of `jpiAnyKey` has a branch forunwrapping arrays. The second example, however, just seems weird: this is .*, not .**. The attached patch fixes it by passing the next node to `executeAnyItem()` (via `executeItemUnwrapTargetArray()`) and thenproperly setting `jperOk` when `executeAnyItem()` finds values and there is no current (next) node. I took this approach given what appears to be the intended behavior or $* on arrays in lax mode. However, I could see anargument that .* should not apply to arrays at all. If so, I can submit a new patch removing the branch that unwraps anarray with .*. Best, David
Attachment
On Tuesday, June 4, 2024, David E. Wheeler <david@justatheory.com> wrote:
Hackers,
The behavior of the .* jpiAnyKey jsonpath selector seems incorrect.
```
select jsonb_path_query('[1,2,3]', '$.*');
jsonb_path_query
------------------
(0 rows)
select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', '$.*');
jsonb_path_query
------------------
[3, 4, 5]
```
The first example might be expected, since .* is intended for object keys, but the handing of `jpiAnyKey` has a branch for unwrapping arrays. The second example, however, just seems weird: this is .*, not .**.
This seems to be working correctly. Lax mode causes the first array level to unwrap and produce new context item values. Then the wildcard member accessor is applied to each. Numbers don’t have members so no matches exist in the first example. The object in the second indeed has a single member and so matches the wildcard and its value, the array, is returned.
David J.
On Jun 4, 2024, at 12:28 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > This seems to be working correctly. Lax mode causes the first array level to unwrap and produce new context item values. Then the wildcard member accessor is applied to each. Numbers don’t have members so no matches exist in the firstexample. The object in the second indeed has a single member and so matches the wildcard and its value, the array,is returned. Oh FFS, unwrapping still breaks my brain. You’re right, of course. Here’s a new patch that demonstrates that behavior, sincethat code path is not currently represented in tests AFAICT (I would have expected to have broken it with this patch). D
Attachment
On Jun 4, 2024, at 20:45, David E. Wheeler <david@justatheory.com> wrote: > Here’s a new patch that demonstrates that behavior, since that code path is not currently represented in tests AFAICT (Iwould have expected to have broken it with this patch). Commitfest link: https://commitfest.postgresql.org/48/5017/ D
On Jun 4, 2024, at 20:45, David E. Wheeler <david@justatheory.com> wrote: > Oh FFS, unwrapping still breaks my brain. You’re right, of course. Here’s a new patch that demonstrates that behavior,since that code path is not currently represented in tests AFAICT (I would have expected to have broken it withthis patch). Rebased and moved the new tests to the end of the file. D
Attachment
On Jun 7, 2024, at 10:23, David E. Wheeler <david@justatheory.com> wrote: > Rebased and moved the new tests to the end of the file. Bah, sorry, that was the previous patch. Here’s v3. D
Attachment
Вторник, 25 июня 2024, 11:17 +07:00 от David E. Wheeler <david@justatheory.com>:
On Jun 7, 2024, at 10:23, David E. Wheeler <david@justatheory.com> wrote:
> Rebased and moved the new tests to the end of the file.
Bah, sorry, that was the previous patch. Here’s v3.
D
Hi! Looks good to me, but I have several comments.
Your patch improves tests, but why did you change formatting in jsonpath_exec.c? What's the motivation?
[1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default.
Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 'strict $.*';
I expected an error like in test [1]. This behavior is not obvious to me.
Everything else is cool. Thanks to the patch and the discussion above, I began to understand better how wildcards in JSON work.
Best regards, Stepan Neretin.
On Jun 25, 2024, at 12:46 AM, Степан Неретин <fenixrnd@mail.ru> wrote: > Hi! Looks good to me, but I have several comments. Thanks for your review! > Your patch improves tests, but why did you change formatting in jsonpath_exec.c? What's the motivation? It’s not just formatting. From the commit message: > While at it, teach `executeAnyItem()` to return `jperOk` when `found` > exist, not because it will be used (the result and `found` are inspected > by different functions), but because it seems like the proper thing to > return from `executeAnyItem()` when considered in isolation. I have since realized it’s not a complete fix for the issue, and hacked around it in my Go version. Would be fine to removethat bit, but IIRC this was the only execution function that would return `jperNotFound` when it in fact adds itemsto the `found` list. The current implementation only looks at one or the other, so it’s not super important, but I foundthe inconsistency annoying and sometimes confusing. > [1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*'); > I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax$.*'); to show what lax mode is set by default. Very few of the other tests do so; I can add it if it’s important for this case, though. > Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 'strict $.*'; > I expected an error like in test [1]. This behavior is not obvious to me. @? suppresses a number of errors. Perhaps I should add a variant of the error-raising query that passes the silent arg, whichwould also suppress the error. > Everything else is cool. Thanks to the patch and the discussion above, I began to understand better how wildcards in JSONwork. Yeah, it’s kind of wild TBH. Best, David
On Jun 25, 2024, at 13:48, David E. Wheeler <david@justatheory.com> wrote: > I have since realized it’s not a complete fix for the issue, and hacked around it in my Go version. Would be fine to removethat bit, but IIRC this was the only execution function that would return `jperNotFound` when it in fact adds itemsto the `found` list. The current implementation only looks at one or the other, so it’s not super important, but I foundthe inconsistency annoying and sometimes confusing. I’ve removed this change. >> [1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*'); >> I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]','lax $.*'); to show what lax mode is set by default. > > Very few of the other tests do so; I can add it if it’s important for this case, though. Went ahead and added lax. > @? suppresses a number of errors. Perhaps I should add a variant of the error-raising query that passes the silent arg,which would also suppress the error. Added a variant where the silent param suppresses the error, too. V2 attached and the PR updated: https://github.com/theory/postgres/pull/4/files Best, David
Attachment
On Thu, Jun 27, 2024 at 1:16 AM David E. Wheeler <david@justatheory.com> wrote:
On Jun 25, 2024, at 13:48, David E. Wheeler <david@justatheory.com> wrote:
> I have since realized it’s not a complete fix for the issue, and hacked around it in my Go version. Would be fine to remove that bit, but IIRC this was the only execution function that would return `jperNotFound` when it in fact adds items to the `found` list. The current implementation only looks at one or the other, so it’s not super important, but I found the inconsistency annoying and sometimes confusing.
I’ve removed this change.
>> [1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
>> I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default.
>
> Very few of the other tests do so; I can add it if it’s important for this case, though.
Went ahead and added lax.
> @? suppresses a number of errors. Perhaps I should add a variant of the error-raising query that passes the silent arg, which would also suppress the error.
Added a variant where the silent param suppresses the error, too.
V2 attached and the PR updated:
https://github.com/theory/postgres/pull/4/files
Best,
David
HI! Now it looks good for me.
Best regards, Stepan Neretin.
On Thu, Jun 27, 2024 at 11:53:14AM +0700, Stepan Neretin wrote: > HI! Now it looks good for me. The tests of jsonb_jsonpath.sql include a lot of patterns for @? and jsonb_path_query with the lax and strict modes, so shouldn't these additions be grouped closer to the existing tests rather than added at the end of the file? -- Michael
Attachment
On Jun 27, 2024, at 04:17, Michael Paquier <michael@paquier.xyz> wrote: > The tests of jsonb_jsonpath.sql include a lot of patterns for @? and > jsonb_path_query with the lax and strict modes, so shouldn't these > additions be grouped closer to the existing tests rather than added at > the end of the file? I think you could argue that they should go with other tests for array unwrapping, though it’s kind of mixed throughout.But that’s more the bit I was testing; almost all the tests are lax, and it’s less the strict behavior to testhere than the explicit behavior of unwrapping in lax mode. But ultimately I don’t care where they go, just that we have them. D
On Jun 27, 2024, at 04:17, Michael Paquier <michael@paquier.xyz> wrote: > The tests of jsonb_jsonpath.sql include a lot of patterns for @? and > jsonb_path_query with the lax and strict modes, so shouldn't these > additions be grouped closer to the existing tests rather than added at > the end of the file? I’ve moved them closer to other tests for unwrapping behavior in the attached updated and rebased v3 patch. Best, David
Attachment
On Mon, Jul 8, 2024 at 11:09 PM David E. Wheeler <david@justatheory.com> wrote:
On Jun 27, 2024, at 04:17, Michael Paquier <michael@paquier.xyz> wrote:
> The tests of jsonb_jsonpath.sql include a lot of patterns for @? and
> jsonb_path_query with the lax and strict modes, so shouldn't these
> additions be grouped closer to the existing tests rather than added at
> the end of the file?
I’ve moved them closer to other tests for unwrapping behavior in the attached updated and rebased v3 patch.
Best,
David
Hi! Looks good to me now! Please, register a patch in CF.
Best regards, Stepan Neretin.
On Jul 15, 2024, at 07:07, Stepan Neretin <sncfmgg@gmail.com> wrote: > Hi! Looks good to me now! Please, register a patch in CF. > Best regards, Stepan Neretin. It’s here: https://commitfest.postgresql.org/48/5017/ Best, David
On Mon, Jul 15, 2024 at 10:29:32AM -0400, David E. Wheeler wrote: > It’s here: > > https://commitfest.postgresql.org/48/5017/ Sorry for the delay. Finally came back to it, and applied the tests. Thanks! It was fun to see that HEAD was silenced with the first patch of this thread that tweaked the behavior with arrays. Regarding the comments, I have left them out for now. That may be a good start, but it also feels like we should do a much better job overall with the area of jsonpath_exec.c. One thing that may help as a start is to reorganize the routines of the file and divide them into sub-categories, or even go through a deeper refactoring to help readers go through the existing 4.5k lines of code that are in this single file.. -- Michael
Attachment
On Jul 19, 2024, at 01:42, Michael Paquier <michael@paquier.xyz> wrote: > Sorry for the delay. Finally came back to it, and applied the tests. > Thanks! Awesome, thank you! > It was fun to see that HEAD was silenced with the first patch of this > thread that tweaked the behavior with arrays. Uh, what? Sorry I don’t follow. > Regarding the comments, I have left them out for now. That may be a > good start, but it also feels like we should do a much better job > overall with the area of jsonpath_exec.c. I put them in because it took me a bit to track down that they were among the implementors of JsonPathGetVarCallback as Iwas porting the code. > One thing that may help as > a start is to reorganize the routines of the file and divide them into > sub-categories, or even go through a deeper refactoring to help > readers go through the existing 4.5k lines of code that are in this > single file.. After I got all the tests passing in the port to Go, I split it up into 14 implementation and 15 test files (with all ofjsonb_jsonpath.(sql.out) in one file). Was much easier to reason about that way. https://github.com/theory/sqljson/tree/main/path/exec Best, David
On Fri, Jul 19, 2024 at 09:49:50AM -0400, David E. Wheeler wrote: >> It was fun to see that HEAD was silenced with the first patch of this >> thread that tweaked the behavior with arrays. > > Uh, what? Sorry I don’t follow. What I mean is that the main regression test suite did not complain on your original patch posted here: https://www.postgresql.org/message-id/A95346F9-6147-46E0-809E-532A485D71D6%40justatheory.com But the new tests showed a difference, and that's enough for me to see value in the new tests. -- Michael
Attachment
On Jul 21, 2024, at 20:54, Michael Paquier <michael@paquier.xyz> wrote: > What I mean is that the main regression test suite did not complain on > your original patch posted here: > https://www.postgresql.org/message-id/A95346F9-6147-46E0-809E-532A485D71D6%40justatheory.com > > But the new tests showed a difference, and that's enough for me to see > value in the new tests. Oh, got it, nice! Thanks, David