Thread: Re: [PATCH] WIP: replace method for jsonpath
On Sep 16, 2024, at 18:39, Florents Tselai <florents.tselai@gmail.com> wrote: > Here’s an updated version of this patch. Oh, nice function. But a broader question for hackers: Is replace() specified in the SQL/JSON spec? If not, what’s the process for evaluatingwhether or not to add features not specified by the spec? > As a future note: > It’s worth noting that both this newly added jspItem and other ones like (jpiDecimal, jpiString) > use jspGetRightArg and jspGetLeftArg. > left and right can be confusing if more complex methods are added in the future. > i.e. jsonpath methods with nargs>=3 . > I was wondering if we’d like something like JSP_GETARG(n) So far I think we have only functions defined by the spec, which tend to be unary or binary, so left and right haven’t beenan issue. Best, David
> On 17 Sep 2024, at 9:40 PM, David E. Wheeler <david@justatheory.com> wrote: > > On Sep 16, 2024, at 18:39, Florents Tselai <florents.tselai@gmail.com> wrote: > >> Here’s an updated version of this patch. > > Oh, nice function. > > But a broader question for hackers: Is replace() specified in the SQL/JSON spec? If not, what’s the process for evaluatingwhether or not to add features not specified by the spec? That’s the first argument I was expecting, and it’s a valid one. From a user’s perspective the answer is: Why not? The more text-processing facilities I have in jsonb, The less back-and-forth-parentheses-fu I do, The easier my life is. From a PG gatekeeping it’s a more complicated issue. It’s not part of the spec, But I think the jsonb infrastructure in PG is really powerful already and we can built on it, And can evolve into a superset DSL of jsonpath. For example, apache/age have lift-and-shifted this infra and built another DSL (cypher) on top of it. > >> As a future note: >> It’s worth noting that both this newly added jspItem and other ones like (jpiDecimal, jpiString) >> use jspGetRightArg and jspGetLeftArg. >> left and right can be confusing if more complex methods are added in the future. >> i.e. jsonpath methods with nargs>=3 . >> I was wondering if we’d like something like JSP_GETARG(n) > > So far I think we have only functions defined by the spec, which tend to be unary or binary, so left and right haven’tbeen an issue. If the answer to the Q above is: “we stick to the spec” then there’s no thinking about this. But tbh, I’ve already started experimenting with other text methods in text $.strip() / trim() / upper() / lower() etc. Fallback scenario: make this an extension, but in a first pass I didn’t find any convenient hooks. One has to create a whole new scanner, grammar etc. > > Best, > > David >
On Sep 17, 2024, at 15:03, Florents Tselai <florents.tselai@gmail.com> wrote: > Fallback scenario: make this an extension, but in a first pass I didn’t find any convenient hooks. > One has to create a whole new scanner, grammar etc. Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" feature[1], which allows users to add functions.Would be cool to have a way to register jsonpath functions somehow, but I would imagine it’d need quite a bit ofspecification similar to RFC-9535. Wouldn’t surprise me to see something like that appear in a future version of the spec,with an interface something like CREATE OPERATOR. I don’t have a strong feeling about what should be added that’s not in the spec; my main interest is not having to constantlysync my port[2]. I’m already behind, and’t just been a couple months! 😂 Best, David [1]: https://www.rfc-editor.org/rfc/rfc9535.html#name-function-extensions [2]: https://github.com/theory/sqljson
On 17.09.24 21:16, David E. Wheeler wrote: > On Sep 17, 2024, at 15:03, Florents Tselai <florents.tselai@gmail.com> wrote: > >> Fallback scenario: make this an extension, but in a first pass I didn’t find any convenient hooks. >> One has to create a whole new scanner, grammar etc. > > Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" feature[1], which allows users to add functions.Would be cool to have a way to register jsonpath functions somehow, but I would imagine it’d need quite a bit ofspecification similar to RFC-9535. Wouldn’t surprise me to see something like that appear in a future version of the spec,with an interface something like CREATE OPERATOR. Why can't we "just" call any suitable pg_proc-registered function from JSON path? The proposed patch routes the example '$.replace("hello","bye")' internally to the internal implementation of the SQL function replace(..., 'hello', 'bye'). Why can't we do this automatically for any function call in a JSON path expression?
On 18 Sep 2024, at 11:23 AM, Peter Eisentraut <peter@eisentraut.org> wrote:On 17.09.24 21:16, David E. Wheeler wrote:On Sep 17, 2024, at 15:03, Florents Tselai <florents.tselai@gmail.com> wrote:Fallback scenario: make this an extension, but in a first pass I didn’t find any convenient hooks.Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" feature[1], which allows users to add functions. Would be cool to have a way to register jsonpath functions somehow, but I would imagine it’d need quite a bit of specification similar to RFC-9535. Wouldn’t surprise me to see something like that appear in a future version of the spec, with an interface something like CREATE OPERATOR.
One has to create a whole new scanner, grammar etc.
Why can't we "just" call any suitable pg_proc-registered function from JSON path? The proposed patch routes the example '$.replace("hello","bye")' internally to the internal implementation of the SQL function replace(..., 'hello', 'bye'). Why can't we do this automatically for any function call in a JSON path expression?
A couple of weeks ago, I discovered transform_jsonb_string_values, which is already available in jsonfuncs.h
and that gave me the idea for this extension https://github.com/Florents-Tselai/jsonb_apply
It does exactly what you’re saying: searches for a suitable pg_proc in the catalog, and directly applies it.
select jsonb_apply('{
"id": 1,
"name": "John",
"messages": [
"hello"
]
}', 'replace', 'hello', 'bye’);
select jsonb_filter_apply('{
"id": 1,
"name": "John",
"messages": [
"hello"
]
}', '{messages}', 'md5’);
But, I don't know… jsonb_apply? That seemed “too fancy”/LISPy for standard Postgres.
Now that you mention it, though, there’s an alternative of tweaking the grammar and calling the suitable text proc.
On 2024-09-18 We 4:23 AM, Peter Eisentraut wrote: > On 17.09.24 21:16, David E. Wheeler wrote: >> On Sep 17, 2024, at 15:03, Florents Tselai >> <florents.tselai@gmail.com> wrote: >> >>> Fallback scenario: make this an extension, but in a first pass I >>> didn’t find any convenient hooks. >>> One has to create a whole new scanner, grammar etc. >> >> Yeah, it got me thinking about the RFC-9535 JSONPath "Function >> Extension" feature[1], which allows users to add functions. Would be >> cool to have a way to register jsonpath functions somehow, but I >> would imagine it’d need quite a bit of specification similar to >> RFC-9535. Wouldn’t surprise me to see something like that appear in a >> future version of the spec, with an interface something like CREATE >> OPERATOR. > > Why can't we "just" call any suitable pg_proc-registered function from > JSON path? The proposed patch routes the example > '$.replace("hello","bye")' internally to the internal implementation > of the SQL function replace(..., 'hello', 'bye'). Why can't we do this > automatically for any function call in a JSON path expression? > > > That might work. The thing that bothers me about the original proposal is this: what if we add a new non-spec jsonpath method and then a new version of the spec adds a method with the same name, but not compatible with our method? We'll be in a nasty place. At the very least I think we need to try hard to avoid that. Maybe we should prefix non-spec method names with "pg_", or maybe use an initial capital letter. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 18 Sep 2024, at 3:47 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-09-18 We 4:23 AM, Peter Eisentraut wrote:On 17.09.24 21:16, David E. Wheeler wrote:On Sep 17, 2024, at 15:03, Florents Tselai <florents.tselai@gmail.com> wrote:Fallback scenario: make this an extension, but in a first pass I didn’t find any convenient hooks.
One has to create a whole new scanner, grammar etc.
Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" feature[1], which allows users to add functions. Would be cool to have a way to register jsonpath functions somehow, but I would imagine it’d need quite a bit of specification similar to RFC-9535. Wouldn’t surprise me to see something like that appear in a future version of the spec, with an interface something like CREATE OPERATOR.
Why can't we "just" call any suitable pg_proc-registered function from JSON path? The proposed patch routes the example '$.replace("hello","bye")' internally to the internal implementation of the SQL function replace(..., 'hello', 'bye'). Why can't we do this automatically for any function call in a JSON path expression?
That might work. The thing that bothers me about the original proposal is this: what if we add a new non-spec jsonpath method and then a new version of the spec adds a method with the same name, but not compatible with our method? We'll be in a nasty place. At the very least I think we need to try hard to avoid that. Maybe we should prefix non-spec method names with "pg_", or maybe use an initial capital letter.
Having said that, “pg_” is probably too long for a jsonpath expression,
Most importantly though, “pg_” in my mind is a prefix for things like catalog lookup and system monitoring.
Not a function that the average user would use.
Thus, I lean towards initial-capital.
The more general case would look like:
A new jsonpath item of the format $.Func(arg1, …, argn) can be applied (recursively or not) to a json object.
As a first iteration/version only pg_proc-registered functions of the format func(text, ...,) -> text are available.
We can focus on oid(arg0) = TEXTOID and rettype = TEXTOID fist.
The first arg0 has to be TEXTOID (the current json string) while subsequent args are provided by the user
in the jsonpath expression.
The functions we want to support will be part of jsonpath grammar
and during execution we'll have enough info to find the appropriate PGFunction to call.
What I'm missing yet is how we could handle vars jsonb,
in case the user doesn't want to just hardcode the actual function arguments.
Then resolving argtypes1...n is a bit more complex:
The signature for jsonb_apply(doc jsonb, func text[, variadic "any" args1_n]); [0]
Here, variadic "any" works beautifully, but that's a brand-new function.
In existing jsonb{path} facilities, vars are jsonb objects which could work as well I think.
Unless I'm missing something.