Thread: Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Tom Lane
Date:
Florents Tselai <florents.tselai@gmail.com> writes: > This patch is a follow-up and generalization to [0]. > It adds the following jsonpath methods: lower, upper, initcap, l/r/btrim, > replace, split_part. How are you going to deal with the fact that this makes jsonpath operations not guaranteed immutable? (See commit cb599b9dd for some context.) Those are all going to have behavior that's dependent on the underlying locale. We have the kluge of having separate "_tz" functions to support non-immutable datetime operations, but that way doesn't seem like it's going to scale well to multiple sources of mutability. regards, tom lane
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Alexander Korotkov
Date:
On Thu, Sep 26, 2024 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florents Tselai <florents.tselai@gmail.com> writes: > > This patch is a follow-up and generalization to [0]. > > It adds the following jsonpath methods: lower, upper, initcap, l/r/btrim, > > replace, split_part. > > How are you going to deal with the fact that this makes jsonpath > operations not guaranteed immutable? (See commit cb599b9dd > for some context.) Those are all going to have behavior that's > dependent on the underlying locale. > > We have the kluge of having separate "_tz" functions to support > non-immutable datetime operations, but that way doesn't seem like > it's going to scale well to multiple sources of mutability. While inventing "_tz" functions I was thinking about jsonpath methods and operators defined in standard then. Now I see huge interest on extending that. I wonder if we can introduce a notion of flexible mutability? Imagine that jsonb_path_query() function (and others) has another function which analyzes arguments and reports mutability. If jsonpath argument is constant and all methods inside are safe then jsonb_path_query() is immutable otherwise it is stable. I was thinking about that back working on jsonpath, but that time problem seemed too limited for this kind of solution. Now, it's possibly time to shake off the dust from this idea. What do you think? ------ Regards, Alexander Korotkov Supabase
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Florents Tselai
Date:
On Thu, Sep 26, 2024 at 1:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Sep 26, 2024 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florents Tselai <florents.tselai@gmail.com> writes:
> > This patch is a follow-up and generalization to [0].
> > It adds the following jsonpath methods: lower, upper, initcap, l/r/btrim,
> > replace, split_part.
>
> How are you going to deal with the fact that this makes jsonpath
> operations not guaranteed immutable? (See commit cb599b9dd
> for some context.) Those are all going to have behavior that's
> dependent on the underlying locale.
>
> We have the kluge of having separate "_tz" functions to support
> non-immutable datetime operations, but that way doesn't seem like
> it's going to scale well to multiple sources of mutability.
While inventing "_tz" functions I was thinking about jsonpath methods
and operators defined in standard then. Now I see huge interest on
extending that. I wonder if we can introduce a notion of flexible
mutability? Imagine that jsonb_path_query() function (and others) has
another function which analyzes arguments and reports mutability. If
jsonpath argument is constant and all methods inside are safe then
jsonb_path_query() is immutable otherwise it is stable. I was
thinking about that back working on jsonpath, but that time problem
seemed too limited for this kind of solution. Now, it's possibly time
to shake off the dust from this idea. What do you think?
------
Regards,
Alexander Korotkov
Supabase
In case you're having a deja vu, while researching this
I did come across [0] where disussing this back in 2019.
In this patch I've conveniently left jspIsMutable and jspIsMutableWalker untouched and under the rug,
but for the few seconds I pondered over this,the best answer I came with was
a simple heuristic to what Alexander says above:
if all elements are safe, then the whole jsp is immutable.
If we really want to tackle this and make jsonpath richer though,
I don't think we can avoid being a little more flexible/explicit wrt mutability.
Speaking of extensible: the jsonpath standard does mention function extensions [1] ,
so it looks like we're covered by the standard, and the mutability aspect is an implementation detail. No?
And having said that, the whole jsonb/jsonpath parser/executor infrastructure is extremely powerful
and kinda under-utilized if we use it "only" for jsonpath.
Tbh, I can see it supporting more specific DSLs and even offering hooks for extensions.
And I know for certain I'm not the only one thinking about this.
See [2] for example where they've lifted, shifted and renamed the jsonb/jsonpath infra to build a separate language for graphs
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On Sep 26, 2024, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote: > Speaking of extensible: the jsonpath standard does mention function extensions [1] , > so it looks like we're covered by the standard, and the mutability aspect is an implementation detail. No? That’s not the standard used for Postgres jsonpath. Postgres follows the SQL/JSON standard in the SQL standard, which isnot publicly available, but a few people on the list have copies they’ve purchased and so could provide some context. In a previous post I wondered if the SQL standard had some facility for function extensions, but I suspect not. Maybe inthe next iteration? > And having said that, the whole jsonb/jsonpath parser/executor infrastructure is extremely powerful > and kinda under-utilized if we use it "only" for jsonpath. > Tbh, I can see it supporting more specific DSLs and even offering hooks for extensions. > And I know for certain I'm not the only one thinking about this. > See [2] for example where they've lifted, shifted and renamed the jsonb/jsonpath infra to build a separate language forgraphs I’m all for extensibility, though jsonpath does need to continue to comply with the SQL standard. Do you have some idea ofthe sorts of hooks that would allow extension authors to use some of that underlying capability? Best, David
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Florents Tselai
Date:
> On 27 Sep 2024, at 12:45 PM, David E. Wheeler <david@justatheory.com> wrote: > > On Sep 26, 2024, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote: > >> Speaking of extensible: the jsonpath standard does mention function extensions [1] , >> so it looks like we're covered by the standard, and the mutability aspect is an implementation detail. No? > > That’s not the standard used for Postgres jsonpath. Postgres follows the SQL/JSON standard in the SQL standard, which isnot publicly available, but a few people on the list have copies they’ve purchased and so could provide some context. > > In a previous post I wondered if the SQL standard had some facility for function extensions, but I suspect not. Maybe inthe next iteration? > >> And having said that, the whole jsonb/jsonpath parser/executor infrastructure is extremely powerful >> and kinda under-utilized if we use it "only" for jsonpath. >> Tbh, I can see it supporting more specific DSLs and even offering hooks for extensions. >> And I know for certain I'm not the only one thinking about this. >> See [2] for example where they've lifted, shifted and renamed the jsonb/jsonpath infra to build a separate language forgraphs > > I’m all for extensibility, though jsonpath does need to continue to comply with the SQL standard. Do you have some ideaof the sorts of hooks that would allow extension authors to use some of that underlying capability? Re-tracing what I had to do 1. Define a new JsonPathItemType jpiMyExtType and map it to a JsonPathKeyword 2. Add a new JsonPathKeyword and make the lexer and parser aware of that, 3. Tell the main executor executeItemOptUnwrapTarget what to do when the new type is matched. I think 1, 2 are the trickiest because they require hooks to jsonpath_scan.l and parser jsonpath_gram.y 3. is the meat of a potential hook, which would be something like extern JsonPathExecResult executeOnMyJsonpathItem(JsonPathExecContext *cxt, JsonbValue *jb, JsonValueList *found); This should be called by the main executor executeItemOptUnwrapTarget when it encounters case jpiMyExtType It looks like quite an endeavor, to be honest.
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Florents Tselai
Date:
On Thu, Sep 26, 2024 at 1:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Sep 26, 2024 at 12:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florents Tselai <florents.tselai@gmail.com> writes:
> > This patch is a follow-up and generalization to [0].
> > It adds the following jsonpath methods: lower, upper, initcap, l/r/btrim,
> > replace, split_part.
>
> How are you going to deal with the fact that this makes jsonpath
> operations not guaranteed immutable? (See commit cb599b9dd
> for some context.) Those are all going to have behavior that's
> dependent on the underlying locale.
>
> We have the kluge of having separate "_tz" functions to support
> non-immutable datetime operations, but that way doesn't seem like
> it's going to scale well to multiple sources of mutability.
While inventing "_tz" functions I was thinking about jsonpath methods
and operators defined in standard then. Now I see huge interest on
extending that. I wonder if we can introduce a notion of flexible
mutability? Imagine that jsonb_path_query() function (and others) has
another function which analyzes arguments and reports mutability. If
jsonpath argument is constant and all methods inside are safe then
jsonb_path_query() is immutable otherwise it is stable. I was
thinking about that back working on jsonpath, but that time problem
seemed too limited for this kind of solution. Now, it's possibly time
to shake off the dust from this idea. What do you think?
I was thinking about taking another stab at this.
Would someone more versed in the inner workings of jsonpath like to weigh in on the immutability wrt locale?
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Robert Haas
Date:
On Tue, May 13, 2025 at 11:00 PM David E. Wheeler <david@justatheory.com> wrote: > On May 13, 2025, at 16:24, Florents Tselai <florents.tselai@gmail.com> wrote: > > As Robert said—and I agree—renaming the existing _tz family would be more trouble than it’s worth, given the need fordeprecations, migration paths, etc. If we were designing this today, suffixes like _stable or _volatile might have beenmore appropriate, but at this point, we’re better off staying consistent with the _tz family. > > I get the pragmatism, and don’t want to over-bike-shed, but what a wart to live with. [I just went back and re-read Robert’spost, and didn’t realize he used exactly the same expression!] Would it really be too effortful to create _stableor _volatile functions and leave the _tz functions as a sort of legacy? No, that wouldn't be too much work, but the issue is that people will keep using the _tz versions and when we eventually try to remove them those people will complain no matter how prominent we make the deprecation notice. If we want to go this route, maybe we should do something like: 1. Add the new versions with a _s suffix or whatever. 2. Invent a GUC jsonb_tz_warning = { on | off } that advises you to use the new functions instead, whenever you use the old ones. 3. After N years, flip the default value from off to on. 4. After M additional years, remove the old functions and the GUC. 5. Still get complaints. -- Robert Haas EDB: http://www.enterprisedb.com
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Tom Lane
Date:
"David E. Wheeler" <david@justatheory.com> writes: > On May 21, 2025, at 14:06, Robert Haas <robertmhaas@gmail.com> wrote: >> If we want to go this route, maybe we should do >> something like: >> ... >> 5. Still get complaints. > Complainers gonna complain. Yeah. I do not see the point of that amount of effort. > Any idea how widespread the use of the function is? It was added in 17, and I’ve met few who have really dug into the jonpathstuff yet, let alone needed the time zone conversion functionality. That's a good point. We should also remember that if somebody really really doesn't want to fix their app, they can trivially create a wrapper function with the old name. Having said that, what's wrong with inventing some improved function names and never removing the old ones? regards, tom lane
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Robert Haas
Date:
On Wed, May 21, 2025 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Having said that, what's wrong with inventing some improved function > names and never removing the old ones? I don't particularly like the clutter, but if the consensus is that the clutter doesn't matter, fair enough. -- Robert Haas EDB: http://www.enterprisedb.com
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
Robert Haas
Date:
On Thu, May 22, 2025 at 4:56 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I don't understand how this discussion got to the conclusion that > functions that depend on the locale cannot be immutable. Note that the > top-level functions lower, upper, and initcap themselves are immutable. Oh, well that was what Tom said last September and I just assumed he was right about the policy. If not, well then that's different. -- Robert Haas EDB: http://www.enterprisedb.com
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On May 23, 2025, at 13:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I assume you mean that they’re set at initdb time, so there’s no mutability concern? > > Yeah, I think Peter's right and I'm wrong. Obviously this ties into > our philosophical debate about how immutable is immutable. But as > long as the functions only depend on locale settings that are fixed > at database creation, I think it's okay to consider them immutable. > > If you were, say, depending on LC_NUMERIC, it would clearly be unsafe > to consider that immutable, so I'm not quite sure if this is the end > of the discussion. But for what's mentioned in the thread title, > I think we only care about LC_CTYPE. Oh, so maybe all this is moot, and Florents can go ahead and add support for the functions to the non-_tz functions? Should there be some sort of inventory of what functions can be used in what contexts? D
Attachment
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On May 24, 2025, at 12:51, Florents Tselai <florents.tselai@gmail.com> wrote: > I think the patch is still in reasonably good shape and hasn’t changed much since September 24.So yes, I’d hope there arestill some valid points to consider or improve. Okay, here’s a review. Patch applies cleanly. All tests pass. I'm curious why you added the `arg0` and `arg1` fields to the `method_args` union. Is there some reason that the existing`left` and `right` fields wouldn't work? Admittedly these are not formally binary operators, but I don't see thatit matters much. The existing string() method operates on a "JSON boolean, number, string, or datetime"; should these functions also operateon all those data types? The argument to the trim methods appears to be ignored: ``` postgres=# select jsonb_path_query('"zzzytest"', '$.ltrim("xyz")'); jsonb_path_query ------------------ "zzzytest" ``` I'm wondering if the issue is the use of the opt_datetime_template in the grammar? ``` | '.' STR_LTRIM_P '(' opt_datetime_template ')' { $$ = makeItemUnary(jpiStrLtrimFunc, $4); } | '.' STR_RTRIM_P '(' opt_datetime_template ')' { $$ = makeItemUnary(jpiStrRtrimFunc, $4); } | '.' STR_BTRIM_P '(' opt_datetime_template ')' { $$ = makeItemUnary(jpiStrBtrimFunc, $4); } ``` I realize it resolves to a string, but for some reason it doesn't get picked up. But also, do you want to support variablesfor either of these arguments? If so, maybe rename and use starts_with_initial: ``` starts_with_initial: STRING_P { $$ = makeItemString(&$1); } | VARIABLE_P { $$ = makeItemVariable(&$1); } ; ``` split_part() does not support a negative n value: ``` postgres=# select split_part('abc,def,ghi,jkl', ',', -2) ; split_part ------------ ghi select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part(",", -2)'); ERROR: syntax error at or near "-" of jsonpath input LINE 1: select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part("... ``` Nor does a `+` work. I think you’d be better served using `csv_elem`, something like: ``` | '.' STR_SPLIT_PART_P '(' STRING_P csv_elem ‘)’ ``` I'm not sure how well these functions comply with the SQL spec. Does it have a provision for implementation-specific methods?I *think* all existing methods are defined by the spec, but someone with access to its contents would have to sayfor sure. And maybe we don't care, consider this a natural extension? I’ve attached a new patch with docs. Best, David
Attachment
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On May 24, 2025, at 17:39, David E. Wheeler <david@justatheory.com> wrote: > I’ve attached a new patch with docs. Oh, and I forgot to mention, src/backend/utils/adt/jsonpath_scan.l looks like it has a ton of duplication in it. Shouldn’tit just add the new keywords, something like: ``` @@ -415,6 +415,11 @@ static const JsonPathKeyword keywords[] = { {4, false, WITH_P, "with"}, {5, true, FALSE_P, "false"}, {5, false, FLOOR_P, "floor"}, + {5, false, STR_BTRIM_P, "btrim"}, + {5, false, STR_LOWER_P, "lower"}, + {5, false, STR_LTRIM_P, "ltrim"}, + {5, false, STR_RTRIM_P, "rtrim"}, + {5, false, STR_UPPER_P, "upper"}, {6, false, BIGINT_P, "bigint"}, {6, false, DOUBLE_P, "double"}, {6, false, EXISTS_P, "exists"}, @@ -428,10 +433,13 @@ static const JsonPathKeyword keywords[] = { {7, false, INTEGER_P, "integer"}, {7, false, TIME_TZ_P, "time_tz"}, {7, false, UNKNOWN_P, "unknown"}, + {7, false, STR_INITCAP_P, "initcap"}, + {7, false, STR_REPLACEFUNC_P, "replace"}, {8, false, DATETIME_P, "datetime"}, {8, false, KEYVALUE_P, "keyvalue"}, {9, false, TIMESTAMP_P, "timestamp"}, {10, false, LIKE_REGEX_P, "like_regex"}, + {10, false, STR_SPLIT_PART_P, "split_part"}, {12, false, TIMESTAMP_TZ_P, "timestamp_tz"}, ``` Best, David
Attachment
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On May 24, 2025, at 17:46, David E. Wheeler <david@justatheory.com> wrote: > Oh, and I forgot to mention, src/backend/utils/adt/jsonpath_scan.l looks like it has a ton of duplication in it. Shouldn’tit just add the new keywords, something like: And now I see my patch broke the grammar because I left some of my fiddling in there. Apologies. Here’s an updated patchwith the updated keyword map, too. Best, David
Attachment
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
Hackers, On May 26, 2025, at 18:00, David E. Wheeler <david@justatheory.com> wrote: > Yes, I think it would be best if the grammar was a bit stricter --- and therefore more self-explanatory --- by making theargs closer to what the functions actually expect. I chatted with Florents and went ahead and simplified the grammar and fixed the other issues I identified in my originalreview. Note that there are two commits, now: `v6-0001-Rename-jsonpath-method-arg-tokens.patch` Renames some of the symbols in the jsonpath grammar so that they’re lessgeneric (`csv*`) and more specific to their contents. This is with the expectation that they will be used by other methodsin the next patch and in the future. I thought it best to separate this refactoring from the feature patch. `v6-0002-Add-additional-jsonpath-string-methods.patch` is that feature patch. The grammar now parses the exact number andtypes of each method argument, eliminating the need for explicit error checking. It also uses the existing patterns forhandling methods with two parameters, removing a bunch of duplicate code. Overall I think this is ready for committer review, although now that I’m not just reviewing but hacking on this thing, maybesomeone else should review it first. Patches attached, GitHub PR here: https://github.com/theory/postgres/pull/12 Best, David
Attachment
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On Jun 3, 2025, at 15:10, David E. Wheeler <david@justatheory.com> wrote: >> https://github.com/theory/postgres/pull/12 > > Found a little more unnecessary code to remove. Updated patches attached. And these should fix the CI failure. I also ran pgindent. Best, David
Attachment
Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
From
"David E. Wheeler"
Date:
On Jun 4, 2025, at 11:27, David E. Wheeler <david@justatheory.com> wrote: > And these should fix the CI failure. I also ran pgindent. Here’s a quick rebase. I think it’s ready for committer review, but since I’ve poked at it quite a bit myself, I updatedthe Commitfest item [1] to “Needs Review”. Best, David [1]: https://commitfest.postgresql.org/patch/5270/