Re: jsonpath - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: jsonpath
Date
Msg-id 43e52ed7-9c21-db2c-693a-9245f9e8ce2a@2ndquadrant.com
Whole thread Raw
In response to Re: jsonpath  (Nikita Glukhov <n.gluhov@postgrespro.ru>)
Responses Re: jsonpath
Re: jsonpath
List pgsql-hackers
Hi,

I've done another round of reviews on v20, assuming the patch is almost
ready to commit, but unfortunately I ran into a bunch of issues that
need to be resolved. None of this is a huge issue, but it's also above
the threshold of what could be tweaked by a committer IMHO.

(Which brings the question who plans to commit this. The patch does not
have a committer in the CF app, but I see both Teodor and Alexander are
listed as it's authors, so I'd expect it to be one of those. Or I might
do that, of course.)


0001
----

1) to_timestamp() does this:

  do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false,
                  &tm, &fsec, &fprec, NULL);

Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the
function did before (well, it called text_to_cstring, but that does
VARDATA_ANY). The same thing applies to to_date(), BTW.

I also find it a bit inconvenient that we expand the fmt like this in
all do_to_timestamp() calls, although it's just to_datetime() that needs
to do it this way. I realize we can't change to_datetime() because it's
external API, but maybe we should make it construct a varlena and pass
it to do_to_timestamp().


2) We define both DCH_FF# and DCH_ff#, but we never ever use the
lower-case version. Heck, it's not mentioned even in DCH_keywords, which
does this:

  ...
  {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */
  ...
  {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */
  ...

Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and
"day". Also, the comment for "ff1" is wrong, it should be "f" I guess.

And of course, there are regression tests for FF# variants, but not the
lower-case ones.


3) Of course, these new formatting patterns should be added to SGML
docs, for example to "Template Patterns for Date/Time Formatting" in
func.sgml (and maybe to other places, I haven't looked for them very
thoroughly).


4) The last block in DCH_from_char() does this

    while (*s == ' ')
        s++;

but I suppose it should use isspace() just like the code immediately
before it.


5) I might be missing something, but why is there the "D" at the end of
the return flags from DCH_from_char?

    /* Return flags for DCH_from_char() */
    #define DCH_DATED    0x01
    #define DCH_TIMED    0x02
    #define DCH_ZONED    0x04


0002
----

1) There are some unnecessary changes to to_datetime signature (date_txt
renamed to vs. datetxt), which is mostly minor but unnecessary churn.


2) There are some extra changes to to_datetime (extra parameters, etc.).
I wonder why those are not included in 0001, as part of the supporting
datetime infrastructure.


3) I'm not sure whether the _safe functions are a good or bad idea, but
at the very least the comments should be updated to explain what it does
(as the API has changed, obviously).


4) the json.c changes are under-documented, e.g. there are no comments
for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and
whether it needs to be specified all the time, half of the functions at
the end don't have comments (some of them are really simple, but then
other simple functions do have comments).

I don't know what the right balance here is (I'm certainly not asking
for bogus comments just to have comments) and I agree that the existing
code is not exactly abundant in comments. But perhaps having at least
some would be nice.

The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There
are pretty much no comments whatsoever (at least at the function level,
explaining what the functions do). It would be good to have a file-level
comment too, explaining how jsonpath works, probably.


5) I see uniqueifyJsonbObject now does this:

    if (!object->val.object.uniquified)
        return;

That seems somewhat strange, considering the function says it'll
uniqueify objects, but then exits when noticing the passed object is not
uniquified?


6) Why do we make make_result inline? (numeric.c) If this needs to be
marked with "inline" then perhaps all the _internal functions should be
marked like that too? I have my doubts about the need for this.


7) The other places add _safe to functions that don't throw errors
directly and instead update edata. Why are set_var_from_str, div_var,
mod_var excluded from this convention?


8) I wonder if the changes in numeric can have negative impact on
performance. Has anyone done any performance tests of this part?


0003
----

1) json_gin.c should probably add comments briefly explaining what
JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and
JsonPathExtractionContext are for.


2) I find it a bit suspicious that there are no asserts in json_gin.c
(well, there are 3 in the existing code, but nothing in the new code,
and the patch essentially doubles the amount of code here).


No comments for 0004 at this point.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" testpending solution of its timing is (fwd)
Next
From: Jeremy Harris
Date:
Subject: RHEL 8.0 build