Re: jsonpath - Mailing list pgsql-hackers
From | Nikita Glukhov |
---|---|
Subject | Re: jsonpath |
Date | |
Msg-id | 38018365-5793-2827-e3c1-5a3b259fb6ae@postgrespro.ru Whole thread Raw |
In response to | Re: jsonpath (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: jsonpath
|
List | pgsql-hackers |
Attached 21-st version of the patches rebased onto the current master. Added new patch 0004 with documentation written by Liudmila Mantrova, and optional patch 0003 that was separated from 0002.
On 24.11.2018 23:03, Tomas Vondra wrote:
Thank your for your review.Hi, I've done another round of reviews on v20, assuming the patch is almost ready to commit, but unfortunately I ran into a bunch of issues that need to be resolved. None of this is a huge issue, but it's also above the threshold of what could be tweaked by a committer IMHO.
(Which brings the question who plans to commit this. The patch does not have a committer in the CF app, but I see both Teodor and Alexander are listed as it's authors, so I'd expect it to be one of those. Or I might do that, of course.)
0001 ---- 1) to_timestamp() does this: do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false, &tm, &fsec, &fprec, NULL); Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the function did before (well, it called text_to_cstring, but that does VARDATA_ANY). The same thing applies to to_date(), BTW. I also find it a bit inconvenient that we expand the fmt like this in all do_to_timestamp() calls, although it's just to_datetime() that needs to do it this way. I realize we can't change to_datetime() because it's external API, but maybe we should make it construct a varlena and pass it to do_to_timestamp().
Of course, there must be VARDATA_ANY() instead of of VARDATA(). But I decided to construct varlena and pass it to do_to_timestamp() and to to_datetime().
2) We define both DCH_FF# and DCH_ff#, but we never ever use the lower-case version. Heck, it's not mentioned even in DCH_keywords, which does this: ... {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and "day".
Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. "Day", "day" are not mapped to DCH_DAY because they determine letter case in the output, but "ff1" and "FF#" output contains only digits.
Fixed.Also, the comment for "ff1" is wrong, it should be "f" I guess.
And of course, there are regression tests for FF# variants, but not the lower-case ones.
Tests were added
3) Of course, these new formatting patterns should be added to SGML docs, for example to "Template Patterns for Date/Time Formatting" in func.sgml (and maybe to other places, I haven't looked for them very thoroughly).
Fixed.
4) The last block in DCH_from_char() does this while (*s == ' ') s++; but I suppose it should use isspace() just like the code immediately before it.
Fixed.
5) I might be missing something, but why is there the "D" at the end of the return flags from DCH_from_char? /* Return flags for DCH_from_char() */ #define DCH_DATED 0x01 #define DCH_TIMED 0x02 #define DCH_ZONED 0x04
These terms "dated", "timed", and "zoned" are taken from the standard: 9.39.10) If <JSON datetime template> JDT is specified, ... : a) If JDT contains <datetime template year>, ... then JDT is dated. b) If JDT contains <datetime template 12-hour>, ... then JDT is timed. c) If JDT contains <datetime template time zone hour>, ... then JDT is zoned.
Extra changes to to_datetime() were moved into patch 0001.0002 ---- 1) There are some unnecessary changes to to_datetime signature (date_txt renamed to vs. datetxt), which is mostly minor but unnecessary churn. 2) There are some extra changes to to_datetime (extra parameters, etc.). I wonder why those are not included in 0001, as part of the supporting datetime infrastructure.
3) I'm not sure whether the _safe functions are a good or bad idea, but at the very least the comments should be updated to explain what it does (as the API has changed, obviously).
This functions were introduced for elimination of PG_TRY/PG_CATCH in jsonpath code. But this error handling approach has a lot of issues (see below), so I decided separate again them into optional patch 0004.
Some new comments were added to json.c.4) the json.c changes are under-documented, e.g. there are no comments for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and whether it needs to be specified all the time, half of the functions at the end don't have comments (some of them are really simple, but then other simple functions do have comments). I don't know what the right balance here is (I'm certainly not asking for bogus comments just to have comments) and I agree that the existing code is not exactly abundant in comments. But perhaps having at least some would be nice.
The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There are pretty much no comments whatsoever (at least at the function level, explaining what the functions do). It would be good to have a file-level comment too, explaining how jsonpath works, probably.
I hope to add detailed comments for jsonpath in the next version of the patches.
5) I see uniqueifyJsonbObject now does this: if (!object->val.object.uniquified) return; That seems somewhat strange, considering the function says it'll uniqueify objects, but then exits when noticing the passed object is not uniquified?
'uniquified' field was rename to 'uniquify'.
6) Why do we make make_result inline? (numeric.c) If this needs to be marked with "inline" then perhaps all the _internal functions should be marked like that too? I have my doubts about the need for this. 7) The other places add _safe to functions that don't throw errors directly and instead update edata. Why are set_var_from_str, div_var, mod_var excluded from this convention?
I added "_safe" postfix only to functions having the both variants: safe (error info is placed into *edata) and unsafe (errors are always thrown). One-line make_result() is called from many unsafe places, so I decided to leave it as it was. It can also be defined as a simple macro. New "_internal" functions are called from jsonpath_exec.c, so they can't be "static inline", but "extern inline" seems to be non-portable.
8) I wonder if the changes in numeric can have negative impact on performance. Has anyone done any performance tests of this part?
I've tried to measure this impact by evaluation of expression with dozens of '+' or even by adding a loop into numeric_add(), but failed to get any measurable difference.
A lot of comments were added. Also major refactoring of jsonb_gin.c was done.0003 ---- 1) jsonb_gin.c should probably add comments briefly explaining what JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and JsonPathExtractionContext are for.
I have managed to find only 4 similar places suitable for asserts.2) I find it a bit suspicious that there are no asserts in json_gin.c (well, there are 3 in the existing code, but nothing in the new code, and the patch essentially doubles the amount of code here).
No comments for 0004 at this point. regards
Attachment
pgsql-hackers by date: