Re: jsonpath - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: jsonpath |
Date | |
Msg-id | 43e52ed7-9c21-db2c-693a-9245f9e8ce2a@2ndquadrant.com Whole thread Raw |
In response to | Re: jsonpath (Nikita Glukhov <n.gluhov@postgrespro.ru>) |
Responses |
Re: jsonpath
Re: jsonpath |
List | pgsql-hackers |
Hi, I've done another round of reviews on v20, assuming the patch is almost ready to commit, but unfortunately I ran into a bunch of issues that need to be resolved. None of this is a huge issue, but it's also above the threshold of what could be tweaked by a committer IMHO. (Which brings the question who plans to commit this. The patch does not have a committer in the CF app, but I see both Teodor and Alexander are listed as it's authors, so I'd expect it to be one of those. Or I might do that, of course.) 0001 ---- 1) to_timestamp() does this: do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false, &tm, &fsec, &fprec, NULL); Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the function did before (well, it called text_to_cstring, but that does VARDATA_ANY). The same thing applies to to_date(), BTW. I also find it a bit inconvenient that we expand the fmt like this in all do_to_timestamp() calls, although it's just to_datetime() that needs to do it this way. I realize we can't change to_datetime() because it's external API, but maybe we should make it construct a varlena and pass it to do_to_timestamp(). 2) We define both DCH_FF# and DCH_ff#, but we never ever use the lower-case version. Heck, it's not mentioned even in DCH_keywords, which does this: ... {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and "day". Also, the comment for "ff1" is wrong, it should be "f" I guess. And of course, there are regression tests for FF# variants, but not the lower-case ones. 3) Of course, these new formatting patterns should be added to SGML docs, for example to "Template Patterns for Date/Time Formatting" in func.sgml (and maybe to other places, I haven't looked for them very thoroughly). 4) The last block in DCH_from_char() does this while (*s == ' ') s++; but I suppose it should use isspace() just like the code immediately before it. 5) I might be missing something, but why is there the "D" at the end of the return flags from DCH_from_char? /* Return flags for DCH_from_char() */ #define DCH_DATED 0x01 #define DCH_TIMED 0x02 #define DCH_ZONED 0x04 0002 ---- 1) There are some unnecessary changes to to_datetime signature (date_txt renamed to vs. datetxt), which is mostly minor but unnecessary churn. 2) There are some extra changes to to_datetime (extra parameters, etc.). I wonder why those are not included in 0001, as part of the supporting datetime infrastructure. 3) I'm not sure whether the _safe functions are a good or bad idea, but at the very least the comments should be updated to explain what it does (as the API has changed, obviously). 4) the json.c changes are under-documented, e.g. there are no comments for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and whether it needs to be specified all the time, half of the functions at the end don't have comments (some of them are really simple, but then other simple functions do have comments). I don't know what the right balance here is (I'm certainly not asking for bogus comments just to have comments) and I agree that the existing code is not exactly abundant in comments. But perhaps having at least some would be nice. The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There are pretty much no comments whatsoever (at least at the function level, explaining what the functions do). It would be good to have a file-level comment too, explaining how jsonpath works, probably. 5) I see uniqueifyJsonbObject now does this: if (!object->val.object.uniquified) return; That seems somewhat strange, considering the function says it'll uniqueify objects, but then exits when noticing the passed object is not uniquified? 6) Why do we make make_result inline? (numeric.c) If this needs to be marked with "inline" then perhaps all the _internal functions should be marked like that too? I have my doubts about the need for this. 7) The other places add _safe to functions that don't throw errors directly and instead update edata. Why are set_var_from_str, div_var, mod_var excluded from this convention? 8) I wonder if the changes in numeric can have negative impact on performance. Has anyone done any performance tests of this part? 0003 ---- 1) json_gin.c should probably add comments briefly explaining what JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and JsonPathExtractionContext are for. 2) I find it a bit suspicious that there are no asserts in json_gin.c (well, there are 3 in the existing code, but nothing in the new code, and the patch essentially doubles the amount of code here). No comments for 0004 at this point. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: