Re: [HACKERS] PATCH: recursive json_populate_record() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] PATCH: recursive json_populate_record() |
Date | |
Msg-id | 934.1485111512@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] PATCH: recursive json_populate_record() (Nikita Glukhov <n.gluhov@postgrespro.ru>) |
Responses |
Re: [HACKERS] PATCH: recursive json_populate_record()
|
List | pgsql-hackers |
Nikita Glukhov <n.gluhov@postgrespro.ru> writes: > On 01/08/2017 09:52 PM, Tom Lane wrote: >> ... attndims is really syntactic sugar only and doesn't affect anything >> meaningful semantically. > I have fixed the first patch: when the number of dimensionsis unknown > we determine it simply by the number of successive opening brackets at > the start of a json array. But I'm still using for verification non-zero > ndims values that can be get from composite type attribute (attndims) or > from domain array type (typndims) through specially added function > get_type_ndims(). Apparently I didn't make my point forcefully enough. attndims is not semantically significant in Postgres; the fact that we even store it is just a historical artifact, I think. There might be an argument to start enforcing it, but we'd have to do that across the board, and I think most likely any such proposal would fail because of backwards compatibility concerns. (There would also be a lot of discussion as to whether, say, "int foo[3]" shouldn't enforce that the array length is 3, not just that it be one-dimensional.) In the meantime, we are *not* going to have attndims be semantically significant in just this one function. That would not satisfy the principle of least astonishment. Therefore, please remove this patch's dependence on attndims. I looked through the patch briefly and have some other comments: 1. It's pretty bizarre that you need dummy forward struct declarations for ColumnIOData and RecordIOData. The reason evidently is that somebody ignored project style and put a bunch of typedefs after the local function declarations in jsonfuncs.c. Project style of course is to put the typedefs first, precisely because they may be needed in the local function declarations. I will go move those declarations right now, as a single- purpose patch, so that you have something a bit cleaner to modify. 2. The business with isstring, saved_scalar_is_string, etc makes me itch. I'm having a hard time explaining exactly why, but it just seems like a single-purpose kluge. Maybe it would seem less so if you saved the original JsonTokenType instead. But also I'm wondering why the ultimate consumers are concerned with string-ness as such. It seems like mainly what they're trying to do is forbid converting a string into a non-scalar Postgres type, but (a) why, and (b) if you are going to restrict it, wouldn't it be more sensible to frame it as you can't convert any JSON scalar to a non-scalar type? As to (a), it seems like you're just introducing unnecessary compatibility breakage if you insist on that. As to (b), really it seems like an appropriate restriction of that sort would be like "if target type is composite, source must be a JSON object, and if target type is array, source must be a JSON array". Personally I think what you ought to do here is not change the semantics of cases that are allowed today, but only change the semantics in the cases of JSON object being assigned to PG composite and JSON array being assigned to PG array; both of those fail today, so there's nobody depending on the existing behavior. But if you reject string-to-composite cases then you will be breaking existing apps, and I doubt people will thank you for it. 3. I think you'd be better off declaring ColumnIOData.type_category as an actual enum type, not "char" with some specific values documented only by a comment. Also, could you fold the domain case in as a fourth type_category? Also, why does ColumnIOData have both an ndims field and another one buried in io.array.ndims? 4. populate_array_report_expected_array() violates our error message guidelines, first by using elog rather than ereport for a user-facing error, and second by assembling the message from pieces, which would make it untranslatable even if it were being reported through ereport. I'm not sure if this needs to be fixed though; will the function even still be needed once you remove the attndims dependency? Even if there are still callers, you wouldn't necessarily need such a function if you scale back your ambitions a bit as to the amount of detail required. I'm not sure you really need to report what you think the dimensions are when complaining that an array is nonrectangular. 5. The business with having some arguments that are only for json and others that are only for jsonb, eg in populate_scalar(), also makes me itch. I wonder whether this wouldn't all get a lot simpler and more readable if we gave up on trying to share code between the two cases. In populate_scalar(), for instance, the amount of code actually shared between the two cases amounts to a whole two lines, which are dwarfed by the amount of crud needed to deal with trying to serve both cases in the one function. There are places where there's more sharable code than that, but it seems like a better design might be to refactor the sharable code out into subroutines called by separate functions for json and jsonb. 6. I'm not too excited by your having invented JsonContainerSize, JsonContainerIsObject, etc, and then using them in just this new code. That is not really improving readability, it's just creating stylistic inconsistency between these functions and the rest of the jsonb code. If you want such macros I think it would be better to submit a separate cosmetic patch that tries to hide such bit-tests behind macros throughout the jsonb code. Another problem is that these macros are coding hazards because they look like they yield bool values but they don't really; assigning the results to bool variables, for example, would fail on most platforms. Safer coding for a general-purpose macro would be like #define JsonContainerIsObject(jc) (((jc)->header & JB_FOBJECT) != 0) 7. More generally, the patch is hard to review because it whacks the existing code around so thoroughly that it's tough even to match up old and new code to get a handle on what you changed functionally. Maybe it would be good if you could separate it into a refactoring patch that just moves existing code around without functional changes, and then a patch on top of that that adds code and makes only localized changes in what's already there. regards, tom lane
pgsql-hackers by date: