Re: [HACKERS] PATCH: recursive json_populate_record() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] PATCH: recursive json_populate_record()
Date
Msg-id 934.1485111512@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] PATCH: recursive json_populate_record()  (Nikita Glukhov <n.gluhov@postgrespro.ru>)
Responses Re: [HACKERS] PATCH: recursive json_populate_record()
List pgsql-hackers
Nikita Glukhov <n.gluhov@postgrespro.ru> writes:
> On 01/08/2017 09:52 PM, Tom Lane wrote:
>> ... attndims is really syntactic sugar only and doesn't affect anything
>> meaningful semantically.

> I have fixed the first patch: when the number of dimensionsis unknown
> we determine it simply by the number of successive opening brackets at
> the start of a json array.  But I'm still using for verification non-zero
> ndims values that can be get from composite type attribute (attndims) or
> from domain array type (typndims) through specially added function 
> get_type_ndims().

Apparently I didn't make my point forcefully enough.  attndims is not
semantically significant in Postgres; the fact that we even store it
is just a historical artifact, I think.  There might be an argument
to start enforcing it, but we'd have to do that across the board, and
I think most likely any such proposal would fail because of backwards
compatibility concerns.  (There would also be a lot of discussion as
to whether, say, "int foo[3]" shouldn't enforce that the array length
is 3, not just that it be one-dimensional.)  In the meantime, we are
*not* going to have attndims be semantically significant in just this one
function.  That would not satisfy the principle of least astonishment.
Therefore, please remove this patch's dependence on attndims.

I looked through the patch briefly and have some other comments:

1. It's pretty bizarre that you need dummy forward struct declarations
for ColumnIOData and RecordIOData.  The reason evidently is that somebody
ignored project style and put a bunch of typedefs after the local function
declarations in jsonfuncs.c.  Project style of course is to put the
typedefs first, precisely because they may be needed in the local function
declarations.  I will go move those declarations right now, as a single-
purpose patch, so that you have something a bit cleaner to modify.

2. The business with isstring, saved_scalar_is_string, etc makes me itch.
I'm having a hard time explaining exactly why, but it just seems like a
single-purpose kluge.  Maybe it would seem less so if you saved the
original JsonTokenType instead.  But also I'm wondering why the ultimate
consumers are concerned with string-ness as such.  It seems like mainly
what they're trying to do is forbid converting a string into a non-scalar
Postgres type, but (a) why, and (b) if you are going to restrict it,
wouldn't it be more sensible to frame it as you can't convert any JSON
scalar to a non-scalar type?  As to (a), it seems like you're just
introducing unnecessary compatibility breakage if you insist on that.
As to (b), really it seems like an appropriate restriction of that sort
would be like "if target type is composite, source must be a JSON object,
and if target type is array, source must be a JSON array".  Personally
I think what you ought to do here is not change the semantics of cases
that are allowed today, but only change the semantics in the cases of
JSON object being assigned to PG composite and JSON array being assigned
to PG array; both of those fail today, so there's nobody depending on the
existing behavior.  But if you reject string-to-composite cases then you
will be breaking existing apps, and I doubt people will thank you for it.

3. I think you'd be better off declaring ColumnIOData.type_category as an
actual enum type, not "char" with some specific values documented only
by a comment.  Also, could you fold the domain case in as a fourth
type_category?  Also, why does ColumnIOData have both an ndims field and
another one buried in io.array.ndims?

4. populate_array_report_expected_array() violates our error message
guidelines, first by using elog rather than ereport for a user-facing
error, and second by assembling the message from pieces, which would
make it untranslatable even if it were being reported through ereport.
I'm not sure if this needs to be fixed though; will the function even
still be needed once you remove the attndims dependency?  Even if there
are still callers, you wouldn't necessarily need such a function if
you scale back your ambitions a bit as to the amount of detail required.
I'm not sure you really need to report what you think the dimensions
are when complaining that an array is nonrectangular.

5. The business with having some arguments that are only for json and
others that are only for jsonb, eg in populate_scalar(), also makes me
itch.  I wonder whether this wouldn't all get a lot simpler and more
readable if we gave up on trying to share code between the two cases.
In populate_scalar(), for instance, the amount of code actually shared
between the two cases amounts to a whole two lines, which are dwarfed
by the amount of crud needed to deal with trying to serve both cases
in the one function.  There are places where there's more sharable
code than that, but it seems like a better design might be to refactor
the sharable code out into subroutines called by separate functions for
json and jsonb.

6. I'm not too excited by your having invented JsonContainerSize,
JsonContainerIsObject, etc, and then using them in just this new
code.  That is not really improving readability, it's just creating
stylistic inconsistency between these functions and the rest of the
jsonb code.  If you want such macros I think it would be better to
submit a separate cosmetic patch that tries to hide such bit-tests
behind macros throughout the jsonb code.  Another problem is that
these macros are coding hazards because they look like they yield bool
values but they don't really; assigning the results to bool variables,
for example, would fail on most platforms.  Safer coding for a
general-purpose macro would be like

#define JsonContainerIsObject(jc)    (((jc)->header & JB_FOBJECT) != 0)

7. More generally, the patch is hard to review because it whacks the
existing code around so thoroughly that it's tough even to match up
old and new code to get a handle on what you changed functionally.
Maybe it would be good if you could separate it into a refactoring
patch that just moves existing code around without functional changes,
and then a patch on top of that that adds code and makes only localized
changes in what's already there.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: [HACKERS] Logical replication failing when foreign key present
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] Too many autovacuum workers spawned during forcedauto-vacuum