Thread: Refactor construct_array() and deconstruct_array() for built-in types

Refactor construct_array() and deconstruct_array() for built-in types

From
Peter Eisentraut
Date:
[for PG16]

There are many calls to construct_array() and deconstruct_array() for 
built-in types, for example, when dealing with system catalog columns. 
These all hardcode the type attributes necessary to pass to these functions.

To simplify this a bit, add construct_array_builtin(), 
deconstruct_array_builtin() as wrappers that centralize this hardcoded 
knowledge.  This simplifies many call sites and reduces the amount of 
hardcoded stuff that is spread around.

I also considered having genbki.pl generate lookup tables for these 
hardcoded values, similar to schemapg.h, but that ultimately seemed 
excessive.

Thoughts?

Attachment
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> There are many calls to construct_array() and deconstruct_array() for 
> built-in types, for example, when dealing with system catalog columns. 
> These all hardcode the type attributes necessary to pass to these functions.

> To simplify this a bit, add construct_array_builtin(), 
> deconstruct_array_builtin() as wrappers that centralize this hardcoded 
> knowledge.  This simplifies many call sites and reduces the amount of 
> hardcoded stuff that is spread around.

> I also considered having genbki.pl generate lookup tables for these 
> hardcoded values, similar to schemapg.h, but that ultimately seemed 
> excessive.

+1 --- the added overhead of the switch statements is probably a
reasonable price to pay for the notational simplification and
bug-proofing.

One minor coding gripe is that compilers that don't know that elog(ERROR)
doesn't return will certainly generate "use of possibly-uninitialized
variable" complaints.  Suggest inserting "return NULL;" or similar into
the default: cases.  I'd also use more specific error wording to help
people find where they need to add code when they make use of a new type;
maybe like "type %u not supported by construct_array_builtin".

            regards, tom lane



Re: Refactor construct_array() and deconstruct_array() for built-in types

From
Peter Eisentraut
Date:
On 02.05.22 16:48, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> There are many calls to construct_array() and deconstruct_array() for
>> built-in types, for example, when dealing with system catalog columns.
>> These all hardcode the type attributes necessary to pass to these functions.
> 
>> To simplify this a bit, add construct_array_builtin(),
>> deconstruct_array_builtin() as wrappers that centralize this hardcoded
>> knowledge.  This simplifies many call sites and reduces the amount of
>> hardcoded stuff that is spread around.
> 
>> I also considered having genbki.pl generate lookup tables for these
>> hardcoded values, similar to schemapg.h, but that ultimately seemed
>> excessive.
> 
> +1 --- the added overhead of the switch statements is probably a
> reasonable price to pay for the notational simplification and
> bug-proofing.
> 
> One minor coding gripe is that compilers that don't know that elog(ERROR)
> doesn't return will certainly generate "use of possibly-uninitialized
> variable" complaints.  Suggest inserting "return NULL;" or similar into
> the default: cases.  I'd also use more specific error wording to help
> people find where they need to add code when they make use of a new type;
> maybe like "type %u not supported by construct_array_builtin".

I have pushed this with the improvements you had suggested.



Re: Refactor construct_array() and deconstruct_array() for built-in types

From
Dagfinn Ilmari Mannsåker
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

> On 02.05.22 16:48, Tom Lane wrote:
>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>>> There are many calls to construct_array() and deconstruct_array() for
>>> built-in types, for example, when dealing with system catalog columns.
>>> These all hardcode the type attributes necessary to pass to these functions.
>> 
>>> To simplify this a bit, add construct_array_builtin(),
>>> deconstruct_array_builtin() as wrappers that centralize this hardcoded
>>> knowledge.  This simplifies many call sites and reduces the amount of
>>> hardcoded stuff that is spread around.
>> 
>>> I also considered having genbki.pl generate lookup tables for these
>>> hardcoded values, similar to schemapg.h, but that ultimately seemed
>>> excessive.
>> +1 --- the added overhead of the switch statements is probably a
>> reasonable price to pay for the notational simplification and
>> bug-proofing.
>> One minor coding gripe is that compilers that don't know that
>> elog(ERROR)
>> doesn't return will certainly generate "use of possibly-uninitialized
>> variable" complaints.  Suggest inserting "return NULL;" or similar into
>> the default: cases.  I'd also use more specific error wording to help
>> people find where they need to add code when they make use of a new type;
>> maybe like "type %u not supported by construct_array_builtin".
>
> I have pushed this with the improvements you had suggested.

I dind't pay much attention to this thread earlier, but I was struck by
the duplication of the switch statement determining the elemlen,
elembyval, and elemalign values between the construct and deconstruct
functions.  How about a common function they can both call?  Something
like:

static void builtin_type_details(Oid elemtype,
                                 int *elemlen,
                                 bool *elembyval,
                                 char *elemalign);

- ilmari



Re: Refactor construct_array() and deconstruct_array() for built-in types

From
Dagfinn Ilmari Mannsåker
Date:
Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:

> I dind't pay much attention to this thread earlier, but I was struck by
> the duplication of the switch statement determining the elemlen,
> elembyval, and elemalign values between the construct and deconstruct
> functions.  How about a common function they can both call?  Something
> like:
>
> static void builtin_type_details(Oid elemtype,
>                                  int *elemlen,
>                                  bool *elembyval,
>                                  char *elemalign);

I just realised that this would require the error message to not include
the function name (which isn't really that critical, since it's a
developer-facing message), but an option would to make it return false
for unknown types, so each of the calling functions can emit their own
error message.

> - ilmari



=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
> Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:
>> I dind't pay much attention to this thread earlier, but I was struck by
>> the duplication of the switch statement determining the elemlen,
>> elembyval, and elemalign values between the construct and deconstruct
>> functions.  How about a common function they can both call?

I was wondering about that too while reading the committed version.
However, adding an additional function call would weaken the argument
that this adds just a tolerable amount of overhead, primarily because
you'd need to return a record or introduce pointers or the like.

> I just realised that this would require the error message to not include
> the function name (which isn't really that critical, since it's a
> developer-facing message), but an option would to make it return false
> for unknown types, so each of the calling functions can emit their own
> error message.

Nah, because the point of that was just to direct people to where
to fix it when they need to.  So the message need only refer to
the common function, if we were to change it.

Perhaps a good compromise could be to turn the duplicated code into
a macro that's instantiated in both places?  But I don't actually
see anything much wrong with the code as Peter has it.

            regards, tom lane



Re: Refactor construct_array() and deconstruct_array() for built-in types

From
Peter Eisentraut
Date:
On 01.07.22 15:37, Tom Lane wrote:
> Perhaps a good compromise could be to turn the duplicated code into
> a macro that's instantiated in both places?  But I don't actually
> see anything much wrong with the code as Peter has it.

There are opportunities to refine this further.  For example, there is 
similar code in TupleDescInitBuiltinEntry(), and bootstrap.c also 
contains hardcoded info on built-in types, and GetCCHashEqFuncs() is 
also loosely related.  As I mentioned earlier in the thread, one could 
have genbki.pl generate support code for this.