Thread: Remove hardcoded hash opclass function signature exceptions

Remove hardcoded hash opclass function signature exceptions

From
Peter Eisentraut
Date:
hashvalidate(), which validates the signatures of support functions for 
the hash AM, contains several hardcoded exceptions.  For example, 
hash/date_ops support function 1 is hashint4(), which would ordinarily 
fail validation because the function argument is int4, not date.  But 
this works internally because int4 and date are of the same size.  There 
are several more exceptions like this that happen to work and were 
allowed historically but would now fail the function signature validation.

AFAICT, these exceptions were just carried over from before the current 
index AM API and validation functions were added.  The code contains 
comments like "For the moment, fix it by having a list of allowed 
cases.", so it probably wasn't meant as the ideal state.

This patch removes those exceptions by providing new support functions 
that have the proper declared signatures.  They internally share most of 
the C code with the "wrong" functions they replace, so the behavior is 
still the same.

With the exceptions gone, hashvalidate() is now simplified and relies 
fully on check_amproc_signature(), similar to other index AMs.

I'm also fixing one case where a brin opclass used hashvarlena() for 
bytea, even though in that case, there is no function signature 
validation done, so it doesn't matter that much.

Not done here, but maybe hashvarlena() and hashvarlenaextended() should 
be removed from pg_proc.dat, since their use as opclass support 
functions is now dubious.  They could continue to exist in the C code as 
internal support functions.
Attachment

Re: Remove hardcoded hash opclass function signature exceptions

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> hashvalidate(), which validates the signatures of support functions for 
> the hash AM, contains several hardcoded exceptions.
> ...
> This patch removes those exceptions by providing new support functions 
> that have the proper declared signatures.  They internally share most of 
> the C code with the "wrong" functions they replace, so the behavior is 
> still the same.

+1 for cleaning this up.  A couple of minor nitpicks:

* I don't really like the new control structure, or rather lack of
structure, in hashvalidate.  In particular the uncommented
s/break/continue/ changes look like typos.  They aren't, but can't
you do this in a less confusing fashion?  Or at least add comments
like "continue not break because the code below the switch doesn't
apply to this case".

* Hand-picking OIDs as you did in pg_proc.dat is kind of passé now.
I guess it's all right as long as nobody else does the same thing in
the near future, but ...

> Not done here, but maybe hashvarlena() and hashvarlenaextended() should 
> be removed from pg_proc.dat, since their use as opclass support 
> functions is now dubious.

I wish we could get rid of those, but according to
codesearch.debian.net, postgis and a couple of other extensions are
relying on them.  If we remove them we'll break any convenient upgrade
path for those extensions.

            regards, tom lane



Re: Remove hardcoded hash opclass function signature exceptions

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 06.09.24 21:43, Tom Lane wrote:
>> * I don't really like the new control structure, or rather lack of
>> structure, in hashvalidate.  In particular the uncommented
>> s/break/continue/ changes look like typos.  They aren't, but can't
>> you do this in a less confusing fashion?  Or at least add comments
>> like "continue not break because the code below the switch doesn't
>> apply to this case".

> Ok, I cleaned that up a bit.

That looks nicer.  Thanks.

>> I wish we could get rid of those, but according to
>> codesearch.debian.net, postgis and a couple of other extensions are
>> relying on them.  If we remove them we'll break any convenient upgrade
>> path for those extensions.

> Those are using the C function, which is ok.  I was thinking about
> removing the SQL function (from pg_proc.dat), because you can't use that
> for much anymore.  (You can't call it directly, and the hash AM will no
> longer accept it.)  I have done that in this patch version and added
> some code comments around it.

No, it isn't okay.  What postgis (and the others) is doing is
equivalent to

regression=# create function myhash(bytea) returns int as 'hashvarlena' LANGUAGE 'internal' IMMUTABLE STRICT PARALLEL
SAFE;
CREATE FUNCTION

After applying the v2 patch, I get

regression=# create function myhash(bytea) returns int as 'hashvarlena' LANGUAGE 'internal' IMMUTABLE STRICT PARALLEL
SAFE;
ERROR:  there is no built-in function named "hashvarlena"

The reason is that the fmgr_builtins table is built from
pg_proc.dat, and only names appearing in it can be used as 'internal'
function definitions.  So you really can't remove the pg_proc entry.

The other thing that's made from pg_proc.dat is the list of extern
function declarations in fmgrprotos.h.  That's why you had to add
those cowboy declarations inside hashfunc.c, which are both ugly
and not helpful for any external module that might wish to call those
functions at the C level.

Other than the business about removing those pg_proc entries,
I think this is good to go.

            regards, tom lane