Thread: Remove hardcoded hash opclass function signature exceptions
hashvalidate(), which validates the signatures of support functions for the hash AM, contains several hardcoded exceptions. For example, hash/date_ops support function 1 is hashint4(), which would ordinarily fail validation because the function argument is int4, not date. But this works internally because int4 and date are of the same size. There are several more exceptions like this that happen to work and were allowed historically but would now fail the function signature validation. AFAICT, these exceptions were just carried over from before the current index AM API and validation functions were added. The code contains comments like "For the moment, fix it by having a list of allowed cases.", so it probably wasn't meant as the ideal state. This patch removes those exceptions by providing new support functions that have the proper declared signatures. They internally share most of the C code with the "wrong" functions they replace, so the behavior is still the same. With the exceptions gone, hashvalidate() is now simplified and relies fully on check_amproc_signature(), similar to other index AMs. I'm also fixing one case where a brin opclass used hashvarlena() for bytea, even though in that case, there is no function signature validation done, so it doesn't matter that much. Not done here, but maybe hashvarlena() and hashvarlenaextended() should be removed from pg_proc.dat, since their use as opclass support functions is now dubious. They could continue to exist in the C code as internal support functions.
Attachment
Peter Eisentraut <peter@eisentraut.org> writes: > hashvalidate(), which validates the signatures of support functions for > the hash AM, contains several hardcoded exceptions. > ... > This patch removes those exceptions by providing new support functions > that have the proper declared signatures. They internally share most of > the C code with the "wrong" functions they replace, so the behavior is > still the same. +1 for cleaning this up. A couple of minor nitpicks: * I don't really like the new control structure, or rather lack of structure, in hashvalidate. In particular the uncommented s/break/continue/ changes look like typos. They aren't, but can't you do this in a less confusing fashion? Or at least add comments like "continue not break because the code below the switch doesn't apply to this case". * Hand-picking OIDs as you did in pg_proc.dat is kind of passé now. I guess it's all right as long as nobody else does the same thing in the near future, but ... > Not done here, but maybe hashvarlena() and hashvarlenaextended() should > be removed from pg_proc.dat, since their use as opclass support > functions is now dubious. I wish we could get rid of those, but according to codesearch.debian.net, postgis and a couple of other extensions are relying on them. If we remove them we'll break any convenient upgrade path for those extensions. regards, tom lane
Peter Eisentraut <peter@eisentraut.org> writes: > On 06.09.24 21:43, Tom Lane wrote: >> * I don't really like the new control structure, or rather lack of >> structure, in hashvalidate. In particular the uncommented >> s/break/continue/ changes look like typos. They aren't, but can't >> you do this in a less confusing fashion? Or at least add comments >> like "continue not break because the code below the switch doesn't >> apply to this case". > Ok, I cleaned that up a bit. That looks nicer. Thanks. >> I wish we could get rid of those, but according to >> codesearch.debian.net, postgis and a couple of other extensions are >> relying on them. If we remove them we'll break any convenient upgrade >> path for those extensions. > Those are using the C function, which is ok. I was thinking about > removing the SQL function (from pg_proc.dat), because you can't use that > for much anymore. (You can't call it directly, and the hash AM will no > longer accept it.) I have done that in this patch version and added > some code comments around it. No, it isn't okay. What postgis (and the others) is doing is equivalent to regression=# create function myhash(bytea) returns int as 'hashvarlena' LANGUAGE 'internal' IMMUTABLE STRICT PARALLEL SAFE; CREATE FUNCTION After applying the v2 patch, I get regression=# create function myhash(bytea) returns int as 'hashvarlena' LANGUAGE 'internal' IMMUTABLE STRICT PARALLEL SAFE; ERROR: there is no built-in function named "hashvarlena" The reason is that the fmgr_builtins table is built from pg_proc.dat, and only names appearing in it can be used as 'internal' function definitions. So you really can't remove the pg_proc entry. The other thing that's made from pg_proc.dat is the list of extern function declarations in fmgrprotos.h. That's why you had to add those cowboy declarations inside hashfunc.c, which are both ugly and not helpful for any external module that might wish to call those functions at the C level. Other than the business about removing those pg_proc entries, I think this is good to go. regards, tom lane