Adrian Klaver <adrian.klaver@aklaver.com> writes:
> The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those
> and then try again.
Highly unlikely to make any difference. What's evidently going on
here is that the test script attempts to do DROP SCHEMA concurrently
with another session that's creating an object inside that schema.
(Here, that's a function, but the particular type of object doesn't
really matter.) There are three possible outcomes of that:
1. The object creation commits soon enough that DROP SCHEMA sees it,
and drops the object along with the schema.
2. The object creation begins after DROP SCHEMA commits, and fails
because the schema is not to be found.
3. The object creation goes through, leaving a now-dangling schema
OID reference in the object's catalog entry. The object is useless
because it's unnamable, but it won't really cause any trouble except
for applications that scan the system catalogs (like pg_dump).
Exactly none of these outcomes result in a usable object, so
one wonders why your application is doing this sort of thing
often enough to hit the race condition.
We could prevent case 3 by locking the schema during object creation,
converting it to one of the other cases. We actually do that for
tables, but not for any other object types, reasoning that the greatly
increased cost of locking would outweigh the problems that dangling
objects create. (Note that to eliminate the issue fully, we'd have to
lock every referenced object not only schemas; for example, also the
data types of the function's arguments and result.) Also, adding such
locking might well lead to deadlocks in concurrent add/drop scenarios,
not just performance costs.
tl;dr: it's been like this a long time, and I don't really foresee
us accepting the costs of making it not act like that. I seem to
recall someone submitting a patch recently that would add such
locking, but I doubt it'll get accepted.
regards, tom lane