On Tue, Nov 25, 2025 at 10:50 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> I think the only difference between this and 0002 is in step 2. In 0002,
> if allocating a tranche fails, we remove the entry from the registry so
> that another backend can try again later. Otherwise, we retain the entry
> with just the tranche ID set, even if later setup fails. I believe that
> achieves basically the same thing, and is perhaps a tad simpler to code.
> how slightly more expensive retry logic would hurt anything there.
The downside is that then we have to rely on PG_CATCH() to make things
whole. I am not sure that there's any problem with that, but I'm also
not sure that there isn't. The timing of PG_CATCH() block execution
often creates bugs, because it runs before (sub)transaction abort.
That means that there's a real risk that you try to acquire an LWLock
you already hold, for example. It's a lot easier to be confident that
cleanup actions will reliably succeed when they run inside the
transaction abort path that knows the order in which various resources
should be released. Generally, I think it's PG_CATCH() is fine if
you're releasing a resource that is decoupled from everything else,
like using a third-party library's free function to free memory
allocated by that library. But if you're releasing PostgreSQL
resources that are layered on top of other PostgreSQL resources, like
a DSA that depends on DSM and LWLock, I think it's a lot more
difficult to be certain that you aren't going to end up trying to
release the same stuff multiple times or in the wrong order. I'm not
saying you can't make it work, but I've banged my head on this
particular doorframe enough times that my reflex is to duck.
--
Robert Haas
EDB: http://www.enterprisedb.com