Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
Date
Msg-id CA+TgmoapJ6erjT21uPO12wTtoOmj6w-dp6T3qySN+NSc1cdEKw@mail.gmail.com
Whole thread Raw
In response to Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, Apr 30, 2025 at 6:56 PM Michael Paquier <michael@paquier.xyz> wrote:
> I'm actually rather scared of the patch, isn't there a risk of
> breaking existing patterns that worked out of the box by forcing the
> resowner to not be set?  My spidey sense tingles when I see such
> patterns, because this is enforcing assumptions directly hidden to the
> callers.

I'm not worried about this particular problem. We use resowner == NULL
in various places to mean "for the lifetime of the backend," which
makes a lot of sense if you think about it: the job of the resowner is
to release resources when the resowner is cleaned up; if you don't
ever want to release the resources, then you don't need a resowner.

What I'm concerned about is that I think that (as I said on the other
thread) is that ProcessGetMemoryContextInterrupt is not really at all
safe to execute at an arbitrary CHECK_FOR_INTERRUPTS(). In other
places where we use resource owners, we don't run into the problem
that happened here because we are careful about the timing of our
resource owner operations. You can see that caution in, for example,
AbortTransaction() and AbortSubTransaction(), where we make sure to
close down higher-level subsystems first, so that when we get to
lower-level cleanup, we know that no more resources are going to be
allocated afterward. But here, we are executing complex operations
that require a "clean" state at pretty much any point in the code,
where we don't actually know that things are sane enough for this code
to execute successfully. This particular error is just a symptom of
that more general problem.

In my mind, the possible fixes here are (1) revert that patch, (2)
redesign things so that the problematic code can only get called when
we know that the backend state is sane, or (3) redesign the code so
that it has fewer requirements for correct operation. Regarding (3),
this particular problem could maybe be solved by not relying on the
resowner machinery at all: we always want the DSA to be pinned, so if
we could create it pre-pinned rather than pinning it as an
after-the-fact step, we wouldn't need the resowner at all. Except that
I don't really think that would be entirely safe, because that seems
to open up the possibility that we'd just leak the DSA area entirely:
say you enter dsa_create with no resowner and then fail someplace
inside that function after you've allocated resources. There's nothing
to clean up those resources, but since you never returned the DSA
handle, it's not stored anywhere, so those resources have just been
leaked. I think this is actually why the interface works the way it
does. Maybe there's some way to fix it by creating a temporary
resource owner that is used by just this code ... but in general I'm
skeptical that we can really set up something that is OK to do in an
aborted transaction, because our ability to handle any further errors
at that point is extremely limited, and this code is definitely
complex enough that it could error out.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ilia Evdokimov
Date:
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Next
From: Robert Haas
Date:
Subject: Re: pgsql: Add function to log the memory contexts of specified backend pro