Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent
Date
Msg-id aSXQOAZuA4rX9ooU@nathan
Whole thread Raw
In response to Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent
List pgsql-hackers
On Tue, Nov 25, 2025 at 10:37:50AM -0500, Robert Haas wrote:
> 1. First or create create the dshash entry. If that fails, we haven't
> done anything yet, so no problem.
> 
> 2. Next, if there dshash entry doesn't have an assigned tranche ID
> yet, then try to assign one. This probably won't fail. But if it does,
> it will produce a meaningful error, and every future attempt will
> likely produce the same, still-meaningful error. I think this is a
> significant improvement over having every attempt but the first return
> "requested DSA \"%s\" failed initialization," which hides the real
> cause of failure.
> 
> 3. Next, if the segment doesn't yet have a DSA, then try to create
> one.This could fail due to lack of memory; if so, future attempts
> might succeed, or might continue to fail with out-of-memory errors.
> Assuming that creating the DSA is successful, pin the DSA and store
> the handle into the dshash entry. On the other hand, if we don't
> create a DSA here because one already exists, try to attach to it.
> That could also fail, but if it does, we can still retry later.
> 
> 4. Pin the mapping for the DSA that we created or attached to in the
> previous step.
> 
> 5. dshash_release_lock.

I think the only difference between this and 0002 is in step 2.  In 0002,
if allocating a tranche fails, we remove the entry from the registry so
that another backend can try again later.  Otherwise, we retain the entry
with just the tranche ID set, even if later setup fails.  I believe that
achieves basically the same thing, and is perhaps a tad simpler to code.

> In other words, I'd try to create a system that always attempts to
> make forward progress, but when that's not possible, fails without
> permanently leaking any resources or spoiling anything for the next
> backend that wants to make an attempt. One worry that we've mentioned
> previously is that this might lead to every backend retrying. But as
> the code is currently written, that basically happens anyway except
> all of them but one produce a generic, uninformative error. Now, if we
> do what I'm proposing here, we'll repeatedly try to create DSM, DSA,
> and DSH objects in a way that actually allocates memory, so the
> retries get a little bit more expensive. At the moment, I'm inclined
> to believe this doesn't really matter. If it's true that failures are
> very rare, and I suspect it is, then it doesn't really matter if the
> cost of retries goes up a bit. If hypothetically they are common
> enough to matter, then we definitely need a mechanism that can't get
> permanently stuck in a half-initialized state. If we were to find that
> retrying too many times too quickly creates other problems, my
> suggested response would be to add a timestamp to the dshash entry and
> limit retries to once per minute or something. However, in the absence
> of evidence that we need such a mechanism, I'd be inclined to guess
> that we don't.
> 
> Thoughts?

Yeah, IMHO this is fine.  I expect most of these ERRORs to be reached
during extension development or out-of-memory scenarios, and I don't see
how slightly more expensive retry logic would hurt anything there.

-- 
nathan



pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: The pgperltidy diffs in HEAD
Next
From: Peter Eisentraut
Date:
Subject: Re: gen_guc_tables.pl: Validate required GUC fields before code generation