Re: Better shared data structure management and resizable shared data structures - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Better shared data structure management and resizable shared data structures
Date
Msg-id cc3ff976-c303-45cd-9a8d-a90483170e37@iki.fi
Whole thread Raw
In response to Re: Better shared data structure management and resizable shared data structures  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Better shared data structure management and resizable shared data structures
List pgsql-hackers
On 19/03/2026 15:36, Robert Haas wrote:
> Without taking a strong position on this particular design idea, I
> kind of wonder if we should be going the opposite direction: instead
> of bundling more and more things into descriptors, try to let people
> write declarative code that does whatever it is that they want to do.
> I think descriptors are pretty limiting as a concept, because they can
> only do the exact things that they're designed to do. For instance, I
> find the change to use LWLockPadded rather than LWLock * in
> pgssSharedState to be a clear improvement, because I'd rather have
> fewer objects and less pointer-chasing. Now, that LWLockPadded is
> going to need to be initialized. I would rather do that by writing
> LWLockInitialize(&pgss->lock.lock, tranche_id) as you did than by
> adding something to a descriptor, since doing the latter is almost
> certainly going to be a less intuitive syntax for the same thing and
> I'm going to have to spend time verifying that whatever locution I'm
> compelled to write actually does what I want. And if somebody adds
> "really light weight locks" to the system then we'll have to add
> RLWLockInitialize() to the things that the descriptor system knows how
> to do, and if for some reason I want to do
> LWLockInitialize(&mythingy->lock.lock, some_random_condition ?
> this_tranche_id : that_tranche_id), the descriptor system will
> probably become an annoying straightjacket. Or if that example isn't
> compelling enough, imagine that I have an array of structs each of
> which contains an LWLockPadded and I need to go loop over the array
> and do all the initializations. Or maybe space is at a premium and I
> want to use LWLock rather than LWLockPadded. Or maybe something else.
> Code is just more flexible than having to go through descriptors,
> which is why a lot of modern languages go to a great deal of trouble
> to make closures a first-class concept.

Sure, descriptors are restricting. I'm not proposing to move everything 
into descriptors.

> Let me take a step back and say what I think the problems in this area
> are, to see whether we agree on the basics. I suspect the reason
> you've undertaken this project is the fact that, currently, requesting
> shared memory and allocating it are totally decoupled. The number of
> bytes you request and the number of bytes you actually allocated could
> be totally different, and then some completely unrelated subsystem can
> break because it's allocating last, and even though it requested the
> bytes it wanted to allocate, some other subsystem under-requested and
> now the bytes this subsystem wants are not actually available.
> Tracking this kind of thing down can be a giant pain in the rear end,
> bordering on impossible IME. Also, if we want to be able to resize
> stuff in shared memory in some happy future, the need for precise
> tracking of these sorts of things presumably goes way up, although the
> exact details of that are not altogether clear to me. Furthermore, as
> you point out, even if everyone behaves themselves and requests and
> allocates the same number of bytes, that's still annoying if it means
> redoing some computation.

Agreed, yes, that's what I'm trying to address.

> I think the answer to this problem is to make requests into named
> objects. You're not allowed to request a number of BYTES of shared
> memory any more; you have to request "the shared memory bytes for the
> object named XYZ". So instead of
> RequestAddinShmemSpace(pgss_memsize()), you would say something like
> RequestAddinShmemSpace("pg_stat_staements", pgss_memsize()) and then
> later instead of saying pgss = ShmemInitStruct("pg_stat_statements",
> sizeof(pgssSharedState), &found), you say pgss =
> ShmemInitStruct("pg_stat_statements", &size, &found).

Yeah, that's a little better than what we have today.

> The other big problem that I think we have in this area is that it's
> unclear what you're allowed to do in _PG_init() vs. some other
> callback, and sometimes you need IsUnderPostmaster checks or
> IsPostmasterEnvironment checks or
> process_shared_preload_libraries_in_progress checks. From my point of
> view, good goals would include (1) moving as much logic as possible
> into _PG_init() vs. having to put it elsewhere and (2) removing as
> many conditional checks as possible from it and aiming for _PG_init()
> functions that just run from start to finish in all cases.

Agreed.

> What _PG_init() already does pretty well is allow you to do
> per-backend setup. For instance, pg_plan_advice needs no shared
> resources, so _PG_init() was easy to write and, IMHO, easy to read.
> It's requesting diverse types of resources -- GUCs, an EXPLAIN
> extension ID, an EXPLAIN option, and hooks, but it can just do all of
> those things one after another with no conditional logic and, IMHO,
> life is great. We fall down a little bit because of the fact that
> PGC_POSTMASTER GUCs can't be added after startup -- see autoprewarm.c,
> for instance, which calls out that problem implicitly; and ...

Agreed.

> I suspect
> that issue is also why pg_stat_statements has the
> process_shared_preload_libraries_in_progress check at the top, because
> it looks to me like everything else that the function does would be
> completely fine to do later.

I think a bigger problem is loading and saving the statistics file. The 
file needs to be saved on postmaster exit, where do you do that if the 
library was not in shared_preload_libraries?

> Shared resources do require some split-up of initialization: as you
> point out, if _PG_init() is called before we know MaxBackends, then we
> can't even size data structures who size depends on that quantity yet,
> and we certainly can't initialize anything, because shared memory
> might not have been created yet. I don't think we can completely avoid
> the need for callbacks here, but... just spitballing, how about
> something like this:
> 
> extern void DefineShmemRegion(char *name, size_t size, void
> **localptr, void (*init_callback)(void *), int flags);
> extern void DefineShmemRegionDynamic(char *name, size_t
> (*sizing_callback)(void *), void **localptr, void
> (*init_callback)(void *), int flags);
> extern void *GetShmemRegion(char *name);
> 
> #define DSR_FLAGS_NO_SLOP  0x01
> #define DSR_FLAGS_DSA_OK 0x02
> 
> If DefineShmemRegion() or DefineShmemRegionDynamic() is called at
> shared_preload_libraries time, it arranges to increase the size of the
> main shared memory segment by the given amount, or the computed amount
> (for things that depend on MaxBackends). Then, once the main shared
> memory segment is created, it invokes the init_callback and sets
> *localptr. If either of these functions are called after the main
> shared memory segment has been created, they check for an existing
> allocation, and if one is found, they just set *localptr. (They can
> actually probably exit quickly if *localptr is already set.)

If you squint a little, this is pretty much the same as my descriptor 
design. Let's start from your DefineShmemRegion function, but in order 
to have some flexibility to add optional optional in the future, without 
having to create DefineShmemRegionNew(), DefineShmemRegionExt() or 
similar functions, let's pass the arguments in a struct. So instead of:

DefineShmemRegion("pg_stat_statements", sizeof(pgssSharedState), &pgss, 
&pgss,pgss_shmem_init, 0);

you would call it like this:

DefineShmemRegion(&(ShmemStructDesc) {
     .name = "pg_stat_statements",
     .size = sizeof(pgssSharedState),
     .ptr = (void **) &pgss,
     .init_fn = pgss_shmem_init,
     .flags = 0,
});

This flexibility will come handy as soon as we add the ability to resize.

Now if you rename DefineShmemRegion() to ShmemRegisterStruct(), this is 
equivalent to my descriptor design. (One detail is whether you require 
the descriptor struct to stay around after the call, or if 
DefineShmemRegion/ShmemRegisterStruct will copy all the information)

> Otherwise, they try to allocate from DSA if DSR_FLAGS_DSA_OK is given,
> or else from the slop space unless DSR_FLAGS_NO_SLOP is given.

Ok, falling back to DSA is a new idea. You could implement that in 
either design.

> If that
> works, they then call the init_callback under a suitable lock and set
> *localptr. If not, they fail silently. Functions that need to use the
> local pointers do something like this:
> 
> if (unlikely(pgss == NULL))
>      pgss = GetShmemRegion("pg_stat_statements");
> 
> ...which throws a suitable error -- not just a generic one that the
> region doesn't exist, but something that's sensitive to different
> failure conditions: the region was never registered, the region was
> registered after shared_preload_libraries time and not enough slop
> space remains, or whatever. GetShmemRegion() could even retry the
> initialization in certain cases, e.g. if DSA is OK and we previously
> were called too early in startup to try DSA, we can try now, or if DSA
> allocation failed due to OOM, we can try again.

Ok, we could add GetShmemRegion() in either design. Do we have any place 
where we'd use that though, instead the backend-private pointer global 
variable? I can't think of any examples where we currently call 
ShmemInitStruct() to get a pointer "on demand" like that.

In pg_stat_statements, this would replace these tests:

     if (!pgss || !pgss_hash)
         ereport(ERROR,
                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                  errmsg("pg_stat_statements must be loaded via 
\"shared_preload_libraries\"")));

But I don't think pg_stat_statements could still allocate the region 
after postmaster startup. GetShmemRegion() would just be a different way 
of throwing that error.

> I *think* this design gets rid of all the IsUnderPostmaster and
> shared_preload_libraries_in_progress checks in individual subsystems,
> and all the use of shmem startup hooks. You just ask for what you want
> and if there's a way to get it, the system gives it to you, and if
> there's not, it generates an error at the latest possible time, and
> also tries to self-heal if that's reasonable. If you do load your
> module in shared_preload_libraries, then by the time main shared
> memory initialization completes in the postmaster, everyone's localptr
> values (like pgss) will be initialized, but if it happens to be an
> EXEC_BACKEND build, those same calls will also happen in every
> postmaster child, and will automatically re-find the shared memory
> areas and reinitialize all the pointers. If you load your module
> later, your localptr values will hopefully be initialized by the end
> of _PG_init(), but if that doesn't work out, then the
> unlikely-protected calls to GetShmemRegion will produce suitable
> errors at a suitable time. And I think it all works out nicely in a
> standalone backend, too.
> 
> This is all kind of a brain dump and is not fully thought-through and
> might be riddled with cognitive errors, but what I'm sort of trying to
> convey is where I think the complexity in the current system comes
> from (which is that we require every subsystem/extension author to
> know how the sausage is made, and we don't enforce consistency between
> requests and allocations) and what I don't really like about
> descriptors as a solution (which is that they are harder to read than
> imperative code and can interfere with cases where somebody wants to
> do something slightly different than what the descriptor-designer had
> in mind). I hope that some of it is helpful to you...

Thanks, this hasn't changed my opinions, but I really appreciate 
pressure-testing the design. I don't want to rewrite this again in a 
year, because we didn't get it quite right.

- Heikki




pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions