Re: Better shared data structure management and resizable shared data structures - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Better shared data structure management and resizable shared data structures
Date
Msg-id 8a6799be-bd42-49fb-8914-856c97bb1977@iki.fi
Whole thread Raw
In response to Re: Better shared data structure management and resizable shared data structures  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Better shared data structure management and resizable shared data structures
List pgsql-hackers
On 18/03/2026 21:30, Robert Haas wrote:
> On Fri, Mar 13, 2026 at 7:41 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Yeah. IMHO the existing shmem_request/startup_hook mechanism is pretty
>> awkward too, and in most cases, the new mechanism is more convenient. It
>> might be slightly less convenient for some things, but mostly it's
>> better. Would you agree with that, or do you actually like the old hooks
>> and ShmemInitStruct() better?
> 
> I'd say it's not massively different one way or the other. Looking at
> the pg_stat_statements changes in particular, I feel like in v2, it
> was actually worse, because you didn't get manage to get rid of
> pgss_shmem_request(), but it was no longer the thing requesting shmem.
> v3 is better in that regard: you get rid of some complexity in
> exchange for what you add. It's not amazing, though:
> pgss_shmem_startup() now has nothing to do with the name of the hook,
> which is not this patch's fault but also isn't solved by it. I wonder
> why in the world somebody decided to jam all of this non-shmem-related
> logic into this function, and why they didn't add a comment explaining
> why it was here and not someplace else. One of the worst things about
> this area is that I often end up having to trace through a bunch of
> postmaster startup logic to figure out whether any given bit of code
> is actually in the right place, and that makes me feel like the hooks
> are badly designed. pgss_shmem_startup() is a good example of that,
> and the fact that it needs an IsUnderPostmaster check is another.

Hmm, I assumed it was important that the pg_stat_statements file is 
loaded later in the startup sequence, and that's why it was in 
pgss_shmem_startup(). But now that I look at it, I don't think there was 
any grand plan, shmem_startup_hook was just the only easy way to get 
control during postmaster startup, after shmem initialization.

So I think we can move that code to the init_fn callback with the new 
API, and that gets rid of shmem_startup_hook in pg_stat_statements. See 
attached (v6-0005-move-pgss-shmem_startup-hook-code-into-the-new-in.patch).

> We should somehow try to make it clear what happens with this new
> mechanism if a module is loaded via shared_preload_libraries vs. if
> it's loaded via LOAD or session_preload_libraries or whatever. Writing
> modules that don't require shared_preload_libraries is a boon to
> users, because they can be added to a production system without a
> server restart. I wonder whether this new facility handles that case
> better, worse, or the same as existing facilities.

Pretty much the same I think. Point taken that it could be documented 
better.

The old documentation for ShmemInitStruct() assumed that it would be 
used from shared_preload_libraries, it didn't. With the new API, I tried 
to document how it behaves when used outside shared_preload_libraries, 
but still focused on how using it from shared_preload_libraries. I'm not 
sure it helped. Perhaps the after-startup behavior should be put in a 
separate section.

>> One such wrinkle with ShmemRegisterStruct() in the patch now is that
>> it's harder to do initialization that touches multiple structs or hash
>> tables. Currently the callbacks are called in the same order that the
>> structs are registered, so you can do all the initialization in the last
>> struct's callback. The single pair of shmem_request/startup_hooks per
>> module was more clear in that aspect. Fortunately, that kind of
>> cross-struct dependencies are pretty rare. So I think it's fine. (The
>> order that the callbacks are called needs be documented explicitly though).
>>
>> If we want to improve on that, one idea would be to introduce a
>> ShmemRegisterCallbacks() function to register callbacks that are not
>> tied to any particular struct and are called after all the per-struct
>> callbacks.
> 
> I think the whole idea of ShmemInitHash() and ShmemInitStruct() is
> relatively poorly designed in this regard. Ideally, I want to
> initialize all the shared memory that belongs to me, and that might
> include arbitrary data structures, but the current design kind of
> thinks that I want one struct or one hash table and nothing else. If
> we're redesigning this mechanism, it would sure be nice to improve on
> that.
> 
> Looking just at the pg_stat_statements changes, I think my overall
> view on this right now is that it's not terrible, but I'm also not
> that happy about introducing yet another way to do it for this amount
> of gain. To me, it doesn't yet rise to the level of a clear win.

So here's another idea (not yet implemented in the attached patch): 
instead of thinking in terms of individual shmem structs and hashes, 
let's introduce a concept of a "subsystem" that ties them together:

static pgss_subsystem_desc = {
     .name = "pg_stat_statements",
     .shmem_structs = {
         {
             .ptr = (void **) &pgss,
             .size = sizeof(pgssSharedState),
         },
     },
     .shmem_hashes = {
         {
             .ptr = &pgss_hash,

             .init_size = 0,          /* set from 'pgss_max' */
             .max_size = 0,           /* set from 'pgss_max' */
             .hash_info.keysize = sizeof(pgssHashKey),
             .hash_info.entrysize = sizeof(pgssEntry),
             .hash_flags = HASH_ELEM | HASH_BLOBS,
         },
     },

     /* called after the shmem structs and hashes have been allocated */
     .init_fn = pgss_shmem_init,
}

void
_PG_init(void)
{
     ...
     RegisterSubsystem(&pgss_subsystem_desc);
}

We could add more callbacks that get called at different times. For 
example the callback that would get called before shared memory is 
allocated, which could adjust the size according to MaxBackends. That 
would fully replace shmem_request_hook. Or a callback that would get 
called later in the startup sequence, if we wanted to e.g. load the 
pg_stat_statements file later in startup.

This would be a natural place for other resources in future too. We 
could support declaring "named lwlock tranches" here to replace 
RequestNamedLWLockTranche() for example, although I think it's still 
better to encourage embedding the LWLock in the struct instead.

_PG_init in pg_stat_statements still does a lot more than register that 
struct. It declares the GUCs and installs other hooks for example. We 
could perhaps move those to the subsystem descriptor too, although I'm 
not sure if that's worth the code churn.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Next
From: Daniel Gustafsson
Date:
Subject: Re: Serverside SNI support in libpq