Re: Better shared data structure management and resizable shared data structures - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Better shared data structure management and resizable shared data structures
Date
Msg-id 41137f2a-0cb0-4cbe-9afc-2dc690f60a87@iki.fi
Whole thread Raw
In response to Re: Better shared data structure management and resizable shared data structures  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Better shared data structure management and resizable shared data structures
List pgsql-hackers
On 16/03/2026 12:28, Ashutosh Bapat wrote:
> 0001
> 
> @@ -3236,6 +3239,8 @@ PostmasterStateMachine(void)
> LocalProcessControlFile(true);
> /* re-create shared memory and semaphores */
> + ResetShmemAllocator();
> 
> This name is misleading. The function does not touch ShmemAllocator at
> all. Instead it resets the ShmemStruct registry. I suggest
> ResetShememRegistry() instead.

Hmm. Come to think of it, this isn't quite right for 
shared_preload_libraries. On crash restart, we don't call _PG_init() for 
shared_preload_libraries. So if ShmemRegisterStruct() is called from 
_PG_init(), it won't get called again after restart, and hence we should 
retain the registration. This is different from ShmemInitStruct(), which 
does get called again after crash restart.

I fixed that by retaining the structs registered with 
ShmemRegisterStruct(), and only clearing out entries made with 
ShmemInitStruct(). I think that's the right thing to do, but I feel a 
little uneasy about keeping stuff across restarts. This isn't a 
completely new issue; even without this patch I feel uneasy about 
keeping the shared memory segment and not calling _PG_init() for 
shared_preload_libraries again. I wish crash restart could reset things 
even more thoroughly.

If we introduced a shmem_register_hook like you suggested, we could call 
that again after restart. But it feels silly; _PG_init() would register 
a shmem_register_hook, which would register the shared memory desc. Why 
not just register the shared memory desc directly? I don't think we 
really gain anything by the extra step. (It might be useful to allow the 
size to be calculated later, taking MaxBackends into account. What I 
mean here is that it would be a silly dance just to handle crash restarts)

> + *
> + * There are two kinds of shared memory data structures: fixed-size structures
> + * and hash tables.
> 
> In future we will have resizable "fixed" structures and we may also
> have resizable hash tables i.e. hash tables whose directory would be
> resizable. The later would be help support resizable shared buffers
> lookup table. It will be good to write the above sentence so that we
> can just add more types of data structures without needing to rewrite
> everything. If we could find a good term for "fixed-size structures"
> which are really "structures that require contiguous memory", we will
> be able to write the above sentence as "There are two kinds of shared
> memory structures: contiguous structures and hash tables.". When we
> add resizable structures, we can just add a sentence "A contiguous
> structure may be fixed size or resizable". When we add resizable hash
> tables, we can just replace that with "Both of these kinds can be
> fixed-size or resizable". I am not sure whether "contiguous
> structures" is a good term though (for one, word contiguous can be
> confused with continuous). Whatever term we use should be something
> that we can carry further in the remaining paragraphs.
> + Fixed-size structures contain things like global
> + * variables for a module and should never be allocated after the shared
> + * memory initialization phase.
> 
> I think the existing comment is not accurate. The term "global
> variables" in the sentence can be confused with process global
> variables. We should be using the term "shared variables" or better
> "shared state". If we adopt "contiguous structures" as the term for
> the first kind of data structure, we can write "Contiguous structures
> contain shared state, maintained in a contiguous chunk of memory, for
> a module. It should never be allocated after the shared memory
> initialization phase.".

I went with "variables shared between all backend processes", and did 
another pass of other copy-editing too. No doubt this needs to be 
updated again when the resizing patch lands.

> - infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size);
> - infoP->alloc = ShmemAllocNoError;
> - hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
> + desc->hash_info.dsize = desc->hash_info.max_dsize =
> hash_select_dirsize(desc->max_size);
> + desc->hash_info.alloc = ShmemAllocNoError;
> + desc->hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
> /* look it up in the shmem index */
> 
> The next several lines of code look up shmem index. Should we remove
> this comments or modify it to say "Register and initialize the hash
> table".

Replaced it with "/* Set up the base struct descriptor */"

> +HTAB *
> +ShmemInitHash(const char *name, /* table string name for shmem index */
> + int64 init_size, /* initial table size */
> + int64 max_size, /* max size of the table */
> + HASHCTL *infoP, /* info about key and bucket size */
> + int hash_flags) /* info about infoP */
> +{
> + ShmemHashDesc *desc;
> +
> 
> ... snip ...
> 
> +
> + ShmemRegisterHash(desc);
> + return *desc->ptr;
> +}
> 
> I like the way these functions are written using the new API. I think
> we should keep these legacy interface at the end of section of shmem
> APIs, rather than keeping those at the end of the file where we have
> monitoring and arithmetic functions. If you want to get rid of the
> legacy APIs in this release itself, I think it's ok to keep them at
> the end of the file.

I don't plan to get rid of the legacy API any time soon, I expect 
existing extensions to continue using it for years to come. So I moved 
them per your suggestion.

> ShmemInitStruct() now calls ShmemRegisterStruct(). Earlier it could be
> called from any backend, in any state to fetch a pointer to a shared
> memory structure. It didn't add a new structure. Now it can add a new
> structure. I am wondering whether that can cause registry in different
> backends to get out of sync. Should we limit the window when it can be
> called just like how shmem_request_hook call is limited. In that sense
> ShmemRegisterStruct() looks something to be called from a
> shmem_register_hook which is also called from EXEC_BACKEND. Sorry to
> expand it here rather in my previous reply. In case we replace all the
> current calls to ShmemInitStruct() with ShmemRegisterStruct(), we may
> be able to get rid of the Shmem Index altogether; after all it's used
> only for fetching the pointers to the shared memory areas in
> EXEC_BACKEND mode.

I think it's still useful to allow ShmemRegisterStruct() after 
postmaster startup, so that you can use it with extensions that are not 
listed in shared_preload_libraries, but need a little bit of shared 
memory. Hmm, perhaps we should add an explicit flag for that case, 
though. So that by default ShmemRegisterStruct() fails if you call it 
after postmaster startup, but you could allow it by setting a flag in 
the descriptor.

> I thought that we could save the registry in the
> shared memory. In EXEC_BACKEND mode, we go over the registry calling
> attach_fn for each entry. But since the binary is overwritten in
> EXEC_BACKEND case, attach/init fns are not guaranteed to have the same
> address in all the backends. Maybe we have to resort to
> launch_backend() to transfer the registry to the backend through the
> file (to keep it in sycn in all the backends): a solution you may not
> like.
Yep, can't have function pointers in shared memory unfortunately.

> + void **ptr;
> +} ShmemStructDesc;
> 
> 
> I think the comments for each member should highlight which of these
> fields are required (non-zero) and which can be optional (zero'ed
> out).

Added some comments on that.

> + */
> + ShmemStructDesc base_desc;
> 
> Once we have calculated the base_desc in ShmemRegisterHash() and
> called ShmemRegisterStruct(), we don't need base_desc anymore. Even
> the pointer to the allocated hash table memory is available through
> *ptr. Probably we could just remove this member from here.
> ShmemRegisterHash() can declare a variable of type ShmemStructDesc,
> populate it based on the members in this structure and pass it to
> ShmemRegisterStruct(). I am not comfortable with specification
> structure being modified by the registration function.

Hmm, the ShmemStructDesc is accessed in ShmemInitRegistered() and in the 
shmem_hash_init() callback, so 'base_desc' cannot be a local variable in 
ShmemRegisterHash(). It could be allocated in TopMemoryContext though. 
That would hide it from ShmemHashDesc.

I'm a little ambivalent about that. It would be nice to hide it if it's 
not intended to be accessed outside shmem.c. Then again, it might be 
useful to peek into it in the future, depending on what fields we add. 
and it might be handy to look at in a debugger.

> @@ -102,15 +102,14 @@ CalculateShmemSize(void)
> size = add_size(size, ShmemRegisteredSize());
> size = add_size(size, dsm_estimate_size());
> 
> We have defined dsm_main_space_shmem_desc, but we still use
> dsm_estimate_size() here and initialize the memory in
> dsm_shmem_init(), which is explicitily called from
> CreateOrAttachShmemStructs(). Why is that? Shouldn't we be registering
> the structure in RegisterShmemStructs(), and let ShmemInitRegistered()
> initialize it? Am I missing something here?

Yes you're right. Fixed that and all the the bugs that Zsolt pointed 
out. Thanks!

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Alvaro Herrera
Date:
Subject: Re: Adding REPACK [concurrently]