Re: Better shared data structure management and resizable shared data structures - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Better shared data structure management and resizable shared data structures
Date
Msg-id b2451225-0a10-4ded-abef-4e9efbde7a8e@iki.fi
Whole thread Raw
In response to Re: Better shared data structure management and resizable shared data structures  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Better shared data structure management and resizable shared data structures
List pgsql-hackers
Thanks, will incorporate your comments in next version. Replying to just 
a few of them here:

On 27/03/2026 09:01, Ashutosh Bapat wrote:
> /* Restore basic shared memory pointers */
> if (UsedShmemSegAddr != NULL)
> + {
> InitShmemAllocator(UsedShmemSegAddr);
> + ShmemCallRequestCallbacks();
> 
> It's not clear how we keep the list of registered callbacks across the
> backends and also after restart in-sync. How do we make sure that the
> callbacks registered at this time are the same callbacks registered
> before creating the shared memory? How do we make sure that the
> callbacks registered after the startup are also registered after
> restart?

On Unix systems, the registered callbacks are inherited by fork(), and 
also survive over crash restart. With EXEC_BACKEND, the assumption is 
that calling a library's _PG_init() function will register the same 
callbacks every time. We make the same assumption today with the 
shmem_startup hook.

> +void
> +RegisterShmemCallbacks(const ShmemCallbacks *callbacks)
> ... snip ...
> + foreach(lc, requested_shmem_areas)
> 
> Doesn't this list contain all the areas, not just registered in this
> instance of the call. Does that mean that we need to have all the
> attach functions idempotent? Why can't we deal with the newly
> registered areas only?

registered_shmem_areas is supposed to be empty when the function is 
entered. There's an assertion for that too before the foreach().

However, it's missing this, after processing the list:

         list_free_deep(requested_shmem_areas);
         requested_shmem_areas = NIL;

Because of that, this will fail if you load multiple extensions that 
call RegisterShmemCallbacks() in the same session. Will fix that.

> + /*
> + * Extra space to reserve in the shared memory segment, but it's not part
> + * of the struct itself. This is used for shared memory hash tables that
> + * can grow beyond the initial size when more buckets are allocated.
> + */
> + size_t extra_size;
> 
> When we introduce resizable structures (where even the hash table
> directly itself could be resizable), we will introduce a new field
> max_size which is easy to get confused with extra_size. Maybe we can
> rename extra_size to something like "auxilliary_size" to mean size of
> the auxiliary parts of the structure which are not part of the main
> struct itself.
> 
> + /*
> + * max_size is the estimated maximum number of hashtable entries. This is
> + * not a hard limit, but the access efficiency will degrade if it is
> + * exceeded substantially (since it's used to compute directory size and
> + * the hash table buckets will get overfull).
> + */
> + size_t max_size;
> +
> + /*
> + * init_size is the number of hashtable entries to preallocate. For a
> + * table whose maximum size is certain, this should be equal to max_size;
> + * that ensures that no run-time out-of-shared-memory failures can occur.
> + */
> + size_t init_size;
> 
> Everytime I look at these two fields, I question whether those are the
> number of entries (i.e. size of the hash table) or number of bytes
> (size of the memory). I know it's the former, but it indicates that
> something needs to be changed here, like changing the names to have
> _entries instead of _size, or changing the type to int64 or some such.
> Renaming to _entries would conflict with dynahash APIs since they use
> _size, so maybe the latter?

Agreed.

- Heikki



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Add pg_stat_autovacuum_priority
Next
From: Sami Imseih
Date:
Subject: Re: another autovacuum scheduling thread