Thread: Re: Changing shared_buffers without restart

Re: Changing shared_buffers without restart

From
Dmitry Dolgov
Date:
> On Tue, Nov 19, 2024 at 01:57:00PM GMT, Peter Eisentraut wrote:
> On 18.10.24 21:21, Dmitry Dolgov wrote:
> > v1-0001-Allow-to-use-multiple-shared-memory-mappings.patch
> >
> > Preparation, introduces the possibility to work with many shmem mappings. To
> > make it less invasive, I've duplicated the shmem API to extend it with the
> > shmem_slot argument, while redirecting the original API to it. There are
> > probably better ways of doing that, I'm open for suggestions.
>
> After studying this a bit, I tend to think you should just change the
> existing APIs in place.  So for example,
>
> void *ShmemAlloc(Size size);
>
> becomes
>
> void *ShmemAlloc(int shmem_slot, Size size);
>
> There aren't that many callers, and all these duplicated interfaces almost
> add more new code than they save.
>
> It might be worth making exceptions for interfaces that are likely to be
> used by extensions.  For example, I see pg_stat_statements using
> ShmemInitStruct() and ShmemInitHash().  But that seems to be it.  Are there
> any other examples out there?  Maybe there are many more that I don't see
> right now.  But at least for the initialization functions, it doesn't seem
> worth it to preserve the existing interfaces exactly.
>
> In any case, I think the slot number should be the first argument.  This
> matches how MemoryContextAlloc() or also talloc() work.

Yeah, agree. I'll reshape this part, thanks.

> (Now here is an idea:  Could these just be memory contexts?  Instead of
> making six shared memory slots, could you make six memory contexts with a
> special shared memory type.  And ShmemAlloc becomes the allocation function,
> etc.?)

Sound interesting. I don't know how good the memory context interface
would fit here, but I'll do some investigation.

> I noticed the existing code made inconsistent use of PGShmemHeader * vs.
> void *, which also bled into your patch.  I made the attached little patch
> to clean that up a bit.

Right, it was bothering me the whole time, but not strong enough to make
me fix this in the PoC just yet.

> I suggest splitting the struct ShmemSegment into one struct for the three
> memory addresses and a separate array just for the slock_t's.  The former
> struct can then stay private in storage/ipc/shmem.c, only the locks need to
> be exported.
>
> Maybe rename ANON_MAPPINGS to something like NUM_ANON_MAPPINGS.
>
> Also, maybe some of this should be declared in storage/shmem.h rather than
> in storage/pg_shmem.h.  We have the existing ShmemLock in there, so it would
> be a bit confusing to have the per-segment locks elsewhere.
>
> [...]
>
> I'm wondering about this change:
>
> -#define PG_MMAP_FLAGS (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)
> +#define PG_MMAP_FLAGS                  (MAP_SHARED|MAP_HASSEMAPHORE)
>
> It looks like this would affect all mmap() calls, not only the one you're
> changing.  But that's the only one that uses this macro!  I don't understand
> why we need this; I don't see anything in the commit log about this ever
> being used for any portability.  I think we should just get rid of it and
> have mmap() use the right flags directly.
>
> I see that FreeBSD has a memfd_create() function.  Might be worth a try.
> Obviously, this whole thing needs a configure test for memfd_create()
> anyway.

Yep, those points make sense to me.

> > v1-0005-Use-anonymous-files-to-back-shared-memory-segment.patch
> >
> > Allows an anonyous file to back a shared mapping. This makes certain things
> > easier, e.g. mappings visual representation, and gives an fd for possible
> > future customizations.
>
> I think this could be a useful patch just by itself, without the rest of the
> series, because of
>
> > * By default, Linux will not add file-backed shared mappings into a
> > core dump, making it more convenient to work with them in PostgreSQL:
> > no more huge dumps to process.
>
> This could be significant operational benefit.
>
> When you say "by default", is this adjustable?  Does someone actually want
> the whole shared memory in their core file?  (If it's adjustable, is it also
> adjustable for anonymous mappings?)

Yes, there is /proc/<pid>/coredump_filter [1], that allows to specify
what to include. One can ask to exclude anon, file-backed and hugetlb
shared memory, with the only caveat that it's per process. I guess
normally no one wants to have a full shared memory in the coredump, but
there could be exceptions.

> I see that memfd_create() has a MFD_HUGETLB flag.  It's not very clear how
> that interacts with the MAP_HUGETLB flag for mmap().  Do you need to specify
> both of them if you want huge pages?

Correct, both (one flag in memfd_create and one for mmap) are needed to
use huge pages.

[1]:
https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-coredump-filter-core-dump-filtering-settings



Re: Changing shared_buffers without restart

From
Peter Eisentraut
Date:
On 19.11.24 14:29, Dmitry Dolgov wrote:
>> I see that memfd_create() has a MFD_HUGETLB flag.  It's not very clear how
>> that interacts with the MAP_HUGETLB flag for mmap().  Do you need to specify
>> both of them if you want huge pages?
> Correct, both (one flag in memfd_create and one for mmap) are needed to
> use huge pages.

I was worried because the FreeBSD man page says

MFD_HUGETLB      This flag is currently unsupported.

It looks like FreeBSD doesn't have MAP_HUGETLB, so maybe this is irrelevant.

But you should make sure in your patch that the right set of flags for 
huge pages is passed.