> On Tue, Nov 19, 2024 at 01:57:00PM GMT, Peter Eisentraut wrote:
> On 18.10.24 21:21, Dmitry Dolgov wrote:
> > v1-0001-Allow-to-use-multiple-shared-memory-mappings.patch
> >
> > Preparation, introduces the possibility to work with many shmem mappings. To
> > make it less invasive, I've duplicated the shmem API to extend it with the
> > shmem_slot argument, while redirecting the original API to it. There are
> > probably better ways of doing that, I'm open for suggestions.
>
> After studying this a bit, I tend to think you should just change the
> existing APIs in place. So for example,
>
> void *ShmemAlloc(Size size);
>
> becomes
>
> void *ShmemAlloc(int shmem_slot, Size size);
>
> There aren't that many callers, and all these duplicated interfaces almost
> add more new code than they save.
>
> It might be worth making exceptions for interfaces that are likely to be
> used by extensions. For example, I see pg_stat_statements using
> ShmemInitStruct() and ShmemInitHash(). But that seems to be it. Are there
> any other examples out there? Maybe there are many more that I don't see
> right now. But at least for the initialization functions, it doesn't seem
> worth it to preserve the existing interfaces exactly.
>
> In any case, I think the slot number should be the first argument. This
> matches how MemoryContextAlloc() or also talloc() work.
Yeah, agree. I'll reshape this part, thanks.
> (Now here is an idea: Could these just be memory contexts? Instead of
> making six shared memory slots, could you make six memory contexts with a
> special shared memory type. And ShmemAlloc becomes the allocation function,
> etc.?)
Sound interesting. I don't know how good the memory context interface
would fit here, but I'll do some investigation.
> I noticed the existing code made inconsistent use of PGShmemHeader * vs.
> void *, which also bled into your patch. I made the attached little patch
> to clean that up a bit.
Right, it was bothering me the whole time, but not strong enough to make
me fix this in the PoC just yet.
> I suggest splitting the struct ShmemSegment into one struct for the three
> memory addresses and a separate array just for the slock_t's. The former
> struct can then stay private in storage/ipc/shmem.c, only the locks need to
> be exported.
>
> Maybe rename ANON_MAPPINGS to something like NUM_ANON_MAPPINGS.
>
> Also, maybe some of this should be declared in storage/shmem.h rather than
> in storage/pg_shmem.h. We have the existing ShmemLock in there, so it would
> be a bit confusing to have the per-segment locks elsewhere.
>
> [...]
>
> I'm wondering about this change:
>
> -#define PG_MMAP_FLAGS (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)
> +#define PG_MMAP_FLAGS (MAP_SHARED|MAP_HASSEMAPHORE)
>
> It looks like this would affect all mmap() calls, not only the one you're
> changing. But that's the only one that uses this macro! I don't understand
> why we need this; I don't see anything in the commit log about this ever
> being used for any portability. I think we should just get rid of it and
> have mmap() use the right flags directly.
>
> I see that FreeBSD has a memfd_create() function. Might be worth a try.
> Obviously, this whole thing needs a configure test for memfd_create()
> anyway.
Yep, those points make sense to me.
> > v1-0005-Use-anonymous-files-to-back-shared-memory-segment.patch
> >
> > Allows an anonyous file to back a shared mapping. This makes certain things
> > easier, e.g. mappings visual representation, and gives an fd for possible
> > future customizations.
>
> I think this could be a useful patch just by itself, without the rest of the
> series, because of
>
> > * By default, Linux will not add file-backed shared mappings into a
> > core dump, making it more convenient to work with them in PostgreSQL:
> > no more huge dumps to process.
>
> This could be significant operational benefit.
>
> When you say "by default", is this adjustable? Does someone actually want
> the whole shared memory in their core file? (If it's adjustable, is it also
> adjustable for anonymous mappings?)
Yes, there is /proc/<pid>/coredump_filter [1], that allows to specify
what to include. One can ask to exclude anon, file-backed and hugetlb
shared memory, with the only caveat that it's per process. I guess
normally no one wants to have a full shared memory in the coredump, but
there could be exceptions.
> I see that memfd_create() has a MFD_HUGETLB flag. It's not very clear how
> that interacts with the MAP_HUGETLB flag for mmap(). Do you need to specify
> both of them if you want huge pages?
Correct, both (one flag in memfd_create and one for mmap) are needed to
use huge pages.
[1]:
https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-coredump-filter-core-dump-filtering-settings