Re: Changing shared_buffers without restart - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Changing shared_buffers without restart
Date
Msg-id scor5gscd42d4nwszuwvtwss6e22fg3dnvxmqwrcsdkpyyigny@efjlkj6ccv7u
Whole thread Raw
In response to Re: Changing shared_buffers without restart  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Changing shared_buffers without restart
Re: Changing shared_buffers without restart
List pgsql-hackers
> On Wed, Nov 27, 2024 at 04:05:47PM GMT, Robert Haas wrote:
> On Wed, Nov 27, 2024 at 3:48 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > My understanding is that clashing of mappings (either at creation time
> > or when resizing) could happen only withing the process address space,
> > and the assumption is that by the time we prepare the mapping layout all
> > the rest of mappings for the process are already done.
>
> I don't think that's correct at all. First, the user could type LOAD
> 'whatever' at any time. But second, even if they don't or you prohibit
> them from doing so, the process could allocate memory for any of a
> million different things, and that could require mapping a new region
> of memory, and the OS could choose to place that just after an
> existing mapping, or at least close enough that we can't expand the
> object size as much as desired.
>
> If we had an upper bound on the size of shared_buffers and could
> reserve that amount of address space at startup time but only actually
> map a portion of it, then we could later remap and expand into the
> reserved space. Without that, I think there's absolutely no guarantee
> that the amount of address space that we need is available when we
> want to extend a mapping.

Just done a couple of experiments, and I think this could be addressed by
careful placing of mappings as well, based on two assumptions: for a new
mapping the kernel always picks up a lowest address that allows enough space,
and the maximum amount of allocable memory for other mappings could be derived
from total available memory. With that in mind the shared mapping layout will
have to have a large gap at the start, between the lowest address and the
shared mappings used for buffers and rest -- the gap where all the other
mapping (allocations, libraries, madvise, etc) will land. It's similar to
address space reserving you mentioned above, will reduce possibility of
clashing significantly, and looks something like this:

    01339000-0139e000 [heap]
    0139e000-014aa000 [heap]
    7f2dd72f6000-7f2dfbc9c000 /memfd:strategy (deleted)
    7f2e0209c000-7f2e269b0000 /memfd:checkpoint (deleted)
    7f2e2cdb0000-7f2e516b4000 /memfd:iocv (deleted)
    7f2e57ab4000-7f2e7c478000 /memfd:descriptors (deleted)
    7f2ebc478000-7f2ee8d3c000 /memfd:buffers (deleted)
    ^ note the distance between two mappings,
      which is intended for resize
    7f3168d3c000-7f318d600000 /memfd:main (deleted)
    ^ here is where the gap starts
    7f4194c00000-7f4194e7d000
    ^ this one is an anonymous maping created due to large
      memory allocation after shared mappings were created
    7f4195000000-7f419527d000
    7f41952dc000-7f4195416000
    7f4195416000-7f4195600000 /dev/shm/PostgreSQL.2529797530
    7f4195600000-7f41a311d000 /usr/lib/locale/locale-archive
    7f41a317f000-7f41a3200000
    7f41a3200000-7f41a3201000 /usr/lib64/libicudata.so.74.2

The assumption about picking up a lowest address is just how it works right now
on Linux, this fact is already used in the patch. The idea that we could put
upper boundary on the size of other mappings based on total available memory
comes from the fact that anonymous mappings, that are much larger than memory,
will fail without overcommit. With overcommit it becomes different, but if
allocations are hitting that limit I can imagine there are bigger problems than
shared buffer resize.

This approach follows the same ideas already used in the patch, and have the
same trade offs: no address changes, but questions about portability.



pgsql-hackers by date:

Previous
From: Alexander Pyhalov
Date:
Subject: Re: Remove an unnecessary check on semijoin_target_ok() on postgres_fdw.c
Next
From: Matheus Alcantara
Date:
Subject: Re: Using read stream in autoprewarm