Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
| From | Jakub Wartak |
|---|---|
| Subject | Re: Changing shared_buffers without restart |
| Date | |
| Msg-id | CAKZiRmx-ycn+TT3_n97K40aNf4Ug0V5ywi3wu9p7fFwkWO+udg@mail.gmail.com Whole thread |
| In response to | Re: Changing shared_buffers without restart (Andres Freund <andres@anarazel.de>) |
| List | pgsql-hackers |
On Mon, Feb 9, 2026 at 3:29 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2026-02-09 14:41:12 +0100, Jakub Wartak wrote: > > I've thought that the potential main reason of the hit would be slow fork(), > > so I had an idea why we fork() with majority of memory being shared_buffers > > (BufferBlocks) that is not really used inside postmaster itself > > (I mean it does not use it, only backends do use it). I've thought it could > > be cool if we could just init the memory, leave just the fd from memfd_create > > for s_b around (that is unmap() BufferBlocks from the postmaster thus lowering > > its RSS/smaps footprint) and then on fork() the fork() would NOT have to copy > > that big kernel VMA for shared_buffers. Instead (in theory - only the fd that > > is the reference - thereby we could increase the scalability of the postmaster > > (kernel would need to perform less work during fork()). Later on, the classic > > backends on their side would mmap() the region back from the fd created earlier > > (in postmaster) using memfd_create(2), but that would happen as part of many > > backends (so workload would be spread across many CPUs). > > FWIW, when looking at this in the past there were two noteworthy things: > > 1) The main driver of slowness was *NOT* shared buffers, but all the libraries > we link to. Particularly openssl makes things a *lot* slower, due to all > the small mappings it creates. If you compare the fork speed of a postgres > with minimal dependencies and one with all the dependencies, you'll see a > huge difference. > > The reason that openssl is so bad is that it modifies data in all the > copy-on-write mappings during process exit processing. See [1]. > > > 2) A lot of the slowness isn't actually from the fork overhead itself, but > from fork competing with the processing during process exit, as both taking > conflicting locks. Interesting, thanks for sharing this. I've studied fork() itself a little bit more (the fork() vs various factors without crazy exit() handlers). See attached results from 2 machines or just run fork_bench C proggie. My conclusions on on 6.14.x are following (those are mostly notes for myself while studying those, but I think I'll share, maybe just one variable is missing here: how fork() ends up being affected by NUMA - future TODO for me ): MAP_SHARED (findings for this $thread) -------------------------------------- a) In "mmap-MAP_SHARED" cases, the max number of fork()/s drops but very slightly as the number of (still only MAP_SHARED!) segments increase. This applies to both with huge pages and without them. Memfd_normal seems to behave almost in identical way, so at least from that angle the patch seems to be ok (assuming it has just two segments today, yesterday it had 6 for me ;)) b) My wild trick/assumption - not related to $thread - under "memfd_unmap" that I've posted earlier - assuming it will double postmaster scalability - is double fizzled right now, as you say the overhead of unmapping segments Before fork()ing and keeping just mem fd to restore that mmap MAP_SHARED segment from child for some reason degrades performance compared to just letting them persist or using MADV_DONTNEED. Probably it's page faulting as you say, I haven't measured that. RIP idea. MAP_PRIVATE (this can be ignored for the purposes of this $thread) ------------------------------------------------------------------ Nevertheless quite interesting to see how those two modes compare and it touches aspect of openssl and e.g. io_uring using to create many VMAs too [1] c) MAP_PRIVATE seems to be way slower because fork() must copy PTEs and mark them as CoW. Performance drops as the total memory (number of pages) increases. We should not have big MAP_ANONYMOUS|MAP_PRIVATE segments (or even just many segments [1]) in the postmaster if we want fast fork(). But even still having a lot of MAP_PRIVATE (in some edge case? large heap?), really benefits from huge pages there. My takeaway from this is - and it's unrelated to this $thread, but still interesting finding for future: once we'll have multithreading, we might be not able to fork() efficiently from there (or it will be big huge impact for MAP_PRIVATE/big heap for all threads). It will clearly depend on the architecture: but if postmaster will be removed and one a giant PID will have multiple TIDs and somebody does want to run COPY TO/FROM PROGRAM often from there, we are screwed unless those segments will be MADV_DONOTFORK. > I seriously doubt it's a good idea to delay the mmapping until after the fork, > that'll just lead to more different mappings to exist that then all need to be > tracked separately by the kernel. Right, the raw numbers are not showing this as a good idea. -J. [1] - https://www.postgresql.org/message-id/7bduf2aqh6ygz7qugmb65ohczozeed36oscviebhjcvussjqt4%405fcoh7427txo
Attachment
pgsql-hackers by date: