Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Changing shared_buffers without restart |
Date | |
Msg-id | 85c07a76-57ca-4be0-9456-170707ced483@garret.ru Whole thread Raw |
In response to | Re: Changing shared_buffers without restart (Dmitry Dolgov <9erthalion6@gmail.com>) |
List | pgsql-hackers |
On 18/04/2025 12:26 am, Dmitry Dolgov wrote: >> On Thu, Apr 17, 2025 at 02:21:07PM GMT, Konstantin Knizhnik wrote: >> >> 1. Performance of Postgres CLOCK page eviction algorithm depends on number >> of shared buffers. My first native attempt just to mark unused buffers as >> invalid cause significant degrade of performance > Thanks for sharing! > > Right, but it concerns the case when the number of shared buffers is > high, independently from whether it was changed online or with a > restart, correct? In that case it's out of scope for this patch. > >> 2. There are several data structures i Postgres which size depends on number >> of buffers. >> In my patch I used in some cases dynamic shared buffer size, but if this >> structure has to be allocated in shared memory then still maximal size has >> to be used. We have the buffers themselves (8 kB per buffer), then the main >> BufferDescriptors array (64 B), the BufferIOCVArray (16 B), checkpoint's >> CkptBufferIds (20 B), and the hashmap on the buffer cache (24B+8B/entry). >> 128 bytes per 8kb bytes seems to large overhead (~1%) but but it may be >> quote noticeable with size differences larger than 2 orders of magnitude: >> E.g. to support scaling to from 0.5Gb to 128GB , with 128 bytes/buffer we'd >> have ~2GiB of static overhead on only 0.5GiB of actual buffers. > Not sure what do you mean by using a maximal size, can you elaborate. > > In the current patch those structures are allocated as before, except > each goes into a separate segment -- without any extra memory overhead > as far as I see. Thank you for explanation. I am sorry that I have not precisely investigated your patch before writing: it seems to be that you are are placing in separate segment only content of shared buffers. Now I see that I was wrong and it is actually the main difference with memory ballooning approach I have used. As far as you are are allocating buffers descriptors and hash table in the same segment, there is no extra memory overhead. The only drawback is that we are loosing content of shared buffers in case of resize. It may be sadly, but not looks like there is no better alternative. But there are still some dependencies on shared buffers size which are not addressed in this PR. I am not sure how critical they are and is it possible to do something here, but at least I want to enumerate them: 1. Checkpointer: maximal number of checkpointer requests depends on NBuffers. So if we start with small shared buffers and then upscale, it may cause the too frequent checkpoints: Size CheckpointerShmemSize(void) ... size = add_size(size, mul_size(NBuffers, sizeof(CheckpointerRequest))); CheckpointerShmemInit(void) CheckpointerShmem->max_requests = NBuffers; 2. XLOG: number of xlog buffers is calculated depending on number of shared buffers: XLOGChooseNumBuffers(void) { ... xbuffers = NBuffers / 32; Should not cause some errors, but may be not so efficient if once again we start we tiny shared buffers. 3. AIO: AIO max concurrency is also calculated based on number of shared buffers: AioChooseMaxConcurrency(void) { ... max_proportional_pins = NBuffers / max_backends; For small shared buffers (i.e. 1Mb, there will be no concurrency at all). So none of this issues can cause some error, just some inefficient behavior. But if we want to start with very small shared buffers and then increase them on demand, then it can be a problem. In all this three cases NBuffers is used not just to calculate some threshold value, but also determine size of the structure in shared memory. The straightforward solution is to place them in the same segment as shared buffers. But I am not sure how difficult it will be to implement.
pgsql-hackers by date: