Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Changing shared_buffers without restart |
Date | |
Msg-id | 6c3c55a0-001e-40b7-9ee2-e2f0cb12a70d@garret.ru Whole thread Raw |
In response to | Re: Changing shared_buffers without restart (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: Changing shared_buffers without restart
|
List | pgsql-hackers |
On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote: TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via changing shared memory mapping layout. Any feedback is appreciated.
Hi Dmitry,
I am sorry that I have not participated in the discussion in this thread from the very beginning, although I am also very interested in dynamic shared buffer resizing and evn proposed my own implementation of it: https://github.com/knizhnik/postgres/pull/2 based on memory ballooning and using `madvise`. And it really works (returns unused memory to the system).
This PoC allows me to understand the main drawbacks of this approach:
1. Performance of Postgres CLOCK page eviction algorithm depends on number of shared buffers. My first native attempt just to mark unused buffers as invalid cause significant degrade of performance
pgbench -c 32 -j 4 -T 100 -P1 -M prepared -S
(here shared_buffers - is maximal shared buffers size and `available_buffers` - is used part:
| shared_buffers | available_buffers | TPS | | ------------------| ---------------------------- | ---- | | 128MB | -1 | 280k | | 1GB | -1 | 324k | | 2GB | -1 | 358k | | 32GB | -1 | 350k | | 2GB | 128Mb | 130k | | 2GB | 1Gb | 311k | | 32GB | 128Mb | 13k | | 32GB | 1Gb | 140k | | 32GB | 2Gb | 348k |
My first thought is to replace clock with LRU based in double-linked list. As far as there is no lockless double-list implementation,
it need some global lock. This lock can become bottleneck. The standard solution is partitioning: use N LRU lists instead of 1.
Just as partitioned has table used by buffer manager to lockup buffers. Actually we can use the same partitions locks to protect LRU list.
But it not clear what to do with ring buffers (strategies).So I decided not to perform such revolution in bufmgr, but optimize clock to more efficiently split reserved buffers.
Just add skip_count
field to buffer descriptor. And it helps! Now the worst case shared_buffer/available_buffers = 32Gb/128Mb
shows the same performance 280k as shared_buffers=128Mb without ballooning.
2. There are several data structures i Postgres which size depends on number of buffers.
In my patch I used in some cases dynamic shared buffer size, but if this structure has to be allocated in shared memory then still maximal size has to be used. We have the buffers themselves (8 kB per buffer), then the main BufferDescriptors array (64 B), the BufferIOCVArray (16 B), checkpoint's CkptBufferIds (20 B), and the hashmap on the buffer cache (24B+8B/entry).
128 bytes per 8kb bytes seems to large overhead (~1%) but but it may be quote noticeable with size differences larger than 2 orders of magnitude:
E.g. to support scaling to from 0.5Gb to 128GB , with 128 bytes/buffer we'd have ~2GiB of static overhead on only 0.5GiB of actual buffers.
3. `madvise` is not portable.
Certainly you have moved much further in your proposal comparing with my PoC (including huge pages support).
But it is still not quite clear to me how you are going to solve the problems with large memory overhead in case of ~100x times variation of shared buffers size.
I
pgsql-hackers by date: