Re: PGC_SIGHUP shared_buffers? - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: PGC_SIGHUP shared_buffers? |
Date | |
Msg-id | 99a4f21e-e117-4169-8626-67a7678654f0@garret.ru Whole thread Raw |
In response to | Re: PGC_SIGHUP shared_buffers? (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-hackers |
On Fri, Feb 16, 2024 at 5:29 PM Robert Haas <robertmhaas@gmail.com> wrote:3. Reserve lots of address space and then only use some of it. I hear rumors that some forks of PG have implemented something like this. The idea is that you convince the OS to give you a whole bunch of address space, but you try to avoid having all of it be backed by physical memory. If you later want to increase shared_buffers, you then get the OS to back more of it by physical memory, and if you later want to decrease shared_buffers, you hopefully have some way of giving the OS the memory back. As compared with the previous two approaches, this seems less likely to be noticeable to most PG code. Problems include (1) you have to somehow figure out how much address space to reserve, and that forms an upper bound on how big shared_buffers can grow at runtime and (2) you have to figure out ways to reserve address space and back more or less of it with physical memory that will work on all of the platforms that we currently support or might want to support in the future.FTR I'm aware of a working experimental prototype along these lines, that will be presented in Vancouver: https://www.pgevents.ca/events/pgconfdev2024/sessions/session/31-enhancing-postgresql-plasticity-new-frontiers-in-memory-management/
If you are interested - this is my attempt to implement resizable shared buffers based on ballooning:
https://github.com/knizhnik/postgres/pull/2
Unused memory is returned to OS using `madvise` (so it is not so portable solution).
Unfortunately there are really many data structure in Postgres which size depends on number of buffers.
In my PR I am using `GetAvailableBuffers()` function instead of `NBuffers`. But it doesn't always help because many of this data structures can not be reallocated.
Another important limitation of this approach are:
1. It is necessary to specify maximal number of shared buffers 2. Only `BufferBlocks` space is shrinked but not buffer descriptors and buffer hash. Estimated memory fooyprint for one page is 132 bytes. If we want to scale shared buffers from 100Mb to 100Gb, size of use memory will be 1.6Gb. And it is quite large.
3. Our CLOCK algorithm becomes very inefficient for large number of shared buffers.
Below are first results (pgbench database with scale 100, pgbench -c 32 -j 4 -T 100 -P1 -M prepared -S ) I get:
| shared_buffers | available_buffers | TPS | | ------------------| ---------------------------- | ---- | | 128MB | -1 | 280k | | 1GB | -1 | 324k | | 2GB | -1 | 358k | | 32GB | -1 | 350k | | 2GB | 128Mb | 130k | | 2GB | 1Gb | 311k | | 32GB | 128Mb | 13k | | 32GB | 1Gb | 140k | | 32GB | 2Gb | 348k |
`shared_buffers` specifies maximal shared buffers size and `avaiable_buffer` - current limit.
So when shared_buffers >> available_buffers and dataset doesn't fit in them, we get awful degrade of performance (> 20 times).
Thanks to CLOCK algorithm.
My first thought is to replace clock with LRU based in double-linked list. As far as there is no lockless double-list implementation,
it need some global lock. This lock can become bottleneck. The standard solution is partitioning: use N LRU lists instead of 1.
Just as partitioned has table used by buffer manager to lockup buffers. Actually we can use the same partitions locks to protect LRU list.
But it not clear what to do with ring buffers (strategies).So I decided not to perform such revolution in bufmgr, but optimize clock to more efficiently split reserved buffers.
Just add skip_count
field to buffer descriptor. And it helps! Now the worst case shared_buffer/available_buffers = 32Gb/128Mb
shows the same performance 280k as shared_buffers=128Mb without ballooning.
pgsql-hackers by date: