Hi,
The ringbuffers we use for seqscans, vacuum, copy etc can cause very
drastic slowdowns (see e.g. [1]), an can cause some workloads to
practically never end up utilizing shared buffers. ETL workloads
e.g. regularly fight with that problem.
While I think there's a number of improvements[2] we could make to the
ringbuffer logic, I think we should also just allow to make them
configurable. I think that'll allow a decent number of systems perform
better (especially on slightly bigger systems the the current
ringbuffers are *way* too small) , make the thresholds more discoverable
(e.g. the NBuffers / 4 threshold is very confusing), and will make it
easier to experiment with better default values.
I think it would make sense to have seqscan_ringbuffer_threshold,
{bulkread,bulkwrite,vacuum}_ringbuffer_size. I think they often sensibly
are set in proportion of shared_buffers, so I suggest defining them as
floats, where negative values divide shared_buffers, whereas positive
values are absolute sizes, and 0 disables the use of ringbuffers.
I.e. to maintain the current defaults, seqscan_ringbuffer_threshold
would be -4.0, but could be also be set to an absolute 4GB (converted to
pages). Probably would want a GUC show function that displays
proportional values in a nice way.
We probably should also just increase all the ringbuffer sizes by an
order of magnitude or two, especially the one for VACUUM.
Greetings,
Andres Freund
[1] https://postgr.es/m/20190507201619.lnyg2nyhmpxcgeau%40alap3.anarazel.de
[2] The two most important things imo:
a) Don't evict buffers when falling off the ringbuffer as long as
there unused buffers on the freelist. Possibly just set their
usagecount to zero as long that is the case.
b) The biggest performance pain comes from ringbuffers where it's
likely that buffers are dirty (vacuum, copy), because doing so
requires that the corresponding WAL be flushed. Which often ends
up turning many individual buffer evictions into an fdatasync,
slowing things down to a crawl. And the contention caused by that
is a significant concurrency issue too. By doing writes, but not
flushes, shortly after the insertion, we can reduce the cost
significantly.