Make ringbuffer threshold and ringbuffer sizes configurable? - Mailing list pgsql-hackers

Hi,

The ringbuffers we use for seqscans, vacuum, copy etc can cause very
drastic slowdowns (see e.g. [1]), an can cause some workloads to
practically never end up utilizing shared buffers. ETL workloads
e.g. regularly fight with that problem.

While I think there's a number of improvements[2] we could make to the
ringbuffer logic, I think we should also just allow to make them
configurable.  I think that'll allow a decent number of systems perform
better (especially on slightly bigger systems the the current
ringbuffers are *way* too small) , make the thresholds more discoverable
(e.g. the NBuffers / 4 threshold is very confusing), and will make it
easier to experiment with better default values.

I think it would make sense to have seqscan_ringbuffer_threshold,
{bulkread,bulkwrite,vacuum}_ringbuffer_size. I think they often sensibly
are set in proportion of shared_buffers, so I suggest defining them as
floats, where negative values divide shared_buffers, whereas positive
values are absolute sizes, and 0 disables the use of ringbuffers.

I.e. to maintain the current defaults, seqscan_ringbuffer_threshold
would be -4.0, but could be also be set to an absolute 4GB (converted to
pages). Probably would want a GUC show function that displays
proportional values in a nice way.

We probably should also just increase all the ringbuffer sizes by an
order of magnitude or two, especially the one for VACUUM.

Greetings,

Andres Freund

[1] https://postgr.es/m/20190507201619.lnyg2nyhmpxcgeau%40alap3.anarazel.de

[2] The two most important things imo:
    a) Don't evict buffers when falling off the ringbuffer as long as
       there unused buffers on the freelist. Possibly just set their
       usagecount to zero as long that is the case.
    b) The biggest performance pain comes from ringbuffers where it's
       likely that buffers are dirty (vacuum, copy), because doing so
       requires that the corresponding WAL be flushed. Which often ends
       up turning many individual buffer evictions into an fdatasync,
       slowing things down to a crawl. And the contention caused by that
       is a significant concurrency issue too. By doing writes, but not
       flushes, shortly after the insertion, we can reduce the cost
       significantly.



pgsql-hackers by date:

Previous
From: "曾文旌(义从)"
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager