Home > mailing lists

Speed-up shared buffers prewarming - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Speed-up shared buffers prewarming
Date	March 15, 2023 23:38:06
Msg-id	b99d8b95-caed-9e39-4fd6-2cbd47224759@garret.ru Whole thread Raw
Responses	Re: Speed-up shared buffers prewarming (Matthias van de Meent <boekewurm+postgres@gmail.com>) Re: Speed-up shared buffers prewarming (Melanie Plageman <melanieplageman@gmail.com>)
List	pgsql-hackers

Tree view

Hi hackers,

It is well known fact that queries using sequential scan can not be used to prewarm cache, because them are using ring buffer
even if shared buffers are almost empty.
I have searched hackers archive but failed to find any discussion about it.
What are the drawbacks of using free buffers even with BAM_BULKREAD strategy?
I mean the following trivial patch:

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 6be80476db..243335d0e4 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -208,8 +208,15 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
        /*
         * If given a strategy object, see whether it can select a buffer. We
         * assume strategy objects don't need buffer_strategy_lock.
         */
-       if (strategy != NULL)
+       if (strategy != NULL && StrategyControl->firstFreeBuffer < 0)
        {
                buf = GetBufferFromRing(strategy, buf_state);
                if (buf != NULL)

So if there are free buffers, then use normal buffer allocation instead of GetBufferFromRing.

Right now it is necessary to use pg_prewarm extension in order to prewarm buffers.
But it is not convenient (you need to manually locate and prewarm all indexes and TOAST relation) and not always possible
(client may just not notice that server is restarted).

One potential problem which I can imagine is sync scan: when several seqscans of the same table are using the same pages from ring buffer.
But synchronization of concurrent sync scans is naturally achieved: backed which is moving first is moving slowly than catching up backends
which do not need to read something from the disk. It seems to me that if we allow to use all shared buffers instead of small ring buffer,
then concurrent seqscans will have more chances to reuse cached pages. I have performed multiple tests with spawning multiple parallel seqscans
after postgres restart and didn't observe any problems or degradation of performance comparing with master.

Also ring buffer is used not only for seqscan. There are several places in Postgres core and extension (for example pgvector) where BAM_BULKREAD strategy is used
also for index scan.

Certainly OS file cache should prevent redundant disk reads.
But it seems to be better in any case to use free memory inside Postgres process rather than rely on OS cache and perform syscalls to copy data from this cache.

Definitely it is possible that seqscan limited by ring buffer will be completed faster than seqscan filling all shared buffers especially if
size of shared buffers is large enough. OS will need some extra time to commit memory and may be swap out other regions to find enough physical
memory for shared buffers. But if data set fits in memory, then subsequent queries will be much faster. And it is quite common for modern servers
that size of shared buffers is comparable with database size.

I will be pleased you point me at some drawbacks of such approach.
Otherwise I can propose patch for commitfest.

pgsql-hackers by date:

From: Mikhail Gribkov
Date: 15 March 2023, 23:23:12
Subject: Re: On login trigger: take three

From: Andrew Dunstan
Date: 15 March 2023, 23:39:27
Subject: Re: Add a hook to allow modification of the ldapbindpasswd

Speed-up shared buffers prewarming - Mailing list pgsql-hackers

Previous

Next