Thread: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Hi, The use of the ringbuffer in VACUUM often causes very substantial slowdowns. The primary reason for that is that often most/all the buffers in the ringbuffer have been dirtied when they were processed, with an associated WAL record. When we then reuse the buffer via the (quite small) ringbuffer, we need to flush the WAL before reclaiming the buffer. A synchronous flush of the WAL every few buffers ends up as a very significant bottleneck, unless you have local low-latency durable storage (i.e. local storage with some sort of power protection). The slowdown caused by the frequent WAL flushes is very significant. A secondary issue, when we end up doing multiple passes, we'll have to re-read data into shared_buffers, when we just wrote it out and evicted it. An example: On a local SSD with decently fast fdatasync([1]). Table size is 3322MB, with ~10% updated, ~30% deleted tuples, and a single index. m_w_m is large enough to do this in one pass. I used pg_prewarm() of another relation to ensure the vacuumed table isn't in s_b (otherwise ringbuffers aren't doing anything). s_b ringbuffer enabled time wal_syncs wal_sync_time 128MB 1 77797ms 128MB 0 13676ms 241 2538ms 8GB 1 72834ms 23976 51989ms 8GB 0 9544ms 150 1634ms see [2] for logs / stats of the 8GB run. All the data here is in the OS page cache, so we don't even pay the real-price for reading the data multiple times. On cloud hardware with higher fsync latency I've seen > 15x time differences between using the ringbuffers and avoiding them by using pg_prewarm. Of course there's a good reason we have the ringbuffer - we don't want maintenance operations to completely disturb the buffer pool and harm the production workload. But if, e.g., the database isn't available due to anti-wraparound measures, there's no other workload to protect, and the ringbuffer substantially reduces availability. Initial data loads could similarly benefit. Therefore I'd like to add an option to the VACUUM command to use to disable the use of the ringbuffer. Not sure about the name yet. I think we should auto-enable that mode once we're using the failsafe mode, similar to [auto]vacuum cost delays getting disabled (c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're soon going to shut down, we want to be aggressive. Greetings, Andres Freund [1] according to pg_test_fsync: fdatasync 769.189 ops/sec 1300 usecs/op [2] For the s_b 128MB case: ringbuffers enabled: 2023-01-11 10:24:58.726 PST [355353][client backend][2/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0" 2023-01-11 10:26:19.488 PST [355353][client backend][2/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0":index scans: 1 pages: 0 removed, 424975 remain, 424975 scanned (100.00% of total) tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable removable cutoff: 2981, which was 0 XIDs old when operation ended new relfrozenxid: 2981, which is 102 XIDs ahead of previous value frozen: 424975 pages from table (100.00% of total) had 6666700 tuples frozen index scan needed: 424975 pages from table (100.00% of total) had 4325101 dead item identifiers removed index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable I/O timings: read: 2284.292 ms, write: 4325.009 ms avg read rate: 83.203 MB/s, avg write rate: 83.199 MB/s buffer usage: 425044 hits, 860113 misses, 860074 dirtied WAL usage: 1709902 records, 434990 full page images, 2273501683 bytes system usage: CPU: user: 11.62 s, system: 11.86 s, elapsed: 80.76 s ┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐ │ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │ ├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤ │ 8092795 │ 443356 │ 2999358740 │ 1569 │ 28651 │ 27081 │ 1874.391 │ 59895.674 │ 2023-01-1110:24:58.664859-08 │ └─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘ ringbuffers disabled: 2023-01-11 10:23:05.081 PST [355054][client backend][2/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0" 2023-01-11 10:23:18.755 PST [355054][client backend][2/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0":index scans: 1 pages: 0 removed, 424979 remain, 424979 scanned (100.00% of total) tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable removable cutoff: 2879, which was 0 XIDs old when operation ended new relfrozenxid: 2879, which is 102 XIDs ahead of previous value frozen: 424979 pages from table (100.00% of total) had 6666700 tuples frozen index scan needed: 424979 pages from table (100.00% of total) had 4325176 dead item identifiers removed index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable I/O timings: read: 1247.366 ms, write: 2888.756 ms avg read rate: 491.485 MB/s, avg write rate: 491.395 MB/s buffer usage: 424927 hits, 860242 misses, 860083 dirtied WAL usage: 1709918 records, 434994 full page images, 2273503049 bytes system usage: CPU: user: 5.42 s, system: 6.26 s, elapsed: 13.67 s ┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐ │ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │ ├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤ │ 8092963 │ 443362 │ 2999373996 │ 212190 │ 212333 │ 241 │ 1209.516 │ 2538.706 │ 2023-01-1110:23:05.004783-08 │ └─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘ For the s_b 8GB case: ringbuffers enabled: 2023-01-11 10:04:12.479 PST [352665][client backend][2/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0" 2023-01-11 10:05:25.312 PST [352665][client backend][2/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0":index scans: 1 pages: 0 removed, 424977 remain, 424977 scanned (100.00% of total) tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable removable cutoff: 2675, which was 0 XIDs old when operation ended new relfrozenxid: 2675, which is 102 XIDs ahead of previous value frozen: 424977 pages from table (100.00% of total) had 6666700 tuples frozen index scan needed: 424977 pages from table (100.00% of total) had 4325066 dead item identifiers removed index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable I/O timings: read: 2610.875 ms, write: 4177.842 ms avg read rate: 81.688 MB/s, avg write rate: 87.515 MB/s buffer usage: 523611 hits, 761552 misses, 815868 dirtied WAL usage: 1709910 records, 434992 full page images, 2273502729 bytes system usage: CPU: user: 11.00 s, system: 11.86 s, elapsed: 72.83 s ┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐ │ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │ ├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤ │ 8092707 │ 443358 │ 2999354090 │ 42259 │ 66227 │ 23976 │ 2050.963 │ 51989.099 │ 2023-01-1110:04:12.404054-08 │ └─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘ ringbuffers disabled: 2023-01-11 10:08:48.414 PST [353287][client backend][3/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0" 2023-01-11 10:08:57.956 PST [353287][client backend][3/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0":index scans: 1 pages: 0 removed, 424977 remain, 424977 scanned (100.00% of total) tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable removable cutoff: 2777, which was 0 XIDs old when operation ended new relfrozenxid: 2777, which is 102 XIDs ahead of previous value frozen: 424977 pages from table (100.00% of total) had 6666700 tuples frozen index scan needed: 424976 pages from table (100.00% of total) had 4325153 dead item identifiers removed index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable I/O timings: read: 1040.230 ms, write: 0.000 ms avg read rate: 312.016 MB/s, avg write rate: 356.242 MB/s buffer usage: 904078 hits, 381084 misses, 435101 dirtied WAL usage: 1709908 records, 434992 full page images, 2273499663 bytes system usage: CPU: user: 5.57 s, system: 2.26 s, elapsed: 9.54 s ┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐ │ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │ ├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤ │ 8092933 │ 443358 │ 2999364596 │ 236354 │ 236398 │ 150 │ 1166.314 │ 1634.408 │ 2023-01-1110:08:48.350328-08 │ └─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘
On Wed, Jan 11, 2023 at 10:27 AM Andres Freund <andres@anarazel.de> wrote: > Therefore I'd like to add an option to the VACUUM command to use to disable > the use of the ringbuffer. Not sure about the name yet. Sounds like a good idea. > I think we should auto-enable that mode once we're using the failsafe mode, > similar to [auto]vacuum cost delays getting disabled > (c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're > soon going to shut down, we want to be aggressive. +1 -- Peter Geoghegan
Hi, On 2023-01-11 10:27:20 -0800, Andres Freund wrote: > On cloud hardware with higher fsync latency I've seen > 15x time differences > between using the ringbuffers and avoiding them by using pg_prewarm. A slightly edited version of what I've in the past to defeat the ringbuffers using pg_prewarm, as I think it might be useful for others: WITH what_rel AS ( SELECT 'copytest_0'::regclass AS vacuum_me ), what_to_prefetch AS ( SELECT vacuum_me, greatest(heap_blks_total - 1, 0) AS last_block, CASE WHEN phase = 'scanning heap' THEN heap_blks_scanned ELSE heap_blks_vacuumed END AS current_pos FROM what_rel, pg_stat_progress_vacuum WHERE relid = vacuum_me AND phase IN ('scanning heap', 'vacuuming heap') ) SELECT vacuum_me, current_pos, pg_prewarm(vacuum_me, 'buffer', 'main', current_pos, least(current_pos + 10000, last_block)) FROM what_to_prefetch \watch 0.1 Having this running in the background brings the s_b=128MB, ringbuffer enabled case down from 77797ms to 14838ms. Close to the version with the ringbuffer disabled. Unfortunately, afaik, that trick isn't currently possible for the index vacuum phase, as we don't yet expose the current scan position. And not every index might be as readily prefetchable as just prefetching the next 10k blocks from the current position. That's not too bad if your indexes are small, but unfortunately that's not always the case... Greetings, Andres Freund
Hi, On 2023-01-11 10:35:19 -0800, Peter Geoghegan wrote: > On Wed, Jan 11, 2023 at 10:27 AM Andres Freund <andres@anarazel.de> wrote: > > Therefore I'd like to add an option to the VACUUM command to use to disable > > the use of the ringbuffer. Not sure about the name yet. > > Sounds like a good idea. Any idea about the name? The obvious thing is to reference ring buffers in the option name, but that's more of an implementation detail... Some ideas: USE_RING_BUFFERS on|off SCAN_PROTECTION on|off REUSE_BUFFERS on|off LIMIT_BUFFER_USAGE on|off Regards, Andres
On Wed, Jan 11, 2023 at 10:58 AM Andres Freund <andres@anarazel.de> wrote: > Any idea about the name? The obvious thing is to reference ring buffers in the > option name, but that's more of an implementation detail... What are the chances that anybody using this feature via a manual VACUUM command will also use INDEX_CLEANUP off? It's not really supposed to be used routinely, at all. Right? It's just for emergencies. Perhaps it can be tied to INDEX_CLEANUP=off? That makes it hard to get just the behavior you want when testing VACUUM, but maybe that doesn't matter. Realistically, most of the value here comes from changing the failsafe behavior, which doesn't require the user to know anything about VACUUM. I know that AWS has reduced the vacuum_failsafe_age default on RDS to 1.2 billion (a decision made before I joined Amazon), so it is already something AWS lean on quite a bit where available. -- Peter Geoghegan
Hi, On 2023-01-11 11:06:26 -0800, Peter Geoghegan wrote: > On Wed, Jan 11, 2023 at 10:58 AM Andres Freund <andres@anarazel.de> wrote: > > Any idea about the name? The obvious thing is to reference ring buffers in the > > option name, but that's more of an implementation detail... > > What are the chances that anybody using this feature via a manual > VACUUM command will also use INDEX_CLEANUP off? It's not really > supposed to be used routinely, at all. Right? It's just for > emergencies. I think it's also quite useful for e.g. vacuuming after initial data loads or if you need to do a first vacuum after a lot of bloat accumulated due to a stuck transaction. > Perhaps it can be tied to INDEX_CLEANUP=off? That makes it hard to get > just the behavior you want when testing VACUUM, but maybe that doesn't > matter. I don't like that - it's also quite useful to disable use of ringbuffers when you actually need to clean up indexes. Especially when we have a lot of dead tuples we'll rescan indexes over and over... Greetings, Andres Freund
On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote: > I don't like that - it's also quite useful to disable use of ringbuffers when > you actually need to clean up indexes. Especially when we have a lot of dead > tuples we'll rescan indexes over and over... That's a fair point. My vote goes to "REUSE_BUFFERS", then. -- Peter Geoghegan
On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote: > Hi, > > On 2023-01-11 10:35:19 -0800, Peter Geoghegan wrote: > > On Wed, Jan 11, 2023 at 10:27 AM Andres Freund <andres@anarazel.de> wrote: > > > Therefore I'd like to add an option to the VACUUM command to use to disable > > > the use of the ringbuffer. Not sure about the name yet. > > > > Sounds like a good idea. > > Any idea about the name? The obvious thing is to reference ring buffers in the > option name, but that's more of an implementation detail... > > Some ideas: > > USE_RING_BUFFERS on|off > REUSE_BUFFERS on|off +1 for either of these. I don't think it's an issue to expose implementation details here. Anyone who wants to change this will know about the implementation details that they're changing, and it's better to refer to it by the same/similar name and not by some other name that's hard to find. BTW I can't see that the ring buffer is currently exposed in any user-facing docs for COPY/ALTER/VACUUM/CREATE ? -- Justin
Hi, On 2023-01-11 14:38:34 -0600, Justin Pryzby wrote: > On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote: > > Some ideas: > > > > USE_RING_BUFFERS on|off > > REUSE_BUFFERS on|off > > +1 for either of these. Then I'd go for REUSE_BUFFERS. What made you prefer it over LIMIT_BUFFER_USAGE? USE_BUFFER_ACCESS_STRATEGY would be a name tied to the implementation that's not awful, I think.. > I don't think it's an issue to expose implementation details here. > Anyone who wants to change this will know about the implementation > details that they're changing, and it's better to refer to it by the > same/similar name and not by some other name that's hard to find. A ringbuffer could refer to a lot of things other than something limiting buffer usage, that's why I don't like it. > BTW I can't see that the ring buffer is currently exposed in any > user-facing docs for COPY/ALTER/VACUUM/CREATE ? Yea, there's surprisingly little in the docs about it, given how much it influences behaviour. It's mentioned in tablesample-method.sgml, but without explanation - and it's a page documenting C API... Greetings, Andres Freund
Peter Geoghegan <pg@bowt.ie> writes: > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote: >> I don't like that - it's also quite useful to disable use of ringbuffers when >> you actually need to clean up indexes. Especially when we have a lot of dead >> tuples we'll rescan indexes over and over... > That's a fair point. > My vote goes to "REUSE_BUFFERS", then. I wonder whether it could make sense to allow a larger ringbuffer size, rather than just the limit cases of "on" and "off". regards, tom lane
Hi, On 2023-01-11 16:18:34 -0500, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote: > >> I don't like that - it's also quite useful to disable use of ringbuffers when > >> you actually need to clean up indexes. Especially when we have a lot of dead > >> tuples we'll rescan indexes over and over... > > > That's a fair point. > > > My vote goes to "REUSE_BUFFERS", then. > > I wonder whether it could make sense to allow a larger ringbuffer size, > rather than just the limit cases of "on" and "off". I can see that making sense, particularly if we were to later extend this to other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading of data > 16MB but also << s_b vastly slower, but it can still be very important to use if there's lots of parallel processes loading data. Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the default value, 0 preventing use of a buffer access strategy, and 1..N indicating the size in blocks? Would we want to set an upper limit lower than implied by the memory limit for the BufferAccessStrategy allocation? Greetings, Andres Freund
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
From
"David G. Johnston"
Date:
On Wed, Jan 11, 2023 at 2:39 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-01-11 16:18:34 -0500, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote:
> >> I don't like that - it's also quite useful to disable use of ringbuffers when
> >> you actually need to clean up indexes. Especially when we have a lot of dead
> >> tuples we'll rescan indexes over and over...
>
> > That's a fair point.
>
> > My vote goes to "REUSE_BUFFERS", then.
>
> I wonder whether it could make sense to allow a larger ringbuffer size,
> rather than just the limit cases of "on" and "off".
I can see that making sense, particularly if we were to later extend this to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading of
data > 16MB but also << s_b vastly slower, but it can still be very important
to use if there's lots of parallel processes loading data.
Should we just add "ring_buffers" to the existing "shared_buffers" and "temp_buffers" settings?
Then give VACUUM a (BUFFER_POOL=ring*|shared) option?
I think making DBAs aware of this dynamic and making the ring buffer usage user-facing is beneficial in its own right (at least, the concept that changes done by vacuum don't impact shared_buffers, regardless of how that non-impact manifests). But I don't see much benefit trying to come up with a different name.
David J.
Hi, On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > Should we just add "ring_buffers" to the existing "shared_buffers" and > "temp_buffers" settings? The different types of ring buffers have different sizes, for good reasons. So I don't see that working well. I also think it'd be more often useful to control this on a statement basis - if you have a parallel import tool that starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of course each session can change the ring buffer settings, but still. > Then give VACUUM a (BUFFER_POOL=ring*|shared) option? That seems likely to mislead, because it'd still use shared buffers when the blocks are already present. The ring buffers aren't a separate buffer pool, they're a subset of the normal bufferpool. Lookup is done normally, only when a page isn't found, the search for a victim buffer first tries to use a buffer from the ring. > I think making DBAs aware of this dynamic and making the ring buffer usage > user-facing is beneficial in its own right (at least, the concept that > changes done by vacuum don't impact shared_buffers, regardless of how that > non-impact manifests). VACUUM can end up dirtying all of shared buffers, even with the ring buffer in use... Greetings, Andres Freund
Hi,
So, I attached a rough implementation of both the autovacuum failsafe
reverts to shared buffers and the vacuum option (no tests or docs or
anything).
The first three patches in the set are just for enabling use of shared
buffers in failsafe mode for autovacuum. I haven't actually ensured it
works (i.e. triggering failsafe mode and checking the stats for whether
or not shared buffers were used).
I was wondering about the status of the autovacuum wraparound failsafe
test suggested in [1]. I don't see it registered for the March's
commitfest. I'll probably review it since it will be useful for this
patchset.
reverts to shared buffers and the vacuum option (no tests or docs or
anything).
The first three patches in the set are just for enabling use of shared
buffers in failsafe mode for autovacuum. I haven't actually ensured it
works (i.e. triggering failsafe mode and checking the stats for whether
or not shared buffers were used).
I was wondering about the status of the autovacuum wraparound failsafe
test suggested in [1]. I don't see it registered for the March's
commitfest. I'll probably review it since it will be useful for this
patchset.
The first patch in the set is to free the BufferAccessStrategy object
that is made in do_autovacuum() -- I don't see when the memory context
it is allocated in is destroyed, so it seems like it might be a leak?
that is made in do_autovacuum() -- I don't see when the memory context
it is allocated in is destroyed, so it seems like it might be a leak?
The last patch in the set is a trial implementation of the VACUUM option
suggested -- BUFFER_USAGE_LIMIT. More on that below.
suggested -- BUFFER_USAGE_LIMIT. More on that below.
On Wed, Jan 11, 2023 at 4:39 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-01-11 16:18:34 -0500, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote:
> >> I don't like that - it's also quite useful to disable use of ringbuffers when
> >> you actually need to clean up indexes. Especially when we have a lot of dead
> >> tuples we'll rescan indexes over and over...
>
> > That's a fair point.
>
> > My vote goes to "REUSE_BUFFERS", then.
>
> I wonder whether it could make sense to allow a larger ringbuffer size,
> rather than just the limit cases of "on" and "off".
I can see that making sense, particularly if we were to later extend this to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading of
data > 16MB but also << s_b vastly slower, but it can still be very important
to use if there's lots of parallel processes loading data.
Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
default value, 0 preventing use of a buffer access strategy, and 1..N
indicating the size in blocks?
I have found the implementation you suggested very hard to use.
The attached fourth patch in the set implements it the way you suggest.
I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
since I don't specify shared buffers in units of nbuffer, it's pretty
annoying to have to figure out a valid number.
I think that it would be better to have it be either a percentage of
shared buffers or a size in units of bytes/kb/mb like that of shared
buffers.
Using a fraction or percentage appeals to me because you don't need to
reference your shared buffers setting and calculate what size you want
to set it to. Also, parsing the size in different units sounds like more work.
Unfortunately, the fraction doesn't really work if we cap the ring size
of a buffer access strategy to NBuffers / 8. Also, there are other
issues like what would 0% and 100% mean.
I have a list of other questions, issues, and TODOs related to the code
I wrote to implement BUFFER_USAGE_LIMIT, but I'm not sure those are
worth discussing until we shape up the interface.
since I don't specify shared buffers in units of nbuffer, it's pretty
annoying to have to figure out a valid number.
I think that it would be better to have it be either a percentage of
shared buffers or a size in units of bytes/kb/mb like that of shared
buffers.
Using a fraction or percentage appeals to me because you don't need to
reference your shared buffers setting and calculate what size you want
to set it to. Also, parsing the size in different units sounds like more work.
Unfortunately, the fraction doesn't really work if we cap the ring size
of a buffer access strategy to NBuffers / 8. Also, there are other
issues like what would 0% and 100% mean.
I have a list of other questions, issues, and TODOs related to the code
I wrote to implement BUFFER_USAGE_LIMIT, but I'm not sure those are
worth discussing until we shape up the interface.
Would we want to set an upper limit lower than implied by the memory limit for
the BufferAccessStrategy allocation?
So, I was wondering what you thought about NBuffers / 8 (the current
limit). Does it make sense?
If we clamp the user-specified value to this, I think we definitely need
to inform them through some kind of logging or message. I am sure there
are lots of other gucs doing this -- do you know any off the top of your
head?
limit). Does it make sense?
If we clamp the user-specified value to this, I think we definitely need
to inform them through some kind of logging or message. I am sure there
are lots of other gucs doing this -- do you know any off the top of your
head?
- Melanie
Attachment
Hi, On 2023-02-22 16:32:53 -0500, Melanie Plageman wrote: > I was wondering about the status of the autovacuum wraparound failsafe > test suggested in [1]. I don't see it registered for the March's > commitfest. I'll probably review it since it will be useful for this > patchset. It's pretty hard to make it work reliably. I was suggesting somewhere that we ought to add a EMERGENCY parameter to manual VACUUMs to allow testing that path a tad more easily. > The first patch in the set is to free the BufferAccessStrategy object > that is made in do_autovacuum() -- I don't see when the memory context > it is allocated in is destroyed, so it seems like it might be a leak? The backend shuts down just after, so that's not a real issue. Not that it'd hurt to fix it. > > I can see that making sense, particularly if we were to later extend this > > to > > other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading > > of > > data > 16MB but also << s_b vastly slower, but it can still be very > > important > > to use if there's lots of parallel processes loading data. > > > > Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the > > default value, 0 preventing use of a buffer access strategy, and 1..N > > indicating the size in blocks? > I have found the implementation you suggested very hard to use. > The attached fourth patch in the set implements it the way you suggest. > I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and, > since I don't specify shared buffers in units of nbuffer, it's pretty > annoying to have to figure out a valid number. I think we should be able to parse it in a similar way to how we parse shared_buffers. You could even implement this as a GUC that is then set by VACUUM (similar to how VACUUM FREEZE is implemented). > I think that it would be better to have it be either a percentage of > shared buffers or a size in units of bytes/kb/mb like that of shared > buffers. I don't think a percentage of shared buffers works particularly well - you very quickly run into the ringbuffer becoming impractically big. > > Would we want to set an upper limit lower than implied by the memory limit > > for > > the BufferAccessStrategy allocation? > > > > > So, I was wondering what you thought about NBuffers / 8 (the current > limit). Does it make sense? That seems *way* too big. Imagine how large allocations we'd end up with a shared_buffers size of a few TB. I'd probably make it a hard error at 1GB and a silent cap at NBuffers / 2 or such. > @@ -547,7 +547,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace, > state->targetcontext = AllocSetContextCreate(CurrentMemoryContext, > "amcheck context", > ALLOCSET_DEFAULT_SIZES); > - state->checkstrategy = GetAccessStrategy(BAS_BULKREAD); > + state->checkstrategy = GetAccessStrategy(BAS_BULKREAD, -1); > > /* Get true root block from meta-page */ > metapage = palloc_btree_page(state, BTREE_METAPAGE); Changing this everywhere seems pretty annoying, particularly because I suspect a bunch of extensions also use GetAccessStrategy(). How about a GetAccessStrategyExt(), GetAccessStrategyCustomSize() or such? > BufferAccessStrategy > -GetAccessStrategy(BufferAccessStrategyType btype) > +GetAccessStrategy(BufferAccessStrategyType btype, int buffers) > { > BufferAccessStrategy strategy; > int ring_size; > + const char *strategy_name = btype_get_name(btype); Shouldn't be executed when we don't need it. > + if (btype != BAS_VACUUM) > + { > + if (buffers == 0) > + elog(ERROR, "Use of shared buffers unsupported for buffer access strategy: %s. nbuffers must be -1.", > + strategy_name); > + > + if (buffers > 0) > + elog(ERROR, "Specification of ring size in buffers unsupported for buffer access strategy: %s. nbuffers mustbe -1.", > + strategy_name); > + } > + > + // TODO: DEBUG logging message for dev? > + if (buffers == 0) > + btype = BAS_NORMAL; GetAccessStrategy() often can be executed hundreds of thousands of times a second, so I'm very sceptical that adding log messages to it useful. Greetings, Andres Freund
On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > "temp_buffers" settings? > > The different types of ring buffers have different sizes, for good reasons. So > I don't see that working well. I also think it'd be more often useful to > control this on a statement basis - if you have a parallel import tool that > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > course each session can change the ring buffer settings, but still. How about having GUCs for each ring buffer (bulk_read_ring_buffers, bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? These options can help especially when statement level controls aren't easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If needed users can also set them at the system level. For instance, one can set bulk_write_ring_buffers to other than 16MB or -1 to disable the ring buffer to use shared_buffers and run a bunch of bulk write queries. Although I'm not quite opposing the idea of statement level controls (like the VACUUM one proposed here), it is better to make these ring buffer sizes configurable across the system to help with the other similar cases e.g., a CTAS or RMV can help subsequent reads from shared buffers if ring buffer is skipped. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Feb 23, 2023 at 3:03 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > > Hi, > > So, I attached a rough implementation of both the autovacuum failsafe > reverts to shared buffers and the vacuum option (no tests or docs or > anything). Thanks for the patches. I have some comments. 0001: 1. I don't quite understand the need for this 0001 patch. Firstly, bstrategy allocated once per autovacuum worker in AutovacMemCxt which goes away with the process. Secondly, the worker exits after do_autovacuum() with which memory context is gone. I think this patch is unnecessary unless I'm missing something. 0002: 1. Don't we need to remove vac_strategy for analyze.c as well? It's pretty-meaningless there than vacuum.c as we're passing bstrategy to all required functions. 0004: 1. I think no multiple sentences in a single error message. How about "of %d, changing it to %d"? + elog(WARNING, "buffer_usage_limit %d is below the minimum buffer_usage_limit of %d. setting it to %d", 2. Typically, postgres error messages start with lowercase letters, hints and detail messages start with uppercase letters. + if (buffers == 0) + elog(ERROR, "Use of shared buffers unsupported for buffer access strategy: %s. nbuffers must be -1.", + strategy_name); + + if (buffers > 0) + elog(ERROR, "Specification of ring size in buffers unsupported for buffer access strategy: %s. nbuffers must be -1.", + strategy_name); 3. A function for this seems unnecessary, especially when a static array would do the needful, something like forkNames[]. +static const char * +btype_get_name(BufferAccessStrategyType btype) +{ + switch (btype) + { 4. Why are these assumptions needed? Can't we simplify by doing validations on the new buffers parameter only when the btype is BAS_VACUUM? + if (buffers == 0) + elog(ERROR, "Use of shared buffers unsupported for buffer access strategy: %s. nbuffers must be -1.", + strategy_name); + // TODO: DEBUG logging message for dev? + if (buffers == 0) + btype = BAS_NORMAL; 5. Is this change needed for this patch? default: elog(ERROR, "unrecognized buffer access strategy: %d", - (int) btype); - return NULL; /* keep compiler quiet */ + (int) btype); + + pg_unreachable(); -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Thank you to all reviewers! This email is in answer to the three reviews done since my last version. Attached is v2 and inline below is replies to all the review comments. The main difference between this version and the previous version is that I've added a guc, buffer_usage_limit and the VACUUM option BUFFER_USAGE_LIMIT is now to be specified in size (like kB, MB, etc). I currently only use the guc value for VACUUM, but it is meant to be used for all buffer access strategies and is configurable at the session level. I would prefer that we had the option of resizing the buffer access strategy object per table being autovacuumed. Since autovacuum reloads the config file between tables, this would be quite possible. I started implementing this, but stopped because the code is not really in a good state for that. In fact, I'm not very happy with my implementation at all because I think given the current structure of vacuum() and vacuum_rel(), it will potentially make the code more confusing. I don't like how autovacuum and vacuum use vacuum_rel() and vacuum() differently (autovacuum always calls vacuum() with a list containing a single relation). And vacuum() takes buffer access strategy as a parameter, supposedly so that autovacuum can change the buffer access strategy object per call, but it doesn't do that. And then vacuum() and vacuum_rel() go and access VacuumParams at various places with no rhyme or reason -- seemingly just based on the random availability of other objects whose state they would like to check on. So, IMO, in adding a "buffers" parameter to VacuumParams, I am asking for confusion in autovacuum code with table-level VacuumParams containing an value for buffers that isn't used. On Mon, Feb 27, 2023 at 4:21 PM Andres Freund <andres@anarazel.de> wrote: > On 2023-02-22 16:32:53 -0500, Melanie Plageman wrote: > > The first patch in the set is to free the BufferAccessStrategy object > > that is made in do_autovacuum() -- I don't see when the memory context > > it is allocated in is destroyed, so it seems like it might be a leak? > > The backend shuts down just after, so that's not a real issue. Not that it'd > hurt to fix it. I've dropped that patch from the set. > > > I can see that making sense, particularly if we were to later extend this > > > to > > > other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading > > > of > > > data > 16MB but also << s_b vastly slower, but it can still be very > > > important > > > to use if there's lots of parallel processes loading data. > > > > > > Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the > > > default value, 0 preventing use of a buffer access strategy, and 1..N > > > indicating the size in blocks? > > > I have found the implementation you suggested very hard to use. > > The attached fourth patch in the set implements it the way you suggest. > > I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and, > > since I don't specify shared buffers in units of nbuffer, it's pretty > > annoying to have to figure out a valid number. > > I think we should be able to parse it in a similar way to how we parse > shared_buffers. You could even implement this as a GUC that is then set by > VACUUM (similar to how VACUUM FREEZE is implemented). in the attached v2, I've used parse_int() to do this. > > I think that it would be better to have it be either a percentage of > > shared buffers or a size in units of bytes/kb/mb like that of shared > > buffers. > > I don't think a percentage of shared buffers works particularly well - you > very quickly run into the ringbuffer becoming impractically big. It is now a size. > > > Would we want to set an upper limit lower than implied by the memory limit > > > for > > > the BufferAccessStrategy allocation? > > > > > > > > So, I was wondering what you thought about NBuffers / 8 (the current > > limit). Does it make sense? > > That seems *way* too big. Imagine how large allocations we'd end up with a > shared_buffers size of a few TB. > > I'd probably make it a hard error at 1GB and a silent cap at NBuffers / 2 or > such. Well, as I mentioned NBuffers / 8 is the current GetAccessStrategy() cap. In the attached patchset, I have introduced a hard cap of 16GB which is enforced for the VACUUM option and for the buffer_usage_limit guc. I kept the "silent cap" at NBuffers / 8 but am open to changing it to NBuffers / 2 if we think it is okay for its silent cap to be different than GetAccessStrategy()'s cap. > > @@ -547,7 +547,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace, > > state->targetcontext = AllocSetContextCreate(CurrentMemoryContext, > > "amcheck context", > > ALLOCSET_DEFAULT_SIZES); > > - state->checkstrategy = GetAccessStrategy(BAS_BULKREAD); > > + state->checkstrategy = GetAccessStrategy(BAS_BULKREAD, -1); > > > > /* Get true root block from meta-page */ > > metapage = palloc_btree_page(state, BTREE_METAPAGE); > > Changing this everywhere seems pretty annoying, particularly because I suspect > a bunch of extensions also use GetAccessStrategy(). How about a > GetAccessStrategyExt(), GetAccessStrategyCustomSize() or such? Yes, I don't know what I was thinking. Changed it to GetAccessStrategyExt() -- though now I am thinking I don't like Ext and want to change it. > > BufferAccessStrategy > > -GetAccessStrategy(BufferAccessStrategyType btype) > > +GetAccessStrategy(BufferAccessStrategyType btype, int buffers) > > { > > BufferAccessStrategy strategy; > > int ring_size; > > + const char *strategy_name = btype_get_name(btype); > > Shouldn't be executed when we don't need it. I got rid of it for now. > > + if (btype != BAS_VACUUM) > > + { > > + if (buffers == 0) > > + elog(ERROR, "Use of shared buffers unsupported for buffer access strategy: %s. nbuffers must be-1.", > > + strategy_name); > > + > > + if (buffers > 0) > > + elog(ERROR, "Specification of ring size in buffers unsupported for buffer access strategy: %s.nbuffers must be -1.", > > + strategy_name); > > + } > > + > > + // TODO: DEBUG logging message for dev? > > + if (buffers == 0) > > + btype = BAS_NORMAL; > > GetAccessStrategy() often can be executed hundreds of thousands of times a > second, so I'm very sceptical that adding log messages to it useful. So, in the case of vacuum and autovacuum, I don't see how GetAccessStrategyExt() could be called hundreds of thousands of times a second. It is not even called for each table being vacuumed -- it is only called before vacuuming a list of tables. On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > "temp_buffers" settings? > > > > The different types of ring buffers have different sizes, for good reasons. So > > I don't see that working well. I also think it'd be more often useful to > > control this on a statement basis - if you have a parallel import tool that > > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > > course each session can change the ring buffer settings, but still. > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > These options can help especially when statement level controls aren't > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > needed users can also set them at the system level. For instance, one > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > the ring buffer to use shared_buffers and run a bunch of bulk write > queries. So, I've rebelled a bit and implemented a single guc, buffer_usage_limit, in the attached patchset. Users can set it at the session or system level or they can specify BUFFER_USAGE_LIMIT to vacuum. It is the same size for all operations. By default all of this would be the same as it is now. The attached patchset does not use the guc for any operations except VACUUM, though. I will add on another patch if people still feel strongly that we cannot have a single guc. If the other operations use this guc, I think we could get much of the same flexibility as having multiple gucs by just being able to set it at the session level (or having command options). > Although I'm not quite opposing the idea of statement level controls > (like the VACUUM one proposed here), it is better to make these ring > buffer sizes configurable across the system to help with the other > similar cases e.g., a CTAS or RMV can help subsequent reads from > shared buffers if ring buffer is skipped. Yes, I've done both. On Tue, Feb 28, 2023 at 3:52 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Feb 23, 2023 at 3:03 AM Melanie Plageman > 0001: > 1. I don't quite understand the need for this 0001 patch. Firstly, > bstrategy allocated once per autovacuum worker in AutovacMemCxt which > goes away with the process. Secondly, the worker exits after > do_autovacuum() with which memory context is gone. I think this patch > is unnecessary unless I'm missing something. I've dropped this one. > 0002: > 1. Don't we need to remove vac_strategy for analyze.c as well? It's > pretty-meaningless there than vacuum.c as we're passing bstrategy to > all required functions. So, it is a bit harder to remove it from analyze because acquire_func func doesn't take the buffer access strategy as a parameter and acquire_sample_rows uses the vac_context global variable to pass to table_scan_analyze_next_block(). We could change acquire_func, but it looks like FDW uses it, so I'm not sure. It would be more for consistency than as a performance win, as I imagine analyze is less of a problem than vacuum (i.e. it is probably reading fewer blocks and probably not dirtying them [unless it does on-access pruning?]). I haven't done this in the attached set. > 0004: > 1. I think no multiple sentences in a single error message. How about > "of %d, changing it to %d"? > + elog(WARNING, "buffer_usage_limit %d is below the > minimum buffer_usage_limit of %d. setting it to %d", I've removed this message, but if I put back a message about clamping, I will remember this note. > 2. Typically, postgres error messages start with lowercase letters, > hints and detail messages start with uppercase letters. > + if (buffers == 0) > + elog(ERROR, "Use of shared buffers unsupported for buffer > access strategy: %s. nbuffers must be -1.", > + strategy_name); > + > + if (buffers > 0) > + elog(ERROR, "Specification of ring size in buffers > unsupported for buffer access strategy: %s. nbuffers must be -1.", > + strategy_name); Thanks! I've removed some of the error messages for now, but, for the ones I kept, I tthink they are consistent now with this pattern. > 3. A function for this seems unnecessary, especially when a static > array would do the needful, something like forkNames[]. > +static const char * > +btype_get_name(BufferAccessStrategyType btype) > +{ > + switch (btype) > + { I've removed it for now. > > 4. Why are these assumptions needed? Can't we simplify by doing > validations on the new buffers parameter only when the btype is > BAS_VACUUM? > + if (buffers == 0) > + elog(ERROR, "Use of shared buffers unsupported for buffer > access strategy: %s. nbuffers must be -1.", > + strategy_name); > > + // TODO: DEBUG logging message for dev? > + if (buffers == 0) > + btype = BAS_NORMAL; So, I've moved validation to the vacuum option parsing for the vacuum option and am using the guc infrastructure to check min and max for the guc value. > 5. Is this change needed for this patch? > default: > elog(ERROR, "unrecognized buffer access strategy: %d", > - (int) btype); > - return NULL; /* keep compiler quiet */ > + (int) btype); > + > + pg_unreachable(); The pg_unreachable() is removed, as I've left GetAccessStrategy() untouched. - Melanie
Attachment
> On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > > "temp_buffers" settings? > > > > > > The different types of ring buffers have different sizes, for good reasons. So > > > I don't see that working well. I also think it'd be more often useful to > > > control this on a statement basis - if you have a parallel import tool that > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > > > course each session can change the ring buffer settings, but still. > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > These options can help especially when statement level controls aren't > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > needed users can also set them at the system level. For instance, one > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > the ring buffer to use shared_buffers and run a bunch of bulk write > > queries. In attached v3, I've changed the name of the guc from buffer_usage_limit to vacuum_buffer_usage_limit, since it is only used for vacuum and autovacuum. I haven't added the other suggested strategy gucs, as those could easily be done in a future patchset. I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize() -- similar to initArrayResultWithSize(). And I've added tab completion for BUFFER_USAGE_LIMIT so that it is easier to try out my patch. Most of the TODOs in the code are related to the question of how autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates the buffer access strategy ring in do_autovacuum() before looping through and vacuuming tables. It passes this strategy object on to vacuum(). Since we reuse the same strategy object for all tables in a given invocation of do_autovacuum(), only failsafe autovacuum would change buffer access strategies. This is probably okay, but it does mean that the table-level VacuumParams variable, ring_size, means something different for autovacuum than vacuum. Autovacuum workers will always have set it to -1. We won't ever reach code in vacuum() which relies on VacuumParams->ring_size as long as autovacuum workers pass a non-NULL BufferAccessStrategy object to vacuum(), though. - Melanie
Attachment
On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote: > Subject: [PATCH v3 2/3] use shared buffers when failsafe active > > + /* > + * Assume the caller who allocated the memory for the > + * BufferAccessStrategy object will free it. > + */ > + vacrel->bstrategy = NULL; This comment could use elaboration: ** VACUUM normally restricts itself to a small ring buffer; but in failsafe mode, in order to process tables as quickly as possible, allow the leaving behind large number of dirty buffers. > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc > #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var)))) > +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ) Macros are normally be capitalized It's a good idea to write "(bufsize)", in case someone passes "a+b". > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > +BufferAccessStrategy > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) Maybe it would make sense for GetAccessStrategy() to call GetAccessStrategyWithSize(). Or maybe not. > + {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, > + gettext_noop("Sets the buffer pool size for operations employing a buffer access strategy."), The description should mention vacuum, if that's the scope of the GUC's behavior. > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > + # -1 to use default, > + # 0 to disable vacuum buffer access strategy and use shared buffers > + # > 0 to specify size If I'm not wrong, there's still no documentation about "ring buffers" or postgres' "strategy". Which seems important to do for this patch, along with other documentation. This patch should add support in vacuumdb.c. And maybe a comment about adding support there, since it's annoying when it the release notes one year say "support VACUUM (FOO)" and then one year later say "support vacuumdb --foo". -- Justin
On Sat, 11 Mar 2023 at 16:55, Melanie Plageman <melanieplageman@gmail.com> wrote: > > > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > > > "temp_buffers" settings? > > > > > > > > The different types of ring buffers have different sizes, for good reasons. So > > > > I don't see that working well. I also think it'd be more often useful to > > > > control this on a statement basis - if you have a parallel import tool that > > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > > > > course each session can change the ring buffer settings, but still. > > > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > > These options can help especially when statement level controls aren't > > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > > needed users can also set them at the system level. For instance, one > > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > > the ring buffer to use shared_buffers and run a bunch of bulk write > > > queries. > > In attached v3, I've changed the name of the guc from buffer_usage_limit > to vacuum_buffer_usage_limit, since it is only used for vacuum and > autovacuum. Sorry for arriving late to this thread, but what about sizing the ring dynamically? From what I gather the primary motivation for larger ring size is avoiding WAL flushes due to dirty buffer writes. We already catch that event with StrategyRejectBuffer(). So maybe a dynamic sizing algorithm could be applied to the ringbuffer. Make the buffers array in strategy capable of holding up to the limit of buffers, but set ring size conservatively. If we have to flush WAL, double the ring size (up to the limit). If we loop around the ring without flushing, decrease the ring size by a small amount to let clock sweep reclaim them for use by other backends. -- Ants Aasma Senior Database Engineer www.cybertec-postgresql.com
Thanks for your interest in this patch! On Mon, Mar 13, 2023 at 8:38 AM Ants Aasma <ants@cybertec.at> wrote: > > On Sat, 11 Mar 2023 at 16:55, Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > > > > "temp_buffers" settings? > > > > > > > > > > The different types of ring buffers have different sizes, for good reasons. So > > > > > I don't see that working well. I also think it'd be more often useful to > > > > > control this on a statement basis - if you have a parallel import tool that > > > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > > > > > course each session can change the ring buffer settings, but still. > > > > > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > > > These options can help especially when statement level controls aren't > > > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > > > needed users can also set them at the system level. For instance, one > > > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > > > the ring buffer to use shared_buffers and run a bunch of bulk write > > > > queries. > > > > In attached v3, I've changed the name of the guc from buffer_usage_limit > > to vacuum_buffer_usage_limit, since it is only used for vacuum and > > autovacuum. > > Sorry for arriving late to this thread, but what about sizing the ring > dynamically? From what I gather the primary motivation for larger ring > size is avoiding WAL flushes due to dirty buffer writes. We already > catch that event with StrategyRejectBuffer(). So maybe a dynamic > sizing algorithm could be applied to the ringbuffer. Make the buffers > array in strategy capable of holding up to the limit of buffers, but > set ring size conservatively. If we have to flush WAL, double the ring > size (up to the limit). If we loop around the ring without flushing, > decrease the ring size by a small amount to let clock sweep reclaim > them for use by other backends. So, the original motivation of this patch was to allow autovacuum in failsafe mode to abandon use of a buffer access strategy, since, at that point, there is no reason to hold back. The idea was expanded to be an option to explicit vacuum, since users often must initiate an explicit vacuum after a forced shutdown due to transaction ID wraparound. As for routine vacuuming and the other buffer access strategies, I think there is an argument for configurability based on operator knowledge -- perhaps your workload will use the data you are COPYing as soon as the COPY finishes, so you might as well disable a buffer access strategy or use a larger fraction of shared buffers. Also, the ring sizes were selected sixteen years ago and average server memory and data set sizes have changed. StrategyRejectBuffer() will allow bulkreads to, as you say, use more buffers than the original ring size, since it allows them to kick dirty buffers out of the ring and claim new shared buffers. Bulkwrites and vacuums, however, will inevitably dirty buffers and require flushing the buffer (and thus flushing the associated WAL) when reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of the ring, since dirtying buffers is their common case. A dynamic resizing like the one you suggest would likely devolve to vacuum and bulkwrite strategies always using the max size. As for decreasing the ring size, buffers are only "added" to the ring lazily and, technically, as it is now, buffers which have been added added to the ring can always be reclaimed by the clocksweep (as long as they are not pinned). The buffer access strategy is more of a self-imposed restriction than it is a reservation. Since the ring is small and the buffers are being frequently reused, odds are the usage count will be 1 and we will be the one who set it to 1, but there is no guarantee. If, when attempting to reuse the buffer, its usage count is > 1 (or it is pinned), we also will kick it out of the ring and go look for a replacement buffer. I do think that it is a bit unreasonable to expect users to know how large they would like to make their buffer access strategy ring. What we want is some way of balancing different kinds of workloads and maintenance tasks reasonably. If your database has no activity because it is the middle of the night or it was shutdown because of transaction id wraparound, there is no reason why vacuum should limit the number of buffers it uses. I'm sure there are many other such examples. - Melanie
Thanks for the review! On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote: > > Subject: [PATCH v3 2/3] use shared buffers when failsafe active > > > > + /* > > + * Assume the caller who allocated the memory for the > > + * BufferAccessStrategy object will free it. > > + */ > > + vacrel->bstrategy = NULL; > > This comment could use elaboration: > > ** VACUUM normally restricts itself to a small ring buffer; but in > failsafe mode, in order to process tables as quickly as possible, allow > the leaving behind large number of dirty buffers. Agreed. It definitely needs a comment like this. I will update in the next version along with addressing your other feedback (after sorting out some of the other points in this mail on which I still have questions). > > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc > > > #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var)))) > > +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ) > > Macros are normally be capitalized Yes, there doesn't seem to be a great amount of consistency around this... See pgstat.c read_chunk_s and bufmgr.c BufHdrGetBlock and friends. Though there are probably more capitalized than not. Since it does a bit of math and returns a value, I wanted to convey that it was more like a function. Also, since the name was long, I thought all-caps would be hard to read. However, if you or others feel strongly, I am attached neither to the capitalization nor to the name at all (what do you think of the name?). > It's a good idea to write "(bufsize)", in case someone passes "a+b". Ah yes, this is a good idea -- I always miss at least one set of parentheses when writing a macro. In this case, I didn't think of it since the current caller couldn't pass an expression. > > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > > +BufferAccessStrategy > > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) > > Maybe it would make sense for GetAccessStrategy() to call > GetAccessStrategyWithSize(). Or maybe not. You mean instead of having anyone call GetAccessStrategyWithSize()? We would need to change the signature of GetAccessStrategy() then -- and there are quite a few callers. > > > + {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, > > + gettext_noop("Sets the buffer pool size for operations employing a buffer access strategy."), > > The description should mention vacuum, if that's the scope of the GUC's > behavior. Good catch. Will update in next version. > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > > + # -1 to use default, > > + # 0 to disable vacuum buffer access strategy and use shared buffers > > + # > 0 to specify size > > If I'm not wrong, there's still no documentation about "ring buffers" or > postgres' "strategy". Which seems important to do for this patch, along > with other documentation. Yes, it is. I have been thinking about where in the docs to add it (the docs about buffer access strategies). Any ideas? > This patch should add support in vacuumdb.c. Oh, I had totally forgotten about vacuumdb. > And maybe a comment about adding support there, since it's annoying > when it the release notes one year say "support VACUUM (FOO)" and then > one year later say "support vacuumdb --foo". I'm not sure I totally follow. Do you mean to add a comment to ExecVacuum() saying to add support to vacuumdb when adding a new option to vacuum? In the past, did people forget to add support to vacuumdb for new vacuum options or did they forget to document that they did that or did they forgot to include that they did that in the release notes? - Melanie
On Sat, Mar 11, 2023 at 11:55 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > > > "temp_buffers" settings? > > > > > > > > The different types of ring buffers have different sizes, for good reasons. So > > > > I don't see that working well. I also think it'd be more often useful to > > > > control this on a statement basis - if you have a parallel import tool that > > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > > > > course each session can change the ring buffer settings, but still. > > > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > > These options can help especially when statement level controls aren't > > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > > needed users can also set them at the system level. For instance, one > > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > > the ring buffer to use shared_buffers and run a bunch of bulk write > > > queries. > > In attached v3, I've changed the name of the guc from buffer_usage_limit > to vacuum_buffer_usage_limit, since it is only used for vacuum and > autovacuum. > > I haven't added the other suggested strategy gucs, as those could easily > be done in a future patchset. > > I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize() > -- similar to initArrayResultWithSize(). > > And I've added tab completion for BUFFER_USAGE_LIMIT so that it is > easier to try out my patch. > > Most of the TODOs in the code are related to the question of how > autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates > the buffer access strategy ring in do_autovacuum() before looping > through and vacuuming tables. It passes this strategy object on to > vacuum(). Since we reuse the same strategy object for all tables in a > given invocation of do_autovacuum(), only failsafe autovacuum would > change buffer access strategies. This is probably okay, but it does mean > that the table-level VacuumParams variable, ring_size, means something > different for autovacuum than vacuum. Autovacuum workers will always > have set it to -1. We won't ever reach code in vacuum() which relies on > VacuumParams->ring_size as long as autovacuum workers pass a non-NULL > BufferAccessStrategy object to vacuum(), though. I've not reviewed the patchset in depth yet but I got assertion failure and SEGV when using the buffer_usage_limit parameter. postgres(1:471180)=# vacuum (buffer_usage_limit 10000000000) ; 2023-03-15 17:10:02.947 JST [471180] ERROR: buffer_usage_limit for a vacuum must be between -1 and 16777216. 10000000000 is invalid. at character 9 The message show the max value is 16777216, but when I set it, I got an assertion failure: postgres(1:470992)=# vacuum (buffer_usage_limit 16777216) ; TRAP: failed Assert("ring_size < MAX_BAS_RING_SIZE_KB"), File: "freelist.c", Line: 606, PID: 470992 Then when I used 1 byte lower value, 16777215, I got a SEGV: postgres(1:471180)=# vacuum (buffer_usage_limit 16777215) ; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: 2023-03-15 17:10:59.404 JST [471159] LOG: server process (PID 471180) was terminated by signal 11: Segmentation fault Finally, when I used a more lower value, 16777100, I got a memory allocation error: postgres(1:471361)=# vacuum (buffer_usage_limit 16777100) ; 2023-03-15 17:12:17.853 JST [471361] ERROR: invalid memory alloc request size 18446744073709551572 Probably vacuum_buffer_usage_limit also has the same issue. Also, should we support a table option for vacuum_buffer_usage_limit as well? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, 15 Mar 2023 at 02:29, Melanie Plageman <melanieplageman@gmail.com> wrote: > As for routine vacuuming and the other buffer access strategies, I think > there is an argument for configurability based on operator knowledge -- > perhaps your workload will use the data you are COPYing as soon as the > COPY finishes, so you might as well disable a buffer access strategy or > use a larger fraction of shared buffers. Also, the ring sizes were > selected sixteen years ago and average server memory and data set sizes > have changed. To be clear I'm not at all arguing against configurability. I was thinking that dynamic use could make the configuration simpler by self tuning to use no more buffers than is useful. > StrategyRejectBuffer() will allow bulkreads to, as you say, use more > buffers than the original ring size, since it allows them to kick > dirty buffers out of the ring and claim new shared buffers. > > Bulkwrites and vacuums, however, will inevitably dirty buffers and > require flushing the buffer (and thus flushing the associated WAL) when > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of > the ring, since dirtying buffers is their common case. A dynamic > resizing like the one you suggest would likely devolve to vacuum and > bulkwrite strategies always using the max size. I think it should self stabilize around the point where the WAL is either flushed by other commit activity, WAL writer or WAL buffers filling up. Writing out their own dirtied buffers will still happen, just the associated WAL flushes will be in larger chunks and possibly done by other processes. > As for decreasing the ring size, buffers are only "added" to the ring > lazily and, technically, as it is now, buffers which have been added > added to the ring can always be reclaimed by the clocksweep (as long as > they are not pinned). The buffer access strategy is more of a > self-imposed restriction than it is a reservation. Since the ring is > small and the buffers are being frequently reused, odds are the usage > count will be 1 and we will be the one who set it to 1, but there is no > guarantee. If, when attempting to reuse the buffer, its usage count is > > 1 (or it is pinned), we also will kick it out of the ring and go look > for a replacement buffer. Right, but while the buffer is actively used by the ring it is unlikely that clocksweep will find it at usage 0 as the ring buffer should cycle more often than the clocksweep. Whereas if the ring stops using a buffer, clocksweep will eventually come and reclaim it. And if the ring shrinking decision turns out to be wrong before the clocksweep gets around to reusing it, we can bring the same buffer back into the ring. > I do think that it is a bit unreasonable to expect users to know how > large they would like to make their buffer access strategy ring. What we > want is some way of balancing different kinds of workloads and > maintenance tasks reasonably. If your database has no activity because > it is the middle of the night or it was shutdown because of transaction > id wraparound, there is no reason why vacuum should limit the number of > buffers it uses. I'm sure there are many other such examples. Ideally yes, though I am not hopeful of finding a solution that does this any time soon. Just to take your example, if a nightly maintenance job wipes out the shared buffer contents slightly optimizing its non time-critical work and then causes morning user visible load to have big latency spikes due to cache misses, that's not a good tradeoff either. -- Ants Aasma Senior Database Engineer www.cybertec-postgresql.com
On Wed, 15 Mar 2023 at 02:57, Melanie Plageman <melanieplageman@gmail.com> wrote: > > > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc > > > > > #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var)))) > > > +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ) > > > > Macros are normally be capitalized > > Yes, there doesn't seem to be a great amount of consistency around > this... See pgstat.c read_chunk_s and bufmgr.c BufHdrGetBlock and > friends. Though there are probably more capitalized than not. Since it > does a bit of math and returns a value, I wanted to convey that it was > more like a function. Also, since the name was long, I thought all-caps > would be hard to read. However, if you or others feel strongly, I am > attached neither to the capitalization nor to the name at all (what do > you think of the name?). A static inline function seems like a less surprising and more type safe solution for this. -- Ants Aasma Senior Database Engineer www.cybertec-postgresql.com
On Tue, Mar 14, 2023 at 08:56:58PM -0400, Melanie Plageman wrote: > On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > > > +BufferAccessStrategy > > > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) > > > > Maybe it would make sense for GetAccessStrategy() to call > > GetAccessStrategyWithSize(). Or maybe not. > > You mean instead of having anyone call GetAccessStrategyWithSize()? > We would need to change the signature of GetAccessStrategy() then -- and > there are quite a few callers. I mean to avoid code duplication, GetAccessStrategy() could "Select ring size to use" and then call into GetAccessStrategyWithSize(). Maybe it's counter to your intent or otherwise not worth it to save 8 LOC. > > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > > > + # -1 to use default, > > > + # 0 to disable vacuum buffer access strategy and use shared buffers > > > + # > 0 to specify size > > > > If I'm not wrong, there's still no documentation about "ring buffers" or > > postgres' "strategy". Which seems important to do for this patch, along > > with other documentation. > > Yes, it is. I have been thinking about where in the docs to add it (the > docs about buffer access strategies). Any ideas? This patch could add something to the vacuum manpage and to the appendix. And maybe references from the shared_buffers and pg_buffercache manpages. > > This patch should add support in vacuumdb.c. > > Oh, I had totally forgotten about vacuumdb. :) > > And maybe a comment about adding support there, since it's annoying > > when it the release notes one year say "support VACUUM (FOO)" and then > > one year later say "support vacuumdb --foo". > > I'm not sure I totally follow. Do you mean to add a comment to > ExecVacuum() saying to add support to vacuumdb when adding a new option > to vacuum? Yeah, like: /* Options added here should also be added to vacuumdb.c */ > In the past, did people forget to add support to vacuumdb for new vacuum > options or did they forget to document that they did that or did they > forgot to include that they did that in the release notes? The first. Maybe not often, it's not important whether it's in the original patch, but it's odd if the vacuumdb option isn't added until the following release, which then shows up as a separate "feature". -- Justin
Thanks for the reviews and for trying the patch! On Wed, Mar 15, 2023 at 4:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Mar 11, 2023 at 11:55 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > > > > "temp_buffers" settings? > > > > > > > > > > The different types of ring buffers have different sizes, for good reasons. So > > > > > I don't see that working well. I also think it'd be more often useful to > > > > > control this on a statement basis - if you have a parallel import tool that > > > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of > > > > > course each session can change the ring buffer settings, but still. > > > > > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > > > These options can help especially when statement level controls aren't > > > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > > > needed users can also set them at the system level. For instance, one > > > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > > > the ring buffer to use shared_buffers and run a bunch of bulk write > > > > queries. > > > > In attached v3, I've changed the name of the guc from buffer_usage_limit > > to vacuum_buffer_usage_limit, since it is only used for vacuum and > > autovacuum. > > > > I haven't added the other suggested strategy gucs, as those could easily > > be done in a future patchset. > > > > I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize() > > -- similar to initArrayResultWithSize(). > > > > And I've added tab completion for BUFFER_USAGE_LIMIT so that it is > > easier to try out my patch. > > > > Most of the TODOs in the code are related to the question of how > > autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates > > the buffer access strategy ring in do_autovacuum() before looping > > through and vacuuming tables. It passes this strategy object on to > > vacuum(). Since we reuse the same strategy object for all tables in a > > given invocation of do_autovacuum(), only failsafe autovacuum would > > change buffer access strategies. This is probably okay, but it does mean > > that the table-level VacuumParams variable, ring_size, means something > > different for autovacuum than vacuum. Autovacuum workers will always > > have set it to -1. We won't ever reach code in vacuum() which relies on > > VacuumParams->ring_size as long as autovacuum workers pass a non-NULL > > BufferAccessStrategy object to vacuum(), though. > > I've not reviewed the patchset in depth yet but I got assertion > failure and SEGV when using the buffer_usage_limit parameter. > > postgres(1:471180)=# vacuum (buffer_usage_limit 10000000000) ; > 2023-03-15 17:10:02.947 JST [471180] ERROR: buffer_usage_limit for a > vacuum must be between -1 and 16777216. 10000000000 is invalid. at > character 9 > > The message show the max value is 16777216, but when I set it, I got > an assertion failure: > > postgres(1:470992)=# vacuum (buffer_usage_limit 16777216) ; > TRAP: failed Assert("ring_size < MAX_BAS_RING_SIZE_KB"), File: > "freelist.c", Line: 606, PID: 470992 > > Then when I used 1 byte lower value, 16777215, I got a SEGV: > > postgres(1:471180)=# vacuum (buffer_usage_limit 16777215) ; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: 2023-03-15 > 17:10:59.404 JST [471159] LOG: server process (PID 471180) was > terminated by signal 11: Segmentation fault > > Finally, when I used a more lower value, 16777100, I got a memory > allocation error: > > postgres(1:471361)=# vacuum (buffer_usage_limit 16777100) ; > 2023-03-15 17:12:17.853 JST [471361] ERROR: invalid memory alloc > request size 18446744073709551572 > > Probably vacuum_buffer_usage_limit also has the same issue. Oh dear--it seems I had an integer overflow when calculating the number of buffers using the specified buffer size in the macro: #define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ) In the attached v4, I've changed that to: static inline int bufsize_limit_to_nbuffers(int bufsize_limit_kb) { int blcksz_kb = BLCKSZ / 1024; Assert(blcksz_kb > 0); return bufsize_limit_kb / blcksz_kb; } This should address Justin's suggestions and Ants' concern about the macro as well. Also, I was missing the = in the Assert(ring_size <= MAX_BAS_RING_SIZE) I've fixed that as well, so it should work for you to specify up to 16777216. > Also, should we support a table option for vacuum_buffer_usage_limit as well? Hmm. Since this is meant more for balancing resource usage globally, it doesn't make as much sense as a table option to me. But, I could be convinced. On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote: > > Subject: [PATCH v3 2/3] use shared buffers when failsafe active > > > > + /* > > + * Assume the caller who allocated the memory for the > > + * BufferAccessStrategy object will free it. > > + */ > > + vacrel->bstrategy = NULL; > > This comment could use elaboration: > > ** VACUUM normally restricts itself to a small ring buffer; but in > failsafe mode, in order to process tables as quickly as possible, allow > the leaving behind large number of dirty buffers. I've added a comment in attached v4 which is a bit different than Justin's suggestion but still more verbose than the previous comment. > > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc > > + {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, > > + gettext_noop("Sets the buffer pool size for operations employing a buffer access strategy."), > > The description should mention vacuum, if that's the scope of the GUC's > behavior. I've updated this in v4. > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > > + # -1 to use default, > > + # 0 to disable vacuum buffer access strategy and use shared buffers > > + # > 0 to specify size > > If I'm not wrong, there's still no documentation about "ring buffers" or > postgres' "strategy". Which seems important to do for this patch, along > with other documentation. So, on the topic of "other documentation", I have, at least, added docs for the vacuum_buffer_usage_limit guc and the BUFFER_USAGE option to VACUUM and the buffer-usage-limit parameter to vacuumdb. > This patch should add support in vacuumdb.c. And maybe a comment about > adding support there, since it's annoying when it the release notes one > year say "support VACUUM (FOO)" and then one year later say "support > vacuumdb --foo". So, v4 adds support for buffer-usage-limit to vacuumdb. There are a few issues. The main one is that no other vacuumdb option takes a size as a parameter. I couldn't actually find any other client with a parameter specified as a size. My VACUUM option code is using the GUC size parsing code from parse_int() -- including the unit flag GUC_UNIT_KB. Now that vacuumdb also needs to parse sizes, I think we'll need to lift the parse_int() code and the unit_conversion struct and unit_conversion_memory_unit_conversion_table out of guc.c and put it somewhere that it can be accessed for more than guc parsing (e.g. option parsing). For vacuumdb in this version, I just specified buffer-usage-limit is only in kB and thus can only be specified as an int. If we had something like pg_parse_size() in common, would this make sense? It would be a little bit of work to figure out what to do about the flags, etc. Another issue is the server-side guc #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) I just redefined it in vacuumdb code. I'm not sure what the preferred method for dealing with this is. I know this validation would get done server-side if I just passed the user-specified option through, but all of the other vacuumdb options appear to be doing min/max boundary validation on the client side. On Wed, Mar 15, 2023 at 8:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Tue, Mar 14, 2023 at 08:56:58PM -0400, Melanie Plageman wrote: > > On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > > > > +BufferAccessStrategy > > > > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) > > > > > > Maybe it would make sense for GetAccessStrategy() to call > > > GetAccessStrategyWithSize(). Or maybe not. > > > > You mean instead of having anyone call GetAccessStrategyWithSize()? > > We would need to change the signature of GetAccessStrategy() then -- and > > there are quite a few callers. > > I mean to avoid code duplication, GetAccessStrategy() could "Select ring > size to use" and then call into GetAccessStrategyWithSize(). Maybe it's > counter to your intent or otherwise not worth it to save 8 LOC. Oh, that's a cool idea. I will think on it. > > > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > > > > + # -1 to use default, > > > > + # 0 to disable vacuum buffer access strategy and use shared buffers > > > > + # > 0 to specify size > > > > > > If I'm not wrong, there's still no documentation about "ring buffers" or > > > postgres' "strategy". Which seems important to do for this patch, along > > > with other documentation. > > > > Yes, it is. I have been thinking about where in the docs to add it (the > > docs about buffer access strategies). Any ideas? > > This patch could add something to the vacuum manpage and to the appendix. > And maybe references from the shared_buffers and pg_buffercache > manpages. So, I was thinking it would be good to have some documentation in general about Buffer Access Strategies (i.e. not just for vacuum). It would have been nice to have something to reference from the pg_stat_io docs that describe what buffer access strategies are. > > > And maybe a comment about adding support there, since it's annoying > > > when it the release notes one year say "support VACUUM (FOO)" and then > > > one year later say "support vacuumdb --foo". > > > > I'm not sure I totally follow. Do you mean to add a comment to > > ExecVacuum() saying to add support to vacuumdb when adding a new option > > to vacuum? > > Yeah, like: > /* Options added here should also be added to vacuumdb.c */ I've added a little something to the comment above the VacuumParams struct. > > In the past, did people forget to add support to vacuumdb for new vacuum > > options or did they forget to document that they did that or did they > > forgot to include that they did that in the release notes? > > The first. Maybe not often, it's not important whether it's in the > original patch, but it's odd if the vacuumdb option isn't added until > the following release, which then shows up as a separate "feature". I've squished in the code for adding the parameter to vacuumdb in a single commit with the guc and vacuum option, but I will separate it out after some of the basics get sorted. - Melanie
Attachment
On Wed, Mar 15, 2023 at 9:03 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote: > > This patch should add support in vacuumdb.c. And maybe a comment about > > adding support there, since it's annoying when it the release notes one > > year say "support VACUUM (FOO)" and then one year later say "support > > vacuumdb --foo". > > So, v4 adds support for buffer-usage-limit to vacuumdb. There are a few > issues. The main one is that no other vacuumdb option takes a size as a > parameter. I couldn't actually find any other client with a parameter > specified as a size. > > My VACUUM option code is using the GUC size parsing code from > parse_int() -- including the unit flag GUC_UNIT_KB. Now that vacuumdb > also needs to parse sizes, I think we'll need to lift the parse_int() > code and the unit_conversion struct and > unit_conversion_memory_unit_conversion_table out of guc.c and put it > somewhere that it can be accessed for more than guc parsing (e.g. option > parsing). > > For vacuumdb in this version, I just specified buffer-usage-limit is > only in kB and thus can only be specified as an int. > > If we had something like pg_parse_size() in common, would this make > sense? It would be a little bit of work to figure out what to do about > the flags, etc. > > Another issue is the server-side guc > #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) > I just redefined it in vacuumdb code. I'm not sure what the preferred > method for dealing with this is. > > I know this validation would get done server-side if I just passed the > user-specified option through, but all of the other vacuumdb options > appear to be doing min/max boundary validation on the client side. So, after discussing vacuumdb client-side validation off-list with Jelte, I realized that I was trying to do too much there. Attached v5 passes the contents of the buffer-usage-limit option to vacuumdb unvalidated into the VACUUM command string which vacuumdb builds. This solves most of the problems. I also improved the error messages coming from VACUUM (buffer_usage_limit) handling. - Melanie
Attachment
> Subject: [PATCH v4 3/3] add vacuum[db] option to specify ring_size and guc > + Specifies the ring buffer size to be used for a given invocation of > + <command>VACUUM</command> or instance of autovacuum. This size is > + converted to a number of shared buffers which will be reused as part of I'd say "specifies the size of shared_buffers to be reused as .." > + a <literal>Buffer Access Strategy</literal>. <literal>0</literal> will > + disable use of a <literal>Buffer Access Strategy</literal>. > + <literal>-1</literal> will set the size to a default of <literal>256 > + kB</literal>. The maximum ring buffer size is <literal>16 GB</literal>. > + Though you may set <varname>vacuum_buffer_usage_limit</varname> below > + <literal>128 kB</literal>, it will be clamped to <literal>128 > + kB</literal> at runtime. The default value is <literal>-1</literal>. > + This parameter can be set at any time. GUC docs usually also say something like "If this value is specified without units, it is taken as .." > + is used to calculate a number of shared buffers which will be reused as *the* number? > + <command>VACUUM</command>. The analyze stage and parallel vacuum workers > + do not use this size. I think what you mean is that vacuum's heap scan stage uses the strategy, but the index scan/cleanup phases doesn't? > + The size in kB of the ring buffer used for vacuuming. This size is used > + to calculate a number of shared buffers which will be reused as part of *the* number > +++ b/doc/src/sgml/ref/vacuumdb.sgml The docs here duplicate the sql-vacuum docs. It seems better to refer to the vacuum page for details, like --parallel does. Unrelated: it would be nice if the client-side options were documented separately from the server-side options. Especially due to --jobs and --parallel. > + if (!parse_int(vac_buffer_size, &result, GUC_UNIT_KB, NULL)) > + { > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("buffer_usage_limit for a vacuum must be between -1 and %d. %s is invalid.", > + MAX_BAS_RING_SIZE_KB, vac_buffer_size), > + parser_errposition(pstate, opt->location))); > + } > + > + /* check for out-of-bounds */ > + if (result < -1 || result > MAX_BAS_RING_SIZE_KB) > + { > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("buffer_usage_limit for a vacuum must be between -1 and %d", > + MAX_BAS_RING_SIZE_KB), > + parser_errposition(pstate, opt->location))); > + } I think these checks could be collapsed into a single ereport(). if !parse_int() || (result < -1 || result > MAX_BAS_RINGSIZE_KB): ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), errmsg("buffer_usage_limit for a vacuum must be an integer between -1 and %d", MAX_BAS_RING_SIZE_KB), There was a recent, similar, and unrelated suggestion here: https://www.postgresql.org/message-id/20230314.135859.260879647537075548.horikyota.ntt%40gmail.com > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > + # -1 to use default, > + # 0 to disable vacuum buffer access strategy and use shared buffers I think it's confusing to say "and use shared buffers", since "strategies" also use shared_buffers. It seems better to remove those 4 words. > @@ -550,6 +563,13 @@ vacuum_one_database(ConnParams *cparams, > pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s", > "--parallel", "13"); > > + // TODO: this is a problem: if the user specifies this option with -1 in a > + // version before 16, it will not produce an error message. it also won't > + // do anything, but that still doesn't seem right. Actually, that seems fine to me. If someone installs v16 vacuumdb, they can run it against old servers and specify the option as -1 without it failing with an error. I don't know if anyone will find that useful, but it doesn't seem unreasonable. I still think adding something to the glossary would be good. Buffer Access Strategy: A circular/ring buffer used for reading or writing data pages from/to the operating system. Ring buffers are used for sequential scans of large tables, VACUUM, COPY FROM, CREATE TABLE AS SELECT, ALTER TABLE, and CLUSTER. By using only a limited portion of >shared_buffers<, the ring buffer avoids avoids evicting large amounts of data whenever a backend performs bulk I/O operations. Use of a ring buffer also forces the backend to write out its own dirty pages, rather than leaving them behind to be cleaned up by other backends. If there's a larger section added than a glossary entry, the text could be promoted from src/backend/storage/buffer/README to doc/. -- Justin
Thanks for the review! Attached is an updated v6. v6 has some updates and corrections. It has two remaining TODOs in the code: one is around what value to initialize the ring_size to in VacuumParams, the other is around whether or not parallel vacuum index workers should in fact stick with the default buffer access strategy sizes. I also separated vacuumdb into its own commit. I also have addressed Justin's review feedback. On Sat, Mar 18, 2023 at 2:30 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > Subject: [PATCH v4 3/3] add vacuum[db] option to specify ring_size and guc > > > + Specifies the ring buffer size to be used for a given invocation of > > + <command>VACUUM</command> or instance of autovacuum. This size is > > + converted to a number of shared buffers which will be reused as part of > > I'd say "specifies the size of shared_buffers to be reused as .." I've included "shared_buffers" in the description. > > + a <literal>Buffer Access Strategy</literal>. <literal>0</literal> will > > + disable use of a <literal>Buffer Access Strategy</literal>. > > + <literal>-1</literal> will set the size to a default of <literal>256 > > + kB</literal>. The maximum ring buffer size is <literal>16 GB</literal>. > > + Though you may set <varname>vacuum_buffer_usage_limit</varname> below > > + <literal>128 kB</literal>, it will be clamped to <literal>128 > > + kB</literal> at runtime. The default value is <literal>-1</literal>. > > + This parameter can be set at any time. > > GUC docs usually also say something like > "If this value is specified without units, it is taken as .." I had updated this in v5 with slightly different wording, but I now am using the wording you suggested (which does appear standard in the rest of the docs). > > > + is used to calculate a number of shared buffers which will be reused as > > *the* number? updated. > > > + <command>VACUUM</command>. The analyze stage and parallel vacuum workers > > + do not use this size. > > I think what you mean is that vacuum's heap scan stage uses the > strategy, but the index scan/cleanup phases doesn't? Yes, non-parallel index vacuum and cleanup will use whatever value you specify but parallel workers make their own buffer access strategy object. I've updated the docs to indicate that they will use the default size for this. > > > + The size in kB of the ring buffer used for vacuuming. This size is used > > + to calculate a number of shared buffers which will be reused as part of > > *the* number fixed. > > +++ b/doc/src/sgml/ref/vacuumdb.sgml > > The docs here duplicate the sql-vacuum docs. It seems better to refer > to the vacuum page for details, like --parallel does. Good idea. > > Unrelated: it would be nice if the client-side options were documented > separately from the server-side options. Especially due to --jobs and > --parallel. Yes, that would be helpful. > > + if (!parse_int(vac_buffer_size, &result, GUC_UNIT_KB, NULL)) > > + { > > + ereport(ERROR, > > + (errcode(ERRCODE_SYNTAX_ERROR), > > + errmsg("buffer_usage_limit for a vacuum must be between -1 and%d. %s is invalid.", > > + MAX_BAS_RING_SIZE_KB, vac_buffer_size), > > + parser_errposition(pstate, opt->location))); > > + } > > + > > + /* check for out-of-bounds */ > > + if (result < -1 || result > MAX_BAS_RING_SIZE_KB) > > + { > > + ereport(ERROR, > > + (errcode(ERRCODE_SYNTAX_ERROR), > > + errmsg("buffer_usage_limit for a vacuum must be between -1 and%d", > > + MAX_BAS_RING_SIZE_KB), > > + parser_errposition(pstate, opt->location))); > > + } > > I think these checks could be collapsed into a single ereport(). > > if !parse_int() || (result < -1 || result > MAX_BAS_RINGSIZE_KB): > ereport(ERROR, > errcode(ERRCODE_SYNTAX_ERROR), > errmsg("buffer_usage_limit for a vacuum must be an integer between -1 and %d", > MAX_BAS_RING_SIZE_KB), > > There was a recent, similar, and unrelated suggestion here: > https://www.postgresql.org/message-id/20230314.135859.260879647537075548.horikyota.ntt%40gmail.com So, these have been updated/improved in v5. I still didn't combine them. I see what you are saying about combining them (and I checked the link you shared), but in this case, having them separate allows me to provide info using the hintmsg passed to parse_int() about why it failed during parse_int -- which could be something not related to range. So, I think it makes sense to keep them separate. > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > > + # -1 to use default, > > + # 0 to disable vacuum buffer access strategy and use shared buffers > > I think it's confusing to say "and use shared buffers", since > "strategies" also use shared_buffers. It seems better to remove those 4 > words. Got it. I've gone ahead and done that. > > @@ -550,6 +563,13 @@ vacuum_one_database(ConnParams *cparams, > > pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s", > > "--parallel", "13"); > > > > + // TODO: this is a problem: if the user specifies this option with -1 in a > > + // version before 16, it will not produce an error message. it also won't > > + // do anything, but that still doesn't seem right. > > Actually, that seems fine to me. If someone installs v16 vacuumdb, they > can run it against old servers and specify the option as -1 without it > failing with an error. I don't know if anyone will find that useful, > but it doesn't seem unreasonable. I sort of skirted around this by removing any validation from vacuumdb (present in v5 and still the case in v6). Now, the parameter is a string and I check if it is non-NULL when the version is < 16. However, this will no longer have the property that someone can use v16 vacuumdb and pass buffer-usage-limit and have it not fail. I think that is okay, though, since they might be confused thinking it was doing something. > I still think adding something to the glossary would be good. > > Buffer Access Strategy: > A circular/ring buffer used for reading or writing data pages from/to > the operating system. Ring buffers are used for sequential scans of > large tables, VACUUM, COPY FROM, CREATE TABLE AS SELECT, ALTER TABLE, > and CLUSTER. By using only a limited portion of >shared_buffers<, the > ring buffer avoids avoids evicting large amounts of data whenever a > backend performs bulk I/O operations. Use of a ring buffer also forces > the backend to write out its own dirty pages, rather than leaving them > behind to be cleaned up by other backends. Yes, I have taken some ideas from here and added a separate commit before all the others adding Buffer Access Strategy to the documentation. > If there's a larger section added than a glossary entry, the text could > be promoted from src/backend/storage/buffer/README to doc/. This is a good idea. I think we provided enough information in the glossary (as far as users would care) if it weren't for the new buffer_usage_limit guc, which probably merits more explanation about how it interacts with buffer access strategies. Since it is only used for vacuum now, do you think such a thing would belong in VACUUM-related documentation? Like somewhere in [1]? On Wed, Mar 15, 2023 at 9:03 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > On Wed, Mar 15, 2023 at 8:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Mar 14, 2023 at 08:56:58PM -0400, Melanie Plageman wrote: > > > On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > > > > > +BufferAccessStrategy > > > > > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) > > > > > > > > Maybe it would make sense for GetAccessStrategy() to call > > > > GetAccessStrategyWithSize(). Or maybe not. > > > > > > You mean instead of having anyone call GetAccessStrategyWithSize()? > > > We would need to change the signature of GetAccessStrategy() then -- and > > > there are quite a few callers. > > > > I mean to avoid code duplication, GetAccessStrategy() could "Select ring > > size to use" and then call into GetAccessStrategyWithSize(). Maybe it's > > counter to your intent or otherwise not worth it to save 8 LOC. > > Oh, that's a cool idea. I will think on it. So, I thought about doing a version of this by adding a helper which did the allocation of the BufferAccessStrategy object given a number of buffers that could be called by both GetAccessStrategy() and GetAccessStrategyWithSize(). I decided not to because I wanted to emit a debug message if the size of the ring was clamped lower or higher than the user would expect -- but only do this in GetAccessStrategyWithSize() since there is no user expectation in the use of GetAccessStrategy(). - Melanie [1] https://www.postgresql.org/docs/devel/routine-vacuuming.html
Attachment
- v6-0006-Add-buffer-usage-limit-option-to-vacuumdb.patch
- v6-0004-Rename-Buffer-Access-Strategy-ring_size-nbuffers.patch
- v6-0005-Add-vacuum-db-buffer-usage-limit-option-and-guc.patch
- v6-0003-use-shared-buffers-when-failsafe-active.patch
- v6-0002-remove-global-variable-vac_strategy.patch
- v6-0001-Add-Buffer-Access-Strategy-to-glossary.patch
On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma <ants@cybertec.at> wrote: > > On Wed, 15 Mar 2023 at 02:29, Melanie Plageman > <melanieplageman@gmail.com> wrote: > > As for routine vacuuming and the other buffer access strategies, I think > > there is an argument for configurability based on operator knowledge -- > > perhaps your workload will use the data you are COPYing as soon as the > > COPY finishes, so you might as well disable a buffer access strategy or > > use a larger fraction of shared buffers. Also, the ring sizes were > > selected sixteen years ago and average server memory and data set sizes > > have changed. > > To be clear I'm not at all arguing against configurability. I was > thinking that dynamic use could make the configuration simpler by self > tuning to use no more buffers than is useful. Yes, but I am struggling with how we would define "useful". > > StrategyRejectBuffer() will allow bulkreads to, as you say, use more > > buffers than the original ring size, since it allows them to kick > > dirty buffers out of the ring and claim new shared buffers. > > > > Bulkwrites and vacuums, however, will inevitably dirty buffers and > > require flushing the buffer (and thus flushing the associated WAL) when > > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of > > the ring, since dirtying buffers is their common case. A dynamic > > resizing like the one you suggest would likely devolve to vacuum and > > bulkwrite strategies always using the max size. > > I think it should self stabilize around the point where the WAL is > either flushed by other commit activity, WAL writer or WAL buffers > filling up. Writing out their own dirtied buffers will still happen, > just the associated WAL flushes will be in larger chunks and possibly > done by other processes. They will have to write out any WAL associated with modifications to the dirty buffer before flushing it, so I'm not sure I understand how this would work. > > As for decreasing the ring size, buffers are only "added" to the ring > > lazily and, technically, as it is now, buffers which have been added > > added to the ring can always be reclaimed by the clocksweep (as long as > > they are not pinned). The buffer access strategy is more of a > > self-imposed restriction than it is a reservation. Since the ring is > > small and the buffers are being frequently reused, odds are the usage > > count will be 1 and we will be the one who set it to 1, but there is no > > guarantee. If, when attempting to reuse the buffer, its usage count is > > > 1 (or it is pinned), we also will kick it out of the ring and go look > > for a replacement buffer. > > Right, but while the buffer is actively used by the ring it is > unlikely that clocksweep will find it at usage 0 as the ring buffer > should cycle more often than the clocksweep. Whereas if the ring stops > using a buffer, clocksweep will eventually come and reclaim it. And if > the ring shrinking decision turns out to be wrong before the > clocksweep gets around to reusing it, we can bring the same buffer > back into the ring. I can see what you mean about excluding a buffer from the ring being a more effective way of allowing it to be reclaimed. However, I'm not sure I understand the use case. If the operation, say vacuum, is actively using the buffer and keeping its usage count at one, then what would be the criteria for it to decide to stop using it? Also, if vacuum used the buffer once and then didn't reuse it but, for some reason, the vacuum isn't over, it isn't any different at that point than some other buffer with a usage count of one. It isn't any harder for it to be reclaimed by the clocksweep. The argument I could see for decreasing the size even when the buffers are being used by the operation employing the strategy is if there is pressure from other workloads to use those buffers. But, designing a system that would reclaim buffers when needed by other workloads is more complicated than what is being proposed here. > > I do think that it is a bit unreasonable to expect users to know how > > large they would like to make their buffer access strategy ring. What we > > want is some way of balancing different kinds of workloads and > > maintenance tasks reasonably. If your database has no activity because > > it is the middle of the night or it was shutdown because of transaction > > id wraparound, there is no reason why vacuum should limit the number of > > buffers it uses. I'm sure there are many other such examples. > > Ideally yes, though I am not hopeful of finding a solution that does > this any time soon. Just to take your example, if a nightly > maintenance job wipes out the shared buffer contents slightly > optimizing its non time-critical work and then causes morning user > visible load to have big latency spikes due to cache misses, that's > not a good tradeoff either. Yes, that is a valid concern. - Melanie
On Mon, 20 Mar 2023 at 00:59, Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma <ants@cybertec.at> wrote: > > > > On Wed, 15 Mar 2023 at 02:29, Melanie Plageman > > <melanieplageman@gmail.com> wrote: > > > As for routine vacuuming and the other buffer access strategies, I think > > > there is an argument for configurability based on operator knowledge -- > > > perhaps your workload will use the data you are COPYing as soon as the > > > COPY finishes, so you might as well disable a buffer access strategy or > > > use a larger fraction of shared buffers. Also, the ring sizes were > > > selected sixteen years ago and average server memory and data set sizes > > > have changed. > > > > To be clear I'm not at all arguing against configurability. I was > > thinking that dynamic use could make the configuration simpler by self > > tuning to use no more buffers than is useful. > > Yes, but I am struggling with how we would define "useful". For copy and vacuum, the only reason I can see for keeping visited buffers around is to avoid flushing WAL or at least doing it in larger batches. Once the ring is big enough that WAL doesn't need to be flushed on eviction, making it bigger only wastes space that could be used by something that is not going to be evicted soon. > > > StrategyRejectBuffer() will allow bulkreads to, as you say, use more > > > buffers than the original ring size, since it allows them to kick > > > dirty buffers out of the ring and claim new shared buffers. > > > > > > Bulkwrites and vacuums, however, will inevitably dirty buffers and > > > require flushing the buffer (and thus flushing the associated WAL) when > > > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of > > > the ring, since dirtying buffers is their common case. A dynamic > > > resizing like the one you suggest would likely devolve to vacuum and > > > bulkwrite strategies always using the max size. > > > > I think it should self stabilize around the point where the WAL is > > either flushed by other commit activity, WAL writer or WAL buffers > > filling up. Writing out their own dirtied buffers will still happen, > > just the associated WAL flushes will be in larger chunks and possibly > > done by other processes. > > They will have to write out any WAL associated with modifications to the > dirty buffer before flushing it, so I'm not sure I understand how this > would work. By the time the dirty buffer needs eviction the WAL associated with it can already be written out by concurrent commits, WAL writer or by WAL buffers filling up. The bigger the ring is, the higher the chance that one of these will happen before we loop around. > > > As for decreasing the ring size, buffers are only "added" to the ring > > > lazily and, technically, as it is now, buffers which have been added > > > added to the ring can always be reclaimed by the clocksweep (as long as > > > they are not pinned). The buffer access strategy is more of a > > > self-imposed restriction than it is a reservation. Since the ring is > > > small and the buffers are being frequently reused, odds are the usage > > > count will be 1 and we will be the one who set it to 1, but there is no > > > guarantee. If, when attempting to reuse the buffer, its usage count is > > > > 1 (or it is pinned), we also will kick it out of the ring and go look > > > for a replacement buffer. > > > > Right, but while the buffer is actively used by the ring it is > > unlikely that clocksweep will find it at usage 0 as the ring buffer > > should cycle more often than the clocksweep. Whereas if the ring stops > > using a buffer, clocksweep will eventually come and reclaim it. And if > > the ring shrinking decision turns out to be wrong before the > > clocksweep gets around to reusing it, we can bring the same buffer > > back into the ring. > > I can see what you mean about excluding a buffer from the ring being a > more effective way of allowing it to be reclaimed. However, I'm not sure > I understand the use case. If the operation, say vacuum, is actively > using the buffer and keeping its usage count at one, then what would be > the criteria for it to decide to stop using it? The criteria for reducing ring size could be that we have cycled the ring buffer n times without having to do any WAL flushes. > Also, if vacuum used the buffer once and then didn't reuse it but, for > some reason, the vacuum isn't over, it isn't any different at that point > than some other buffer with a usage count of one. It isn't any harder > for it to be reclaimed by the clocksweep. > > The argument I could see for decreasing the size even when the buffers > are being used by the operation employing the strategy is if there is > pressure from other workloads to use those buffers. But, designing a > system that would reclaim buffers when needed by other workloads is more > complicated than what is being proposed here. I don't think any specific reclaim is needed, if the ring stops using a buffer *and* there is pressure from other workloads the buffer will get used for other stuff by the normal clocksweep. If the ring keeps using it then the normal clocksweep is highly unlikely to find it with usage count 0. If there is no concurrent allocation pressure, the ring can start using it again if that turns out to be necessary (probably should still check that it hasn't been reused by someone else). -- Ants Aasma Senior Database Engineer www.cybertec-postgresql.com
On Tue, Mar 21, 2023 at 6:03 AM Ants Aasma <ants@cybertec.at> wrote: > > On Mon, 20 Mar 2023 at 00:59, Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma <ants@cybertec.at> wrote: > > > > > > On Wed, 15 Mar 2023 at 02:29, Melanie Plageman > > > <melanieplageman@gmail.com> wrote: > > > > As for routine vacuuming and the other buffer access strategies, I think > > > > there is an argument for configurability based on operator knowledge -- > > > > perhaps your workload will use the data you are COPYing as soon as the > > > > COPY finishes, so you might as well disable a buffer access strategy or > > > > use a larger fraction of shared buffers. Also, the ring sizes were > > > > selected sixteen years ago and average server memory and data set sizes > > > > have changed. > > > > > > To be clear I'm not at all arguing against configurability. I was > > > thinking that dynamic use could make the configuration simpler by self > > > tuning to use no more buffers than is useful. > > > > Yes, but I am struggling with how we would define "useful". > > For copy and vacuum, the only reason I can see for keeping visited > buffers around is to avoid flushing WAL or at least doing it in larger > batches. Once the ring is big enough that WAL doesn't need to be > flushed on eviction, making it bigger only wastes space that could be > used by something that is not going to be evicted soon. Well, I think if you know you will use the data you are COPYing right away in your normal workload, it could be useful to have the ring be large or to disable use of the ring. And, for vacuum, if you need to get it done as quickly as possible, again, it could be useful to have the ring be large or disable use of the ring. > > > > StrategyRejectBuffer() will allow bulkreads to, as you say, use more > > > > buffers than the original ring size, since it allows them to kick > > > > dirty buffers out of the ring and claim new shared buffers. > > > > > > > > Bulkwrites and vacuums, however, will inevitably dirty buffers and > > > > require flushing the buffer (and thus flushing the associated WAL) when > > > > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of > > > > the ring, since dirtying buffers is their common case. A dynamic > > > > resizing like the one you suggest would likely devolve to vacuum and > > > > bulkwrite strategies always using the max size. > > > > > > I think it should self stabilize around the point where the WAL is > > > either flushed by other commit activity, WAL writer or WAL buffers > > > filling up. Writing out their own dirtied buffers will still happen, > > > just the associated WAL flushes will be in larger chunks and possibly > > > done by other processes. > > > > They will have to write out any WAL associated with modifications to the > > dirty buffer before flushing it, so I'm not sure I understand how this > > would work. > > By the time the dirty buffer needs eviction the WAL associated with it > can already be written out by concurrent commits, WAL writer or by WAL > buffers filling up. The bigger the ring is, the higher the chance that > one of these will happen before we loop around. Ah, I think I understand the idea now. So, I think it is an interesting idea to try and find the goldilocks size for the ring buffer. It is especially interesting to me in the case in which we are enlarging the ring. However, given that concurrent workload variability, machine I/O latency fluctuations, etc, we will definitely have to set a max value of some kind anyway for the ring size. So, this seems more like a complimentary feature to vacuum_buffer_usage_limit. If we added some kind of adaptive sizing in a later version, we could emphasize in the guidance for setting vacuum_buffer_usage_limit that it is the *maximum* size you would like to allow vacuum to use. And, of course, there are the other operations which use buffer access strategies. > > > > As for decreasing the ring size, buffers are only "added" to the ring > > > > lazily and, technically, as it is now, buffers which have been added > > > > added to the ring can always be reclaimed by the clocksweep (as long as > > > > they are not pinned). The buffer access strategy is more of a > > > > self-imposed restriction than it is a reservation. Since the ring is > > > > small and the buffers are being frequently reused, odds are the usage > > > > count will be 1 and we will be the one who set it to 1, but there is no > > > > guarantee. If, when attempting to reuse the buffer, its usage count is > > > > > 1 (or it is pinned), we also will kick it out of the ring and go look > > > > for a replacement buffer. > > > > > > Right, but while the buffer is actively used by the ring it is > > > unlikely that clocksweep will find it at usage 0 as the ring buffer > > > should cycle more often than the clocksweep. Whereas if the ring stops > > > using a buffer, clocksweep will eventually come and reclaim it. And if > > > the ring shrinking decision turns out to be wrong before the > > > clocksweep gets around to reusing it, we can bring the same buffer > > > back into the ring. > > > > I can see what you mean about excluding a buffer from the ring being a > > more effective way of allowing it to be reclaimed. However, I'm not sure > > I understand the use case. If the operation, say vacuum, is actively > > using the buffer and keeping its usage count at one, then what would be > > the criteria for it to decide to stop using it? > > The criteria for reducing ring size could be that we have cycled the > ring buffer n times without having to do any WAL flushes. > > > Also, if vacuum used the buffer once and then didn't reuse it but, for > > some reason, the vacuum isn't over, it isn't any different at that point > > than some other buffer with a usage count of one. It isn't any harder > > for it to be reclaimed by the clocksweep. > > > > The argument I could see for decreasing the size even when the buffers > > are being used by the operation employing the strategy is if there is > > pressure from other workloads to use those buffers. But, designing a > > system that would reclaim buffers when needed by other workloads is more > > complicated than what is being proposed here. > > I don't think any specific reclaim is needed, if the ring stops using > a buffer *and* there is pressure from other workloads the buffer will > get used for other stuff by the normal clocksweep. If the ring keeps > using it then the normal clocksweep is highly unlikely to find it with > usage count 0. If there is no concurrent allocation pressure, the ring > can start using it again if that turns out to be necessary (probably > should still check that it hasn't been reused by someone else). Yes, you don't need a specific reclaim mechanism. But you would want to be quite conservative about decreasing the ring size (given workload variation and machine variations such as bursting in the cloud) and probably not do so simply because the operation using the strategy doesn't absolutely need the buffer but also because other concurrent workloads really need the buffer. And, it seems complicated to determine if other workloads do need the buffer. - Melanie
On Mon, 20 Mar 2023 at 11:50, Melanie Plageman <melanieplageman@gmail.com> wrote: > Attached is an updated v6. I had a look over the v6-0001 patch. There are a few things I think could be done better: "Some operations will access a large number of pages at a time", does this really need "at a time"? I think it's more relevant that the operation uses a large number of pages. Missing <firstterm> around Buffer Access Strategy. Various things could be linked to other sections of the glossary, e.g. pages could link to glossary-data-page, shared buffers could link to glossary-shared-memory and WAL could link to glossary-wal. The final paragraph should have <command> tags around the various commands that you list. I have adjusted those and slightly reworded a few other things. See the attached .diff which can be applied atop of v6-0001. David
Attachment
On Thu, Mar 30, 2023 at 11:54 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 20 Mar 2023 at 11:50, Melanie Plageman > <melanieplageman@gmail.com> wrote: > > Attached is an updated v6. > > I had a look over the v6-0001 patch. There are a few things I think > could be done better: > > "Some operations will access a large number of pages at a time", does > this really need "at a time"? I think it's more relevant that the > operation uses a large number of pages. > > Missing <firstterm> around Buffer Access Strategy. > > Various things could be linked to other sections of the glossary, e.g. > pages could link to glossary-data-page, shared buffers could link to > glossary-shared-memory and WAL could link to glossary-wal. > > The final paragraph should have <command> tags around the various > commands that you list. > > I have adjusted those and slightly reworded a few other things. See > the attached .diff which can be applied atop of v6-0001. There was one small typo keeping this from compiling. Also a repeated word. I've fixed these. I also edited a bit of indentation and tweaked some wording. Diff attached (to be applied on top of your diff). - Melanie
Attachment
On Sat, 1 Apr 2023 at 02:52, Melanie Plageman <melanieplageman@gmail.com> wrote: > There was one small typo keeping this from compiling. Also a repeated > word. I've fixed these. I also edited a bit of indentation and tweaked > some wording. Diff attached (to be applied on top of your diff). Thanks for fixing that mistake. For reference, I had changed things to end lines early so that the glossterm tags could be on a line of their own without breaking to a new line. The rest of the file seems to be done that way, so I thought we'd better stick to it. I swapped out "associated WAL" for "unflushed WAL". I didn't agree that the WAL that would be flushed would have any particular association with the to-be-written page. I dropped CTAS since I didn't see any other mention in the docs about that. I could maybe see the sense in making reference to the abbreviated form if we were going to mention it again and didn't want to spell the whole thing out each time, but that's not the case here. I pushed the result. David
On Fri, Mar 31, 2023 at 5:47 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 1 Apr 2023 at 02:52, Melanie Plageman <melanieplageman@gmail.com> wrote: > > There was one small typo keeping this from compiling. Also a repeated > > word. I've fixed these. I also edited a bit of indentation and tweaked > > some wording. Diff attached (to be applied on top of your diff). > > Thanks for fixing that mistake. > > For reference, I had changed things to end lines early so that the > glossterm tags could be on a line of their own without breaking to a > new line. The rest of the file seems to be done that way, so I thought > we'd better stick to it. > > I swapped out "associated WAL" for "unflushed WAL". I didn't agree > that the WAL that would be flushed would have any particular > association with the to-be-written page. > > I dropped CTAS since I didn't see any other mention in the docs about > that. I could maybe see the sense in making reference to the > abbreviated form if we were going to mention it again and didn't want > to spell the whole thing out each time, but that's not the case here. > > I pushed the result. Cool! I've attached v7 with that commit dropped and with support for parallel vacuum workers to use the same number of buffers in their own Buffer Access Strategy ring as the main vacuum phase did. I also updated the docs to indicate that vacuum_buffer_usage_limit is per backend (not per instance of VACUUM). - Melanie
Attachment
On Sat, 1 Apr 2023 at 12:57, Melanie Plageman <melanieplageman@gmail.com> wrote: > I've attached v7 with that commit dropped and with support for parallel > vacuum workers to use the same number of buffers in their own Buffer > Access Strategy ring as the main vacuum phase did. I also updated the > docs to indicate that vacuum_buffer_usage_limit is per backend (not per > instance of VACUUM). (was just replying about v6-0002 when this came in. Replying here instead) For v7-0001, can we just get rid of both of those static globals? I'm gobsmacked by the existing "A few variables that don't seem worth passing around as parameters" comment. Not wanting to pass parameters around is a horrible excuse for adding global variables, even static ones. Attached is what I propose in .diff form so that the CFbot can run on your v7 patches without picking this up. I considered if we could switch memory contexts before calling expand_vacuum_rel() and get_all_vacuum_rels(), but I see, at least in the case of expand_vacuum_rel() that we'd probably want to list_free() the output of find_all_inheritors() to save that from leaking into the vac_context. It seems safe just to switch into the vac_context only when we really want to keep that memory around. (I do think switching in each iteration of the foreach(part_lc, part_oids) loop is excessive, however. Just not enough for me to want to change it) David
Attachment
On Sat, Apr 01, 2023 at 01:05:19PM +1300, David Rowley wrote: > Attached is what I propose in .diff form so that the CFbot can run on > your v7 patches without picking this up. But it processes .diff, too https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F
On Fri, Mar 31, 2023 at 8:05 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 1 Apr 2023 at 12:57, Melanie Plageman <melanieplageman@gmail.com> wrote: > > I've attached v7 with that commit dropped and with support for parallel > > vacuum workers to use the same number of buffers in their own Buffer > > Access Strategy ring as the main vacuum phase did. I also updated the > > docs to indicate that vacuum_buffer_usage_limit is per backend (not per > > instance of VACUUM). > > (was just replying about v6-0002 when this came in. Replying here instead) > > For v7-0001, can we just get rid of both of those static globals? I'm > gobsmacked by the existing "A few variables that don't seem worth > passing around as parameters" comment. Not wanting to pass parameters > around is a horrible excuse for adding global variables, even static > ones. Makes sense to me. > Attached is what I propose in .diff form so that the CFbot can run on > your v7 patches without picking this up. Your diff LGTM. Earlier upthread in [1], Bharath had mentioned in a review comment about removing the global variables that he would have expected the analogous global in analyze.c to also be removed (vac_strategy [and analyze.c also has anl_context]). I looked into doing this, and this is what I found out (see full rationale in [2]): > it is a bit harder to remove it from analyze because acquire_func > doesn't take the buffer access strategy as a parameter and > acquire_sample_rows uses the vac_context global variable to pass to > table_scan_analyze_next_block(). I don't know if this is worth mentioning in the commit removing the other globals? Maybe it will just make it more confusing... > I considered if we could switch memory contexts before calling > expand_vacuum_rel() and get_all_vacuum_rels(), but I see, at least in > the case of expand_vacuum_rel() that we'd probably want to list_free() > the output of find_all_inheritors() to save that from leaking into the > vac_context. It seems safe just to switch into the vac_context only > when we really want to keep that memory around. (I do think switching > in each iteration of the foreach(part_lc, part_oids) loop is > excessive, however. Just not enough for me to want to change it) Yes, I see what you mean. Your decision makes sense to me. - Melanie [1] https://www.postgresql.org/message-id/CALj2ACXKgAQpKsCPi6ox%2BK5JLDB9TAxeObyVOfrmgTjqmc0aAA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAAKRu_brtqmd4e7kwEeKjySP22y4ywF32M7pvpi%2Bx5txgF0%2Big%40mail.gmail.com
Hi, I was just doing some cleanup on the main patch in this set and realized that it was missing a few things. One of which is forbidding the BUFFER_USAGE_LIMIT with VACUUM FULL since VACUUM FULL does not use a BAS_VACUUM strategy. VACUUM FULL technically uses a bulkread buffer access strategy for reading the original relation if its number of blocks is > number of shared buffers / 4 (see initscan()). The new rel writing is done using smgrextend/write directly and doesn't go through shared buffers. I think it is a stretch to try and use the size passed in to VACUUM by BUFFER_USAGE_LIMIT for the bulkread strategy ring. As for forbidding the combination, I noticed that when VACUUM FULL is specified with INDEX_CLEANUP OFF, there is no syntax error but the INDEX_CLEANUP option is simply ignored. This is documented behavior. I somehow feel like VACUUM (FULL, BUFFER_USAGE_LIMIT 'x') should error out instead of silently not using the buffer usage limit, though. I am looking for others' opinions. - Melanie
On Sat, Apr 01, 2023 at 01:29:13PM -0400, Melanie Plageman wrote: > Hi, > > I was just doing some cleanup on the main patch in this set and realized > that it was missing a few things. One of which is forbidding the > BUFFER_USAGE_LIMIT with VACUUM FULL since VACUUM FULL does not use a > BAS_VACUUM strategy. > > VACUUM FULL technically uses a bulkread buffer access strategy for > reading the original relation if its number of blocks is > number of > shared buffers / 4 (see initscan()). The new rel writing is done using > smgrextend/write directly and doesn't go through shared buffers. I > think it is a stretch to try and use the size passed in to VACUUM by > BUFFER_USAGE_LIMIT for the bulkread strategy ring. When you say that it's a stretch, do you mean that it'd be a pain to add arguments to handful of functions to pass down the setting ? Or that it's unclear if doing so would be the desirable/needed/intended/expected behavior ? I think if VACUUM FULL were going to allow a configurable strategy size, then so should CLUSTER. But it seems fine if they don't. I wonder if maybe strategy should be configurable in some more generic way, like a GUC. At one point I had a patch to allow INSERT to use strategy buffers (not just INSERT SELECT). And that's still pretty desirable. Also COPY. I've seen load spikes caused by pg_dumping tables which are just below 25% of shared_buffers. Which is exacerbated because pg_dump deliberately orders tables by size, so those tables are dumped one after another, each causing eviction of ~20% of shared buffers. And exacerbated some more because TOAST don't seem to use a ring buffer in that case. > I somehow feel like VACUUM (FULL, BUFFER_USAGE_LIMIT 'x') should error > out instead of silently not using the buffer usage limit, though. > > I am looking for others' opinions. Sorry, no opinion here :) One thing is that it's fine to take something that previously throw an error and change it to not throw an error anymore. But it's undesirable to do the opposite. For that reason, there's may be a tendency to add errors for cases like this. -- Justin
On Sat, Apr 1, 2023 at 1:57 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sat, Apr 01, 2023 at 01:29:13PM -0400, Melanie Plageman wrote: > > Hi, > > > > I was just doing some cleanup on the main patch in this set and realized > > that it was missing a few things. One of which is forbidding the > > BUFFER_USAGE_LIMIT with VACUUM FULL since VACUUM FULL does not use a > > BAS_VACUUM strategy. > > > > VACUUM FULL technically uses a bulkread buffer access strategy for > > reading the original relation if its number of blocks is > number of > > shared buffers / 4 (see initscan()). The new rel writing is done using > > smgrextend/write directly and doesn't go through shared buffers. I > > think it is a stretch to try and use the size passed in to VACUUM by > > BUFFER_USAGE_LIMIT for the bulkread strategy ring. > > When you say that it's a stretch, do you mean that it'd be a pain to add > arguments to handful of functions to pass down the setting ? Or that > it's unclear if doing so would be the desirable/needed/intended/expected > behavior ? More that I don't think it makes sense. VACUUM FULL only uses a buffer access strategy (BAS_BULKREAD) for reading the original relation in and not for writing the new one. It has different concerns because its behavior is totally different from regular vacuum. It is not modifying the original buffers (AFAIK) and the amount of WAL it is generating is different. Also, no matter what, the new relation won't be in shared buffers because of VACUUM FULL using the smgr functions directly. So, I think that allowing the two options together is confusing for the user because it seems to imply we can give them some benefit that we cannot. > I wonder if maybe strategy should be configurable in some more generic > way, like a GUC. At one point I had a patch to allow INSERT to use > strategy buffers (not just INSERT SELECT). And that's still pretty > desirable. Also COPY. I've seen load spikes caused by pg_dumping > tables which are just below 25% of shared_buffers. Which is exacerbated > because pg_dump deliberately orders tables by size, so those tables are > dumped one after another, each causing eviction of ~20% of shared > buffers. And exacerbated some more because TOAST don't seem to use a > ring buffer in that case. Yes, it is probably worth exploring how configurable or dynamic Buffer Access Strategies should be for other users (e.g. not just VACUUM). However, since the ring sizes wouldn't be the same for all the different operations, it is probably easier to start with a single kind of operation and go from there. > > I somehow feel like VACUUM (FULL, BUFFER_USAGE_LIMIT 'x') should error > > out instead of silently not using the buffer usage limit, though. > > > > I am looking for others' opinions. > > Sorry, no opinion here :) > > One thing is that it's fine to take something that previously throw an > error and change it to not throw an error anymore. But it's undesirable > to do the opposite. For that reason, there's may be a tendency to add > errors for cases like this. So, I have made it error out when you specify BUFFER_USAGE_LIMIT with VACUUM FULL or VACUUM ONLY_DATABASE_STATS. However, if you specify buffer_usage_limit -1 with either of these options, it will not error out. I don't love this, but I noticed that VACUUM (FULL, PARALLEL 0) does not error out, while VACUUM (FULL, PARALLEL X) where X > 0 does. If I want to error out when BUFFER_USAGE_LIMIT specified at all but still do so at the bottom of ExecVacuum() with the rest of the vacuum option sanity checking, I will probably need to add a flag bit for VacuumParams->options. I was wondering why some "sanity checking" of vacuum options is done in ExecVacuum() and some in vacuum() (it isn't just split by what is applicable to autovacuum and what isn't). I noticed that even in cases where we don't use the strategy object we still made it, which I thought seemed like a bit of a waste and easy to fix. I've added a commit which does not make the BufferAccessStrategy object when VACUUM FULL or VACUUM ONLY_DATABASE_STATS are specified. I noticed that we also don't use the strategy for VACUUM (PROCESS_MAIN false, PROCESS_TOAST false), but it didn't seem worth handling this very specific case, so I didn't. v8 attached has the prohibitions specified above (including for vacuumdb, as relevant) as well as some cleanup, added test cases, and updated documentation. 0001 is essentially unmodified (i.e. I didn't do anything with the other global variable David mentioned). I still have a few open questions: - what the initial value of ring_size for autovacuum should be (see the one remaining TODO in the code) - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc value when that is set? - should INDEX_CLEANUP off cause VACUUM to use shared buffers and disable use of a strategy (like failsafe vacuum) - should we add anything to VACUUM VERBOSE output about the number of reuses of strategy buffers? - Should we make BufferAccessStrategyData non-opaque so that we don't have to add a getter for nbuffers. I could have implemented this in another way, but I don't really see why BufferAccessStrategyData should be opaque - Melanie
Attachment
- v8-0001-remove-global-variable-vac_strategy.patch
- v8-0003-use-shared-buffers-when-failsafe-active.patch
- v8-0005-Add-VACUUM-BUFFER_USAGE_LIMIT-option-and-GUC.patch
- v8-0002-Don-t-make-vacuum-strategy-ring-when-unused.patch
- v8-0004-Rename-Buffer-Access-Strategy-ring_size-nbuffers.patch
- v8-0006-Add-buffer-usage-limit-option-to-vacuumdb.patch
On Sat, 1 Apr 2023 at 13:24, Melanie Plageman <melanieplageman@gmail.com> wrote: > Your diff LGTM. > > Earlier upthread in [1], Bharath had mentioned in a review comment about > removing the global variables that he would have expected the analogous > global in analyze.c to also be removed (vac_strategy [and analyze.c also > has anl_context]). > > I looked into doing this, and this is what I found out (see full > rationale in [2]): > > > it is a bit harder to remove it from analyze because acquire_func > > doesn't take the buffer access strategy as a parameter and > > acquire_sample_rows uses the vac_context global variable to pass to > > table_scan_analyze_next_block(). > > I don't know if this is worth mentioning in the commit removing the > other globals? Maybe it will just make it more confusing... I did look at that, but it seems a little tricky to make work unless the AcquireSampleRowsFunc signature was changed. To me, it just does not seem worth doing that to get rid of the two globals in analyze.c. I pushed the patch with just the vacuum.c changes. David
I've now pushed up v8-0004. Can rebase the remaining 2 patches on top of master again and resend? On Mon, 3 Apr 2023 at 08:11, Melanie Plageman <melanieplageman@gmail.com> wrote: > I still have a few open questions: > - what the initial value of ring_size for autovacuum should be (see the > one remaining TODO in the code) I assume you're talking about the 256KB BAS_VACUUM one set in GetAccessStrategy()? I don't think this patch should be doing anything to change those defaults. Anything that does that should likely have a new thread and come with analysis or reasoning about why the newly proposed defaults are better than the old ones. > - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc > value when that is set? That's a good question... > - should INDEX_CLEANUP off cause VACUUM to use shared buffers and > disable use of a strategy (like failsafe vacuum) I don't see why it should. It seems strange to have one option magically make changes to some other option. > - should we add anything to VACUUM VERBOSE output about the number of > reuses of strategy buffers? Sounds like this would require an extra array of counter variables in BufferAccessStrategyData? I think it might be a bit late to start experimenting with this. > - Should we make BufferAccessStrategyData non-opaque so that we don't > have to add a getter for nbuffers. I could have implemented this in > another way, but I don't really see why BufferAccessStrategyData > should be opaque If nothing outside of the .c file requires access then there's little need to make the members known outside of the file. Same as you'd want to make classes private rather than public when possible in OOP. If you do come up with a reason to be able to determine the size of the BufferAccessStrategy from outside freelist.c, I'd say an accessor method is the best way. David David
On Mon, Apr 3, 2023 at 1:09 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 1 Apr 2023 at 13:24, Melanie Plageman <melanieplageman@gmail.com> wrote: > > Your diff LGTM. > > > > Earlier upthread in [1], Bharath had mentioned in a review comment about > > removing the global variables that he would have expected the analogous > > global in analyze.c to also be removed (vac_strategy [and analyze.c also > > has anl_context]). > > > > I looked into doing this, and this is what I found out (see full > > rationale in [2]): > > > > > it is a bit harder to remove it from analyze because acquire_func > > > doesn't take the buffer access strategy as a parameter and > > > acquire_sample_rows uses the vac_context global variable to pass to > > > table_scan_analyze_next_block(). > > > > I don't know if this is worth mentioning in the commit removing the > > other globals? Maybe it will just make it more confusing... > > I did look at that, but it seems a little tricky to make work unless > the AcquireSampleRowsFunc signature was changed. To me, it just does > not seem worth doing that to get rid of the two globals in analyze.c. Yes, I came to basically the same conclusion. On Mon, Apr 3, 2023 at 7:57 AM David Rowley <dgrowleyml@gmail.com> wrote: > > I've now pushed up v8-0004. Can rebase the remaining 2 patches on top > of master again and resend? v9 attached. > On Mon, 3 Apr 2023 at 08:11, Melanie Plageman <melanieplageman@gmail.com> wrote: > > I still have a few open questions: > > - what the initial value of ring_size for autovacuum should be (see the > > one remaining TODO in the code) > > I assume you're talking about the 256KB BAS_VACUUM one set in > GetAccessStrategy()? I don't think this patch should be doing anything > to change those defaults. Anything that does that should likely have > a new thread and come with analysis or reasoning about why the newly > proposed defaults are better than the old ones. I actually was talking about something much more trivial but a little more confusing. In table_recheck_autovac(), I initialize the autovac_table->at_params.ring_size to the value of the vacuum_buffer_usage_limit guc. However, autovacuum makes its own BufferAccessStrategy object (instead of relying on vacuum() to do it) and passes that in to vacuum(). So, if we wanted autovacuum to disable use of a strategy (and use as many shared buffers as it likes), it would pass in NULL to vacuum(). If vauum_buffer_usage_limit is not 0, then we would end up making and using a BufferAccessStrategy in vacuum(). If we instead initialized autovac_table->at_params.ring_size to 0, even if the passed in BufferAccessStrategy is NULL, we wouldn't make a ring for autovacuum. Right now, we don't disable the strategy for autovacuum except in failsafe mode. And it is unclear when or why we would want to. I also thought it might be weird to have the value of the ring_size be initialized to something other than the value of vacuum_buffer_usage_limit for autovacuum, since it is supposed to use that guc value. In fact, right now, we don't use the autovac_table->at_params.ring_size set in table_recheck_autovac() when making the ring in do_autovacuum() but instead use the guc directly. I actually don't really like how vacuum() relies on the BufferAccessStrategy parameter being NULL for autovacuum and feel like there is a more intuitive way to handle all this. But, I didn't want to make major changes at this point. Anyway, the above is quite a bit more analysis than the issue is really worth. We should pick something and then document it in a comment. > > - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc > > value when that is set? > > That's a good question... I kinda think we should just skip it. It adds to the surface area of the feature. > > - should INDEX_CLEANUP off cause VACUUM to use shared buffers and > > disable use of a strategy (like failsafe vacuum) > > I don't see why it should. It seems strange to have one option > magically make changes to some other option. Sure, sounds good. > > - should we add anything to VACUUM VERBOSE output about the number of > > reuses of strategy buffers? > > Sounds like this would require an extra array of counter variables in > BufferAccessStrategyData? I think it might be a bit late to start > experimenting with this. Makes sense. I hadn't thought through the implementation. We count reuses in pg_stat_io data structures but that is global and not per BufferAccessStrategyData instance, so I agree to scrapping this idea. > > - Should we make BufferAccessStrategyData non-opaque so that we don't > > have to add a getter for nbuffers. I could have implemented this in > > another way, but I don't really see why BufferAccessStrategyData > > should be opaque > > If nothing outside of the .c file requires access then there's little > need to make the members known outside of the file. Same as you'd want > to make classes private rather than public when possible in OOP. > > If you do come up with a reason to be able to determine the size of > the BufferAccessStrategy from outside freelist.c, I'd say an accessor > method is the best way. In the main patch, I wanted access to the number of buffers so that parallel vacuum workers could make their own rings the same size. I added an accessor, but it looked a bit silly so I thought I would ask if we needed to keep the data structure opaque. It isn't called frequently enough to worry about the function call overhead. Though the accessor could use a better name than the one I chose. - Melanie
Attachment
On Tue, 4 Apr 2023 at 02:49, Melanie Plageman <melanieplageman@gmail.com> wrote: > v9 attached. I've made a pass on the v9-0001 patch only. Here's what I noted down: v9-0001: 1. In the documentation and comments, generally we always double-space after a period. I see quite often you're not following this. 2. Doc: We could generally seem to break tags within paragraphs into multiple lines. You're doing that quite a bit, e.g: linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>. <literal>0</literal> will disable use of a 2. This is not a command <command>BUFFER_USAGE_LIMIT</command> parameter. <option> is probably what you want. 3. I'm not sure I agree that it's a good idea to refer to the strategy with multiple different names. Here you've called it a "ring buffer", but in the next sentence, you're calling it a Buffer Access Strategy. Specifies the ring buffer size for <command>VACUUM</command>. This size is used to calculate the number of shared buffers which will be reused as part of a <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>. <literal>0</literal> disables use of a 4. Can you explain your choice in not just making < 128 a hard error rather than clamping? I guess it means checks like this are made more simple, but that does not seem like a good enough reason: /* check for out-of-bounds */ if (result < -1 || result > MAX_BAS_RING_SIZE_KB) postgres=# vacuum (parallel -1) pg_class; ERROR: parallel workers for vacuum must be between 0 and 1024 Maybe the above is a good guide to follow. To allow you to get rid of the clamping code, you'd likely need an assign hook function for vacuum_buffer_usage_limit. 5. I see vacuum.sgml is full of inconsistencies around the use of <literal> vs <option>. I was going to complain about your: <literal>ONLY_DATABASE_STATS</literal> option. If <literal>ANALYZE</literal> is also specified, the <literal>BUFFER_USAGE_LIMIT</literal> value is used for both the vacuum but I see you've likely just copied what's nearby. There are also plenty of usages of <option> in that file. I'd rather see you use <option>. Maybe there can be some other patch that sweeps the entire docs to look for <literal>OPTION_NAME</literal> and replaces them to use <option>. 6. I was surprised to see you've added both GetAccessStrategyWithSize() and GetAccessStrategyWithNBuffers(). I think the former is suitable for both. GetAccessStrategyWithNBuffers() seems to be just used once outside of freelist.c 7. I don't think bas_nbuffers() is a good name for an external function. StrategyGetBufferCount() seems better. 8. I don't quite follow this comment: /* * TODO: should this be 0 so that we are sure that vacuum() never * allocates a new bstrategy for us, even if we pass in NULL for that * parameter? maybe could change how failsafe NULLs out bstrategy if * so? */ Can you explain under what circumstances would vacuum() allocate a bstrategy when do_autovacuum() would not? Is this a case of a config reload where someone changes vacuum_buffer_usage_limit from 0 to something non-zero? If so, perhaps do_autovacuum() needs to detect this and allocate a strategy rather than having vacuum() do it once per table (wastefully). 9. buffer/README. I think it might be overkill to document details about how the new vacuum option works in a section talking about Buffer Ring Replacement Strategy. Perhaps it just worth something like: "In v16, the 256KB ring was made configurable by way of the vacuum_buffer_usage_limit GUC and the BUFFER_USAGE_LIMIT VACUUM option." 10. I think if you do #4 then you can get rid of all the range checks and DEBUG1 elogs in GetAccessStrategyWithSize(). 11. This seems a bit badly done: int vacuum_buffer_usage_limit = -1; int VacuumCostPageHit = 1; /* GUC parameters for vacuum */ int VacuumCostPageMiss = 2; int VacuumCostPageDirty = 20; I'd class vacuum_buffer_usage_limit as a "GUC parameters for vacuum" too. Probably the CamelCase naming should be followed too. 12. ANALYZE too? {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, gettext_noop("Sets the buffer pool size for VACUUM and autovacuum."), 13. VacuumParams.ring_size has no comments explaining what it is. 14. vacuum_buffer_usage_limit seems to be lumped in with unrelated GUCs extern PGDLLIMPORT int maintenance_work_mem; extern PGDLLIMPORT int max_parallel_maintenance_workers; +extern PGDLLIMPORT int vacuum_buffer_usage_limit; extern PGDLLIMPORT int VacuumCostPageHit; extern PGDLLIMPORT int VacuumCostPageMiss; 15. No comment explaining what these are: #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) #define MIN_BAS_RING_SIZE_KB 128 16. Parameter names in function declaration and definition don't match in: extern BufferAccessStrategy GetAccessStrategyWithNBuffers(BufferAccessStrategyType btype, int nbuffers); extern BufferAccessStrategy GetAccessStrategyWithSize(BufferAccessStrategyType btype, int nbuffers); Also, line wraps at 79 chars. (80 including line feed) 17. If you want to test the 16GB upper limit, maybe going 1KB (or 8KB?) rather than 1GB over 16GB is better? 2097153kB, I think. VACUUM (BUFFER_USAGE_LIMIT '17 GB') vac_option_tab; David
On Mon, Apr 3, 2023 at 8:37 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 4 Apr 2023 at 02:49, Melanie Plageman <melanieplageman@gmail.com> wrote: > > v9 attached. > > I've made a pass on the v9-0001 patch only. Here's what I noted down: Thanks for the review! Attached v10 addresses the review feedback below. Remaining TODOs: - tests - do something about config reload changing GUC > v9-0001: > > 1. In the documentation and comments, generally we always double-space > after a period. I see quite often you're not following this. I've gone through and done this. I noticed after building the docs that it doesn't seem to affect how many spaces are after a period in the rendered docs, but I suppose it affects readability when editing the sgml files. > 2. Doc: We could generally seem to break tags within paragraphs into > multiple lines. You're doing that quite a bit, e.g: > > linkend="glossary-buffer-access-strategy">Buffer Access > Strategy</glossterm>. <literal>0</literal> will disable use of a I've updated all of the ones I could find that I did this with. > 2. This is not a command > > <command>BUFFER_USAGE_LIMIT</command> parameter. > > <option> is probably what you want. I have gone through and attempted to correct all option/command/application tag usages. > 3. I'm not sure I agree that it's a good idea to refer to the strategy > with multiple different names. Here you've called it a "ring buffer", > but in the next sentence, you're calling it a Buffer Access Strategy. > > Specifies the ring buffer size for <command>VACUUM</command>. This size > is used to calculate the number of shared buffers which will be reused as > part of a <glossterm linkend="glossary-buffer-access-strategy">Buffer > Access Strategy</glossterm>. <literal>0</literal> disables use of a I've updated this to always prefix any use of ring with "Buffer Access Strategy". I don't know how you'll feel about it. It felt awkward in some places to use Buffer Access Strategy as a complete stand-in for ring buffer. > 4. Can you explain your choice in not just making < 128 a hard error > rather than clamping? > > I guess it means checks like this are made more simple, but that does > not seem like a good enough reason: > > /* check for out-of-bounds */ > if (result < -1 || result > MAX_BAS_RING_SIZE_KB) > > postgres=# vacuum (parallel -1) pg_class; > ERROR: parallel workers for vacuum must be between 0 and 1024 > > Maybe the above is a good guide to follow. > > To allow you to get rid of the clamping code, you'd likely need an > assign hook function for vacuum_buffer_usage_limit. I've added a check hook and replicated the same restrictions in ExecVacuum() where it parses the limit. I have included enforcement of the conditional limit that the ring cannot occupy more than 1/8 of shared buffers. The immediate consequence of this was that my tests were no longer stable (except for the integer overflow one). I have removed them for now until I can come up with a better testing strategy. On the topic of testing, I also thought we should remove the VACUUM(BUFFER_USAGE_LIMIT X, PARALLEL X) test. Though the parallel workers do make their own strategy rings and such a test would be covering some code, I am hesitant to write a test that would never really fail. The observable behavior of not using a strategy will be either 1) basically nothing or 2) the same for parallel and non-parallel. What do you think? > 5. I see vacuum.sgml is full of inconsistencies around the use of > <literal> vs <option>. I was going to complain about your: > > <literal>ONLY_DATABASE_STATS</literal> option. If > <literal>ANALYZE</literal> is also specified, the > <literal>BUFFER_USAGE_LIMIT</literal> value is used for both the vacuum > > but I see you've likely just copied what's nearby. > > There are also plenty of usages of <option> in that file. I'd rather > see you use <option>. Maybe there can be some other patch that sweeps > the entire docs to look for <literal>OPTION_NAME</literal> and > replaces them to use <option>. I haven't done the separate sweep patch, but I have updated my own usages in this set. > 6. I was surprised to see you've added both > GetAccessStrategyWithSize() and GetAccessStrategyWithNBuffers(). I > think the former is suitable for both. GetAccessStrategyWithNBuffers() > seems to be just used once outside of freelist.c This has been updated and reorganized. > 7. I don't think bas_nbuffers() is a good name for an external > function. StrategyGetBufferCount() seems better. I've used this name. > 8. I don't quite follow this comment: > > /* > * TODO: should this be 0 so that we are sure that vacuum() never > * allocates a new bstrategy for us, even if we pass in NULL for that > * parameter? maybe could change how failsafe NULLs out bstrategy if > * so? > */ > > Can you explain under what circumstances would vacuum() allocate a > bstrategy when do_autovacuum() would not? Is this a case of a config > reload where someone changes vacuum_buffer_usage_limit from 0 to > something non-zero? If so, perhaps do_autovacuum() needs to detect > this and allocate a strategy rather than having vacuum() do it once > per table (wastefully). Hmm. Yes, I started hacking on this, but I think it might be a bit tricky to get right. I think it would make sense to check if vacuum_buffer_usage_limit goes from 0 to not 0 or from not 0 to 0 and allow disabling and enabling the buffer access strategy, however, I'm not sure we want to allow changing the size during an autovacuum worker's run. I started writing code to just allow enabling and disabling, but I'm a little concerned that the distinction will be difficult to understand for the user with no obvious indication of what is happening. That is, you change the size and it silently does nothing, but you set it to/from 0 and it silently does something? One alternative for now is to save the ring size before looping through the relations in do_autovacuum() and always restore that value in tab->at_params.ring_size in table_recheck_autovac(). I'm not sure what to do. > 9. buffer/README. I think it might be overkill to document details > about how the new vacuum option works in a section talking about > Buffer Ring Replacement Strategy. Perhaps it just worth something > like: > > "In v16, the 256KB ring was made configurable by way of the > vacuum_buffer_usage_limit GUC and the BUFFER_USAGE_LIMIT VACUUM > option." I've made the change you suggested. > 10. I think if you do #4 then you can get rid of all the range checks > and DEBUG1 elogs in GetAccessStrategyWithSize(). Done. > 11. This seems a bit badly done: > > int vacuum_buffer_usage_limit = -1; > > int VacuumCostPageHit = 1; /* GUC parameters for vacuum */ > int VacuumCostPageMiss = 2; > int VacuumCostPageDirty = 20; > > I'd class vacuum_buffer_usage_limit as a "GUC parameters for vacuum" > too. Probably the CamelCase naming should be followed too. I've made this change. > 12. ANALYZE too? > > {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, > gettext_noop("Sets the buffer pool size for VACUUM and autovacuum."), I've mentioned this here and also added the option for ANALYZE. > 13. VacuumParams.ring_size has no comments explaining what it is. I've added one. > 14. vacuum_buffer_usage_limit seems to be lumped in with unrelated GUCs > > extern PGDLLIMPORT int maintenance_work_mem; > extern PGDLLIMPORT int max_parallel_maintenance_workers; > +extern PGDLLIMPORT int vacuum_buffer_usage_limit; > > extern PGDLLIMPORT int VacuumCostPageHit; > extern PGDLLIMPORT int VacuumCostPageMiss; I've moved it down a line. > 15. No comment explaining what these are: > > #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) > #define MIN_BAS_RING_SIZE_KB 128 I've added one. > 16. Parameter names in function declaration and definition don't match in: > > extern BufferAccessStrategy > GetAccessStrategyWithNBuffers(BufferAccessStrategyType btype, int > nbuffers); > extern BufferAccessStrategy > GetAccessStrategyWithSize(BufferAccessStrategyType btype, int > nbuffers); I've fixed this. > Also, line wraps at 79 chars. (80 including line feed) I've fixed that function prototype instance of it. In general line wrap limit + pgindent can be quite challenging. I often break something onto multiple lines to appease the line limit and then pgindent will add an absurd number of tabs to align the second line in a way that looks truly awful. I try to make local variables when this is a problem, but it is often quite annoying to do that. I wish there was some way to make pgindent do something different in these cases. > 17. If you want to test the 16GB upper limit, maybe going 1KB (or > 8KB?) rather than 1GB over 16GB is better? 2097153kB, I think. > > VACUUM (BUFFER_USAGE_LIMIT '17 GB') vac_option_tab; I've removed this test for now until I figure out a way to actually hit this reliably with different-sized shared buffers. - Melanie
Attachment
On Wed, 5 Apr 2023 at 05:53, Melanie Plageman <melanieplageman@gmail.com> wrote: > Attached v10 addresses the review feedback below. Thanks. Here's another round on v10-0001: 1. The following documentation fragment does not seem to be aligned with the code: <literal>16 GB</literal>. The minimum size is the lesser of 1/8 the size of shared buffers and <literal>128 KB</literal>. The default value is <literal>-1</literal>. If this value is specified The relevant code is: static inline int StrategyGetClampedBufsize(int bufsize_kb) { int sb_limit_kb; int blcksz_kb = BLCKSZ / 1024; Assert(blcksz_kb > 0); sb_limit_kb = NBuffers / 8 * blcksz_kb; return Min(sb_limit_kb, bufsize_kb); } The code seems to mean that the *maximum* is the lesser of 16GB and shared_buffers / 8. You're saying it's the minimum. 2. I think you could get rid of the double "Buffer Access Stategy" in: <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>. <literal>0</literal> will disable use of a <literal>Buffer Access Strategy</literal>. <literal>-1</literal> will set the size to a default of <literal>256 KB</literal>. The maximum size is how about: <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>. A setting of <literal>0</literal> will allow the operation to use any number of <varname>shared_buffers</varname>, whereas <literal>-1</literal> will set the size to a default of <literal>256 KB</literal>. The maximum size is 3. In the following snippet you can use <xref linkend="sql-vacuum"/> or just <command>VACUUM</command>. There are examples of both in that file. I don't have a preference as it which, but I think what you've got isn't great. <link linkend="sql-vacuum"><command>VACUUM</command></link> and <link linkend="sql-analyze"><command>ANALYZE</command></link> 4. I wonder if there's a reason this needs to be written in the overview of ANALYZE. <command>ANALYZE</command> uses a <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm> when reading in the sample data. The number of buffers consumed for this can be controlled by <xref linkend="guc-vacuum-buffer-usage-limit"/> or by using the <option>BUFFER_USAGE_LIMIT</option> option. I think it's fine just to mention it under BUFFER_USAGE_LIMIT. It just does not seem fundamental enough to be worth being upfront about it. The other things mentioned in that section don't seem related to parameters, so there might be no better place for those to go. That's not the case for what you're adding here. 5. I think I'd rather see the details spelt out here rather than telling the readers to look at what VACUUM does: Specifies the <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm> ring buffer size for <command>ANALYZE</command>. See the <link linkend="sql-vacuum"><command>VACUUM</command></link> option with the same name. 6. When I asked about restricting the valid values of vacuum_buffer_usage_limit to -1 / 0 or 128 KB to 16GB, I didn't expect you to code in the NBuffers / 8 check. We shouldn't chain dependencies between GUCs like that. Imagine someone editing their postgresql.conf after realising shared_buffers is too large for their RAM, they reduce it and restart. The database fails to start because vacuum_buffer_usage_limit is too large! Angry DBA? Take what's already written about vacuum_failsafe_age as your guidance on this: "The default is 1.6 billion transactions. Although users can set this value anywhere from zero to 2.1 billion, VACUUM will silently adjust the effective value to no less than 105% of autovacuum_freeze_max_age." Here we just document the silent capping. You can still claim the valid range is 128KB to 16GB in the docs. You can mention the 1/8th of shared buffers cap same as what's mentioned about "105%" above. When I mentioned #4 and #10 in my review of the v9-0001 patch, I just wanted to not surprise users who do vacuum_buffer_usage_limit = 64 and magically get 128. 7. In ExecVacuum(), similar to #6 from above, it's also not great that you're raising an ERROR based on if StrategyGetClampedBufsize() clamps or not. If someone has a script that does: VACUUM (BUFFER_USAGE_LIMIT '1 GB'); it might be annoying that it stops working when someone adjusts shared buffers from 10GB to 6GB. I really think the NBuffers / 8 clamping just should be done inside GetAccessStrategyWithSize(). 8. I think this ERROR in vacuum.c should mention that 0 is a valid value. ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("buffer_usage_limit for a vacuum must be between %d KB and %d KB", MIN_BAS_RING_SIZE_KB, MAX_BAS_RING_SIZE_KB))); I doubt there's a need to mention -1 as that's the same as not specifying BUFFER_USAGE_LIMIT. 9. The following might be worthy of a comment explaining the order of precedence of how we choose the size: if (params->ring_size == -1) { if (VacuumBufferUsageLimit == -1) bstrategy = GetAccessStrategy(BAS_VACUUM); else bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); } else bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, params->ring_size); 10. I wonder if you need to keep bufsize_limit_to_nbuffers(). It's just used once and seems trivial enough just to write that code inside GetAccessStrategyWithSize(). 11. It's probably worth putting the valid range in the sample config: #vacuum_buffer_usage_limit = -1 # size of vacuum and analyze buffer access strategy ring. # -1 to use default, # 0 to disable vacuum buffer access strategy # > 0 to specify size <-- here 12. Is bufmgr.h the right place for these? /* * Upper and lower hard limits for the Buffer Access Strategy ring size * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT option * to VACUUM and ANALYZE. */ #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) #define MIN_BAS_RING_SIZE_KB 128 Your comment implies they're VACUUM / ANALYZE limits. If we want to impose these limits to all access strategies then these seem like good names and location, otherwise, I imagine miscadmin.h is the correct place. If so, they'll likely want to be renamed to something more VACUUM specific. I don't particularly have a preference. 128 - 1677216 seem like reasonable limits for any buffer access strategy. 13. I think check_vacuum_buffer_usage_limit() does not belong in freelist.c. Maybe vacuum.c? 14. Not related to this patch, but why do we have half the vacuum related GUCs in vacuum.c and the other half in globals.c? I see vacuum_freeze_table_age is defined in vacuum.c but is also needed in autovacuum.c, so that rules out the globals.c ones being for vacuum.c and autovacuum.c. It seems a bit messy. I'm not really sure where VacuumBufferUsageLimit should go now. > Remaining TODOs: > - tests > - do something about config reload changing GUC Shouldn't table_recheck_autovac() pfree/palloc a new strategy if the size changes? I'm not sure what the implications are with that and the other patch you're working on to allow vacuum config changes mid-vacuum. We'll need to be careful and not immediately break that if that gets committed then this does or vice-versa. David
On Tue, Apr 4, 2023 at 8:14 PM David Rowley <dgrowleyml@gmail.com> wrote: > 14. Not related to this patch, but why do we have half the vacuum > related GUCs in vacuum.c and the other half in globals.c? I see > vacuum_freeze_table_age is defined in vacuum.c but is also needed in > autovacuum.c, so that rules out the globals.c ones being for vacuum.c > and autovacuum.c. It seems a bit messy. I'm not really sure where > VacuumBufferUsageLimit should go now. vacuum_freeze_table_age is an abomination, even compared to the rest of these GUCs. It was added around the time the visibility map first went in, and so is quite a bit more recent than autovacuum_freeze_max_age. Before the introduction of the visibility map, we only had autovacuum_freeze_max_age, and it was used to schedule antiwraparound autovacuums -- there was no such thing as aggressive VACUUMs (just antiwraparound autovacuums), and no need for autovacuum_freeze_max_age at all. So autovacuum_freeze_max_age was just for autovacuum.c code. There was only one type of VACUUM, and they always advanced relfrozenxid to the same degree. With the introduction of the visibility map, we needed to have autovacuum_freeze_max_age in vacuum.c for the first time, to deal with interpreting the then-new vacuum_freeze_table_age GUC correctly. We silently truncate the vacuum_freeze_table_age setting so that it never exceeds 95% of autovacuum_freeze_max_age (the 105%-of-autovacuum_freeze_max_age vacuum_failsafe_age thing that you're discussing is symmetric). So after 2009 autovacuum_freeze_max_age actually plays an important role in VACUUM, the command, and not just autovacuum. This is related to the problem of the autovacuum_freeze_max_age reloption being completely broken [1]. If autovacuum_freeze_max_age was still purely just an autovacuum.c scheduling thing, then there would be no problem with having a separate reloption of the same name. There are big problems precisely because vacuum.c doesn't do anything with the autovacuum_freeze_max_age reloption. Though it does okay with the autovacuum_freeze_table_age reloption, which gets passed in. (Yes, it's called autovacuum_freeze_table_age in reloption form -- not vacuum_freeze_table_age, like the GUC). Note that the decision to ignore the reloption version of autovacuum_freeze_max_age in the failsafe's 105%-of-autovacuum_freeze_max_age thing was a deliberate one. The autovacuum_freeze_max_age GUC is authoritative in the sense that it cannot be overridden locally, except in the direction of making aggressive VACUUMs happen more frequently for a given table (so they can't be less frequent via reloption configuration). I suppose you'd have to untangle that mess if you wanted to fix the autovacuum_freeze_max_age reloption issue I go into in [1]. [1] https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com -- Peter Geoghegan
Hi, On 2023-04-04 13:53:15 -0400, Melanie Plageman wrote: > > 8. I don't quite follow this comment: > > > > /* > > * TODO: should this be 0 so that we are sure that vacuum() never > > * allocates a new bstrategy for us, even if we pass in NULL for that > > * parameter? maybe could change how failsafe NULLs out bstrategy if > > * so? > > */ > > > > Can you explain under what circumstances would vacuum() allocate a > > bstrategy when do_autovacuum() would not? Is this a case of a config > > reload where someone changes vacuum_buffer_usage_limit from 0 to > > something non-zero? If so, perhaps do_autovacuum() needs to detect > > this and allocate a strategy rather than having vacuum() do it once > > per table (wastefully). Hm. I don't much like that we use a single strategy for multiple tables today. That way even tiny tables never end up in shared_buffers. But that's really a discussion for a different thread. However, if were to use a per-table bstrategy, it'd be a lot easier to react to changes of the config. I doubt it's worth adding complications to the code for changing the size of the ringbuffer during an ongoing vacuum scan, at least for 16. Reacting to enabling/disbling the ringbuffer alltogether seems a bit more important, but still not crucial compared to making it configurable at all. I think it'd be OK to add a comment saying something like "XXX: In the future we might want to react to configuration changes of the ring buffer size during a vacuum" or such. WRT to the TODO specifically: Yes, passing in 0 seems to make sense. I don't see a reason not to do so? But perhaps there's a better solution: Perhaps the best solution for autovac vs interactive vacuum issue would be to move the allocation of the bstrategy to ExecVacuum()? Random note while looking at the code: ISTM that adding handling of -1 in GetAccessStrategyWithSize() would make the code more readable. Instead of if (params->ring_size == -1) { if (VacuumBufferUsageLimit == -1) bstrategy = GetAccessStrategy(BAS_VACUUM); else bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); } else bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, params->ring_size); you could just have something like: bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, params->ring_size != -1 ? params->ring_size : VacuumBufferUsageLimit); by falling back to the default values from GetAccessStrategy(). Or even more "extremely", you could entirely remove references to VacuumBufferUsageLimit and handle that in freelist.c Greetings, Andres Freund
v11 attached with updates detailed below. On Tue, Apr 4, 2023 at 11:14 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 5 Apr 2023 at 05:53, Melanie Plageman <melanieplageman@gmail.com> wrote: > > Attached v10 addresses the review feedback below. > > Thanks. Here's another round on v10-0001: > > 1. The following documentation fragment does not seem to be aligned > with the code: > > <literal>16 GB</literal>. The minimum size is the lesser > of 1/8 the size of shared buffers and <literal>128 KB</literal>. The > default value is <literal>-1</literal>. If this value is specified > > The relevant code is: > > static inline int > StrategyGetClampedBufsize(int bufsize_kb) > { > int sb_limit_kb; > int blcksz_kb = BLCKSZ / 1024; > > Assert(blcksz_kb > 0); > > sb_limit_kb = NBuffers / 8 * blcksz_kb; > > return Min(sb_limit_kb, bufsize_kb); > } > > The code seems to mean that the *maximum* is the lesser of 16GB and > shared_buffers / 8. You're saying it's the minimum. Good catch. Fixed. > 2. I think you could get rid of the double "Buffer Access Stategy" in: > > <glossterm linkend="glossary-buffer-access-strategy">Buffer > Access Strategy</glossterm>. > <literal>0</literal> will disable use of a <literal>Buffer > Access Strategy</literal>. > <literal>-1</literal> will set the size to a default of > <literal>256 KB</literal>. The maximum size is > > how about: > > <glossterm linkend="glossary-buffer-access-strategy">Buffer > Access Strategy</glossterm>. > A setting of <literal>0</literal> will allow the operation to use any > number of <varname>shared_buffers</varname>, whereas > <literal>-1</literal> will set the size to a default of > <literal>256 KB</literal>. The maximum size is I've made these changes. > 3. In the following snippet you can use <xref linkend="sql-vacuum"/> > or just <command>VACUUM</command>. There are examples of both in that > file. I don't have a preference as it which, but I think what you've > got isn't great. > > <link linkend="sql-vacuum"><command>VACUUM</command></link> and > <link linkend="sql-analyze"><command>ANALYZE</command></link> I've updated it to use the link. I thought it would be nice to have the link in case the reader wants to look at the BUFFER_USAGE_LIMIT option docs there. > 4. I wonder if there's a reason this needs to be written in the > overview of ANALYZE. > > <command>ANALYZE</command> uses a > <glossterm linkend="glossary-buffer-access-strategy">Buffer Access > Strategy</glossterm> > when reading in the sample data. The number of buffers consumed for this can > be controlled by <xref linkend="guc-vacuum-buffer-usage-limit"/> or by using > the <option>BUFFER_USAGE_LIMIT</option> option. > > I think it's fine just to mention it under BUFFER_USAGE_LIMIT. It just > does not seem fundamental enough to be worth being upfront about it. > The other things mentioned in that section don't seem related to > parameters, so there might be no better place for those to go. That's > not the case for what you're adding here. I updated this. > 5. I think I'd rather see the details spelt out here rather than > telling the readers to look at what VACUUM does: > > Specifies the > <glossterm linkend="glossary-buffer-access-strategy">Buffer > Access Strategy</glossterm> > ring buffer size for <command>ANALYZE</command>. See the > <link linkend="sql-vacuum"><command>VACUUM</command></link> option with > the same name. I've updated it to contain the same text, as relevant, as the VACUUM option contains. Note that both rely on the vacuum_buffer_usage_limit GUC documentation for a description of upper and lower bounds. > 6. When I asked about restricting the valid values of > vacuum_buffer_usage_limit to -1 / 0 or 128 KB to 16GB, I didn't expect > you to code in the NBuffers / 8 check. We shouldn't chain > dependencies between GUCs like that. Imagine someone editing their > postgresql.conf after realising shared_buffers is too large for their > RAM, they reduce it and restart. The database fails to start because > vacuum_buffer_usage_limit is too large! Angry DBA? > > Take what's already written about vacuum_failsafe_age as your guidance on this: > > "The default is 1.6 billion transactions. Although users can set this > value anywhere from zero to 2.1 billion, VACUUM will silently adjust > the effective value to no less than 105% of > autovacuum_freeze_max_age." > > Here we just document the silent capping. You can still claim the > valid range is 128KB to 16GB in the docs. You can mention the 1/8th of > shared buffers cap same as what's mentioned about "105%" above. > > When I mentioned #4 and #10 in my review of the v9-0001 patch, I just > wanted to not surprise users who do vacuum_buffer_usage_limit = 64 and > magically get 128. > 7. In ExecVacuum(), similar to #6 from above, it's also not great that > you're raising an ERROR based on if StrategyGetClampedBufsize() clamps > or not. If someone has a script that does: > > VACUUM (BUFFER_USAGE_LIMIT '1 GB'); it might be annoying that it stops > working when someone adjusts shared buffers from 10GB to 6GB. > > I really think the NBuffers / 8 clamping just should be done inside > GetAccessStrategyWithSize(). Got it. I've done what you suggested. I had some logic issues as well that I fixed and reorderd the code. > 8. I think this ERROR in vacuum.c should mention that 0 is a valid value. > > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("buffer_usage_limit for a vacuum must be between %d KB and %d KB", > MIN_BAS_RING_SIZE_KB, MAX_BAS_RING_SIZE_KB))); > > I doubt there's a need to mention -1 as that's the same as not > specifying BUFFER_USAGE_LIMIT. I've done this. I didn't say that 0 meant disabling the strategy. Do you think that would be useful? > 9. The following might be worthy of a comment explaining the order of > precedence of how we choose the size: > > if (params->ring_size == -1) > { > if (VacuumBufferUsageLimit == -1) > bstrategy = GetAccessStrategy(BAS_VACUUM); > else > bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); > } > else > bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, params->ring_size); I've updated this. Also, after doing so, I realized the if/else logic here and in ExecVacuum() could be better and updated the ordering to more closely mirror the human readable logic. > 10. I wonder if you need to keep bufsize_limit_to_nbuffers(). It's > just used once and seems trivial enough just to write that code inside > GetAccessStrategyWithSize(). I've gotten rid of it. > 11. It's probably worth putting the valid range in the sample config: > > #vacuum_buffer_usage_limit = -1 # size of vacuum and analyze buffer > access strategy ring. > # -1 to use default, > # 0 to disable vacuum buffer access strategy > # > 0 to specify size <-- here Done. > 12. Is bufmgr.h the right place for these? > > /* > * Upper and lower hard limits for the Buffer Access Strategy ring size > * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT option > * to VACUUM and ANALYZE. > */ > #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) > #define MIN_BAS_RING_SIZE_KB 128 > > Your comment implies they're VACUUM / ANALYZE limits. If we want to > impose these limits to all access strategies then these seem like good > names and location, otherwise, I imagine miscadmin.h is the correct > place. If so, they'll likely want to be renamed to something more > VACUUM specific. I don't particularly have a preference. 128 - > 1677216 seem like reasonable limits for any buffer access strategy. I don't assert on these limits in GetAccessStrategyWithSize(), and, since the rest of the code is mainly only concerned with vacuum, I think it is better to make these limits vacuum-specific. If we decide to make other access strategies configurable, we can generalize these macros then. As such, I have moved them into miscadmin.h. > 13. I think check_vacuum_buffer_usage_limit() does not belong in > freelist.c. Maybe vacuum.c? I've moved it to vacuum.c. I put it above ExecVacuum() since that would be correct alphabetically, but I'm not sure if it would be better to move it down since ExecVacuum() is the "main entry point". > 14. Not related to this patch, but why do we have half the vacuum > related GUCs in vacuum.c and the other half in globals.c? I see > vacuum_freeze_table_age is defined in vacuum.c but is also needed in > autovacuum.c, so that rules out the globals.c ones being for vacuum.c > and autovacuum.c. It seems a bit messy. I'm not really sure where > VacuumBufferUsageLimit should go now. I've left it where it is and added a (helpful?) comment. > > Remaining TODOs: > > - tests > > - do something about config reload changing GUC > > Shouldn't table_recheck_autovac() pfree/palloc a new strategy if the > size changes? See thoughts about that below in response to Andres' mail. > I'm not sure what the implications are with that and the other patch > you're working on to allow vacuum config changes mid-vacuum. We'll > need to be careful and not immediately break that if that gets > committed then this does or vice-versa. We can think hard about this. If we go with adding a TODO for the size, and keeping the same ring, it won't be a problem. On Wed, Apr 5, 2023 at 1:05 PM Andres Freund <andres@anarazel.de> wrote: > On 2023-04-04 13:53:15 -0400, Melanie Plageman wrote: > > > 8. I don't quite follow this comment: > > > > > > /* > > > * TODO: should this be 0 so that we are sure that vacuum() never > > > * allocates a new bstrategy for us, even if we pass in NULL for that > > > * parameter? maybe could change how failsafe NULLs out bstrategy if > > > * so? > > > */ > > > > > > Can you explain under what circumstances would vacuum() allocate a > > > bstrategy when do_autovacuum() would not? Is this a case of a config > > > reload where someone changes vacuum_buffer_usage_limit from 0 to > > > something non-zero? If so, perhaps do_autovacuum() needs to detect > > > this and allocate a strategy rather than having vacuum() do it once > > > per table (wastefully). > > Hm. I don't much like that we use a single strategy for multiple tables > today. That way even tiny tables never end up in shared_buffers. But that's > really a discussion for a different thread. However, if were to use a > per-table bstrategy, it'd be a lot easier to react to changes of the config. Agreed. I was wondering if it is okay to do the palloc()/pfree() for every table given that some may be small. > I doubt it's worth adding complications to the code for changing the size of > the ringbuffer during an ongoing vacuum scan, at least for 16. Reacting to > enabling/disbling the ringbuffer alltogether seems a bit more important, but > still not crucial compared to making it configurable at all. > > I think it'd be OK to add a comment saying something like "XXX: In the future > we might want to react to configuration changes of the ring buffer size during > a vacuum" or such. I've added the XXX to the autovacuum code. I think you mean it also could be considered for VACUUM, but I've refrained from mentioning that for now. > WRT to the TODO specifically: Yes, passing in 0 seems to make sense. I don't > see a reason not to do so? But perhaps there's a better solution: I've done that (passed in a 0), I was concerned that future code may reference this vacuum param and expect it to be aligned with the Buffer Access Strategy in use. Really only vacuum() should be referencing the params, so, perhaps it is not an issue... Okay, now I've convinced myself that it is better to allocate the strategy in ExecVacuum(). Then we can get rid of the VacuumParams->ring_size altogether. I haven't done that in this version because of the below concern (re: it being appropriate to allocate the strategy in ExecVacuum() given its current concern/focus). > Perhaps the best solution for autovac vs interactive vacuum issue would be to > move the allocation of the bstrategy to ExecVacuum()? Thought about this -- I did think it might be a bit weird since ExecVacuum() mainly does option handling and sanity checking. Doing Buffer Access Strategy allocation seemed a bit out of place. I've left it as is, but would be happy to change it if the consensus is that this is better. > Random note while looking at the code: > ISTM that adding handling of -1 in GetAccessStrategyWithSize() would make the > code more readable. Instead of > > if (params->ring_size == -1) > { > if (VacuumBufferUsageLimit == -1) > bstrategy = GetAccessStrategy(BAS_VACUUM); > else > bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); > } > else > bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, params->ring_size); > > you could just have something like: > bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, > params->ring_size != -1 ? params->ring_size : VacuumBufferUsageLimit); > > by falling back to the default values from GetAccessStrategy(). > > Or even more "extremely", you could entirely remove references to > VacuumBufferUsageLimit and handle that in freelist.c Hmm. I see what you mean. I've updated it to this, which is a bit better. if (params->ring_size > -1) bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, params->ring_size); else if (VacuumBufferUsageLimit > -1) bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); else bstrategy = GetAccessStrategy(BAS_VACUUM); Not referencing VacuumBufferUsageLimit except in freelist.c is more challenging because I think it would be weird to have GetAccessStrategyWithSize() call GetAccessStrategy() which then calls GetAccessStrategyWithSize(). - Melanie
Attachment
On Wed, Apr 5, 2023 at 1:05 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-04-04 13:53:15 -0400, Melanie Plageman wrote: > > > 8. I don't quite follow this comment: > > > > > > /* > > > * TODO: should this be 0 so that we are sure that vacuum() never > > > * allocates a new bstrategy for us, even if we pass in NULL for that > > > * parameter? maybe could change how failsafe NULLs out bstrategy if > > > * so? > > > */ > > > > > > Can you explain under what circumstances would vacuum() allocate a > > > bstrategy when do_autovacuum() would not? Is this a case of a config > > > reload where someone changes vacuum_buffer_usage_limit from 0 to > > > something non-zero? If so, perhaps do_autovacuum() needs to detect > > > this and allocate a strategy rather than having vacuum() do it once > > > per table (wastefully). > > Hm. I don't much like that we use a single strategy for multiple tables > today. That way even tiny tables never end up in shared_buffers. But that's > really a discussion for a different thread. However, if were to use a > per-table bstrategy, it'd be a lot easier to react to changes of the config. > > > I doubt it's worth adding complications to the code for changing the size of > the ringbuffer during an ongoing vacuum scan, at least for 16. Reacting to > enabling/disbling the ringbuffer alltogether seems a bit more important, but > still not crucial compared to making it configurable at all. > > I think it'd be OK to add a comment saying something like "XXX: In the future > we might want to react to configuration changes of the ring buffer size during > a vacuum" or such. > > WRT to the TODO specifically: Yes, passing in 0 seems to make sense. I don't > see a reason not to do so? But perhaps there's a better solution: > > Perhaps the best solution for autovac vs interactive vacuum issue would be to > move the allocation of the bstrategy to ExecVacuum()? So, I started looking into allocating the bstrategy in ExecVacuum(). While doing so, I was trying to understand if the "sanity checking" in vacuum() could possibly apply to autovacuum, and I don't really see how. AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS. We could move those sanity checks up into ExecVacuum(). I also noticed that we make the vac_context in vacuum() which says it is for "cross-transaction storage". We use it for the buffer access strategy and for the newrels relation list created in vacuum(). Then we delete it at the end of vacuum(). Autovacuum workers already make a similar kind of memory context called AutovacMemCxt in do_autovacuum() which the comment says is for the list of relations to vacuum/analyze across transactions. What if we made ExecVacuum() make its own memory context and both it and do_autovacuum() pass that memory context (along with the buffer access strategy they make) to vacuum(), which then uses the memory context in the same way it does now? It simplifies the buffer usage limit patchset and also seems a bit more clear than what is there now? - Melanie
Hi, On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote: > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund <andres@anarazel.de> wrote: > > Perhaps the best solution for autovac vs interactive vacuum issue would be to > > move the allocation of the bstrategy to ExecVacuum()? > > So, I started looking into allocating the bstrategy in ExecVacuum(). > > While doing so, I was trying to understand if the "sanity checking" in > vacuum() could possibly apply to autovacuum, and I don't really see how. > > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS. > > We could move those sanity checks up into ExecVacuum(). Would make sense. ISTM that eventually most of what currently happens in vacuum() should be in ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So it just seems to make more sense to move those parts to ExecVacuum(). > I also noticed that we make the vac_context in vacuum() which says it is > for "cross-transaction storage". We use it for the buffer access > strategy and for the newrels relation list created in vacuum(). Then we > delete it at the end of vacuum(). > Autovacuum workers already make a similar kind of memory context called > AutovacMemCxt in do_autovacuum() which the comment says is for the list > of relations to vacuum/analyze across transactions. AutovacMemCxt seems to be a bit longer lived / cover more than the context created in vacuum(). It's where all the hash tables etc live that do_autovacuum() uses to determine what to vacuum. Note that do_autovacuum() also creates: /* * create a memory context to act as fake PortalContext, so that the * contexts created in the vacuum code are cleaned up for each table. */ PortalContext = AllocSetContextCreate(AutovacMemCxt, "Autovacuum Portal", ALLOCSET_DEFAULT_SIZES); which is then what vacuum() creates the "Vacuum" context in. > What if we made ExecVacuum() make its own memory context and both it and > do_autovacuum() pass that memory context (along with the buffer access > strategy they make) to vacuum(), which then uses the memory context in > the same way it does now? Maybe? It's not clear to me why it'd be a win. > It simplifies the buffer usage limit patchset and also seems a bit more > clear than what is there now? I don't really see what it'd make simpler? The context in vacuum() is used for just that vacuum - we couldn't just use AutovacMemCxt, as that'd live much longer (for all the tables a autovac worker processes). Greetings, Andres Freund
On Wed, 5 Apr 2023 at 16:37, Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Apr 4, 2023 at 8:14 PM David Rowley <dgrowleyml@gmail.com> wrote: > > 14. Not related to this patch, but why do we have half the vacuum > > related GUCs in vacuum.c and the other half in globals.c? I see > > vacuum_freeze_table_age is defined in vacuum.c but is also needed in > > autovacuum.c, so that rules out the globals.c ones being for vacuum.c > > and autovacuum.c. It seems a bit messy. I'm not really sure where > > VacuumBufferUsageLimit should go now. > > vacuum_freeze_table_age is an abomination, even compared to the rest > of these GUCs. It was added around the time the visibility map first > went in, and so is quite a bit more recent than > autovacuum_freeze_max_age. > > Before the introduction of the visibility map, we only had > autovacuum_freeze_max_age, and it was used to schedule antiwraparound > autovacuums -- there was no such thing as aggressive VACUUMs (just > antiwraparound autovacuums), and no need for autovacuum_freeze_max_age > at all. So autovacuum_freeze_max_age was just for autovacuum.c code. > There was only one type of VACUUM, and they always advanced > relfrozenxid to the same degree. > > With the introduction of the visibility map, we needed to have > autovacuum_freeze_max_age in vacuum.c for the first time, to deal with > interpreting the then-new vacuum_freeze_table_age GUC correctly. We > silently truncate the vacuum_freeze_table_age setting so that it never > exceeds 95% of autovacuum_freeze_max_age (the > 105%-of-autovacuum_freeze_max_age vacuum_failsafe_age thing that > you're discussing is symmetric). So after 2009 > autovacuum_freeze_max_age actually plays an important role in VACUUM, > the command, and not just autovacuum. > > This is related to the problem of the autovacuum_freeze_max_age > reloption being completely broken [1]. If autovacuum_freeze_max_age > was still purely just an autovacuum.c scheduling thing, then there > would be no problem with having a separate reloption of the same name. > There are big problems precisely because vacuum.c doesn't do anything > with the autovacuum_freeze_max_age reloption. Though it does okay with > the autovacuum_freeze_table_age reloption, which gets passed in. (Yes, > it's called autovacuum_freeze_table_age in reloption form -- not > vacuum_freeze_table_age, like the GUC). > > Note that the decision to ignore the reloption version of > autovacuum_freeze_max_age in the failsafe's > 105%-of-autovacuum_freeze_max_age thing was a deliberate one. The > autovacuum_freeze_max_age GUC is authoritative in the sense that it > cannot be overridden locally, except in the direction of making > aggressive VACUUMs happen more frequently for a given table (so they > can't be less frequent via reloption configuration). I suppose you'd > have to untangle that mess if you wanted to fix the > autovacuum_freeze_max_age reloption issue I go into in [1]. > > [1] https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com I read this twice yesterday and again this morning. It looks like you're taking an opportunity to complain/vent about vacuum_freeze_table_age and didn't really answer my query about why all the vacuum GUCs aren't defined in the one file. I'd just picked vacuum_freeze_table_age as a random one from vacuum.c to raise the point about the inconsistency about the GUC locations. I don't think this is the place to raise concerns with existing GUCs, but if you did have a point about the GUCs locations, then you might need to phase it differently as I didn't catch it. David
On Wed, Apr 5, 2023 at 2:33 PM David Rowley <dgrowleyml@gmail.com> wrote: > I read this twice yesterday and again this morning. It looks like > you're taking an opportunity to complain/vent about > vacuum_freeze_table_age and didn't really answer my query about why > all the vacuum GUCs aren't defined in the one file. I'd just picked > vacuum_freeze_table_age as a random one from vacuum.c to raise the > point about the inconsistency about the GUC locations. I thought that the point was obvious. Which is: the current situation with the locations of these GUCs came about because the division between autovacuum and VACUUM used to be a lot clearer, but that changed. Without the locations of the GUCs also changing. More generally, the current structure has lots of problems. And so it seems to me that you're probably not wrong to suspect that it just doesn't make much sense to keep them in different files now. -- Peter Geoghegan
On Wed, Apr 5, 2023 at 5:15 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote: > > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund <andres@anarazel.de> wrote: > > > Perhaps the best solution for autovac vs interactive vacuum issue would be to > > > move the allocation of the bstrategy to ExecVacuum()? > > > > So, I started looking into allocating the bstrategy in ExecVacuum(). > > > > While doing so, I was trying to understand if the "sanity checking" in > > vacuum() could possibly apply to autovacuum, and I don't really see how. > > > > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or > > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS. > > > > We could move those sanity checks up into ExecVacuum(). > > Would make sense. > > ISTM that eventually most of what currently happens in vacuum() should be in > ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So it > just seems to make more sense to move those parts to ExecVacuum(). I've done that in the attached wip patch. It is perhaps too much of a change, I dunno. > > I also noticed that we make the vac_context in vacuum() which says it is > > for "cross-transaction storage". We use it for the buffer access > > strategy and for the newrels relation list created in vacuum(). Then we > > delete it at the end of vacuum(). > > > Autovacuum workers already make a similar kind of memory context called > > AutovacMemCxt in do_autovacuum() which the comment says is for the list > > of relations to vacuum/analyze across transactions. > > AutovacMemCxt seems to be a bit longer lived / cover more than the context > created in vacuum(). It's where all the hash tables etc live that > do_autovacuum() uses to determine what to vacuum. > > Note that do_autovacuum() also creates: > > /* > * create a memory context to act as fake PortalContext, so that the > * contexts created in the vacuum code are cleaned up for each table. > */ > PortalContext = AllocSetContextCreate(AutovacMemCxt, > "Autovacuum Portal", > ALLOCSET_DEFAULT_SIZES); > > which is then what vacuum() creates the "Vacuum" context in. Yea, I realized that when writing the patch. > > What if we made ExecVacuum() make its own memory context and both it and > > do_autovacuum() pass that memory context (along with the buffer access > > strategy they make) to vacuum(), which then uses the memory context in > > the same way it does now? > > Maybe? It's not clear to me why it'd be a win. Less that it is a win and more that we need access to that memory context when allocating the buffer access strategy, so we would have had to make it in ExecVacuum(). And if we have already made it, we would need to pass it in to vacuum() for it to use. > > It simplifies the buffer usage limit patchset and also seems a bit more > > clear than what is there now? > > I don't really see what it'd make simpler? The context in vacuum() is used for > just that vacuum - we couldn't just use AutovacMemCxt, as that'd live much > longer (for all the tables a autovac worker processes). Autovacuum already made the BufferAccessStrategy in the AutovacMemCxt, so this is the same behavior. I simply made autovacuum_do_vac_analyze() make the per table vacuum memory context and pass that to vacuum(). So we have the same amount of memory context granularity as before. Attached patchset has some kind of isolation test failure due to a hard deadlock that I haven't figured out yet. I thought it was something with the "in_vacuum" static variable and having VACUUM or ANALYZE called when already in VACUUM or ANALYZE, but that variable is the same as in master. I've mostly shared it because I want to know if this approach is worth pursuing or not. Also, while working on it, I noticed that I made a mistake in the code that was committed in 4830f102 and didn't remember that we should still make a Buffer Access Strategy in the case of VACUUM (FULL, ANALYZE). Changing this: if (params->options & (VACOPT_ONLY_DATABASE_STATS | VACOPT_FULL)) == 0) to this: if ((params.options & VACOPT_ONLY_DATABASE_STATS) == 0 || (params.options & VACOPT_FULL && (params.options & VACOPT_ANALYZE) == 0) should fix it. - Melanie
Attachment
On Wed, Apr 5, 2023 at 6:55 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Wed, Apr 5, 2023 at 5:15 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote: > > > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund <andres@anarazel.de> wrote: > > > > Perhaps the best solution for autovac vs interactive vacuum issue would be to > > > > move the allocation of the bstrategy to ExecVacuum()? > > > > > > So, I started looking into allocating the bstrategy in ExecVacuum(). > > > > > > While doing so, I was trying to understand if the "sanity checking" in > > > vacuum() could possibly apply to autovacuum, and I don't really see how. > > > > > > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or > > > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS. > > > > > > We could move those sanity checks up into ExecVacuum(). > > > > Would make sense. > > > > ISTM that eventually most of what currently happens in vacuum() should be in > > ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So it > > just seems to make more sense to move those parts to ExecVacuum(). > > I've done that in the attached wip patch. It is perhaps too much of a > change, I dunno. > > > > I also noticed that we make the vac_context in vacuum() which says it is > > > for "cross-transaction storage". We use it for the buffer access > > > strategy and for the newrels relation list created in vacuum(). Then we > > > delete it at the end of vacuum(). > > > > > Autovacuum workers already make a similar kind of memory context called > > > AutovacMemCxt in do_autovacuum() which the comment says is for the list > > > of relations to vacuum/analyze across transactions. > > > > AutovacMemCxt seems to be a bit longer lived / cover more than the context > > created in vacuum(). It's where all the hash tables etc live that > > do_autovacuum() uses to determine what to vacuum. > > > > Note that do_autovacuum() also creates: > > > > /* > > * create a memory context to act as fake PortalContext, so that the > > * contexts created in the vacuum code are cleaned up for each table. > > */ > > PortalContext = AllocSetContextCreate(AutovacMemCxt, > > "Autovacuum Portal", > > ALLOCSET_DEFAULT_SIZES); > > > > which is then what vacuum() creates the "Vacuum" context in. > > Yea, I realized that when writing the patch. > > > > What if we made ExecVacuum() make its own memory context and both it and > > > do_autovacuum() pass that memory context (along with the buffer access > > > strategy they make) to vacuum(), which then uses the memory context in > > > the same way it does now? > > > > Maybe? It's not clear to me why it'd be a win. > > Less that it is a win and more that we need access to that memory > context when allocating the buffer access strategy, so we would have had > to make it in ExecVacuum(). And if we have already made it, we would > need to pass it in to vacuum() for it to use. > > > > It simplifies the buffer usage limit patchset and also seems a bit more > > > clear than what is there now? > > > > I don't really see what it'd make simpler? The context in vacuum() is used for > > just that vacuum - we couldn't just use AutovacMemCxt, as that'd live much > > longer (for all the tables a autovac worker processes). > > Autovacuum already made the BufferAccessStrategy in the AutovacMemCxt, > so this is the same behavior. I simply made autovacuum_do_vac_analyze() > make the per table vacuum memory context and pass that to vacuum(). So > we have the same amount of memory context granularity as before. > > Attached patchset has some kind of isolation test failure due to a hard > deadlock that I haven't figured out yet. I thought it was something with > the "in_vacuum" static variable and having VACUUM or ANALYZE called when > already in VACUUM or ANALYZE, but that variable is the same as in > master. > > I've mostly shared it because I want to know if this approach is worth > pursuing or not. Figured out how to fix the issue, though I'm not sure I understand how the issue can occur. use_own_xacts seems like it will always be true for autovacuum when it calls vacuum() and ExecVacuum() only calls vacuum() once, so I thought that I could make use_own_xacts a parameter to vacuum() and push up its calculation for VACUUM and ANALYZE into ExecVacuum(). This caused a deadlock, so there must be a way that in_vacuum is false but vacuum() is called in a nested context. Anyway, recalculating it every time in vacuum() fixes it. Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which includes a commit to fix the bug in master and a commit to move relevant code from vacuum() up into ExecVacuum(). The logic I suggested earlier for fixing the bug was...not right. Attached fix should be right? - Melanie
Attachment
On Thu, 6 Apr 2023 at 12:42, Melanie Plageman <melanieplageman@gmail.com> wrote: > Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which > includes a commit to fix the bug in master and a commit to move relevant > code from vacuum() up into ExecVacuum(). I'm still playing catch up to the moving of the pre-checks from vacuum() to ExecVacuum(). I'm now wondering... Is it intended that VACUUM t1,t2; now share the same strategy? Currently, in master, we'll allocate a new strategy for t2 after vacuuming t1. Does this not mean we'll now leave fewer t1 pages in shared_buffers because the reuse of the strategy will force them out with t2 pages? I understand there's nothing particularly invalid about that, but it is a change in behaviour that the patch seems to be making with very little consideration as to if it's better or worse. David
On Wed, Apr 5, 2023 at 9:15 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 6 Apr 2023 at 12:42, Melanie Plageman <melanieplageman@gmail.com> wrote: > > Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which > > includes a commit to fix the bug in master and a commit to move relevant > > code from vacuum() up into ExecVacuum(). > > I'm still playing catch up to the moving of the pre-checks from > vacuum() to ExecVacuum(). I'm now wondering... > > Is it intended that VACUUM t1,t2; now share the same strategy? > Currently, in master, we'll allocate a new strategy for t2 after > vacuuming t1. Does this not mean we'll now leave fewer t1 pages in > shared_buffers because the reuse of the strategy will force them out > with t2 pages? I understand there's nothing particularly invalid > about that, but it is a change in behaviour that the patch seems to be > making with very little consideration as to if it's better or worse. I'm pretty sure that in master we also reuse the strategy since we make it above this loop in vacuum() (and pass it in) /* * Loop to process each selected relation. */ foreach(cur, relations) { VacuumRelation *vrel = lfirst_node(VacuumRelation, cur); if (params->options & VACOPT_VACUUM) { if (!vacuum_rel(vrel->oid, vrel->relation, params, false, bstrategy)) continue; } On Wed, Apr 5, 2023 at 8:41 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Wed, Apr 5, 2023 at 6:55 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > On Wed, Apr 5, 2023 at 5:15 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2023-04-05 16:17:20 -0400, Melanie Plageman wrote: > > > > On Wed, Apr 5, 2023 at 1:05 PM Andres Freund <andres@anarazel.de> wrote: > > > > > Perhaps the best solution for autovac vs interactive vacuum issue would be to > > > > > move the allocation of the bstrategy to ExecVacuum()? > > > > > > > > So, I started looking into allocating the bstrategy in ExecVacuum(). > > > > > > > > While doing so, I was trying to understand if the "sanity checking" in > > > > vacuum() could possibly apply to autovacuum, and I don't really see how. > > > > > > > > AFAICT, autovacuum does not ever set VACOPT_DISABLE_PAGE_SKIPPING or > > > > VACOPT_FULL or VACOPT_ONLY_DATABASE_STATS. > > > > > > > > We could move those sanity checks up into ExecVacuum(). > > > > > > Would make sense. > > > > > > ISTM that eventually most of what currently happens in vacuum() should be in > > > ExecVacuum(). There's a lot of stuff that can't happen for autovacuum. So it > > > just seems to make more sense to move those parts to ExecVacuum(). > > > > I've done that in the attached wip patch. It is perhaps too much of a > > change, I dunno. > > > > > > I also noticed that we make the vac_context in vacuum() which says it is > > > > for "cross-transaction storage". We use it for the buffer access > > > > strategy and for the newrels relation list created in vacuum(). Then we > > > > delete it at the end of vacuum(). > > > > > > > Autovacuum workers already make a similar kind of memory context called > > > > AutovacMemCxt in do_autovacuum() which the comment says is for the list > > > > of relations to vacuum/analyze across transactions. > > > > > > AutovacMemCxt seems to be a bit longer lived / cover more than the context > > > created in vacuum(). It's where all the hash tables etc live that > > > do_autovacuum() uses to determine what to vacuum. > > > > > > Note that do_autovacuum() also creates: > > > > > > /* > > > * create a memory context to act as fake PortalContext, so that the > > > * contexts created in the vacuum code are cleaned up for each table. > > > */ > > > PortalContext = AllocSetContextCreate(AutovacMemCxt, > > > "Autovacuum Portal", > > > ALLOCSET_DEFAULT_SIZES); > > > > > > which is then what vacuum() creates the "Vacuum" context in. > > > > Yea, I realized that when writing the patch. > > > > > > What if we made ExecVacuum() make its own memory context and both it and > > > > do_autovacuum() pass that memory context (along with the buffer access > > > > strategy they make) to vacuum(), which then uses the memory context in > > > > the same way it does now? > > > > > > Maybe? It's not clear to me why it'd be a win. > > > > Less that it is a win and more that we need access to that memory > > context when allocating the buffer access strategy, so we would have had > > to make it in ExecVacuum(). And if we have already made it, we would > > need to pass it in to vacuum() for it to use. > > > > > > It simplifies the buffer usage limit patchset and also seems a bit more > > > > clear than what is there now? > > > > > > I don't really see what it'd make simpler? The context in vacuum() is used for > > > just that vacuum - we couldn't just use AutovacMemCxt, as that'd live much > > > longer (for all the tables a autovac worker processes). > > > > Autovacuum already made the BufferAccessStrategy in the AutovacMemCxt, > > so this is the same behavior. I simply made autovacuum_do_vac_analyze() > > make the per table vacuum memory context and pass that to vacuum(). So > > we have the same amount of memory context granularity as before. > > > > Attached patchset has some kind of isolation test failure due to a hard > > deadlock that I haven't figured out yet. I thought it was something with > > the "in_vacuum" static variable and having VACUUM or ANALYZE called when > > already in VACUUM or ANALYZE, but that variable is the same as in > > master. > > > > I've mostly shared it because I want to know if this approach is worth > > pursuing or not. > > Figured out how to fix the issue, though I'm not sure I understand how > the issue can occur. > use_own_xacts seems like it will always be true for autovacuum when it > calls vacuum() and ExecVacuum() only calls vacuum() once, so I thought > that I could make use_own_xacts a parameter to vacuum() and push up its > calculation for VACUUM and ANALYZE into ExecVacuum(). > This caused a deadlock, so there must be a way that in_vacuum is false > but vacuum() is called in a nested context. > Anyway, recalculating it every time in vacuum() fixes it. > > Attached is a v12 of the whole vacuum_buffer_usage_limit patch set which > includes a commit to fix the bug in master and a commit to move relevant > code from vacuum() up into ExecVacuum(). > > The logic I suggested earlier for fixing the bug was...not right. > Attached fix should be right? David had already pushed a fix, so the patchset had merge conflicts. Attached v13 should work. - Melanie
Attachment
On Thu, 6 Apr 2023 at 13:14, David Rowley <dgrowleyml@gmail.com> wrote: > Is it intended that VACUUM t1,t2; now share the same strategy? > Currently, in master, we'll allocate a new strategy for t2 after > vacuuming t1. Does this not mean we'll now leave fewer t1 pages in > shared_buffers because the reuse of the strategy will force them out > with t2 pages? I understand there's nothing particularly invalid > about that, but it is a change in behaviour that the patch seems to be > making with very little consideration as to if it's better or worse. Actually, never mind that. I'm wrong. The same strategy is used for both tables before and after this change. I stumbled on thinking vacuum() was being called in a loop from ExecVacuum() rather than it just passing all of the relations to vacuum() and the loop being done inside vacuum(), which is does. David
Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT option. I haven't included a test for VACUUM (BUFFER_USAGE_LIMIT x, PARALLEL x) for the reason I mentioned upthread -- even if we force it to actually do the parallel vacuuming, we are adding exercising the code where parallel vacuum workers make their own buffer access strategy rings but not really adding a test that will fail usefully. If something is wrong with the configurability of the buffer access strategy object, I don't see how it will break differently in parallel vacuum workers vs regular vacuum. - Melanie
Attachment
second On Thu, 6 Apr 2023 at 14:14, Melanie Plageman <melanieplageman@gmail.com> wrote: > > Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT > option. I've spent quite a bit of time looking at this since you sent it. I've also made quite a few changes, mostly cosmetic ones, but there are a few things below which are more fundamental. 1. I don't really think we need to support VACUUM (BUFFER_USAGE_LIMIT -1); It's just the same as VACUUM; Removing that makes the documentation more simple. 2. I don't think we really need to allow vacuum_buffer_usage_limit = -1. I think we can just set this to 256 and leave it. If we allow -1 then we need to document what -1 means. The more I think about it, the more strange it seems to allow -1. I can't quite imagine work_mem = -1 means 4MB. Why 4MB? Changing this means we don't really need to do anything special in: + if (VacuumBufferUsageLimit > -1) + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); + else + bstrategy = GetAccessStrategy(BAS_VACUUM); That simply becomes: bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); The code inside GetAccessStrategyWithSize() handles returning NULL when the GUC is zero. The equivalent in ExecVacuum() becomes: if (ring_size > -1) bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size); else bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); instead of: if (ring_size > -1) bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size); else if (VacuumBufferUsageLimit > -1) bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); else bstrategy = GetAccessStrategy(BAS_VACUUM); 3. There was a bug in GetAccessStrategyBufferCount() (I renamed it to that from StrategyGetBufferCount()) that didn't handle NULL input. The problem was that if you set vacuum_buffer_usage_limit = 0 then did a parallel vacuum that GetAccessStrategyWithSize() would return NULL due to the 0 buffer input, but GetAccessStrategyBufferCount() couldn't handle NULL. I've adjusted GetAccessStrategyBufferCount() just to return 0 on NULL input. Most of the rest is cosmetic. GetAccessStrategyWithSize() ended up looking quite different. I didn't see the sense in converting the shared_buffer size into kilobytes to compare when we could just convert ring_size_kb to buffers slightly sooner and then just do: /* Cap to 1/8th of shared_buffers */ ring_buffers = Min(NBuffers / 8, ring_buffers); I renamed nbuffers to ring_buffers as it was a little too confusing to have nbuffers (for ring size) and NBuffers (for shared_buffers). A few other changes like getting rid of the regression test and code check for VACUUM (ONLY_DATABASE_STATS, BUFFER_USAGE_LIMIT 0); There is already an if check and ERROR that looks for ONLY_DATABASE_STATS with any other option slightly later in the function. I also got rid of the documentation that mentioned that wasn't supported as there's already a mention in the ONLY_DATABASE_STATS which says it's not supported with anything else. No other option seemed to care enough to mention it, so I don't think BUFFER_USAGE_LIMIT is an exception. I've attached v15. I've only glanced at the vacuumdb patch so far. I'm not expecting it to be too controversial. I'm fairly happy with v15 now but would welcome anyone who wants to have a look in the next 8 hours or so, else I plan to push it. David
Attachment
On Thu, Apr 06, 2023 at 11:34:44PM +1200, David Rowley wrote: > second On Thu, 6 Apr 2023 at 14:14, Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT > > option. > > I've spent quite a bit of time looking at this since you sent it. I've > also made quite a few changes, mostly cosmetic ones, but there are a > few things below which are more fundamental. > > 1. I don't really think we need to support VACUUM (BUFFER_USAGE_LIMIT > -1); It's just the same as VACUUM; Removing that makes the > documentation more simple. Agreed. > 2. I don't think we really need to allow vacuum_buffer_usage_limit = > -1. I think we can just set this to 256 and leave it. If we allow -1 > then we need to document what -1 means. The more I think about it, the > more strange it seems to allow -1. I can't quite imagine work_mem = -1 > means 4MB. Why 4MB? Agreed. > 3. There was a bug in GetAccessStrategyBufferCount() (I renamed it to > that from StrategyGetBufferCount()) that didn't handle NULL input. The > problem was that if you set vacuum_buffer_usage_limit = 0 then did a > parallel vacuum that GetAccessStrategyWithSize() would return NULL due > to the 0 buffer input, but GetAccessStrategyBufferCount() couldn't > handle NULL. I've adjusted GetAccessStrategyBufferCount() just to > return 0 on NULL input. Good catch. > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index bcc49aec45..c421da348d 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -2001,6 +2001,35 @@ include_dir 'conf.d' > </listitem> > </varlistentry> > > + <varlistentry id="guc-vacuum-buffer-usage-limit" xreflabel="vacuum_buffer_usage_limit"> > + <term> > + <varname>vacuum_buffer_usage_limit</varname> (<type>integer</type>) > + <indexterm> > + <primary><varname>vacuum_buffer_usage_limit</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the size of <varname>shared_buffers</varname> to be reused > + for each backend participating in a given invocation of > + <command>VACUUM</command> or <command>ANALYZE</command> or in > + autovacuum. This size is converted to the number of shared buffers > + which will be reused as part of a > + <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>. > + A setting of <literal>0</literal> will allow the operation to use any > + number of <varname>shared_buffers</varname>. The maximum size is > + <literal>16 GB</literal> and the minimum size is > + <literal>128 KB</literal>. If the specified size would exceed 1/8 the > + size of <varname>shared_buffers</varname>, it is silently capped to > + that value. The default value is <literal>-1</literal>. If this This still says that the default value is -1. > diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml > index b6d30b5764..0b02d9faef 100644 > --- a/doc/src/sgml/ref/vacuum.sgml > +++ b/doc/src/sgml/ref/vacuum.sgml > @@ -345,6 +346,24 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet > </listitem> > </varlistentry> > > + <varlistentry> > + <term><literal>BUFFER_USAGE_LIMIT</literal></term> > + <listitem> > + <para> > + Specifies the > + <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm> > + ring buffer size for <command>VACUUM</command>. This size is used to > + calculate the number of shared buffers which will be reused as part of > + this strategy. <literal>0</literal> disables use of a > + <literal>Buffer Access Strategy</literal>. If <option>ANALYZE</option> > + is also specified, the <option>BUFFER_USAGE_LIMIT</option> value is used > + for both the vacuum and analyze stages. This option can't be used with > + the <option>FULL</option> option except if <option>ANALYZE</option> is > + also specified. > + </para> > + </listitem> > + </varlistentry> I noticed you seemed to have removed the reference to the GUC vacuum_buffer_usage_limit here. Was that intentional? We may not need to mention "falling back" as I did before, however, the GUC docs mention max/min values and such, which might be useful. > diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c > index ea1d8960f4..e92738c7f0 100644 > --- a/src/backend/commands/vacuum.c > +++ b/src/backend/commands/vacuum.c > @@ -56,6 +56,7 @@ > #include "utils/acl.h" > #include "utils/fmgroids.h" > #include "utils/guc.h" > +#include "utils/guc_hooks.h" > #include "utils/memutils.h" > #include "utils/pg_rusage.h" > #include "utils/snapmgr.h" > @@ -95,6 +96,30 @@ static VacOptValue get_vacoptval_from_boolean(DefElem *def); > static bool vac_tid_reaped(ItemPointer itemptr, void *state); > static int vac_cmp_itemptr(const void *left, const void *right); > > +/* > + * GUC check function to ensure GUC value specified is within the allowable > + * range. > + */ > +bool > +check_vacuum_buffer_usage_limit(int *newval, void **extra, > + GucSource source) > +{ > + /* Allow specifying the default or disabling Buffer Access Strategy */ > + if (*newval == -1 || *newval == 0) > + return true; This should not check for -1 since that isn't the default anymore. It should only need to check for 0 I think? > + /* Value upper and lower hard limits are inclusive */ > + if (*newval >= MIN_BAS_VAC_RING_SIZE_KB && > + *newval <= MAX_BAS_VAC_RING_SIZE_KB) > + return true; > + > + /* Value does not fall within any allowable range */ > + GUC_check_errdetail("\"vacuum_buffer_usage_limit\" must be -1, 0 or between %d KB and %d KB", > + MIN_BAS_VAC_RING_SIZE_KB, MAX_BAS_VAC_RING_SIZE_KB); Also remove -1 here. > * Primary entry point for manual VACUUM and ANALYZE commands > * > @@ -114,6 +139,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > bool disable_page_skipping = false; > bool process_main = true; > bool process_toast = true; > + /* by default use buffer access strategy with default size */ > + int ring_size = -1; We need to update this comment to something like, "use an invalid value for ring_size" so it is clear whether or not the BUFFER_USAGE_LIMIT was specified when making the access strategy later". Also, I think just removing the comment would be okay, because this is the normal behavior for initializing values, I think. > @@ -240,6 +309,17 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("VACUUM FULL cannot be performed in parallel"))); > > + /* > + * BUFFER_USAGE_LIMIT does nothing for VACUUM (FULL) so just raise an > + * ERROR for that case. VACUUM (FULL, ANALYZE) does make use of it, so > + * we'll permit that. > + */ > + if ((params.options & VACOPT_FULL) && !(params.options & VACOPT_ANALYZE) && > + ring_size > -1) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("BUFFER_USAGE_LIMIT cannot be specified for VACUUM FULL"))); > + > /* > * Make sure VACOPT_ANALYZE is specified if any column lists are present. > */ > @@ -341,7 +421,20 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > > MemoryContext old_context = MemoryContextSwitchTo(vac_context); > > - bstrategy = GetAccessStrategy(BAS_VACUUM); Is it worth moving this assert up above when we do the "sanity checking" for VACUUM FULL with BUFFER_USAGE_LIMIT? > + Assert(ring_size >= -1); > + /* > + * If BUFFER_USAGE_LIMIT was specified by the VACUUM command, that > + * overrides the value of VacuumBufferUsageLimit. Otherwise, use > + * VacuumBufferUsageLimit to define the size, which might be 0. We > + * expliot that calling GetAccessStrategyWithSize with a zero size s/expliot/exploit I might rephrase the last sentence(s). Since it overrides it, I think it is clear that if it is not specified, then the thing it overrides is used. Then you could phrase the whole thing like: "If BUFFER_USAGE_LIMIT was specified by the VACUUM or ANALYZE command, it overrides the value of VacuumBufferUsageLimit. Either value may be 0, in which case GetAccessStrategyWithSize() will return NULL, which is the expected behavior." > + * returns NULL. > + */ > + if (ring_size > -1) > + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size); > + else > + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); > + > MemoryContextSwitchTo(old_context); > } > diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c > index c1e911b1b3..1b5f779384 100644 > --- a/src/backend/postmaster/autovacuum.c > +++ b/src/backend/postmaster/autovacuum.c > @@ -2287,11 +2287,21 @@ do_autovacuum(void) > /* > - * Create a buffer access strategy object for VACUUM to use. We want to > - * use the same one across all the vacuum operations we perform, since the > - * point is for VACUUM not to blow out the shared cache. > + * Optionally, create a buffer access strategy object for VACUUM to use. > + * When creating one, we want the same one across all tables being > + * vacuumed this helps prevent autovacuum from blowing out shared buffers. "When creating one" is a bit awkward. I would say something like "Use the same BufferAccessStrategy object for all tables VACUUMed by this worker to prevent autovacuum from blowing out shared buffers." > diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c > index f122709fbe..710b05cbc5 100644 > --- a/src/backend/storage/buffer/freelist.c > +++ b/src/backend/storage/buffer/freelist.c > +/* > + * GetAccessStrategyWithSize -- create a BufferAccessStrategy object with a > + * number of buffers equivalent to the passed in size. > + * > + * If the given ring size is 0, no BufferAccessStrategy will be created and > + * the function will return NULL. The ring size may not be negative. > + */ > +BufferAccessStrategy > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size_kb) > +{ > + int ring_buffers; > + BufferAccessStrategy strategy; > + > + Assert(ring_size_kb >= 0); > + > + /* Figure out how many buffers ring_size_kb is */ > + ring_buffers = ring_size_kb / (BLCKSZ / 1024); Is there any BLCKSZ that could end up rounding down to 0 and resulting in a divide by 0? > diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c > index 1b1d814254..011ec18015 100644 > --- a/src/backend/utils/init/globals.c > +++ b/src/backend/utils/init/globals.c > @@ -139,7 +139,10 @@ int max_worker_processes = 8; > int max_parallel_workers = 8; > int MaxBackends = 0; > > -int VacuumCostPageHit = 1; /* GUC parameters for vacuum */ > +/* GUC parameters for vacuum */ > +int VacuumBufferUsageLimit = 256; So, I know we agreed to make it camel cased, but I couldn't help mentioning the discussion over in [1] in which Sawada-san says: > In vacuum.c, we use snake case for GUC parameters and camel case for > other global variables Our variable doesn't have a corresponding global that is not a GUC, and the current pattern is hardly consistent. But, I know we are discussing following this convention, so I thought I would mention it. > diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h > index 06a86f9ac1..d4f9ff8077 100644 > --- a/src/include/miscadmin.h > +++ b/src/include/miscadmin.h > @@ -263,6 +263,18 @@ extern PGDLLIMPORT double hash_mem_multiplier; > extern PGDLLIMPORT int maintenance_work_mem; > extern PGDLLIMPORT int max_parallel_maintenance_workers; > GUC name mentioned in comment is inconsistent with current GUC name. > +/* > + * Upper and lower hard limits for the buffer access strategy ring size > + * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT > + * option to VACUUM and ANALYZE. > + */ > +#define MIN_BAS_VAC_RING_SIZE_KB 128 > +#define MAX_BAS_VAC_RING_SIZE_KB (16 * 1024 * 1024) > diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql > index a1fad43657..d23e1a8ced 100644 > --- a/src/test/regress/sql/vacuum.sql > +++ b/src/test/regress/sql/vacuum.sql > @@ -272,6 +272,18 @@ SELECT t.relfilenode = :toast_filenode AS is_same_toast_filenode > FROM pg_class c, pg_class t > WHERE c.reltoastrelid = t.oid AND c.relname = 'vac_option_tab'; > > +-- BUFFER_USAGE_LIMIT option > +VACUUM (BUFFER_USAGE_LIMIT '512 kB') vac_option_tab; Is it worth adding a VACUUM (BUFFER_USAGE_LIMIT 0) vac_option_tab test? - Melanie [1] https://www.postgresql.org/message-id/CAD21AoC5aDwARiqsL%2BKwHqnN7phub9AaMkbGkJ9aUCeETx8esw%40mail.gmail.com
On Fri, 7 Apr 2023 at 05:20, Melanie Plageman <melanieplageman@gmail.com> wrote: > > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > This still says that the default value is -1. Oops, I had this staged but didn't commit and formed the patch with "git diff master.." instead of "git diff master", so missed a few staged changes. > > diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml > I noticed you seemed to have removed the reference to the GUC > vacuum_buffer_usage_limit here. Was that intentional? > We may not need to mention "falling back" as I did before, however, the > GUC docs mention max/min values and such, which might be useful. Unintentional. I removed it when removing the -1 stuff. It's useful to keep something about the fallback, so I put that part back. > > + /* Allow specifying the default or disabling Buffer Access Strategy */ > > + if (*newval == -1 || *newval == 0) > > + return true; > > This should not check for -1 since that isn't the default anymore. > It should only need to check for 0 I think? Thanks. That one was one of the staged fixes. > > + /* Value upper and lower hard limits are inclusive */ > > + if (*newval >= MIN_BAS_VAC_RING_SIZE_KB && > > + *newval <= MAX_BAS_VAC_RING_SIZE_KB) > > + return true; > > + > > + /* Value does not fall within any allowable range */ > > + GUC_check_errdetail("\"vacuum_buffer_usage_limit\" must be -1, 0 or between %d KB and %d KB", > > + MIN_BAS_VAC_RING_SIZE_KB, MAX_BAS_VAC_RING_SIZE_KB); > > Also remove -1 here. And this one. > > * Primary entry point for manual VACUUM and ANALYZE commands > > * > > @@ -114,6 +139,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > > bool disable_page_skipping = false; > > bool process_main = true; > > bool process_toast = true; > > + /* by default use buffer access strategy with default size */ > > + int ring_size = -1; > > We need to update this comment to something like, "use an invalid value > for ring_size" so it is clear whether or not the BUFFER_USAGE_LIMIT was > specified when making the access strategy later". Also, I think just > removing the comment would be okay, because this is the normal behavior > for initializing values, I think. Yeah, I've moved the assignment away from the declaration and wrote something along those lines. > > @@ -240,6 +309,17 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > errmsg("VACUUM FULL cannot be performed in parallel"))); > > > > + /* > > + * BUFFER_USAGE_LIMIT does nothing for VACUUM (FULL) so just raise an > > + * ERROR for that case. VACUUM (FULL, ANALYZE) does make use of it, so > > + * we'll permit that. > > + */ > > + if ((params.options & VACOPT_FULL) && !(params.options & VACOPT_ANALYZE) && > > + ring_size > -1) > > + ereport(ERROR, > > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > + errmsg("BUFFER_USAGE_LIMIT cannot be specified for VACUUM FULL"))); > > + > > /* > > * Make sure VACOPT_ANALYZE is specified if any column lists are present. > > */ > > @@ -341,7 +421,20 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > > > > MemoryContext old_context = MemoryContextSwitchTo(vac_context); > > > > - bstrategy = GetAccessStrategy(BAS_VACUUM); > > Is it worth moving this assert up above when we do the "sanity checking" > for VACUUM FULL with BUFFER_USAGE_LIMIT? I didn't do this, but I did adjust that check to check ring_size != -1 and put that as the first condition. It's likely more rare to have ring_size not set to -1, so probably should check that first. > s/expliot/exploit oops > I might rephrase the last sentence(s). Since it overrides it, I think it > is clear that if it is not specified, then the thing it overrides is > used. Then you could phrase the whole thing like: > > "If BUFFER_USAGE_LIMIT was specified by the VACUUM or ANALYZE command, > it overrides the value of VacuumBufferUsageLimit. Either value may be > 0, in which case GetAccessStrategyWithSize() will return NULL, which is > the expected behavior." Fixed. > > diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c > > index c1e911b1b3..1b5f779384 100644 > > --- a/src/backend/postmaster/autovacuum.c > > +++ b/src/backend/postmaster/autovacuum.c > > @@ -2287,11 +2287,21 @@ do_autovacuum(void) > > /* > > - * Create a buffer access strategy object for VACUUM to use. We want to > > - * use the same one across all the vacuum operations we perform, since the > > - * point is for VACUUM not to blow out the shared cache. > > + * Optionally, create a buffer access strategy object for VACUUM to use. > > + * When creating one, we want the same one across all tables being > > + * vacuumed this helps prevent autovacuum from blowing out shared buffers. > > "When creating one" is a bit awkward. I would say something like "Use > the same BufferAccessStrategy object for all tables VACUUMed by this > worker to prevent autovacuum from blowing out shared buffers." Adjusted > > + /* Figure out how many buffers ring_size_kb is */ > > + ring_buffers = ring_size_kb / (BLCKSZ / 1024); > > Is there any BLCKSZ that could end up rounding down to 0 and resulting > in a divide by 0? I removed that Assert() as I found quite a number of other places in our code that assume BLCKSZ / 1024 is never 0. > So, I know we agreed to make it camel cased, but I couldn't help > mentioning the discussion over in [1] in which Sawada-san says: I didn't change anything here. I'm happy to follow any rules about this once they're defined. What we have today is horribly inconsistent. > GUC name mentioned in comment is inconsistent with current GUC name. > > > +/* > > + * Upper and lower hard limits for the buffer access strategy ring size > > + * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT > > + * option to VACUUM and ANALYZE. I did adjust this. I wasn't sure it was incorrect as I mentioned "GUC" as in, the user facing setting. > Is it worth adding a VACUUM (BUFFER_USAGE_LIMIT 0) vac_option_tab test? I think so. I've attached v16. David
Attachment
+VACUUM uses a ring like sequential scans, however, the size this ring +controlled by the vacuum_buffer_usage_limit GUC. Dirty pages are not removed should say: ".. the size OF this ring IS .." ?
On Fri, Apr 07, 2023 at 09:12:32AM +1200, David Rowley wrote: > On Fri, 7 Apr 2023 at 05:20, Melanie Plageman <melanieplageman@gmail.com> wrote: > > GUC name mentioned in comment is inconsistent with current GUC name. > > > > > +/* > > > + * Upper and lower hard limits for the buffer access strategy ring size > > > + * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT > > > + * option to VACUUM and ANALYZE. > > I did adjust this. I wasn't sure it was incorrect as I mentioned "GUC" > as in, the user facing setting. Oh, actually maybe you are right then. > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index bcc49aec45..220f9ee84c 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -2001,6 +2001,35 @@ include_dir 'conf.d' > </listitem> > </varlistentry> > > + <varlistentry id="guc-vacuum-buffer-usage-limit" xreflabel="vacuum_buffer_usage_limit"> > + <term> > + <varname>vacuum_buffer_usage_limit</varname> (<type>integer</type>) > + <indexterm> > + <primary><varname>vacuum_buffer_usage_limit</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the size of <varname>shared_buffers</varname> to be reused > + for each backend participating in a given invocation of > + <command>VACUUM</command> or <command>ANALYZE</command> or in > + autovacuum. Rereading this, I think it is not a good sentence (my fault). Perhaps we should use the same language as the BUFFER_USAGE_LIMIT options. Something like: Specifies the <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm> ring buffer size used by each backend participating in a given invocation of <command>VACUUM</command> or <command>ANALYZE</command> or in autovacuum. Last part is admittedly a bit awkward... > diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c > index ea1d8960f4..995b4bd54a 100644 > --- a/src/backend/commands/vacuum.c > +++ b/src/backend/commands/vacuum.c > @@ -56,6 +56,7 @@ > #include "utils/acl.h" > #include "utils/fmgroids.h" > #include "utils/guc.h" > +#include "utils/guc_hooks.h" > #include "utils/memutils.h" > #include "utils/pg_rusage.h" > #include "utils/snapmgr.h" > @@ -95,6 +96,26 @@ static VacOptValue get_vacoptval_from_boolean(DefElem *def); > static bool vac_tid_reaped(ItemPointer itemptr, void *state); > static int vac_cmp_itemptr(const void *left, const void *right); > > +/* > + * GUC check function to ensure GUC value specified is within the allowable > + * range. > + */ > +bool > +check_vacuum_buffer_usage_limit(int *newval, void **extra, > + GucSource source) > +{ I'm not so hot on this comment. It seems very...generic. Like it could be the comment on any GUC check function. I'm also okay with leaving it as is. > @@ -341,7 +422,19 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) > > MemoryContext old_context = MemoryContextSwitchTo(vac_context); > > - bstrategy = GetAccessStrategy(BAS_VACUUM); > + Assert(ring_size >= -1); > + > + /* > + * If BUFFER_USAGE_LIMIT was specified by the VACUUM or ANALYZE > + * command, it overrides the value of VacuumBufferUsageLimit. Either > + * value may be 0, in which case GetAccessStrategyWithSize() will > + * return NULL, effectively allowing full use of shared buffers. Maybe "unlimited" is better than "full"? > + */ > + if (ring_size != -1) > + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size); > + else > + bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit); > + > MemoryContextSwitchTo(old_context); > } > > diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c > @@ -365,6 +371,9 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes, > maintenance_work_mem / Min(parallel_workers, nindexes_mwm) : > maintenance_work_mem; > > + /* Use the same buffer size for all workers */ I would say ring buffer size -- this sounds like it is the size of a single buffer. > + shared->ring_nbuffers = GetAccessStrategyBufferCount(bstrategy); > + > pg_atomic_init_u32(&(shared->cost_balance), 0); > pg_atomic_init_u32(&(shared->active_nworkers), 0); > pg_atomic_init_u32(&(shared->idx), 0); > + * Upper and lower hard limits for the buffer access strategy ring size > + * specified by the VacuumBufferUsageLimit GUC and BUFFER_USAGE_LIMIT option I agree with your original usage of the actual GUC name, now that I realize why you were doing it and am rereading it. > + * to VACUUM and ANALYZE. > + */ > +#define MIN_BAS_VAC_RING_SIZE_KB 128 > +#define MAX_BAS_VAC_RING_SIZE_KB (16 * 1024 * 1024) Otherwise, LGTM.
On Fri, 7 Apr 2023 at 09:44, Melanie Plageman <melanieplageman@gmail.com> wrote: > Otherwise, LGTM. Thanks for looking. I've also taken Justin's comments about the README into account and fixed that part. I've pushed the patch after a little more adjusting. I added some text to the docs that mention larger VACUUM_BUFFER_LIMITs can speed up vacuum and also a reason why they might not want to go nuts with it. I've also just now pushed the vacuumdb patch too. I ended up adjusting some of the ERROR messages in the main patch after the following not so nice user experience: $ vacuumdb --buffer-usage-limit=1TB --analyze postgres vacuumdb: vacuuming database "postgres" SQL: VACUUM (SKIP_DATABASE_STATS, ANALYZE, BUFFER_USAGE_LIMIT '1TB') vacuumdb: error: processing of database "postgres" failed: ERROR: buffer_usage_limit option must be 0 or between 128 KB and 16777216 KB $ vacuumdb --buffer-usage-limit=128KB --analyze postgres vacuumdb: vacuuming database "postgres" SQL: VACUUM (SKIP_DATABASE_STATS, ANALYZE, BUFFER_USAGE_LIMIT '128KB') vacuumdb: error: processing of database "postgres" failed: ERROR: value: "128KB": is invalid for buffer_usage_limit HINT: Valid units for this parameter are "B", "kB", "MB", "GB", and "TB". David
On 07.04.23 02:52, David Rowley wrote: > On Fri, 7 Apr 2023 at 09:44, Melanie Plageman <melanieplageman@gmail.com> wrote: >> Otherwise, LGTM. > > Thanks for looking. I've also taken Justin's comments about the > README into account and fixed that part. > > I've pushed the patch after a little more adjusting. I added some > text to the docs that mention larger VACUUM_BUFFER_LIMITs can speed up > vacuum and also a reason why they might not want to go nuts with it. I came across these new options and had a little bit of trouble figuring them out from the documentation. Maybe this could be polished a bit. vacuumdb --help says --buffer-usage-limit=BUFSIZE I can guess what a "SIZE" might be, but is "BUFSIZE" different from a "SIZE"? Maybe simplify here. On the vacuumdb man page, the placeholder is <replaceable class="parameter">buffer_usage_limit</replaceable> which is yet another way of phrasing it. Maybe also use "size" here? The VACUUM man page says BUFFER_USAGE_LIMIT [ <replaceable ...>string</replaceable> ] which had me really confused. The detailed description later doesn't give any further explanation of possible values, except that <literal>0</literal> is apparently a possible value, which in my mind is not a string. Then there is a link to guc-vacuum-buffer-usage-limit, which lifts the mystery that this is really just an integer setting with possible memory-size units, but it was really hard to figure that out from the start! Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it explains the different kinds of accepted values, and "string" wasn't added there. Maybe also change this to "size" here and add an explanation there what kinds of sizes are possible. Finally, the locations of the new options in the various documentation places seems a bit random. The vacuumdb --help output and the man page appear to be mostly alphabetical, so --buffer-usage-limit should be after -a/--all. (Also note that right now the option isn't even in the same place in the --help output versus the man page.) The order of the options on the VACUUM man page doesn't make any sense anymore. This isn't really the fault of this patch, but maybe it's time to do a fresh reordering there.
On Fri, 14 Apr 2023 at 19:20, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > I came across these new options and had a little bit of trouble figuring > them out from the documentation. Maybe this could be polished a bit. > > vacuumdb --help says > > --buffer-usage-limit=BUFSIZE > > I can guess what a "SIZE" might be, but is "BUFSIZE" different from a > "SIZE"? Maybe simplify here. > > On the vacuumdb man page, the placeholder is > > <replaceable class="parameter">buffer_usage_limit</replaceable> > > which is yet another way of phrasing it. Maybe also use "size" here? > > The VACUUM man page says > > BUFFER_USAGE_LIMIT [ <replaceable ...>string</replaceable> ] > > which had me really confused. The detailed description later doesn't > give any further explanation of possible values, except that > <literal>0</literal> is apparently a possible value, which in my mind is > not a string. Then there is a link to guc-vacuum-buffer-usage-limit, > which lifts the mystery that this is really just an integer setting with > possible memory-size units, but it was really hard to figure that out > from the start! > > Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it > explains the different kinds of accepted values, and "string" wasn't > added there. Maybe also change this to "size" here and add an > explanation there what kinds of sizes are possible. > > Finally, the locations of the new options in the various documentation > places seems a bit random. The vacuumdb --help output and the man page > appear to be mostly alphabetical, so --buffer-usage-limit should be > after -a/--all. (Also note that right now the option isn't even in the > same place in the --help output versus the man page.) These are all valid points. I've attached a patch aiming to address each of them. > The order of the options on the VACUUM man page doesn't make any sense > anymore. This isn't really the fault of this patch, but maybe it's time > to do a fresh reordering there. Agreed, that likely wasn't a big problem say about 5 years ago when we had far fewer options, but the number has grown quite a bit since then. Right after I fix the points you've mentioned seems a good time to address that. David
Attachment
On Sat, 15 Apr 2023 at 12:59, David Rowley <dgrowleyml@gmail.com> wrote: > These are all valid points. I've attached a patch aiming to address > each of them. I tweaked this a little further and pushed it. David
On Sat, Apr 15, 2023 at 12:59:52PM +1200, David Rowley wrote: > On Fri, 14 Apr 2023 at 19:20, Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: > > > > I came across these new options and had a little bit of trouble figuring > > them out from the documentation. Maybe this could be polished a bit. > > > > vacuumdb --help says > > > > --buffer-usage-limit=BUFSIZE > > > > I can guess what a "SIZE" might be, but is "BUFSIZE" different from a > > "SIZE"? Maybe simplify here. > > > > On the vacuumdb man page, the placeholder is > > > > <replaceable class="parameter">buffer_usage_limit</replaceable> > > > > which is yet another way of phrasing it. Maybe also use "size" here? > > > > The VACUUM man page says > > > > BUFFER_USAGE_LIMIT [ <replaceable ...>string</replaceable> ] > > > > which had me really confused. The detailed description later doesn't > > give any further explanation of possible values, except that > > <literal>0</literal> is apparently a possible value, which in my mind is > > not a string. Then there is a link to guc-vacuum-buffer-usage-limit, > > which lifts the mystery that this is really just an integer setting with > > possible memory-size units, but it was really hard to figure that out > > from the start! > > > > Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it > > explains the different kinds of accepted values, and "string" wasn't > > added there. Maybe also change this to "size" here and add an > > explanation there what kinds of sizes are possible. > > > > Finally, the locations of the new options in the various documentation > > places seems a bit random. The vacuumdb --help output and the man page > > appear to be mostly alphabetical, so --buffer-usage-limit should be > > after -a/--all. (Also note that right now the option isn't even in the > > same place in the --help output versus the man page.) > > These are all valid points. I've attached a patch aiming to address > each of them. I like that we are now using "size" consistently instead of bufsize etc. > > > The order of the options on the VACUUM man page doesn't make any sense > > anymore. This isn't really the fault of this patch, but maybe it's time > > to do a fresh reordering there. > > Agreed, that likely wasn't a big problem say about 5 years ago when we > had far fewer options, but the number has grown quite a bit since > then. > > Right after I fix the points you've mentioned seems a good time to address that. Are we still thinking that reordering the VACUUM (and ANALYZE) options makes sense. And, if so, should it be alphabetical within parameter category? That is, all actual parameters (e.g. FULL and FREEZE) are alphabetically organized first followed by all parameter types (e.g. boolean and size) alphabetically listed? - Melanie
On Tue, 18 Apr 2023 at 09:21, Melanie Plageman <melanieplageman@gmail.com> wrote: > Are we still thinking that reordering the VACUUM (and ANALYZE) options > makes sense. And, if so, should it be alphabetical within parameter > category? That is, all actual parameters (e.g. FULL and FREEZE) are > alphabetically organized first followed by all parameter types (e.g. > boolean and size) alphabetically listed? I've opened a thread for that [1]. David [1] https://postgr.es/m/CAApHDvo1eWbt5PVpk0G=yCbBNgLU7KaRP6dCBHpNbFaBjyGyQA@mail.gmail.com
Hi, On Sun, Apr 16, 2023 at 9:09 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 15 Apr 2023 at 12:59, David Rowley <dgrowleyml@gmail.com> wrote: > > These are all valid points. I've attached a patch aiming to address > > each of them. > > I tweaked this a little further and pushed it. > I realized that the value of vacuum_buffer_usage_limit parameter in postgresql.conf.sample doesn't have the unit: #vacuum_buffer_usage_limit = 256 # size of vacuum and analyze buffer access strategy ring. # 0 to disable vacuum buffer access strategy # range 128kB to 16GB It works but I think we might want to add the unit kB for understandability and consistency with other GUC_UNIT_KB parameters. I've attached a small patch that adds the unit and aligns the indent of the comments to the perimeter parameters. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada, <sawada.mshk@gmail.com> wrote:
I realized that the value of vacuum_buffer_usage_limit parameter in
postgresql.conf.sample doesn't have the unit:
#vacuum_buffer_usage_limit = 256 # size of vacuum and analyze buffer
access strategy ring.
# 0 to disable vacuum buffer access strategy
# range 128kB to 16GB
It works but I think we might want to add the unit kB for
understandability and consistency with other GUC_UNIT_KB parameters.
I've attached a small patch that adds the unit and aligns the indent
of the comments to the perimeter parameters.
I'm not currently able to check, but if work_mem has a unit in the sample conf then I agree that vacuum_buffer_usage_limit should too.
I'm fine for you to go ahead and adjust this, otherwise it'll be Monday before I can.
David
> On 26 Apr 2023, at 13:26, David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada, <sawada.mshk@gmail.com <mailto:sawada.mshk@gmail.com>> wrote: > It works but I think we might want to add the unit kB for > understandability and consistency with other GUC_UNIT_KB parameters. > I've attached a small patch that adds the unit and aligns the indent > of the comments to the perimeter parameters. > > I'm not currently able to check, but if work_mem has a unit in the sample conf then I agree that vacuum_buffer_usage_limitshould too. +1 work_mem and all other related options in this section has a unit in the sample conf so adding this makes sense. -- Daniel Gustafsson
On Wed, Apr 26, 2023 at 8:31 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 26 Apr 2023, at 13:26, David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada, <sawada.mshk@gmail.com <mailto:sawada.mshk@gmail.com>> wrote:
> It works but I think we might want to add the unit kB for
> understandability and consistency with other GUC_UNIT_KB parameters.
> I've attached a small patch that adds the unit and aligns the indent
> of the comments to the perimeter parameters.
>
> I'm not currently able to check, but if work_mem has a unit in the sample conf then I agree that vacuum_buffer_usage_limit should too.
+1 work_mem and all other related options in this section has a unit in the
sample conf so adding this makes sense.
Agreed.
for the patch, the other GUCs have a tab instead of a space between the unit and the "#" of the first comment.
(not the fault of this patch but probably makes sense to fix now).
Otherwise, LGTM
- Melanie
On Wed, Apr 26, 2023 at 9:59 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > > On Wed, Apr 26, 2023 at 8:31 AM Daniel Gustafsson <daniel@yesql.se> wrote: >> >> > On 26 Apr 2023, at 13:26, David Rowley <dgrowleyml@gmail.com> wrote: >> > On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada, <sawada.mshk@gmail.com <mailto:sawada.mshk@gmail.com>> wrote: >> >> > It works but I think we might want to add the unit kB for >> > understandability and consistency with other GUC_UNIT_KB parameters. >> > I've attached a small patch that adds the unit and aligns the indent >> > of the comments to the perimeter parameters. >> > >> > I'm not currently able to check, but if work_mem has a unit in the sample conf then I agree that vacuum_buffer_usage_limitshould too. >> >> +1 work_mem and all other related options in this section has a unit in the >> sample conf so adding this makes sense. > > > Agreed. > for the patch, the other GUCs have a tab instead of a space between the unit and the "#" of the first comment. > (not the fault of this patch but probably makes sense to fix now). > Otherwise, LGTM Thanks for the review! Pushed after incorporating a comment from Melanie. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com