Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New strategies for freezing, advancing relfrozenxid early
Date
Msg-id CAH2-WzkbuGaW61LzAfj=Ge7YcEBYmgyZ6dwTdVDYXGXPt3c2pQ@mail.gmail.com
Whole thread Raw
In response to Re: New strategies for freezing, advancing relfrozenxid early  (Andres Freund <andres@anarazel.de>)
Responses Re: New strategies for freezing, advancing relfrozenxid early  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: New strategies for freezing, advancing relfrozenxid early  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Jan 26, 2023 at 9:53 AM Andres Freund <andres@anarazel.de> wrote:
> I assume the case you're thinking of is that pruning did *not* do any changes,
> but in the process of figuring out that nothing needed to be pruned, we did a
> MarkBufferDirtyHint(), and as part of that emitted an FPI?

Yes.

> > That's going to be very significantly more aggressive. For example
> > it'll impact small tables very differently.
>
> Maybe it would be too aggressive, not sure. The cost of a freeze WAL record is
> relatively small, with one important exception below, if we are 99.99% sure
> that it's not going to require an FPI and isn't going to dirty the page.
>
> The exception is that a newer LSN on the page can cause the ringbuffer
> replacement to trigger more more aggressive WAL flushing. No meaningful
> difference if we modified the page during pruning, or if the page was already
> in s_b (since it likely won't be written out via the ringbuffer in that case),
> but if checksums are off and we just hint-dirtied the page, it could be a
> significant issue.

Most of the overhead of FREEZE WAL records (with freeze plan
deduplication and page-level freezing in) is generic WAL record header
overhead. Your recent adversarial test case is going to choke on that,
too. At least if you set checkpoint_timeout to 1 minute again.

> Thus a modification of the above logic could be to opportunistically freeze if
> a ) it won't cause an FPI and either
> b1) the page was already dirty before pruning, as we'll not do a ringbuffer
>     replacement in that case
> or
> b2) We wrote a WAL record during pruning, as the difference in flush position
>     is marginal
>
> An even more aggressive version would be to replace b1) with logic that'd
> allow newly dirtying the page if it wasn't read through the ringbuffer. But
> newly dirtying the page feels like it'd be more dangerous.

In many cases we'll have to dirty the page anyway, just to set
PD_ALL_VISIBLE. The whole way the logic works is conditioned (whether
triggered by an FPI or triggered by my now-reverted GUC) on being able
to set the whole page all-frozen in the VM.

> A less aggressive version would be to check if any WAL records were emitted
> during heap_page_prune() (instead of FPIs) and whether we'd emit an FPI if we
> modified the page again. Similar to what we do now, except not requiring an
> FPI to have been emitted.

Also way more aggressive. Not nearly enough on its own.

> But to me it seems a bit odd that VACUUM now is more aggressive if checksums /
> wal_log_hint bits is on, than without them. Which I think is how using either
> of pgWalUsage.wal_fpi, pgWalUsage.wal_records ends up working?

Which part is the odd part? Is it odd that page-level freezing works
that way, or is it odd that page-level checksums work that way?

In any case this seems like an odd thing for you to say, having
eviscerated a patch that really just made the same behavior trigger
independently of FPIs in some tables, controlled via a GUC.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: drop postmaster symlink
Next
From: Tomas Vondra
Date:
Subject: Re: Syncrep and improving latency due to WAL throttling