Home > mailing lists

Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From	Nikita Malakhov
Subject	Re: New strategies for freezing, advancing relfrozenxid early
Date	December 16, 2022 10:59:39
Msg-id	CAN-LCVNP5P=0VhnkLwty+xntPB2AOtN3jgzZKgdagJmfY29ScA@mail.gmail.com Whole thread Raw
In response to	Re: New strategies for freezing, advancing relfrozenxid early (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: New strategies for freezing, advancing relfrozenxid early
List	pgsql-hackers

Tree view

Hi!

I've found this discussion very interesting, in view of vacuuming

TOAST tables is always a problem because these tables tend to

bloat very quickly with dead data - just to remind, all TOAST-able

columns of the relation use the same TOAST table which is one

for the relation, and TOASTed data are not updated - there are

only insert and delete operations.

Have you tested it with large and constantly used TOAST tables?

How would it work with the current TOAST implementation?

We propose a different approach to the TOAST mechanics [1],

and a new vacuum would be very promising.

Thank you!

[1] https://commitfest.postgresql.org/41/3490/

On Fri, Dec 16, 2022 at 10:48 AM John Naylor <john.naylor@enterprisedb.com> wrote:

On Wed, Dec 14, 2022 at 6:07 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> At the suggestion of Jeff, I wrote a Wiki page that shows motivating
> examples for the patch series:
>
> https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples
>
> These are all cases where VACUUM currently doesn't do the right thing
> around freezing, in a way that is greatly ameliorated by the patch.
> Perhaps this will help other hackers to understand the motivation
> behind some of these mechanisms. There are plenty of details that only
> make sense in the context of a certain kind of table, with certain
> performance characteristics that the design is sensitive to, and seeks
> to take advantage of in one way or another.

Thanks for this. This is the kind of concrete, data-based evidence that I find much more convincing, or at least easy to reason about. I'd actually recommend in the future to open discussion with this kind of analysis -- even before coding, it's possible to indicate what a design is *intended* to achieve. And reviewers can likewise bring up cases of their own in a concrete fashion.

On Wed, Dec 14, 2022 at 12:16 AM Peter Geoghegan <pg@bowt.ie> wrote:

> At the very least, a given VACUUM operation has to choose its freezing
> strategy based on how it expects the table will look when it's done
> vacuuming the table, and how that will impact the next VACUUM against
> the same table. Without that, then vacuuming an append-only table will
> fall into a pattern of setting pages all-visible in one vacuum, and
> then freezing those same pages all-frozen in the very next vacuum
> because there are too many. Which makes little sense; we're far better
> off freezing the pages at the earliest opportunity instead.

That makes sense, but I wonder if we can actually be more specific: One motivating example mentioned is the append-only table. If we detected that case, which I assume we can because autovacuum_vacuum_insert_* GUCs exist, we could use that information as one way to drive eager freezing independently of size. At least in theory -- it's very possible size will be a necessary part of the decision, but it's less clear that it's as useful as a user-tunable knob.

If we then ignored the append-only case when evaluating a freezing policy, maybe other ideas will fall out. I don't have a well-thought out idea about policy or knobs, but it's worth thinking about.

Aside from that, I've only given the patches a brief reading. Having seen the VM snapshot in practice (under "Scanned pages, visibility map snapshot" in the wiki page), it's neat to see fewer pages being scanned. Prefetching not only seems superior to SKIP_PAGES_THRESHOLD, but anticipates asynchronous IO. Keeping only one VM snapshot page in memory makes perfect sense.

I do have a cosmetic, but broad-reaching, nitpick about terms regarding "skipping strategy". That's phrased as a kind of negative -- what we're *not* doing. Many times I had to pause and compute in my head what we're *doing*, i.e. the "scanning strategy". For example, I wonder if the VM strategies would be easier to read as:

VMSNAP_SKIP_ALL_VISIBLE -> VMSNAP_SCAN_LAZY
VMSNAP_SKIP_ALL_FROZEN -> VMSNAP_SCAN_EAGER
VMSNAP_SKIP_NONE -> VMSNAP_SCAN_ALL

Notice here they're listed in order of increasing eagerness.

--
John Naylor
EDB: http://www.enterprisedb.com

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

pgsql-hackers by date:

From: John Naylor
Date: 16 December 2022, 10:48:17
Subject: Re: New strategies for freezing, advancing relfrozenxid early

From: Pavel Stehule
Date: 16 December 2022, 11:05:24
Subject: Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

Previous

Next