Re: Eagerly scan all-visible pages to amortize aggressive vacuum - Mailing list pgsql-hackers

From Robert Treat
Subject Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date
Msg-id CABV9wwMaCaxAG3Ji9pCXW-LPS0mtXubfW-rLSrAtvqak6Wu=XQ@mail.gmail.com
Whole thread Raw
In response to Re: Eagerly scan all-visible pages to amortize aggressive vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Dec 13, 2024 at 5:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
> > Hi,
>
> Thanks for the review!
> Attached v2 should address your feedback and also fixes a few bugs with v1.
>
> I've still yet to run very long-running benchmarks. I did start running more
> varied benchmark scenarios -- but all still under two hours. So far, the
> behavior is as expected.
>
> > On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
> > > Because we want to amortize our eager scanning across a few vacuums,
> > > we cap the maximum number of successful eager scans to a percentage of
> > > the total number of all-visible but not all-frozen pages in the table
> > > (currently 20%).
> >
> > One thing worth mentioning around here seems that we currently can't
> > partially-aggressively freeze tuples that are "too young" and how that
> > interacts with everything else.
>
> I'm not sure I know what you mean. Are you talking about how we don't freeze
> tuples that are visible to everyone but younger than the freeze limit?
>

FWIW that was my interpretation of his statement, though I had a
clarifying question around this topic myself, which is, from a user
perspective when would we expect to see these eager vacuums? ISTM we
would be doing 'normal vacuums' prior to vacuum_freeze_min_age, and
'aggressive vacuums' after (autovacuum_freeze_max_age -
vacuum_freeze_min_age), so the expectation is that 'eager vacuums'
would fall into the ~50 million transaction window between those two
points (assuming defaults, which admittedly I don't use). Does that
sound right?

> > > In the attached chart.png, you can see the vm_page_freezes climbing
> > > steadily with the patch, whereas on master, there are sudden spikes
> > > aligned with the aggressive vacuums. You can also see that the number
> > > of pages that are all-visible but not all-frozen grows steadily on
> > > master until the aggressive vacuum. This is vacuum's "backlog" of
> > > freezing work.
> >
> > What's the reason for all-visible-but-not-all-frozen to increase to a higher
> > value initially than where it later settles?
>
> My guess is that it has to do with shorter, more frequent vacuums at the
> beginning of the benchmark when the relation is smaller (and we haven't
> exceeded shared buffers or memory yet). They are setting pages all-visible, but
> we haven't used up enough xids yet to qualify for an eager vacuum.
>
> The peak of AVnAF pages aligns with the start of the first eager vacuum. We
> don't do any eager scanning until we are sure there is some data requiring
> freeze (see this criteria):
>
>     if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
>         TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
>                                       vacrel->cutoffs.FreezeLimit))
>
> Once we have used up enough xids to qualify for the first eager vacuum, the
> number of AVnAF pages starts to go down.
>
> It would follow from this theory that we would see a build-up like this after
> each relfrozenxid advancement (so after the next aggressive vacuum).
>
> But I think we don't see this because the vacuums are longer by the time
> aggressive vacuums have started, so we end up using up enough XIDs between
> vacuums to qualify for eager vacuums on vacuums after the aggressive vacuum.
>
> That is just my theory though.
>

I like your theory but it's a little too counterintuitive for me :-)

I would expect we'd see a change in the vacuum time & rate after the
first aggressive scan, which incidentally your graph does show for
master, but it looks a bit too smooth on your original patchset. I
guess there could be a sweet spot where the rate of changes fit
perfectly with regards to the line between lazy / eager vacuums, but
hard to imagine you were that lucky.

> > > Below is the comparative WAL volume, checkpointer and background
> > > writer writes, reads and writes done by all other backend types, time
> > > spent vacuuming in milliseconds, and p99 latency. Notice that overall
> > > vacuum IO time is substantially lower with the patch.
> > >
> > >    version     wal  cptr_bgwriter_w   other_rw  vac_io_time  p99_lat
> > >     patch   770 GB          5903264  235073744   513722         1
> > >     master  767 GB          5908523  216887764  1003654        16
> >
> > Hm. It's not clear to me why other_rw is higher with the patch? After all,
> > given the workload, there's no chance of unnecessarily freezing tuples? Is
> > that just because at the end of the benchmark there's leftover work?
>
> So other_rw is mostly client backend and autovacuum reads and writes. It is
> higher with the patch because there are actually more vacuum reads and writes
> with the patch than on master. However the autovacuum worker read and write
> time is much lower. Those blocks are more often in shared buffers, I would
> guess.
>
<snip>
> From e36b4fac345be44954410c4f0e61467dc0f49a72 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Thu, 12 Dec 2024 16:44:37 -0500
> Subject: [PATCH v2 10/10] Eagerly scan all-visible pages to amortize
> aggressive vacuum
>
> @@ -27,11 +27,37 @@
>  * to the end, skipping pages as permitted by their visibility status, vacuum
>  * options, and the eagerness level of the vacuum.
>  *
> + * There are three vacuum eagerness levels: normal vacuum, eager vacuum, and
> + * aggressive vacuum.
> + *
>  * When page skipping is enabled, non-aggressive vacuums may skip scanning
> - * pages that are marked all-visible in the visibility map. We may choose not
> + * pages that are marked all-visible in the visibility map. It may choose not
>  * to skip pages if the range of skippable pages is below
>  * SKIP_PAGES_THRESHOLD.
>  *

I find the above confusing since page skipping is the regular activity
but referred to in the negative, and because you use the term
"non-aggressive vacuums" which in prior releases only mapped to
"normal" vacuums, but now would map to both "normal" and "eager"
vacuums, and it isn't clear that is desired (in my head anyway). Does
the following still convey what you meant (and hopefully work better
with the paragraphs that follow)?

When page skipping is not disabled, a normal vacuum may skip scanning
pages that are marked all-visible in the visibility map if the range
of skippable pages is below SKIP_PAGES_THRESHOLD.

> + * Eager vacuums will scan skippable pages in an effort to freeze them and
> + * decrease the backlog of all-visible but not all-frozen pages that have to
> + * be processed to advance relfrozenxid and avoid transaction ID wraparound.
> + *

> @@ -170,6 +197,51 @@ typedef enum
>       VACUUM_ERRCB_PHASE_TRUNCATE,
> } VacErrPhase;
>
> +/*
> + * Eager vacuums scan some all-visible but not all-frozen pages. Since our
> + * goal is to freeze these pages, an eager scan that fails to set the page
> + * all-frozen in the VM is considered to have "failed".
> + *
> + * On the assumption that different regions of the table tend to have
> + * similarly aged data, once we fail to freeze EAGER_SCAN_MAX_FAILS_PER_REGION
> + * blocks in a region of size EAGER_SCAN_REGION_SIZE, we suspend eager
> + * scanning until vacuum has progressed to another region of the table with
> + * potentially older data.
> + */
> +#define EAGER_SCAN_REGION_SIZE 4096
> +#define EAGER_SCAN_MAX_FAILS_PER_REGION 128
> +
> +/*
> + * An eager scan of a page that is set all-frozen in the VM is considered
> + * "successful". To spread out eager scanning across multiple eager vacuums,
> + * we limit the number of successful eager page scans. The maximum number of
> + * successful eager page scans is calculated as a ratio of the all-visible but
> + * not all-frozen pages at the beginning of the vacuum.
> + */
> +#define EAGER_SCAN_SUCCESS_RATE 0.2
> +
> +/*
> + * The eagerness level of a vacuum determines how many all-visible but
> + * not all-frozen pages it eagerly scans.
> + *
> + * A normal vacuum (eagerness VAC_NORMAL) scans no all-visible pages (with the
> + * exception of those scanned due to SKIP_PAGES_THRESHOLD).
> + *
> + * An eager vacuum (eagerness VAC_EAGER) scans a number of pages up to a limit
> + * based on whether or not it is succeeding or failing. An eager vacuum is
> + * downgraded to a normal vacuum when it hits its success quota. An aggressive
> + * vacuum cannot be downgraded. No eagerness level is ever upgraded.
> + *

At the risk of being overly nit-picky... eager vacuums scan their
subset of all-visible pages "up to a limit" based solely on the
success ratio. In the case of (excessive) failures, there is no limit
to the number of pages scanned, only a pause in the pages scanned
until the next region.

> + * An aggressive vacuum (eagerness EAGER_FULL) must scan all all-visible but
> + * not all-frozen pages.
> + */

I think the above should be VAC_AGGRESSIVE vs EAGER_FULL, no?

> +typedef enum VacEagerness
> +{
> +     VAC_NORMAL,
> +     VAC_EAGER,
> +     VAC_AGGRESSIVE,
> +} VacEagerness;
> +

> @@ -516,25 +772,20 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
>                * Force aggressive mode, and disable skipping blocks using the
>                * visibility map (even those set all-frozen)
>                */
> -             vacrel->aggressive = true;
> +             aggressive = true;
>               skipwithvm = false;
>       }
>
>       vacrel->skipwithvm = skipwithvm;
>
> +     heap_vacuum_set_up_eagerness(vacrel, aggressive);
> +
>       if (verbose)
> -     {
> -             if (vacrel->aggressive)
> -                     ereport(INFO,
> -                                     (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> -                                                     vacrel->dbname, vacrel->relnamespace,
> -                                                     vacrel->relname)));
> -             else
> -                     ereport(INFO,
> -                                     (errmsg("vacuuming \"%s.%s.%s\"",
> -                                                     vacrel->dbname, vacrel->relnamespace,
> -                                                     vacrel->relname)));
> -     }
> +             ereport(INFO,
> +                             (errmsg("%s of \"%s.%s.%s\"",
> +                                             vac_eagerness_description(vacrel->eagerness),
> +                                             vacrel->dbname, vacrel->relnamespace,
> +                                             vacrel->relname)));
>
>       /*
>        * Allocate dead_items memory using dead_items_alloc.  This handles

One thing I am wondering about is that since we actually modify
vacrel->eagerness during the "success downgrade" cycle, a single
vacuum run could potentially produce messages with both eager vacuum
and normal vacuum language. I don't think that'd be a problem in the
above spot, but wondering if it might be elsewhere (maybe in
pg_stat_activity?).


Robert Treat
https://xzilla.net



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Skip collecting decoded changes of already-aborted transactions
Next
From: Tatsuo Ishii
Date:
Subject: Re: "collation" or "collation oder"