Re: Resume vacuum and autovacuum from interruption and cancellation - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Resume vacuum and autovacuum from interruption and cancellation
Date
Msg-id CA+fd4k52W1P+CkVFu12L3RoMqW9q1mkWYoshPUXJ-=ybbyNP9A@mail.gmail.com
Whole thread Raw
In response to Re: Resume vacuum and autovacuum from interruption and cancellation  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: Resume vacuum and autovacuum from interruption and cancellation
Re: Resume vacuum and autovacuum from interruption and cancellation
List pgsql-hackers
On Tue, 5 Nov 2019 at 15:57, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 2 Nov 2019 at 02:10, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> > > Sounds like an interesting idea, but does it really help? Because if
> > > vacuum was interrupted previously, wouldn't it already know the dead
> > > tuples, etc in the next run quite quickly, as the VM, FSM is already
> > > updated for the page in the previous run.
> >
> > +1. I don't deny that a patch like this could sometimes save
> > something, but it doesn't seem like it would save all that much all
> > that often. If your autovacuum runs are being frequently cancelled,
> > that's going to be a big problem, I think.
>
> I've observed the case where user wants to cancel a very long running
> autovacuum (sometimes for anti-wraparound) for doing DDL or something
> maintenance works. If the table is very large autovacuum could take a
> long time and they might not reclaim garbage enough.
>
> > And as Rafia says, even
> > though you might do a little extra work reclaiming garbage from
> > subsequently-modified pages toward the beginning of the table, it
> > would be unusual if they'd *all* been modified. Plus, if they've
> > recently been modified, they're more likely to be in cache.
> >
> > I think this patch really needs a test scenario or demonstration of
> > some kind to prove that it produces a measurable benefit.
>
> Okay. A simple test could be that we cancel a long running vacuum on a
> large table that is being updated and rerun vacuum. And then we see
> the garbage on that table. I'll test it.
>

Attached the updated version patch.

I've measured the effect by this patch. In the test, I simulate the
case where autovacuum running on the table that is being updated is
canceled in the middle of vacuum, and then rerun (or resume)
autovacuum on the table. Since the vacuum resume block is saved after
heap vacuum, I set maintenance_work_mem so that vacuum on that table
needs heap vacuum twice or more. In other words, maintenance_work_mem
are used up during autovacuum at least more than once. The detail step
is:

1.  Make table dirty for 15 min
2.  Run vacuum with vacuum delays
3.  After the first heap vacuum, cancel it
4.  Rerun vacuum (or with the patch resume vacuum)
Through step #2 to step #4 the table is being updated in background. I
used pgbench and \random command, so the table is updated uniformly.

 I've measured the dead tuple percentage of the table. In these tests,
how long step #4 took and how much collected garbage at step #4 are
important.

1. Canceled vacuum after processing about 20% of table at step #2.
1-1. HEAD
After making dirtied (after step #1): 6.96%
After cancellation (after step #3): 6.13%

At step #4, vacuum reduced it to 4.01% and took 12m 49s. The vacuum
efficiency is 0.16%/m (2.12% down in 12.8min).

1-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.92%
After cancellation (after step #3): 5.84%

At step #4, vacuum reduced it to 4.32% and took 10m 26s. The vacuum
efficiency is 0.14%/m.

------
2. Canceled vacuum after processing about 40% of table at step #2.
2-1. HEAD
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.56%

At step #4, vacuum reduced it to 1.91% and took 8m 15s.The vacuum
efficiency is 0.32%/m.

2-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.46%

At step #4, vacuum reduced it to 1.94% and took 6m 30s. The vacuum
efficiency is 0.38%/m.

-----
3. Canceled vacuum after processing about 70% of table at step #2.
3-1. HEAD
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.73%

At step #4, vacuum reduced it to 2.32% and took 8m 11s. The vacuum
efficiency is 0.29%/m.

3-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.96%
After cancellation (after step #3): 4.73%

At step #4, vacuum reduced it to 3.25% and took 4m 12s. The vacuum
efficiency is 0.35%/m.

According to those results, it's thought that the more we resume
vacuum from the tail of the table, the efficiency is good. Since the
table is being updated uniformly even during autovacuum it was more
efficient to restart autovacuum from last position rather than from
the beginning of the table. I think that results shows somewhat the
benefit of this patch but I'm concerned that it might be difficult for
users when to use this option. In practice the efficiency completely
depends on the dispersion of updated pages, and that test made pages
dirty uniformly, which is not a common situation. So probably if we
want this feature, I think we should automatically enable resuming
when we can basically be sure that resuming is better. For example, we
remember both the last vacuumed block and how many vacuum-able pages
seems to exist from there, and we decide to resume vacuum if we can
expect to process more many pages.

Regards

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: more ALTER .. DEPENDS ON EXTENSION fixes
Next
From: Alvaro Herrera
Date:
Subject: Re: HAVE_WORKING_LINK still needed?