Re: Resume vacuum and autovacuum from interruption and cancellation - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Resume vacuum and autovacuum from interruption and cancellation |
Date | |
Msg-id | CA+fd4k52W1P+CkVFu12L3RoMqW9q1mkWYoshPUXJ-=ybbyNP9A@mail.gmail.com Whole thread Raw |
In response to | Re: Resume vacuum and autovacuum from interruption and cancellation (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Resume vacuum and autovacuum from interruption and cancellation
Re: Resume vacuum and autovacuum from interruption and cancellation |
List | pgsql-hackers |
On Tue, 5 Nov 2019 at 15:57, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 2 Nov 2019 at 02:10, Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote: > > > Sounds like an interesting idea, but does it really help? Because if > > > vacuum was interrupted previously, wouldn't it already know the dead > > > tuples, etc in the next run quite quickly, as the VM, FSM is already > > > updated for the page in the previous run. > > > > +1. I don't deny that a patch like this could sometimes save > > something, but it doesn't seem like it would save all that much all > > that often. If your autovacuum runs are being frequently cancelled, > > that's going to be a big problem, I think. > > I've observed the case where user wants to cancel a very long running > autovacuum (sometimes for anti-wraparound) for doing DDL or something > maintenance works. If the table is very large autovacuum could take a > long time and they might not reclaim garbage enough. > > > And as Rafia says, even > > though you might do a little extra work reclaiming garbage from > > subsequently-modified pages toward the beginning of the table, it > > would be unusual if they'd *all* been modified. Plus, if they've > > recently been modified, they're more likely to be in cache. > > > > I think this patch really needs a test scenario or demonstration of > > some kind to prove that it produces a measurable benefit. > > Okay. A simple test could be that we cancel a long running vacuum on a > large table that is being updated and rerun vacuum. And then we see > the garbage on that table. I'll test it. > Attached the updated version patch. I've measured the effect by this patch. In the test, I simulate the case where autovacuum running on the table that is being updated is canceled in the middle of vacuum, and then rerun (or resume) autovacuum on the table. Since the vacuum resume block is saved after heap vacuum, I set maintenance_work_mem so that vacuum on that table needs heap vacuum twice or more. In other words, maintenance_work_mem are used up during autovacuum at least more than once. The detail step is: 1. Make table dirty for 15 min 2. Run vacuum with vacuum delays 3. After the first heap vacuum, cancel it 4. Rerun vacuum (or with the patch resume vacuum) Through step #2 to step #4 the table is being updated in background. I used pgbench and \random command, so the table is updated uniformly. I've measured the dead tuple percentage of the table. In these tests, how long step #4 took and how much collected garbage at step #4 are important. 1. Canceled vacuum after processing about 20% of table at step #2. 1-1. HEAD After making dirtied (after step #1): 6.96% After cancellation (after step #3): 6.13% At step #4, vacuum reduced it to 4.01% and took 12m 49s. The vacuum efficiency is 0.16%/m (2.12% down in 12.8min). 1-2. Patched (resume vacuum) After making dirtied (after step #1): 6.92% After cancellation (after step #3): 5.84% At step #4, vacuum reduced it to 4.32% and took 10m 26s. The vacuum efficiency is 0.14%/m. ------ 2. Canceled vacuum after processing about 40% of table at step #2. 2-1. HEAD After making dirtied (after step #1): 6.97% After cancellation (after step #3): 4.56% At step #4, vacuum reduced it to 1.91% and took 8m 15s.The vacuum efficiency is 0.32%/m. 2-2. Patched (resume vacuum) After making dirtied (after step #1): 6.97% After cancellation (after step #3): 4.46% At step #4, vacuum reduced it to 1.94% and took 6m 30s. The vacuum efficiency is 0.38%/m. ----- 3. Canceled vacuum after processing about 70% of table at step #2. 3-1. HEAD After making dirtied (after step #1): 6.97% After cancellation (after step #3): 4.73% At step #4, vacuum reduced it to 2.32% and took 8m 11s. The vacuum efficiency is 0.29%/m. 3-2. Patched (resume vacuum) After making dirtied (after step #1): 6.96% After cancellation (after step #3): 4.73% At step #4, vacuum reduced it to 3.25% and took 4m 12s. The vacuum efficiency is 0.35%/m. According to those results, it's thought that the more we resume vacuum from the tail of the table, the efficiency is good. Since the table is being updated uniformly even during autovacuum it was more efficient to restart autovacuum from last position rather than from the beginning of the table. I think that results shows somewhat the benefit of this patch but I'm concerned that it might be difficult for users when to use this option. In practice the efficiency completely depends on the dispersion of updated pages, and that test made pages dirty uniformly, which is not a common situation. So probably if we want this feature, I think we should automatically enable resuming when we can basically be sure that resuming is better. For example, we remember both the last vacuumed block and how many vacuum-able pages seems to exist from there, and we decide to resume vacuum if we can expect to process more many pages. Regards -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: