Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Date
Msg-id CAGTBQpaM7cib3M-yZDi=8HHTmBmF41zSYMKh72bzTfNoYLyDXw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> Attached patches, rebased and modified as discussed:
>> 1 no longer does tree pruning, it just vacuums a range of the FSM
>> 2 reintroduces tree pruning for the initial FSM vacuum
>> 3 and 4 remain as they were, but rebased
>
> I reviewed and cleaned up 0001.  The API for FreeSpaceMapVacuumRange was
> underspecified, and the use of it seemed to have off-by-one errors.  Also,
> you still had vacuum doing a full FreeSpaceMapVacuum call at the end;
> I thought the point was to get rid of that.  We then need to make sure
> we clean up after a truncation, but we can do that by introducing a
> FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel.  I think the
> attached 0001 is committable, but feel free to take another look.

+
+             /*
+              * Periodically do incremental FSM vacuuming to make newly-freed
+              * space visible on upper FSM pages.  Note: although we've cleaned
+              * the current block, we haven't yet updated its FSM entry (that
+              * happens further down), so passing end == blkno is correct.
+              */
+             if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+             {
+                 FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
+                                         blkno);
+                 next_fsm_block_to_vacuum = blkno;
+             }
          }

Using next_fsm_block_to_vacuum there seems strange.

v10 counted the number of blocks with updated free space to vacuum the
FSM only after a lot of changes to it were made. This will vacuum the
FSM after *scanning* a lot of pages, even if little modifications were
made to them. Not sure which is better, but the logic in v10 sounded
right to me.

!         if (fsm_end.logpageno == addr.logpageno)
!             end_slot = fsm_end_slot;
!         else if (fsm_end.logpageno > addr.logpageno)
!             end_slot = SlotsPerFSMPage;
!         else
!             end_slot = -1;
!
!         for (slot = start_slot; slot < end_slot; slot++)

This doesn't seem correct.

The +1 in v10 was intentional. Suppose a leaf node of the FSM has a
parent in slot 3, and that one has a parent at slot 10.

This would vacuum slots 0-9 on the upper level, but not 10. That would
leave a whole branch (the one where the end block belongs) unvisited.

That's why the end_slot has to be inclusive of the end block. We have
work to do recursing the end_slot.


> I still don't like 0002.  It's adding a lot of complexity, and
> not-negligible overhead, to solve yesterday's problem.

Are you sure it's non-negligible?

Most of my benchmarks couldn't measure any change whatsoever after
this patch in regards to run/cpu time.

The size of the FSM is so much smaller than the table, that the cost
of vacuuming it is drowned by all the other work done and buried under
the noise.

Maybe under very special cases where vacuum does nothing, skipping all
rows, it might be measurable. A heavily bloated-then-cleaned table
with few live rows per page, but all frozen, that would be a total
worst-case. But reading the visibility map to skip rows is comparable
work to vacuuming the FSM. There's no reason to think it would worsen
by *that* much. I might try to benchmark that a bit after the long
weekend.

Anyway, it's a separate patch to be independently committed/vetted.

> After 0001,
> there's no reason to assume that vacuum is particularly likely to get
> cancelled between having made cleanups and having updated the upper FSM
> levels.  (Maybe the odds are a bit more for the no-indexes case, but
> that doesn't seem like it's common enough to justify a special mechanism
> either.)

Why not?

Any kind of DDL (even those that don't rewrite the heap) would cancel
autovacuum.

You might think DDL isn't common enough to worry about, but I've seen
cases where regular reindex were required to keep index bloat in check
(and were cron'd), and those cancel autovacuum all the time.

> Not sure about 0004 either.  The fact that we can't localize what part of
> the index needs to be updated means that repeated IndexFreeSpaceMapVacuum
> calls are a roughly quadratic cost.

Well, the index bulk delete itself already is a huge elephant-sized
quadratic cost.

The FSM is only the cherry on top.

The updated part can't be localize because it isn't. All blocks could
potentially be changed. Even in correlated indexes, upper levels need
not be physically correlated and would screw with the "vacuum block
range" optimization.

I could try to optimize the case where it's possible, by recording the
first and last blocks modified, but that would be a hugely invasive
patch (it would have to touch a lot of places in btree code).

And index bloat is a very real issue, as bad as heap bloat is.

>  Maybe in proportion to the other work
> we have to do, they're not much, but on the other hand how much benefit is
> there?

A week-long multi-pass vacuum I was forced to do about a month ago
accumulated 400GB of *index bloat* because of this. To put into
context, the index was ~800GB, so close to 50% bloat.

Granted, with this patch it would still accumulate bloat, just not
400GB. Just 400GB / # of passes. It's still a decent improvement I
would say. And perhaps there was something else wrong in my
installation. But the patch would indeed have helped a lot.

>  Should we make the call conditional on how many index pages got
> updated?

It could be attempted.

>  Also, I wonder why you only touched nbtree and spgist.

They're the only ones that used the FSM during bulk deletes, at least
that I could find grepping a bit.

I was a bit surprised to find that hash doesn't use the FSM at all,
but perhaps there's a reason for that.

GIN does all FSM updates at the end during ginvacuumcleanup, so
there's nothing to do between bulkdelete calls.

>  (For
> that matter, I wonder why BRIN doesn't go through IndexFreeSpaceMapVacuum
> like the rest of the index AMs.  Or perhaps it has the right idea and we
> should get rid of IndexFreeSpaceMapVacuum as a useless layer.)

No idea.


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Updating parallel.sgml's treatment of parallel joins
Next
From: Arthur Zakirov
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries