Re: Setting visibility map in VACUUM's second phase - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Setting visibility map in VACUUM's second phase
Date
Msg-id 6230.1354816893@sss.pgh.pa.us
Whole thread Raw
In response to Setting visibility map in VACUUM's second phase  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Setting visibility map in VACUUM's second phase
Re: Setting visibility map in VACUUM's second phase
List pgsql-hackers
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> So the idea that the patch implements is this. When we scan pages in
> the first phase of vacuum, if we find a page that has all-visible
> tuples but also has one or more dead tuples that we know the second
> phase of vacuum will remove, we mark such page with a special flag
> called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel
> free to suggest a better name). What this flag tells the second phase
> of vacuum is to mark the page all-visible and set the VM bit unless
> some intermediate action on the page again inserts a not-all-visible
> tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will
> be cleared by that backend and the second phase of vacuum will not set
> all-visible flag and VM bit.

This seems overly complicated, as well as a poor use of a precious page
header flag bit.  Why not just recheck all-visibility status after
performing the deletions?  Keep in mind that even if there were some
not-all-visible tuples when we were on the page the first time, they
might have become all-visible by the time we return.  So this is going
out of its way to produce a less-than-optimal result.

> The patch itself is quite small and works as intended. One thing that
> demanded special handling is the fact that visibilitymap_set()
> requires a cutoff xid to tell the Hot Standby to resolve conflicts
> appropriately. We could have scanned the page again during the second
> phase to compute the cutoff xid, but I thought we can overload the
> pd_prune_xid field in the page header to store this information which
> is already computed in the first phase.

And this is *far* too cute.  The use of that field is complicated enough
already without having its meaning vary depending on randomly-designed
flag bits.  And it's by no means obvious that using a by-now-old value
when we return to the page is a good idea anyway (or even correct).

I think taking a second whack at setting the visibility bit is a fine
idea, but let's drop all the rest of this premature optimization.
        regards, tom lane



pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Functional dependency in GROUP BY through JOINs
Next
From: Alvaro Herrera
Date:
Subject: Re: How to check whether the row was modified by this transaction before?