Setting visibility map in VACUUM's second phase - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Setting visibility map in VACUUM's second phase
Date
Msg-id CABOikdP0meGuXPPWuYrP=vDvoqUdshF2xJAzZHWSKg03Rz_+9Q@mail.gmail.com
Whole thread Raw
Responses Re: Setting visibility map in VACUUM's second phase
List pgsql-hackers
Hi All,

I briefly mentioned this idea in one of the other thread, but starting
a new thread to highlight the point. Today, we set the visibility map
*only* in the first phase of vacuum. This works when the page has no
dead tuples. But the vacuum itself is removing one or more dead tuples
from the page, the visibility map update must wait for the next vacuum
to come in. If the page receives no other action between this and the
next vacuum, the VM bit will get set next time but not before the next
vacuum reads and rescans the page. To explain this further, please see
the test below on current master:

I ran pgbench for a few seconds and then vacuumed the pgbench_accounts
table (stripping out non-interesting output)
INFO:  "pgbench_accounts": found 4709 removable, 1000000 nonremovable
row versions in 16475 out of 16475 pages

So VACUUM removed all the dead tuples from 16475 pages. One would
expect the next VACUUM to not do nothing because the table just
vacuumed is not being accessed at all. But if I run vacuum again, we
get this:
INFO:  "pgbench_accounts": found 0 removable, 1000000 nonremovable row
versions in 16475 out of 16475 pages

Its only third vacuum on the table that skips all the heap pages
because they all are now marked as all-visible in the VM.
INFO:  "pgbench_accounts": found 0 removable, 0 nonremovable row
versions in 0 out of 16475 pages

If I repeat the same test with the attached patch, the second vacuum
itself will skip all the heap pages because the VM is up-to-date at
the end of the first vacuum. Please see the output from the first and
second vacuum with the patch applied.
INFO:  "pgbench_accounts": found 4163 removable, 1000000 nonremovable
row versions in 16465 out of 16465 pages
INFO:  "pgbench_accounts": found 0 removable, 0 nonremovable row
versions in 0 out of 16465 pages

So the idea that the patch implements is this. When we scan pages in
the first phase of vacuum, if we find a page that has all-visible
tuples but also has one or more dead tuples that we know the second
phase of vacuum will remove, we mark such page with a special flag
called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel
free to suggest a better name). What this flag tells the second phase
of vacuum is to mark the page all-visible and set the VM bit unless
some intermediate action on the page again inserts a not-all-visible
tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will
be cleared by that backend and the second phase of vacuum will not set
all-visible flag and VM bit.

The patch itself is quite small and works as intended. One thing that
demanded special handling is the fact that visibilitymap_set()
requires a cutoff xid to tell the Hot Standby to resolve conflicts
appropriately. We could have scanned the page again during the second
phase to compute the cutoff xid, but I thought we can overload the
pd_prune_xid field in the page header to store this information which
is already computed in the first phase. We don't need this field when
a page is in PD_ALL_VISIBLE_OR_DEAD state because there is nothing to
prune on the page as such in that state. I added a new page header
flag to tell if the XID stored in pd_prune_xid is a prune XID or a
cutoff XID. But may be its not required. If the page has
PD_ALL_VISIBLE_OR_DEAD flag set, then we can assume that the XID is a
cutoff XID, otherwise its a prune XID. So PageIsPrunable() will need
consult the pd_prune_xid only if PD_ALL_VISIBLE_OR_DEAD is clear.

AFAICS pgbench itself may not be the most appropriate benchmark to
test this feature because between successive vacuums of the large
pgbench_accounts table, I think almost every page in that table will
have dead tuples and need vacuum's attention. But I think this will be
a very powerful enhancement for the cases where a user does a lot of
UPDATE/DELETE operations on a large table and then calls VACUUM to
remove all the dead tuples. In the current master, the visibility map
will remain unset (as shown by the vacuum output above). Hence
index-only scans won't work until the table is vacuumed one more time.
Another downside on the current master is that the next vacuum will
once again read all those heap blocks in shared buffers and scan them
over again. This unnecessary IO activity can be avoided with this
idea.I don't have any reasonable hardware to test this though, so any
help is highly appreciated.

Comments ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Attachment

pgsql-hackers by date:

Previous
From: Vlad Arkhipov
Date:
Subject: How to check whether the row was modified by this transaction before?
Next
From: Dimitri Fontaine
Date:
Subject: in-catalog Extension Scripts and Control parameters (templates?)