Setting visibility map in VACUUM's second phase - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Setting visibility map in VACUUM's second phase |
Date | |
Msg-id | CABOikdP0meGuXPPWuYrP=vDvoqUdshF2xJAzZHWSKg03Rz_+9Q@mail.gmail.com Whole thread Raw |
Responses |
Re: Setting visibility map in VACUUM's second phase
|
List | pgsql-hackers |
Hi All, I briefly mentioned this idea in one of the other thread, but starting a new thread to highlight the point. Today, we set the visibility map *only* in the first phase of vacuum. This works when the page has no dead tuples. But the vacuum itself is removing one or more dead tuples from the page, the visibility map update must wait for the next vacuum to come in. If the page receives no other action between this and the next vacuum, the VM bit will get set next time but not before the next vacuum reads and rescans the page. To explain this further, please see the test below on current master: I ran pgbench for a few seconds and then vacuumed the pgbench_accounts table (stripping out non-interesting output) INFO: "pgbench_accounts": found 4709 removable, 1000000 nonremovable row versions in 16475 out of 16475 pages So VACUUM removed all the dead tuples from 16475 pages. One would expect the next VACUUM to not do nothing because the table just vacuumed is not being accessed at all. But if I run vacuum again, we get this: INFO: "pgbench_accounts": found 0 removable, 1000000 nonremovable row versions in 16475 out of 16475 pages Its only third vacuum on the table that skips all the heap pages because they all are now marked as all-visible in the VM. INFO: "pgbench_accounts": found 0 removable, 0 nonremovable row versions in 0 out of 16475 pages If I repeat the same test with the attached patch, the second vacuum itself will skip all the heap pages because the VM is up-to-date at the end of the first vacuum. Please see the output from the first and second vacuum with the patch applied. INFO: "pgbench_accounts": found 4163 removable, 1000000 nonremovable row versions in 16465 out of 16465 pages INFO: "pgbench_accounts": found 0 removable, 0 nonremovable row versions in 0 out of 16465 pages So the idea that the patch implements is this. When we scan pages in the first phase of vacuum, if we find a page that has all-visible tuples but also has one or more dead tuples that we know the second phase of vacuum will remove, we mark such page with a special flag called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel free to suggest a better name). What this flag tells the second phase of vacuum is to mark the page all-visible and set the VM bit unless some intermediate action on the page again inserts a not-all-visible tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will be cleared by that backend and the second phase of vacuum will not set all-visible flag and VM bit. The patch itself is quite small and works as intended. One thing that demanded special handling is the fact that visibilitymap_set() requires a cutoff xid to tell the Hot Standby to resolve conflicts appropriately. We could have scanned the page again during the second phase to compute the cutoff xid, but I thought we can overload the pd_prune_xid field in the page header to store this information which is already computed in the first phase. We don't need this field when a page is in PD_ALL_VISIBLE_OR_DEAD state because there is nothing to prune on the page as such in that state. I added a new page header flag to tell if the XID stored in pd_prune_xid is a prune XID or a cutoff XID. But may be its not required. If the page has PD_ALL_VISIBLE_OR_DEAD flag set, then we can assume that the XID is a cutoff XID, otherwise its a prune XID. So PageIsPrunable() will need consult the pd_prune_xid only if PD_ALL_VISIBLE_OR_DEAD is clear. AFAICS pgbench itself may not be the most appropriate benchmark to test this feature because between successive vacuums of the large pgbench_accounts table, I think almost every page in that table will have dead tuples and need vacuum's attention. But I think this will be a very powerful enhancement for the cases where a user does a lot of UPDATE/DELETE operations on a large table and then calls VACUUM to remove all the dead tuples. In the current master, the visibility map will remain unset (as shown by the vacuum output above). Hence index-only scans won't work until the table is vacuumed one more time. Another downside on the current master is that the next vacuum will once again read all those heap blocks in shared buffers and scan them over again. This unnecessary IO activity can be avoided with this idea.I don't have any reasonable hardware to test this though, so any help is highly appreciated. Comments ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Attachment
pgsql-hackers by date: