lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples |
Date | |
Msg-id | 20130108024957.GA4751@tornado.leadboat.com Whole thread Raw |
Responses |
Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples
Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples |
List | pgsql-hackers |
Per this comment in lazy_scan_heap(), almost all tuple removal these days happens in heap_page_prune(): case HEAPTUPLE_DEAD: /* * Ordinarily, DEAD tuples would have been removed by * heap_page_prune(), but it's possible that the tuple * state changed since heap_page_prune() looked. In * particular an INSERT_IN_PROGRESS tuple could have * changed to DEAD if the inserter aborted. So this * cannot be considered an error condition. vacuumlazy.c remains responsible for noticing the LP_DEAD line pointers left by heap_page_prune(), removing corresponding index entries, and marking those line pointers LP_UNUSED. Nonetheless, lazy_vacuum_heap() retains the ability to remove actual HEAPTUPLE_DEAD tuples and reclaim their LP_NORMAL line pointers. This support gets exercised only in the scenario described in the above comment. For hot standby, this capability requires its own WAL record, XLOG_HEAP2_CLEANUP_INFO, to generate the necessary conflicts[1]. There is a bug in lazy_scan_heap()'s bookkeeping for the xid to place in that WAL record. Each call to heap_page_prune() simply overwrites vacrelstats->latestRemovedXid, but lazy_scan_heap() expects it to only ever increase the value. I have a attached a minimal fix to be backpatched. It has lazy_scan_heap() ignore heap_page_prune()'s actions for the purpose of this conflict xid, because heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them. At that point in the investigation, I realized that the cost of being able to remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit. Again, the benefit is being able to remove tuples whose inserting transaction aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune() and the one in lazy_scan_heap(). To make that possible, lazy_vacuum_heap() grabs a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples. If we take it out of the business of removing tuples, lazy_vacuum_heap() can skip WAL and update lp_flags under a mere shared lock. The second attached patch, for HEAD, implements that. Besides optimizing things somewhat, it simplifies the code and removes rarely-tested branches. (This patch supersedes the backpatch-oriented patch rather than stacking with it.) The bookkeeping behind the "page containing dead tuples is marked as all-visible in relation" warning is also faulty; it only fires when lazy_heap_scan() saw the HEAPTUPLE_DEAD tuple; again, heap_page_prune() will be the one to see it in almost every case. I changed the warning to fire whenever the table cannot be marked all-visible for a reason other than the presence of too-recent live tuples. I considered renaming lazy_vacuum_heap() to lazy_heap_clear_dead_items(), reflecting its narrower role. Ultimately, I left function names unchanged. This patch conflicts textually with Pavan's "Setting visibility map in VACUUM's second phase" patch, but I don't see any conceptual incompatibility. I can't give a simple statement of the performance improvement here. The XLOG_HEAP2_CLEAN record is fairly compact, so the primary benefit of avoiding it is the possibility of avoiding a full-page image. For example, if a checkpoint lands just before the VACUUM and again during the index-cleaning phase (assume just one such phase in this example), this patch reduces heap-related WAL volume by almost 50%. Improvements as extreme as 2% and 97% are possible given other timings of checkpoints relatively to the VACUUM. In general, expect this to help VACUUMs spanning several checkpoint cycles more than it helps shorter VACUUMs. I have attached a script I used as a reference workload for testing different checkpoint timings. There should also be some improvement from keeping off WALInsertLock, not requiring WAL flushes to evict from the ring buffer during the lazy_vacuum_heap() phase, and not taking a second buffer cleanup lock. I did not attempt to quantify those. Thanks, nm [1] Normally, heap_page_prune() removes the tuple first (leaving an LP_DEAD line pointer), and vacuumlazy.c removes index entries afterward. When the removal happens in this order, the XLOG_HEAP2_CLEAN record takes care of conflicts. However, in the rarely-used code path, we remove the index entries before removing the tuple. XLOG_HEAP2_CLEANUP_INFO conflicts with standby snapshots that might need the vanishing index entries.
Attachment
pgsql-hackers by date: