Vacuum ERRORs out considering freezing dead tuples from before OldestXmin - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |
Date | |
Msg-id | CAAKRu_bDD7oq9ZwB2OJqub5BovMG6UjEYsoK2LVttadjEqyRGg@mail.gmail.com Whole thread Raw |
Responses |
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |
List | pgsql-hackers |
Hi, If vacuum fails to remove a tuple with xmax older than VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed, it will ERROR out when determining whether or not to freeze the tuple with "cannot freeze committed xmax". In back branches starting with 14, failing to remove tuples older than OldestXmin during pruning caused vacuum to infinitely loop in lazy_scan_prune(), as investigated on this [1] thread. On master, after 1ccc1e05ae removed the retry loop in lazy_scan_prune() and stopped comparing tuples to OldestXmin, the hang could no longer happen, but we can still attempt to freeze dead tuples visibly killed before OldestXmin -- resulting in an ERROR. Pruning may fail to remove dead tuples with xmax before OldestXmin if the tuple is not considered removable by GlobalVisState. For vacuum, the GlobalVisState is initially calculated at the beginning of vacuuming the relation -- at the same time and with the same value as VacuumCutoffs->OldestXmin. A backend's GlobalVisState may be updated again when it is accessed if a new snapshot is taken or if something caused ComputeXidHorizons() to be called. This can happen, for example, at the end of a round of index vacuuming when GetOldestNonRemovableTransactionId() is called. Normally this may result in GlobalVisState's horizon moving forward -- potentially allowing more dead tuples to be removed. However, if a disconnected standby with a running transaction older than VacuumCutoffs->OldestXmin reconnects to the primary after vacuum initially calculates GlobalVisState and OldestXmin but before GlobalVisState is updated, the value of GlobalVisState->maybe_needed could go backwards. If this happens in the middle of vacuum's first pruning and freezing pass, it is possible that pruning/freezing could then encounter a tuple whose xmax is younger than GlobalVisState->maybe_needed and older than VacuumCutoffs->OldestXmin. heap_prune_satisfies_vacuum() would deem the tuple HEAPTUPLE_RECENTLY_DEAD and would not remove it. But the heap_pre_freeze_checks() would ERROR out with "cannot freeze committed xmax". This check is to avoid freezing dead tuples. We can fix this by always removing tuples considered dead before VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby has a transaction that sees that tuple as alive, because it will simply wait to replay the removal until it would be correct to do so or recovery conflict handling will cancel the transaction that sees the tuple as alive and allow replay to continue. Attached is the suggested fix for master plus a repro. I wrote it as a recovery suite TAP test, but I am _not_ proposing we add it to the ongoing test suite. It is, amongst other things, definitely prone to flaking. I also had to use loads of data to force two index vacuuming passes now that we have TIDStore, so it is a slow test. If you want to run the repro with meson, you'll have to add 't/099_vacuum_hang.pl' to src/test/recovery/meson.build and then run it with: meson test postgresql:recovery / recovery/099_vacuum_hang If you use autotools, you can run it with: make check PROVE_TESTS="t/099_vacuum_hang.pl" The repro forces a round of index vacuuming after the standby reconnects and before pruning a dead tuple whose xmax is older than OldestXmin. At the end of the round of index vacuuming, _bt_pendingfsm_finalize() calls GetOldestNonRemovableTransactionId(), thereby updating the backend's GlobalVisState and moving maybe_needed backwards. Then vacuum's first pass will continue with pruning and find our later inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin. I make sure that the standby reconnects between vacuum_get_cutoffs() and pruning because I have a cursor on the page keeping VACUUM FREEZE from getting a cleanup lock. See the repro for step-by-step explanations of how it works. I have a modified version of this that repros the infinite loop on 14-16 with substantially less data. See it here [2]. Also, the repro attached to this mail won't work on 14 and 15 because of changes to background_psql. - Melanie [1] https://postgr.es/m/20240415173913.4zyyrwaftujxthf2%40awork3.anarazel.de#1b216b7768b5bd577a3d3d51bd5aadee [2] https://www.postgresql.org/message-id/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
Attachment
pgsql-hackers by date: