Rework the way multixact truncations work - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Rework the way multixact truncations work |
Date | |
Msg-id | 20150621192409.GA4797@alap3.anarazel.de Whole thread Raw |
Responses |
Re: Rework the way multixact truncations work
(Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Rework the way multixact truncations work (Thomas Munro <thomas.munro@enterprisedb.com>) Re: Rework the way multixact truncations work (Andres Freund <andres@anarazel.de>) Re: Rework the way multixact truncations work (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi, As discussed on list, over IM and in person at pgcon I want to make multixact truncations be WAL logged to address various bugs. Since that's a comparatively large and invasive change I thought it'd be a good idea to start a new thread instead of burying it in a already long thread. Here's the commit message which hopefully explains what's being changed and why: Rework the way multixact truncations work. The fact that multixact truncations are not WAL logged has caused a fair share of problems. Amongst others it requires to do computations during recovery while the database is not in a consistent state, delaying truncations till checkpoints, and handling members being truncated, but offset not. We tried to put bandaids on lots of these issues over the last years, but it seems time to change course. Thus this patch introduces WAL logging for truncation, even in the back branches. This allows: 1) to perform the truncation directly during VACUUM, instead of delaying it to the checkpoint. 2) to avoid looking at the offsets SLRU for truncation during recovery, we can just use the master's values. 3) simplify a fair amount of logic to keep in memory limits straight, this has gotten much easier During the course of fixing this a bunch of bugs had to be fixed: 1) Data was not purged from memory the member's slru before deleting segments. This happend to be hard or impossible to hit due to the interlock between checkpoints and truncation. 2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but that doesn't work for offsets that haven't yet been flushed to disk. Flush out before running to fix. Not pretty, but it feels slightly safer to only make decisions based on on-disk state. To handle the case of an updated standby replaying WAL from a not-yet upgraded primary we have to recognize that situation and use "old style" truncation (i.e. looking at the SLRUs) during WAL replay. In contrast to before this now happens in the startup process, when replaying a checkpoint record, instead of the checkpointer. Doing this in the restartpoint was incorrect, they can happen much later than the original checkpoint, thereby leading to wraparound. It's also more in line to how the WAL logging now works. To avoid "multixact_redo: unknown op code 48" errors standbys should be upgraded before primaries. This needs to be expressed clearly in the release notes. Backpatch to 9.3, where the use of multixacts was expanded. Arguably this could be backpatched further, but there doesn't seem to be sufficient benefit to outweigh the risk of applying a significantly different patch there. I've tested this a bunch, including using a newer standby against a older master and such. What I have yet to test is that the concurrency protections against multiple backends truncating at the same time are correct. It'd be very welcome to see some wider testing and review on this. I've attached three commits: 0001: Add functions to burn through multixacts - that should get its own file. 0002: Lower the lower bound limits for *_freeze_max_age - I think we should just do that. There really is no reason for the current limits and they make testing hard and force space wastage. 0003: The actual truncation patch. Greetings, Andres Freund
Attachment
pgsql-hackers by date: