Re: Fix calculations on WAL recycling. - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Fix calculations on WAL recycling. |
Date | |
Msg-id | 20180723065916.GI2854@paquier.xyz Whole thread Raw |
In response to | Fix calculations on WAL recycling. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: Fix calculations on WAL recycling.
|
List | pgsql-hackers |
On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote: > I'll register this to the next commitfest. > > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyotaro@lab.ntt.co.jp This is an open item for v11. >> While considering this, I found a bug in 4b0d28de06, which >> removed prior checkpoint from control file. It actually trims the >> segments before the last checkpoint's redo segment but recycling >> is still considered based on the *prevous* checkpoint. As the >> result min_wal_size doesn't work as told. Specifically, setting >> min/max_wal_size to 48MB and advance four or more segments then >> two checkpoints leaves just one segment, which is less than >> min_wal_size. >> >> The attached patch fixes that. One arguable point on this would >> be the removal of the behavior when RemoveXLogFile(name, >> InvalidXLogRecPtr, ..). >> >> The only place calling the function with the parameter is >> timeline switching. Previously unconditionally 10 segments are >> recycled after switchpoint but the reason for the behavior is we >> didn't have the information on previous checkpoint at hand at the >> time. But now we can use the timeline switch point as the >> approximate of the last checkpoint's redo point and this allows >> us to use min/max_wal_size properly at the time. I have been looking at that, and tested with this simple scenario: create table aa (a int); Then just repeat the following SQLs: insert into aa values (1); select pg_switch_wal(); checkpoint; select pg_walfile_name(pg_current_wal_lsn()); By doing so, you would notice that the oldest WAL segment does not get recycled after the checkpoint, while it should as the redo pointer is always checkpoint generated. What happens is that this oldest segment gets recycled every two checkpoints. Then I had a look at the proposed patch, with a couple of comments. - if (PriorRedoPtr == InvalidXLogRecPtr) - recycleSegNo = endlogSegNo + 10; - else - recycleSegNo = XLOGfileslop(PriorRedoPtr); + recycleSegNo = XLOGfileslop(RedoRecPtr); I think that this is a new behavior, and should not be changed, see point 3 below. In CreateCheckPoint(), the removal of past WAL segments is always going to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling should happen also when PriorRedoPtr is InvalidXLogRecPtr, no? /* Trim from the last checkpoint, not the last - 1 */ This comment could be adjusted, let's remove "not the last - 1" . The calculation of _logSegNo in CreateRestartPoint is visibly incorrect, this still uses PriorRedoPtr so the bug is not fixed for standbys. The tweaks for ThisTimeLineID also need to be out of the loop where PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation should be kept. Finally, in summary, this patch is doing actually three things: 1) Rename a couple of variables which refer incorrectly to the prior checkpoint so as they refer to the last checkpoint generated. 2) Make sure that WAL recycling/removal happens based on the last checkpoint generated, which is the one just generated when past WAL segments are cleaned up as post-operation actions. 3) Enforce the recycling point to be the switch point instead of arbitrarily recycle 10 segments, which is what b2a5545b has introduced. Recycling at exactly the switch point is wrong, as the redo LSN of the previous checkpoint may not be at the same segment when a timeline has switched, so you would finish with recycling segments which are still needed if an instance needs be crash-recovered post-promotion without a completed post-recovery checkpoint. In short this is dangerous. I'll let Heikki comment on that, but I think that you should fetch the last redo LSN instead, still you need to be worried about checkpoints running in parallel of the startup process, so I would stick with the current logic. 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is not. -- Michael
Attachment
pgsql-hackers by date: