Re: Enabling Checksums - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Enabling Checksums |
Date | |
Msg-id | 1362692725.26602.113.camel@jdavis Whole thread Raw |
In response to | Re: Enabling Checksums (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Enabling Checksums
(Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: Enabling Checksums (Jeff Davis <pgsql@j-davis.com>) |
List | pgsql-hackers |
On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote: > If you enable checksums, the free space map never gets updated in a > standby. It will slowly drift to be completely out of sync with reality, > which could lead to significant slowdown and bloat after failover. One of the design points of this patch is that those operations that use MarkBufferDirtyHint(), including tuple hint bits, the FSM, index dead markers, etc., do not directly go to the standby. That's because the standby can't write WAL, so it can't protect itself against a torn page breaking the checksum. However, these do make it through by riding along with a full-page image in the WAL. The fact that checksums are enabled means that these full page images will be written once per modified page per checkpoint, and then replayed on the standby. FSM should get the updates the same way, even though no other WAL is written for the FSM. If full_page_writes are disabled, then the updates will never arrive. But in that case, I think we can just go ahead and dirty the page during recovery, because there isn't a real problem. I was hesitant to make this change in my patch because: 1. I wanted to see if someone saw a flaw in this reasoning; and 2. I noticed that full_page_images can be changed with a SIGHUP, which could add complexity (I don't see any reason we allow this... shouldn't we just force a restart for that change?). I added a README file, moved some of the explanatory material there, and tried to clarify this situation. Let me know if you see a problem that I'm missing. I verified that at least some FSM changes do make it through with checksums on, but I didn't dig much deeper than that. > Since the checksums are an all-or-nothing cluster-wide setting, the > three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and > PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the > code simpler, and leaves the bits free for future use. If we want to > enable such per-page setting in the future, we can add it later. For a > per-relation scheme, they're not needed. Removed header bits. > XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN > without a lock. That's not atomic, so it could incorrectly determine > that a page doesn't need to be backed up. We used to always hold an > exclusive lock on the buffer when it's called, which prevents > modifications to the LSN, but that's no longer the case. Fixed. I added a new exported function, BufferGetLSNAtomic(). There was another similar omission in gistget.c. By the way, I can not find any trace of XLogCheckBufferNeedsBackup(), was that a typo? > Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I > think it will generate WAL records for unlogged tables as it is. Fixed. I also rebased and added a GUC to control whether the checksum failure causes an error or not. I need to do another self-review after these changes and some more extensive testing, so I might have missed a couple things. Regards, Jeff Davis
Attachment
pgsql-hackers by date: