Re: 16-bit page checksums for 9.2 - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: 16-bit page checksums for 9.2 |
Date | |
Msg-id | CA+U5nM+CRsUepiGHnz4K9VBDnQnTspAzKSMEJ-Miy+M6C0x+Yg@mail.gmail.com Whole thread Raw |
In response to | Re: 16-bit page checksums for 9.2 (Noah Misch <noah@leadboat.com>) |
Responses |
Re: 16-bit page checksums for 9.2
|
List | pgsql-hackers |
On Wed, Feb 8, 2012 at 2:05 PM, Noah Misch <noah@leadboat.com> wrote: > On Wed, Feb 08, 2012 at 11:40:34AM +0000, Simon Riggs wrote: >> On Wed, Feb 8, 2012 at 3:24 AM, Noah Misch <noah@leadboat.com> wrote: >> > On Tue, Feb 07, 2012 at 08:58:59PM +0000, Simon Riggs wrote: >> >> On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch <noah@leadboat.com> wrote: >> >> > This patch uses FPIs to guard against torn hint writes, even when the >> >> > checksums are disabled. ?One could not simply condition them on the >> >> > page_checksums setting, though. ?Suppose page_checksums=off and we're hinting >> >> > a page previously written with page_checksums=on. ?If that write tears, >> >> > leaving the checksum intact, that block will now fail verification. ?A couple >> >> > of ideas come to mind. ?(a) Read the on-disk page and emit an FPI only if the >> >> > old page had a checksum. ?(b) Add a checksumEnableLSN field to pg_control. >> >> > Whenever the cluster starts with checksums disabled, set the field to >> >> > InvalidXLogRecPtr. ?Whenever the cluster starts with checksums enabled and the >> >> > field is InvalidXLogRecPtr, set the field to the next LSN. ?When a checksum >> >> > failure occurs in a page with LSN older than the stored one, emit either a >> >> > softer warning or no message at all. >> >> >> >> We can only change page_checksums at restart (now) so the above seems moot. >> >> >> >> If we crash with FPWs enabled we repair any torn pages. >> > >> > There's no live bug, but that comes at a high cost: the patch has us emit >> > full-page images for hint bit writes regardless of the page_checksums setting. >> >> Sorry, I don't understand what you mean. I don't see any failure cases >> that require that. >> >> page_checksums can only change at a shutdown checkpoint, >> >> The statement "If that write tears, >> >> > leaving the checksum intact, that block will now fail verification." >> cannot happen, ISTM. >> >> If we write out a block we update the checksum if page_checksums is >> set, or we clear it. If we experience a torn page at crash, the FPI >> corrects that, so the checksum never does fail verification. We only >> need to write a FPI when we write with checksums. >> >> If that's wrong, please explain a failure case in detail. > > In the following description, I will treat a heap page as having two facts. > The first is either CHKSUM or !CHKSUM, and the second is either HINT or !HINT. > A torn page write lacking the protection of a full-page image can update one > or both facts despite the transaction having logically updated both. (This is > just the normal torn-page scenario.) Given that, consider the following > sequence of events: > > 1. startup with page_checksums = on > 2. update page P with facts CHKSUM, !HINT > 3. clean write of P > 4. clean shutdown > > 5. startup with page_checksums = off > 6. update P with facts !CHKSUM, HINT. no WAL since page_checksums=off > 7. begin write of P > 8. crash. torn write of P only updates HINT. disk state now CHKSUM, HINT > > 9. startup with page_checksums = off > 10. crash recovery. does not touch P, because step 6 generated no WAL > 11. clean shutdown > > 12. startup with page_checksums = on > 13. read P. checksum failure > > Again, this cannot happen with checksum16_with_wallogged_hint_bits.v7.patch, > because step 6 _does_ write WAL. That ought to change for the sake of > page_checksums=off performance, at which point we must protect the above > scenario by other means. Thanks for explaining. Logic fixed. >> >> > PageSetLSN() is not atomic, so the shared buffer content lock we'll be holding >> >> > is insufficient. >> >> >> >> Am serialising this by only writing PageLSN while holding buf hdr lock. >> > >> > That means also taking the buffer header spinlock in every PageGetLSN() caller >> > holding only a shared buffer content lock. ?Do you think that will pay off, >> > versus the settled pattern of trading here your shared buffer content lock for >> > an exclusive one? >> >> Yes, I think it will pay off. This is the only code location where we >> do that, and we are already taking the buffer header lock, so there is >> effectively no overhead. > > The sites I had in the mind are the calls to PageGetLSN() in FlushBuffer() > (via BufferGetLSN()) and XLogCheckBuffer(). We don't take the buffer header > spinlock at either of those locations. If they remain safe under the new > rules, we'll need comments explaining why. I think they may indeed be safe, > but it's far from obvious. OK, some slight rework required to do that, no problems. I checked all other call sites and have updated README to explain. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: