Re: 16-bit page checksums for 9.2 - Mailing list pgsql-hackers
From | Aidan Van Dyk |
---|---|
Subject | Re: 16-bit page checksums for 9.2 |
Date | |
Msg-id | CAC_2qU-OnB4Zpcs77q7Xo4L+vBOhFc-RKS6WJNWFv+7m8jzoNw@mail.gmail.com Whole thread Raw |
In response to | Re: 16-bit page checksums for 9.2 ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: 16-bit page checksums for 9.2
|
List | pgsql-hackers |
On Thu, Dec 29, 2011 at 11:44 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > You wind up with a database free of torn pages before you apply WAL. > full_page_writes to the WAL are not needed as long as double-write is > used for any pages which would have been written to the WAL. If > checksums were written to the double-buffer metadata instead of > adding them to the page itself, this could be implemented alone. It > would probably allow a modest speed improvement over using > full_page_writes and would eliminate those full-page images from the > WAL files, making them smaller. Correct. So now lots of people seem to be jumping on the double-write bandwagon and looking at some the things it promise: All writes are durable This solves 2 big issues: - Remove torn-page problem - Remove FPW from WAL That up front looks pretty attractive. But we need to look at the tradeoffs, and then decide (benchmark anyone). Remember, postgresql is a double-write system right now. The 1st, checkumed write is the FPW in WAL. It's fsynced. And the 2nd synced write is when the file is synced during checkpoint. So, postgresql currently has an optimization now that not every write has *requirements* for atomic, instant durability. And so postgresql get's to do lots of writes to the OS cache and *not* request them to be instantly synced. And then at some point, when it's reay to clear the 1st checksumed write, make sure everywrite is synced. And lots of work went into PG recently to get even better at the collection of writes/syncs that happen at checkpoint time to take even biger advantage of the fact that its' better to write everything in a fil efirst, then call a single sync. So moving to this new double-write-area bandwagon, we move from a "WAL FPW synced at the commit, collect as many other writes, then final sync" type system to a system where *EVERY* write requires syncs of 2 separate 8K writes at buffer write-out time. So we avoid the FPW at commit (yes, that's nice for latency), and we guarentee every buffer written is consistent (that fixes our hit-bit-only dirty writes from being torn). And we do that at a cost of every buffer write requiring 2 fsyncs, in a serial fashion. Come checkpoint, I'm wondering.... Again, all that to avoid a single "optimization" that postgresql currently has: 1) writes for hint-bit only buffers don't need to be durable And the problem that optimization introduces: 1) Since they aren't guarenteed durable, we can't believe a checksum -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
pgsql-hackers by date: