Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have an idea! Currently we write the backup pages (copies of pages
> modified since the last checkpoint) when we write the WAL changes as
> part of the commit. See the XLogCheckBuffer() call in XLogInsert().
Can someone explain exactly what the problem being defeated by writing whole
pages to the WAL log? Like, if page is half flushed to disk and contains half
the old data and half of the new data, what exactly would go wrong with
recovery? When postgres sees the write why would it have trouble recovering
the correct contents for the page based on the half-written page and the
regular WAL entry?
> A further optimization would be to write the backup pages to the same 8k
> file over and over again rather than adding I/O to pg_xlog WAL files.
This is an interesting idea. But it's not necessarily universally a win. For
users with a battery backed cache it would be absolutely ideal and a huge win.
It would avoid pushing other pages out of cache and avoid doing extra i/o
syncing the no longer relevant data to the actual disk.
But for users without a battery backed cache I don't think it would be a win.
It would mean another data file that ideally would be on a segregated set of
spindles to avoid any unnecessary seeks. Currently the ideal postgres setup
has WAL on one set of spindles, data on another set of spindles, and OS
writable partitions like /var on an entirely separate set of spindles. That
takes at least 6 drives if you want mirrored drives. This would raise the
minimum to 8 drives.
--
greg