On Wed, 2012-11-14 at 21:22 -0500, Robert Haas wrote:
> > But there are some practical issues, as Tom points out. Another one is
> > that it's harder for external utilities (like pg_basebackup) to verify
> > checksums.
> Well, I think the invariant we'd need to maintain is as follows: every
> page for which the checksum fork might be wrong must have an FPI
> following the redo pointer. So, at the time we advance the redo
> pointer, we need the checksum fork to be up-to-date for all pages for
> which a WAL record was written after the old redo pointer except for
> those for which a WAL record has again been written after the new redo
> pointer. In other words, the checksum pages we write out don't need
> to be completely accurate; the checksums for any blocks we know will
> get clobbered anyway during replay don't really matter.
The issue about external utilities is a bigger problem than I realized
at first. Originally, I thought that it was just a matter of code to
associate the checksum with the data.
However, an external utility will never see a torn page while the system
is online (after recovery); but it *will* see an inconsistent view of
the checksum and the data if they are issued in separate write() calls.
So, the hazard of storing the checksum in a different place is not
equivalent to the existing hazard of a torn page.
Regards,Jeff Davis