> The more I think about this, the more disturbed I get. It seems clear
> that this sequence is capable of writing out the checkpoint record
> before all dirty data pages have reached disk. If we suffer a crash
> before the data pages do reach disk, then on restart we will
> not realize we need to redo the changes to those pages.
> This seems an awfully large hole for what is claimed to be
> a bulletproof xlog technology.
>
> I feel that checkpoint should not use sync(2) at all, but
> should instead depend on fsync'ing the data files --- since
> fsync doesn't return until the write is done, this is considerably
> more secure.
I never was happy about sync() of course. This is just another reason
to re-write smgr. I don't know how useful is second sync() call, but
on Solaris (and I believe on many other *NIXes) rc0 calls it
three times, -:) Why?
Maybe now, with two checkpoints in log, we should start redo from
oldest one? This will increase recovery time of course -:(
Vadim