Kevin Brown <kevin@sysexperts.com> writes:
> One question I have is: in the event of a crash, why not simply replay
> all the transactions found in the WAL? Is the startup time of the
> database that badly affected if pg_control is ignored?
Interesting thought, indeed. Since we truncate the WAL after each
checkpoint, seems like this approach would no more than double the time
for restart. The win is it'd eliminate pg_control as a single point of
failure. It's always bothered me that we have to update pg_control on
every checkpoint --- it should be a write-pretty-darn-seldom file,
considering how critical it is.
I think we'd have to make some changes in the code for deleting old
WAL segments --- right now it's not careful to delete them in order.
But surely that can be coped with.
OTOH, this might just move the locus for fatal failures out of
pg_control and into the OS' algorithms for writing directory updates.
We would have no cross-check that the set of WAL file names visible in
pg_xlog is sensible or aligned with the true state of the datafile area.
We'd have to take it on faith that we should replay the visible files
in their name order. This might mean we'd have to abandon the current
hack of recycling xlog segments by renaming them --- which would be a
nontrivial performance hit.
Comments anyone?
> If there exists somewhere a reasonably succinct description of the
> reasoning behind the current transaction management scheme (including
> an analysis of the pros and cons), I'd love to read it and quit
> bugging you. :-)
Not that I know of. Would you care to prepare such a writeup? There
is a lot of material in the source-code comments, but no coherent
presentation.
regards, tom lane