On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > I think we can get away with writing the LSN value to disk, as you
> > suggested, but only every so often. No need to do it after every WAL
> > record, just consistently every so often, so it gives us a point at
> > which we know we are safe.
>
> Huh? How does that make you safe? What you need to know is the max
> LSN that could possibly be on disk.
>
> Hmm, actually we could get away with tying this to fetching WAL files
> from the archive. When switching to a new WAL file, write out the
> *ending* WAL address of that file to pg_control. Then process the WAL
> records in it. Whether or not any of the affected pages get to disk,
> we know that there is no LSN on disk exceeding what we already put in
> pg_control. If we crash and restart, we'll have to get to the end
> of this file before we start letting backends in; which might be further
> than we actually got before the crash, but not too much further because
> we already know the whole WAL file is available.
>
> Or is that the same thing you were saying? The detail about using
> the end address seems fairly critical, and you didn't mention it...
Same! Just said safe point was "LSN + 1", and since end = next start.
Looks we've got a solution, no matter how it's described. (I actually
have a more detailed proof of safety using snapshots/MVCC considerations
so I wasn't overly worried but what we've discussed is much easier to
understand and agree. Proof of safety is all we need, and this simpler
proof is more secure.)
Don't want to make it per file though. Big systems can whizz through WAL
files very quickly, so we either make it a big number e.g. 255 files per
xlogid, or we make it settable (and recorded in pg_control).
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support