On Thu, 2011-06-23 at 22:02 -0400, Robert Haas wrote:
> > 1. INSERT to a new page, marking it with LSN X
> > 2. WAL flushed to LSN Y (Y > X)
> > 2. Some persistent snapshot (that doesn't see the INSERT) is released,
> > and generates WAL recording that fact with LSN Z (Z > Y)
> > 3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
> > 4. page is written out because LSN is still X
> > 5. crash
> I don't really think that's a separate set of assumptions - if we had
> some system whereby snapshots could survive a crash, then they'd have
> to be WAL-logged (because that's how we make things survive crashes).
In the above example, it is WAL-logged (with LSN Z).
> And anything that is WAL-logged must obey the WAL-before-data rule.
> We have a system already that ensures that when
> synchronous_commit=off, CLOG pages can't be flushed before the
> corresponding WAL record makes it to disk.
In this case, how do you prevent the PD_ALL_VISIBLE from making it to
disk if you never bumped the LSN when it was set? It seems like you just
don't have the information to do so, and it seems like the information
required would be variable in size.
> I guess the point you are driving at here is that a page can only go
> from being all-visible to not-all-visible by virtue of being modified.
> There's no other piece of state (like a persistent snapshot) that can
> be lost as part of a crash that would make us need change our mind and
> decide that an all-visible XID is really not all-visible after all.
> (The reverse is not true: since snapshots are ephemeral, a crash will
> render every row either all-visible or dead.) I guess I never thought
> about documenting that particular aspect of it because (to me) it
> seems fairly self-evident. Maybe I'm wrong...
I didn't mean to make this conversation quite so hypothetical. My
primary points are:
1. Sometimes it makes sense to break the typical WAL conventions for
performance reasons. But when we do so, we have to be quite careful,
because things get complicated quickly.
2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
because the conditions under which it may be set are more complex
(having to do with both snapshots and cleanup actions). Other hint bits
are based only on transaction status: either the WAL for that
transaction completion got flushed (and is therefore permanent), and we
set the hint bit; or it didn't get flushed and we don't.
Just having this discussion has been good enough for me to get a better
idea what's going on, so if you think the comments are sufficient that's
OK with me.
Regards,Jeff Davis