Re: crash-safe visibility map, take five - Mailing list pgsql-hackers

From Robert Haas
Subject Re: crash-safe visibility map, take five
Date
Msg-id BANLkTimUw0-yGwmxc5NWd1rB3t8RZiZqUg@mail.gmail.com
Whole thread Raw
In response to Re: crash-safe visibility map, take five  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: crash-safe visibility map, take five
List pgsql-hackers
On Thu, Jun 23, 2011 at 6:40 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2011-06-23 at 18:18 -0400, Robert Haas wrote:
>> Lazy VACUUM is the only thing that makes a page all visible.  I don't
>> understand the part about snapshots.
>
> Lazy VACUUM is the only thing that _marks_ a page with PD_ALL_VISIBLE.
>
> After an INSERT to a new page, and after all snapshots are released, the
> page becomes all-visible; and thus subject to being marked with
> PD_ALL_VISIBLE by lazy vacuum without bumping the LSN. Note that there
> is no cleanup action that takes place here, so nothing else will bump
> the LSN either.
>
> So, let's say that we hypothetically had persistent snapshots, then
> you'd have the following problem:
>
> 1. INSERT to a new page, marking it with LSN X
> 2. WAL flushed to LSN Y (Y > X)
> 2. Some persistent snapshot (that doesn't see the INSERT) is released,
> and generates WAL recording that fact with LSN Z (Z > Y)
> 3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
> 4. page is written out because LSN is still X
> 5. crash
>
> Now, the persistent snapshot is still present because LSN Z never made
> it to disk; but the page is marked with PD_ALL_VISIBLE.
>
> Sure, if these hypothetical persistent snapshots were transactional, and
> if synchronous_commit is on, then LSN Z would be flushed before step 3;
> but that's another set of assumptions. That's why I left it simple and
> said that the assumption was "snapshots are released if there's a
> crash".

I don't really think that's a separate set of assumptions - if we had
some system whereby snapshots could survive a crash, then they'd have
to be WAL-logged (because that's how we make things survive crashes).
And anything that is WAL-logged must obey the WAL-before-data rule.
We have a system already that ensures that when
synchronous_commit=off, CLOG pages can't be flushed before the
corresponding WAL record makes it to disk.  For a system like what
you're describing, you'd need something similar - these
crash-surviving snapshots would have to make sure that no action which
depended on their state hit the disk before the WAL record marking the
state change hit the disk.

I guess the point you are driving at here is that a page can only go
from being all-visible to not-all-visible by virtue of being modified.There's no other piece of state (like a
persistentsnapshot) that can 
be lost as part of a crash that would make us need change our mind and
decide that an all-visible XID is really not all-visible after all.
(The reverse is not true: since snapshots are ephemeral, a crash will
render every row either all-visible or dead.)  I guess I never thought
about documenting that particular aspect of it because (to me) it
seems fairly self-evident.  Maybe I'm wrong...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: pg_upgrade defaulting to port 25432
Next
From: Brendan Jurd
Date:
Subject: Re: Fwd: Keywords in pg_hba.conf should be field-specific