Re: crash-safe visibility map, take three - Mailing list pgsql-hackers

From Robert Haas
Subject Re: crash-safe visibility map, take three
Date
Msg-id AANLkTinjMjut9Xv2RV8W=8tUAsk+hCEj0QXNA1s=EdmE@mail.gmail.com
Whole thread Raw
In response to Re: crash-safe visibility map, take three  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Dec 1, 2010 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I think we can improve this a bit further by also introducing a
>> HEAP_XMIN_FROZEN bit that we set in lieu of overwriting XMIN with
>> FrozenXID.  This allows us to freeze tuples aggressively - if we want
>> - without losing any forensic information.
>
> So far so good ...
>
>> We can then modify the
>> above algorithm slightly, so that when we observe that a page is all
>> visible, we not only set PD_ALL_VISIBLE on the page but also
>> HEAP_XMIN_FROZEN on each tuple.  The WAL record marking the page as
>> all-visible then doubles as a WAL record marking it frozen,
>> eliminating the need to dirty the page yet again at anti-wraparound
>> vacuum time.
>
> but this seems a lot more dubious/fragile.  The basic problem is that
> it's not clear whether HEAP_XMIN_FROZEN is a hint bit or essential
> data.  If you want to set it without the overhead of an LSN bump or a
> possible FPI in WAL, then it's a hint bit.  But if you're using it to
> protect clog truncation then it's essential data.  Perhaps you can make
> this work but there are some nonobvious requirements:
>
> 1. Seeing PD_ALL_VISIBLE set does not excuse vacuum from having to
> iterate through all the tuples on the page checking for
> HEAP_XMIN_FROZEN.  This is because the non-logged update of the page
> might have been torn on the way to disk, such that PD_ALL_VISIBLE got
> set but not all of the FROZEN bits did.

Good point.  If we see the bit set in the visibility map set, it
should be safe to infer that the PD_ALL_VISIBLE bit and all
HEAP_XMIN_FROZEN bits are set.  But if the visibility map bit is NOT
set, we must check PD_ALL_VISIBLE and, whether it's set or not, each
individual HEAP_XMIN_FROZEN bit.

> 2. During an anti-wraparound vacuum, you *need to* emit a WAL record
> when setting HEAP_XMIN_FROZEN.  It's not a hint, any more than writing
> FrozenXID is now.
>
> Actually, #2 isn't even good enough.  What if vacuum passes over a page
> and finds all the FROZEN bits set, but the reason they're set is that
> somebody else updated them in hint fashion microseconds before?  It
> seems possible that those bits might not make it to disk before a
> subsequent crash.  The only way to be really sure those bits are set is
> to emit a WAL record that says to set them, whether or not they seem to
> be set already.  While the WAL record could be small, you'd need one for
> every page, making the argument that this saves I/O somewhat dubious.

I think that we would only ever allow HEAP_XMIN_FROZEN to be set as
part of a WAL-logged operation.  Either we are marking the page
all-visible  - in which case we're emitting the new WAL record type
XLOG_HEAP_ALLVISIBLE - or we're freezing individual tuples on a page
where very old and very new tuples are intermixed - in which case we
emit the existing XLOG_HEAP2_FREEZE.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: crash-safe visibility map, take three
Next
From: Tom Lane
Date:
Subject: Re: improving foreign key locks