collateral benefits of a crash-safe visibility map - Mailing list pgsql-hackers

From Robert Haas
Subject collateral benefits of a crash-safe visibility map
Date
Msg-id BANLkTinX1=kC6c9F78ceFHTNsOPZO_92Hg@mail.gmail.com
Whole thread Raw
Responses Re: collateral benefits of a crash-safe visibility map  (Simon Riggs <simon@2ndQuadrant.com>)
Re: collateral benefits of a crash-safe visibility map  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: collateral benefits of a crash-safe visibility map  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> no, that wasn't my intent at all, except in the sense of wondering if
> a crash-safe visibility map provides a route of displacing a lot of
> hint bit i/o and by extension, making alternative approaches of doing
> that, including mine, a lot less useful.  that's a good thing.

Sadly, I don't think it's going to have that effect.  The
page-is-all-visible bits seem to offer a significant performance
benefit over the xmin-committed hint bits; but the benefit of
xmin-committed all by itself is too much to ignore.  The advantages of
the xmin-committed hint bit (as opposed to the all-visible page-level
bit) are:

(1) Setting the xmin-committed hint bit is a much more light-weight
operation than setting the all-visible page-level bit.  It can by done
on-the-fly by any backend, rather than only by VACUUM, and need not be
XLOG'd.
(2) If there are long-running transactions on the system,
xmin-committed can be set much sooner than all-visible - the
transaction need only commit.  All-visible can't be set until
overlapping transactions have ended.
(3) xmin-committed is useful on standby servers, whereas all-visible
is ignored there.  (Note that neither this patch nor index-only scans
changes anything about that: it's existing behavior, necessitated by
different xmin horizons.)

So I think that attempts to minimize the overhead of setting the
xmin-committed bit are not likely to be mooted by anything I'm doing.
Keep up the good work.  :-)

Where I do think that we can possibly squeeze some additional benefit
out of a crash-safe visibility map is in regards to anti-wraparound
vacuuming.  The existing visibility map is used to skip vacuuming of
all-visible pages, but it's not used when XID wraparound is at issue.
The reason is fairly obvious: a regular vacuum only needs to worry
about getting rid of dead tuples (and a visibility map bit being set
is good evidence that there are none), but an anti-wraparound vacuum
also needs to worry about live tuples with xmins that are about to
wrap around from past to future (such tuples must be frozen).  There's
a second reason, too: the visibility map bit, not being crash-safe,
has a small chance of being wrong, and we'd like to eventually get rid
of any dead tuples that slip through the cracks.  Making the
visibility map crash-safe doesn't directly address the first problem,
but it does (if or when we're convinced that it's fairly bug-free)
address the second one.

To address the first problem, what we've talked about doing is
something along the line of freezing the tuples at the time we mark
the page all-visible, so we don't have to go back and do it again
later.  Unfortunately, it's not quite that simple, because freezing
tuples that early would cause all sorts of headaches for hot standby,
not to mention making Tom and Alvaro grumpy when they're trying to
figure out a corruption problem and all the xmins are FrozenXID rather
than whatever they were originally.  We floated the idea of a
tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat
the tuple as frozen, but wouldn't actually overwrite the xmin field.
That would solve the forensic problem with earlier freezing, but it
doesn't do anything to resolve the Hot Standby problem.  There is a
performance issue to worry about, too: freezing operations must be
xlog'd, as we update relfrozenxid based on the results, and therefore
can't risk losing a freezing operation later on.  So freezing sooner
means more xlog activity for pages that might very well never benefit
from it (if the tuples therein don't stick around long enough for it
to matter).

Nonetheless, I haven't completely given up hope.  The current
situation is that a big table into which new records are slowly being
inserted has to be repeatedly scanned in its entirety for unfrozen
tuples even though only a small and readily identifiable part of it
can actually contain any such tuples, which is clearly less than
ideal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Kohei Kaigai
Date:
Subject: [v9.2] Leaky view and RLS
Next
From: Bruce Momjian
Date:
Subject: Re: improvements to pgtune