Thread: collateral benefits of a crash-safe visibility map

collateral benefits of a crash-safe visibility map

From
Robert Haas
Date:
On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> no, that wasn't my intent at all, except in the sense of wondering if
> a crash-safe visibility map provides a route of displacing a lot of
> hint bit i/o and by extension, making alternative approaches of doing
> that, including mine, a lot less useful.  that's a good thing.

Sadly, I don't think it's going to have that effect.  The
page-is-all-visible bits seem to offer a significant performance
benefit over the xmin-committed hint bits; but the benefit of
xmin-committed all by itself is too much to ignore.  The advantages of
the xmin-committed hint bit (as opposed to the all-visible page-level
bit) are:

(1) Setting the xmin-committed hint bit is a much more light-weight
operation than setting the all-visible page-level bit.  It can by done
on-the-fly by any backend, rather than only by VACUUM, and need not be
XLOG'd.
(2) If there are long-running transactions on the system,
xmin-committed can be set much sooner than all-visible - the
transaction need only commit.  All-visible can't be set until
overlapping transactions have ended.
(3) xmin-committed is useful on standby servers, whereas all-visible
is ignored there.  (Note that neither this patch nor index-only scans
changes anything about that: it's existing behavior, necessitated by
different xmin horizons.)

So I think that attempts to minimize the overhead of setting the
xmin-committed bit are not likely to be mooted by anything I'm doing.
Keep up the good work.  :-)

Where I do think that we can possibly squeeze some additional benefit
out of a crash-safe visibility map is in regards to anti-wraparound
vacuuming.  The existing visibility map is used to skip vacuuming of
all-visible pages, but it's not used when XID wraparound is at issue.
The reason is fairly obvious: a regular vacuum only needs to worry
about getting rid of dead tuples (and a visibility map bit being set
is good evidence that there are none), but an anti-wraparound vacuum
also needs to worry about live tuples with xmins that are about to
wrap around from past to future (such tuples must be frozen).  There's
a second reason, too: the visibility map bit, not being crash-safe,
has a small chance of being wrong, and we'd like to eventually get rid
of any dead tuples that slip through the cracks.  Making the
visibility map crash-safe doesn't directly address the first problem,
but it does (if or when we're convinced that it's fairly bug-free)
address the second one.

To address the first problem, what we've talked about doing is
something along the line of freezing the tuples at the time we mark
the page all-visible, so we don't have to go back and do it again
later.  Unfortunately, it's not quite that simple, because freezing
tuples that early would cause all sorts of headaches for hot standby,
not to mention making Tom and Alvaro grumpy when they're trying to
figure out a corruption problem and all the xmins are FrozenXID rather
than whatever they were originally.  We floated the idea of a
tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat
the tuple as frozen, but wouldn't actually overwrite the xmin field.
That would solve the forensic problem with earlier freezing, but it
doesn't do anything to resolve the Hot Standby problem.  There is a
performance issue to worry about, too: freezing operations must be
xlog'd, as we update relfrozenxid based on the results, and therefore
can't risk losing a freezing operation later on.  So freezing sooner
means more xlog activity for pages that might very well never benefit
from it (if the tuples therein don't stick around long enough for it
to matter).

Nonetheless, I haven't completely given up hope.  The current
situation is that a big table into which new records are slowly being
inserted has to be repeatedly scanned in its entirety for unfrozen
tuples even though only a small and readily identifiable part of it
can actually contain any such tuples, which is clearly less than
ideal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: collateral benefits of a crash-safe visibility map

From
Simon Riggs
Date:
On Tue, May 10, 2011 at 3:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> To address the first problem, what we've talked about doing is
> something along the line of freezing the tuples at the time we mark
> the page all-visible, so we don't have to go back and do it again
> later.  Unfortunately, it's not quite that simple, because freezing
> tuples that early would cause all sorts of headaches for hot standby,
> not to mention making Tom and Alvaro grumpy when they're trying to
> figure out a corruption problem and all the xmins are FrozenXID rather
> than whatever they were originally.  We floated the idea of a
> tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat
> the tuple as frozen, but wouldn't actually overwrite the xmin field.
> That would solve the forensic problem with earlier freezing, but it
> doesn't do anything to resolve the Hot Standby problem.  There is a
> performance issue to worry about, too: freezing operations must be
> xlog'd, as we update relfrozenxid based on the results, and therefore
> can't risk losing a freezing operation later on.  So freezing sooner
> means more xlog activity for pages that might very well never benefit
> from it (if the tuples therein don't stick around long enough for it
> to matter).

Hmmm, do we really need to WAL log freezing?

Can we break down freezing into a 2 stage process, so that we can have
first stage as a lossy operation and a second stage that is WAL
logged?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: collateral benefits of a crash-safe visibility map

From
Robert Haas
Date:
On Tue, May 10, 2011 at 12:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Hmmm, do we really need to WAL log freezing?
>
> Can we break down freezing into a 2 stage process, so that we can have
> first stage as a lossy operation and a second stage that is WAL
> logged?

That might solve the relfrozenxid problem - set the bits in the heap,
sync the heap, then update relfrozenxid once the heap is guaranteed
safely on disk - but it again seems problematic for Hot Standby.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: collateral benefits of a crash-safe visibility map

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 10, 2011 at 12:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Hmmm, do we really need to WAL log freezing?

> That might solve the relfrozenxid problem - set the bits in the heap,
> sync the heap, then update relfrozenxid once the heap is guaranteed
> safely on disk - but it again seems problematic for Hot Standby.

... or even warm standby.  You basically *have to* WAL-log freezing
before you can truncate pg_clog.  The only freedom you have here is
freedom to mess with the policy about how soon you try to truncate
pg_clog.

(Doing an unlogged freeze operation first is right out, too, if it
causes the system to fail to perform/log the operation later.)
        regards, tom lane


Re: collateral benefits of a crash-safe visibility map

From
Heikki Linnakangas
Date:
On 10.05.2011 17:47, Robert Haas wrote:
> On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure<mmoncure@gmail.com>  wrote:
>> no, that wasn't my intent at all, except in the sense of wondering if
>> a crash-safe visibility map provides a route of displacing a lot of
>> hint bit i/o and by extension, making alternative approaches of doing
>> that, including mine, a lot less useful.  that's a good thing.
>
> Sadly, I don't think it's going to have that effect.  The
> page-is-all-visible bits seem to offer a significant performance
> benefit over the xmin-committed hint bits; but the benefit of
> xmin-committed all by itself is too much to ignore.  The advantages of
> the xmin-committed hint bit (as opposed to the all-visible page-level
> bit) are:
>
> (1) Setting the xmin-committed hint bit is a much more light-weight
> operation than setting the all-visible page-level bit.  It can by done
> on-the-fly by any backend, rather than only by VACUUM, and need not be
> XLOG'd.
> (2) If there are long-running transactions on the system,
> xmin-committed can be set much sooner than all-visible - the
> transaction need only commit.  All-visible can't be set until
> overlapping transactions have ended.
> (3) xmin-committed is useful on standby servers, whereas all-visible
> is ignored there.  (Note that neither this patch nor index-only scans
> changes anything about that: it's existing behavior, necessitated by
> different xmin horizons.)

(4) xmin-committed flag attached directly to the tuple provides some 
robustness in case of corruption, due to bad hw. Without the flag, a 
single bit flip in the clog could in the worst case render all of your 
bulk-loaded data invisible and vacuumable. Of course, corruption will 
always eat your data to some extent, but the hint bits provide some 
robustness. Hint bits are close to the data itself, not in another file 
like the clog, which can come handy at disaster recovery.

A flag in the heap page header isn't too different from a per-tuple hint 
bit from that point of view, it's still in the same page as the data 
itself. A bit in the clog or visibility map is not.

Not sure how much performance we're willing to sacrifice for that, but 
it's something to keep in mind.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: collateral benefits of a crash-safe visibility map

From
Merlin Moncure
Date:
On Tue, May 10, 2011 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> no, that wasn't my intent at all, except in the sense of wondering if
>> a crash-safe visibility map provides a route of displacing a lot of
>> hint bit i/o and by extension, making alternative approaches of doing
>> that, including mine, a lot less useful.  that's a good thing.
>
> Sadly, I don't think it's going to have that effect.  The
> page-is-all-visible bits seem to offer a significant performance
> benefit over the xmin-committed hint bits; but the benefit of
> xmin-committed all by itself is too much to ignore.  The advantages of
> the xmin-committed hint bit (as opposed to the all-visible page-level
> bit) are:
>
> (1) Setting the xmin-committed hint bit is a much more light-weight
> operation than setting the all-visible page-level bit.  It can by done
> on-the-fly by any backend, rather than only by VACUUM, and need not be
> XLOG'd.
> (2) If there are long-running transactions on the system,
> xmin-committed can be set much sooner than all-visible - the
> transaction need only commit.  All-visible can't be set until
> overlapping transactions have ended.
> (3) xmin-committed is useful on standby servers, whereas all-visible
> is ignored there.  (Note that neither this patch nor index-only scans
> changes anything about that: it's existing behavior, necessitated by
> different xmin horizons.)

right. #1 could maybe worked around somehow and #2 is perhaps
arguable, at least in some workloads, but #3 is admittedly a killer
especially since the bit is on the page.

I noted your earlier skepticism regarding moving the page visibility
check completely to the VM:
"In some ways, that would make things much simpler.  But to make that
work, every insert/update/delete to a page would have to pin the
visibility map page and clear PD_ALL_VISIBLE if appropriate, so it
might not be good from a performance standpoint, especially in
high-concurrency workloads.  Right now, if PD_ALL_VISIBLE isn't set,
we don't bother touching the visibility map page, which seems like a
possibly important optimization."

That's debatable, but probably moot.  Thanks for thinking that through though.

merlin


Re: collateral benefits of a crash-safe visibility map

From
Simon Riggs
Date:
On Tue, May 10, 2011 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 10, 2011 at 12:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Hmmm, do we really need to WAL log freezing?
>>
>> Can we break down freezing into a 2 stage process, so that we can have
>> first stage as a lossy operation and a second stage that is WAL
>> logged?
>
> That might solve the relfrozenxid problem - set the bits in the heap,
> sync the heap, then update relfrozenxid once the heap is guaranteed
> safely on disk - but it again seems problematic for Hot Standby.

How about we truncate the clog differently on each server? We could
have a special kind of VACUUM that runs during Hot Standby, setting
frozen hint bits only.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: collateral benefits of a crash-safe visibility map

From
Robert Haas
Date:
On Tue, May 10, 2011 at 1:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, May 10, 2011 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, May 10, 2011 at 12:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Hmmm, do we really need to WAL log freezing?
>>>
>>> Can we break down freezing into a 2 stage process, so that we can have
>>> first stage as a lossy operation and a second stage that is WAL
>>> logged?
>>
>> That might solve the relfrozenxid problem - set the bits in the heap,
>> sync the heap, then update relfrozenxid once the heap is guaranteed
>> safely on disk - but it again seems problematic for Hot Standby.
>
> How about we truncate the clog differently on each server? We could
> have a special kind of VACUUM that runs during Hot Standby, setting
> frozen hint bits only.

Interesting idea.  It does seem complicated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: collateral benefits of a crash-safe visibility map

From
Simon Riggs
Date:
On Tue, May 10, 2011 at 6:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, May 10, 2011 at 12:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Hmmm, do we really need to WAL log freezing?
>
>> That might solve the relfrozenxid problem - set the bits in the heap,
>> sync the heap, then update relfrozenxid once the heap is guaranteed
>> safely on disk - but it again seems problematic for Hot Standby.
>
> ... or even warm standby.  You basically *have to* WAL-log freezing
> before you can truncate pg_clog.  The only freedom you have here is
> freedom to mess with the policy about how soon you try to truncate
> pg_clog.
>
> (Doing an unlogged freeze operation first is right out, too, if it
> causes the system to fail to perform/log the operation later.)

Trying to think outside of the box from all these things we can't do.

Can we keep track of the relfrozenxid and then note when we fsync the
relevant file, then issue a single WAL record to indicate that? Still
WAL logging, but 1 record per table, not 1 record per tuple.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services