Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant
Date
Msg-id 1121876803.16393.62.camel@localhost.localdomain
Whole thread Raw
In response to Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)
List pgsql-patches
On Wed, 2005-07-20 at 09:24 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2005-07-19 at 22:24 -0400, Tom Lane wrote:
> >> Simon Riggs <simon@2ndquadrant.com> writes:
> >>> Short patch enclosed to turn off writing of commit-status hint bits.
> >>
> >> Doesn't this entirely destroy the ability to truncate clog, and
> >> therefore the ability to survive XID wraparound?
>
> > I hope not for all our sakes, since the hint bits are not WAL logged and
> > anything that relies upon them would be fragile.
>
> We don't rely on any one write of them to work, but that doesn't mean
> that we can indefinitely postpone writing them.

OK, I think I understand where you're coming from now.

A table can't be migrated to read-only media until all of its tuples
have an xmin of FrozenTransactionId; I said the following, which was
wrong:
"This should allow migration of older child tables to hierarchical
storage when using a large historical table design."
The patch doesn't directly contribute to that goal, though is of value
in a large historical table design with many read only child tables (and
other situations).

But that comment wasn't the only inspiration for the patch.

When VACUUM freezes the xid, it *does* make sense at that point to
update the hint bits as a performance optimization. That isn't required
though, and *can* be indefinitely postponed, AFAICS. All of the tqual
routines will still work just as well without the hint bits set. If you
know different, I'll need a deeper explanation before I understand.

IMHO, the direction the patch is going in is still worthwhile because of
these issues:

1. Any block read may attempt to set hint bits, which dirties the block
and must be written out. So *reads* can result in heavier write activity
at checkpoint time. That effects both OLTP and DW systems: Random read
transactions against a large table will be worst effected, since we may
end up writing the block once for each read.

2. A lazy vacuum may also dirty a block, even when it has done nothing
else useful to that block. Worse, if we vacuum a table that is bigger
than shared_buffers (or close), then we will end up having to evict
dirty buffers that the vacuum itself has written in order to continue
the vacuum. Since Vacuum is two-pass, we may end up writing a block
*twice*, once where we set the hint bits and then again later where we
remove the tuples and re-write. So this patch will allow a normal VACUUM
to perform better on larger tables.

(1) is a pain, but there's no point solving it without also solving (2).

The patch would fail an Assert test during a VACUUM, since the info bits
are not actually set if cache_txn_status_with_data = false during
VACUUM.

As a result of (2), perhaps we should remove all of the
SetBufferCommitInfoNeedsSave calls in HeapTupleSatisfiesVacuum, and add
a call to SetBufferCommitInfoNeedsSave that overrides
cache_txn_status_with_data when we actually freeze a row. (Or perhaps
that should be a VACUUM FAST command?) That way we would set the hint
bits *only* when we freeze a row and not at any other time.

If we further reduced the number of times we dirty the block *at all* on
the first pass of a VACUUM, we would reduce the chance of writing twice.
We could save the setting of frozen transactions until the second phase,
i.e. only dirty the block if (pgchanged && vacrelstats->num_dead_tuples
> prev_dead_count)

The patch sets cache_txn_status_with_data as a USERSET, with the
intention that particular read-only users would not wish to have their
read-only transactions turn into write transactions. There was no
intention to prevent VACUUM, not to avoid the optimisation of writing
hint bits on a VACUUM FREEZE nor to set full read only status - which we
discussed previously but is a much longer project.

Sorry for any confusion caused in my initial patch submission. Does my
longer explanation make sense of what the patch is trying to achieve.

Best Regards, Simon Riggs


pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)
Next
From: Tom Lane
Date:
Subject: Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)