Thread: Why do we have a WAL record for CLOG page extension?

Why do we have a WAL record for CLOG page extension?

From
Simon Riggs
Date:
Why do we have a WAL record for CLOG page extension? It seems to be
unnecessary, since other code already covers that failure situation.

We ignore heap extensions, on the assumption that the insertion will
automatically extend the relation when we recover. We also ignore
previously truncated clog pages, so the absence of a clog page at
recovery time is clearly not an issue. So why not avoid writing WAL when
we move to the next clog page and fix this at recovery time if it is an
issue?

Well, slru.c *already* includes code that specifically ignores errors
for set/get TransactionIds during recovery when pages are absent, plus
this is supposed to still work even if this is not the first page of a
clog segment. So, the WAL record seems like it is not required.

Patch removes the xlog record written at new page time. This ensures
that we don't need to wait for an I/O when holding the XidGenLock, which
can be a problem with hundreds of people requesting that lock,
especially since we may need to queue for one of the WAL locks first.

Leave the RMID for clog, in case required later.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Why do we have a WAL record for CLOG page extension?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Why do we have a WAL record for CLOG page extension?

Think of it as being a substitute for a full-page image, or perhaps the
init-bit in heap-insert records is the closest analogy.  Your patch is
unworkable because it provides no means for recovering from
corrupt-on-disk clog pages.

            regards, tom lane

Re: Why do we have a WAL record for CLOG page extension?

From
Simon Riggs
Date:
On Tue, 2006-06-06 at 10:59 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Why do we have a WAL record for CLOG page extension?
>
> Think of it as being a substitute for a full-page image, or perhaps the
> init-bit in heap-insert records is the closest analogy.  Your patch is
> unworkable because it provides no means for recovering from
> corrupt-on-disk clog pages.

The current system only helps recover from corrupt-on-disk pages that
have been newly written since last checkpoint (or maybe two checkpoints
ago). If the pages were initialised before that then we've simply lost
that data altogether anyhow. Seems like we're writing WAL for a fairly
rare situation, yet we still have some big issues.

Should we hold two copies of the clog for robustness? That way we can
just switch to the second copy if we have problems.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com