Re: PITR COPY Failure (was Point in Time Recovery) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: PITR COPY Failure (was Point in Time Recovery)
Date
Msg-id 1090333179.28049.2905.camel@stromboli
Whole thread Raw
In response to Re: PITR COPY Failure (was Point in Time Recovery)  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: PITR COPY Failure (was Point in Time Recovery)
Re: PITR COPY Failure (was Point in Time Recovery)
List pgsql-hackers
On Tue, 2004-07-20 at 14:11, Simon Riggs wrote:
> On Tue, 2004-07-20 at 13:51, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > >> The quick and dirty solution would be to dike out the safety check at
> > >> 4268ff.
> >
> > > If you take out that check, we still fail because the wasted space at
> > > the end is causing a "record with zero length" error.
> >
> > Ugh.  I'm beginning to think we ought to revert the patch that added the
> > don't-split-across-files logic to XLogInsert; that seems to have broken
> > more assumptions than I realized.  That was added here:
> >
> > 2004-02-11 17:55  tgl
> >
> >     * src/: backend/access/transam/xact.c,
> >     backend/access/transam/xlog.c, backend/access/transam/xlogutils.c,
> >     backend/storage/smgr/md.c, backend/storage/smgr/smgr.c,
> >     bin/pg_controldata/pg_controldata.c,
> >     bin/pg_resetxlog/pg_resetxlog.c, include/access/xact.h,
> >     include/access/xlog.h, include/access/xlogutils.h,
> >     include/pg_config_manual.h, include/catalog/pg_control.h,
> >     include/storage/smgr.h: Commit the reasonably uncontroversial parts
> >     of J.R. Nield's PITR patch, to wit: Add a header record to each WAL
> >     segment file so that it can be reliably identified.  Avoid
> >     splitting WAL records across segment files (this is not strictly
> >     necessary, but makes it simpler to incorporate the header records).
> >      Make WAL entries for file creation, deletion, and truncation (as
> >     foreseen but never implemented by Vadim).  Also, add support for
> >     making XLOG_SEG_SIZE configurable at compile time, similarly to
> >     BLCKSZ.  Fix a couple bugs I introduced in WAL replay during recent
> >     smgr API changes.  initdb is forced due to changes in pg_control
> >     contents.
> >
> > There are other ways to do this, for example we could treat the WAL page
> > headers as variable-size, and stick the file labeling info into the
> > first page's header instead of making it be a separate record.  The
> > separate-record way makes it easier to incorporate future additions to
> > the file labeling info, but I don't really think it's critical to allow
> > for that.
> >
>
> I think I've fixed it now...but wait 20
>
> The problem was that a zero length XLOG_WASTED_SPACE record just fell
> out of ReadRecord when it shouldn't have. By giving it a helping hand it
> makes it through with pointers correctly set, and everything else was
> already thought of in the earlier patch, so xlog_redo etc happens.
>
> I'll update again in a few minutes....no point us both looking at this.
>

This was a very confusing test...Here's what I think happened:

Mark discovered a numerological coincidence that meant that the
XLOG_WASTED_SPACE record was zero length at the end of EACH file he was
writing to, as long as there was just that one writer. So no matter how
many records were inserted, each xlog file had a zero length
XLOG_WASTED_SPACE record at its end.

ReadRecord failed on seeing a zero length record, i.e. when it got to
the FIRST of the XLOG_WASTED_SPACE records. Thats why the test fails no
matter how many records you give it, as long as it was more than enough
to write into a second xlog segment file.

By telling ReadRecord that XLOG_WASTED_SPACE records of zero length are
in fact *OK*, it continues happily. (Thats just a partial fix, see
later)

The test works, but gives what looks like strange results: the test
blows away the data directory completely, so the then-current xlog dies
too. That contained the commit for the large COPY, so even though the
recovery now works, the table has zero rows in it. (When things die
you're still likely to lose *some* data).

Anyway, so then we put the "concurrent transaction" test back in and the
test passes because we have now set the pointers correctly.

After all that, I think the wasted space idea is still sensible. You
musn't have a continuation record across files, otherwise we'll end up
with half a commit one-day, which would break ACID.

I'm happy that we have the explicit test in XLogInsert for zero-length
records. Somebody will one-day write a resource manager with zero length
records when they didn't mean to and we need to catch that at write
time, not at recovery time like Mark has done. The WasteXLInsertBuffer
was the only part of the code that *can* write a zero-length record, so
we will *not* see another recurrence of this situation --at recovery
time--.

Though further concerns along this theme are:
- what happens when the space at the end of a file is so small we can't
even write a zero-length XLOG_WASTED_SPACE record? Hopefully, you're
gonna say "damn your eyes...couldnt you see that, its already there".
- if the space at the end of a file was just zeros, then the "concurrent
transaction test" would still fail....we probably need to enhance this
to treat a few zeros at end of file AS IF it was an XLOG_WASTED_SPACE
record an continue. (That scenario would happen if we were doing a
recovery that included a local un-archived xlog that was very close to
being full - probably more likely to occur in crash recovery than
archive recovery)

The included patch doesn't attempt to address those issues, yet.

Best regards, Simon Riggs


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PITR COPY Failure (was Point in Time Recovery)
Next
From: Simon Riggs
Date:
Subject: Re: PITR COPY Failure (was Point in Time Recovery)