Re: PITR COPY Failure (was Point in Time Recovery) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: PITR COPY Failure (was Point in Time Recovery) |
Date | |
Msg-id | 1090333179.28049.2905.camel@stromboli Whole thread Raw |
In response to | Re: PITR COPY Failure (was Point in Time Recovery) (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: PITR COPY Failure (was Point in Time Recovery)
Re: PITR COPY Failure (was Point in Time Recovery) |
List | pgsql-hackers |
On Tue, 2004-07-20 at 14:11, Simon Riggs wrote: > On Tue, 2004-07-20 at 13:51, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > >> The quick and dirty solution would be to dike out the safety check at > > >> 4268ff. > > > > > If you take out that check, we still fail because the wasted space at > > > the end is causing a "record with zero length" error. > > > > Ugh. I'm beginning to think we ought to revert the patch that added the > > don't-split-across-files logic to XLogInsert; that seems to have broken > > more assumptions than I realized. That was added here: > > > > 2004-02-11 17:55 tgl > > > > * src/: backend/access/transam/xact.c, > > backend/access/transam/xlog.c, backend/access/transam/xlogutils.c, > > backend/storage/smgr/md.c, backend/storage/smgr/smgr.c, > > bin/pg_controldata/pg_controldata.c, > > bin/pg_resetxlog/pg_resetxlog.c, include/access/xact.h, > > include/access/xlog.h, include/access/xlogutils.h, > > include/pg_config_manual.h, include/catalog/pg_control.h, > > include/storage/smgr.h: Commit the reasonably uncontroversial parts > > of J.R. Nield's PITR patch, to wit: Add a header record to each WAL > > segment file so that it can be reliably identified. Avoid > > splitting WAL records across segment files (this is not strictly > > necessary, but makes it simpler to incorporate the header records). > > Make WAL entries for file creation, deletion, and truncation (as > > foreseen but never implemented by Vadim). Also, add support for > > making XLOG_SEG_SIZE configurable at compile time, similarly to > > BLCKSZ. Fix a couple bugs I introduced in WAL replay during recent > > smgr API changes. initdb is forced due to changes in pg_control > > contents. > > > > There are other ways to do this, for example we could treat the WAL page > > headers as variable-size, and stick the file labeling info into the > > first page's header instead of making it be a separate record. The > > separate-record way makes it easier to incorporate future additions to > > the file labeling info, but I don't really think it's critical to allow > > for that. > > > > I think I've fixed it now...but wait 20 > > The problem was that a zero length XLOG_WASTED_SPACE record just fell > out of ReadRecord when it shouldn't have. By giving it a helping hand it > makes it through with pointers correctly set, and everything else was > already thought of in the earlier patch, so xlog_redo etc happens. > > I'll update again in a few minutes....no point us both looking at this. > This was a very confusing test...Here's what I think happened: Mark discovered a numerological coincidence that meant that the XLOG_WASTED_SPACE record was zero length at the end of EACH file he was writing to, as long as there was just that one writer. So no matter how many records were inserted, each xlog file had a zero length XLOG_WASTED_SPACE record at its end. ReadRecord failed on seeing a zero length record, i.e. when it got to the FIRST of the XLOG_WASTED_SPACE records. Thats why the test fails no matter how many records you give it, as long as it was more than enough to write into a second xlog segment file. By telling ReadRecord that XLOG_WASTED_SPACE records of zero length are in fact *OK*, it continues happily. (Thats just a partial fix, see later) The test works, but gives what looks like strange results: the test blows away the data directory completely, so the then-current xlog dies too. That contained the commit for the large COPY, so even though the recovery now works, the table has zero rows in it. (When things die you're still likely to lose *some* data). Anyway, so then we put the "concurrent transaction" test back in and the test passes because we have now set the pointers correctly. After all that, I think the wasted space idea is still sensible. You musn't have a continuation record across files, otherwise we'll end up with half a commit one-day, which would break ACID. I'm happy that we have the explicit test in XLogInsert for zero-length records. Somebody will one-day write a resource manager with zero length records when they didn't mean to and we need to catch that at write time, not at recovery time like Mark has done. The WasteXLInsertBuffer was the only part of the code that *can* write a zero-length record, so we will *not* see another recurrence of this situation --at recovery time--. Though further concerns along this theme are: - what happens when the space at the end of a file is so small we can't even write a zero-length XLOG_WASTED_SPACE record? Hopefully, you're gonna say "damn your eyes...couldnt you see that, its already there". - if the space at the end of a file was just zeros, then the "concurrent transaction test" would still fail....we probably need to enhance this to treat a few zeros at end of file AS IF it was an XLOG_WASTED_SPACE record an continue. (That scenario would happen if we were doing a recovery that included a local un-archived xlog that was very close to being full - probably more likely to occur in crash recovery than archive recovery) The included patch doesn't attempt to address those issues, yet. Best regards, Simon Riggs
Attachment
pgsql-hackers by date: