Re: Recovery bug - Mailing list pgsql-bugs

From Jeff Davis
Subject Re: Recovery bug
Date
Msg-id 1289434809.22581.99.camel@jdavis-ux.asterdata.local
Whole thread Raw
In response to Re: Recovery bug  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Recovery bug
List pgsql-bugs
On Tue, 2010-10-26 at 10:48 +0300, Heikki Linnakangas wrote:
> > The reason I didn't use ReadRecord is because it sets a global variable
> > to point to the next location in the log, so that subsequent calls can
> > just pass NULL for the location.
>
> True. XLogPageRead is new in 9.0, however. We'll have to use ReadRecord
> or invent something new for back-branches anyway.
>
> > It looks like the patch leaves the global variable pointing just after
> > the redo location rather than the checkpoint. I haven't tested your
> > patch yet, but it looks like some of the following code depends on
> > ReadRecord(NULL,...) fetching the record right after the checkpoint
> > record; so I think something else is required if you want to use
> > ReadRecord.
>
> Hmm, the next call to ReadRecord is this:
>
> >         /*
> >          * Find the first record that logically follows the checkpoint --- it
> >          * might physically precede it, though.
> >          */
> >         if (XLByteLT(checkPoint.redo, RecPtr))
> >         {
> >             /* back up to find the record */
> >             record = ReadRecord(&(checkPoint.redo), PANIC, false);
> >         }
> >         else
> >         {
> >             /* just have to read next record after CheckPoint */
> >             record = ReadRecord(NULL, LOG, false);
> >         }
>
> In the first case, the location is given explicitly. In the second case,
> the redo pointer equals the checkpoint record, so the current position
> is correct even with the patch. It makes me slightly nervous, though.
> It's correct today, but if someone adds code between the backup_label
> check and this that assumes that the current position is the checkpoint
> record, it'll fail. Then again, any new ReadRecord call in such added
> code would also break the assumption in the above block that the current
> position is the checkpoint record.
>
> In the case that the redo pointer is the same as the checkpoint record,
> we don't need to re-fetch the checkpoint record. I've added a test for
> that in the attached patch.

There is a problem with this patch. ReadRecord() not only modifies
global variables, it also modifies the location pointed to by "record",
which is later used to set "wasShutdown". How about if we only set
"wasShutdown" if there is no backup_label (because the checkpoint for
pg_start_backup() will never be a shutdown checkpoint anyway)?

Trivial patch attached.

It's not easy to reproduce a real problem, but the easiest way that I
found is by creating a commit record at the REDO location. Put a sleep()
in CheckPointGuts() to give you time before the checkpoint completes.

Session1:
  CREATE TABLE foo(i int);
  BEGIN;
  INSERT INTO foo VALUES(1);

Session2:
  SELECT pg_start_backup('foo');

Session1 (before checkpoint for pg_start_backup() completes):
  COMMIT;

Then, perform backup, stop backup, shutdown the server. Then try to
restore the backup, and you'll get a PANIC.

This seems like a pretty serious issue to me (backups could appear
unrecoverable), so please consider this before the next patch-level
release so that the bad fix doesn't go out to the world. Also, you might
want to double-check that there aren't other side effects that we're
still missing.

Regards,
    Jeff Davis


Attachment

pgsql-bugs by date:

Previous
From: Chris Browne
Date:
Subject: Re: BUG #5747: TimeStamps too far into the future are invalid
Next
From: Devrim GÜNDÜZ
Date:
Subject: Re: BUG #5746: /etc/init.d/postgresql-9.0 status returns the wrong value