Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Recovery inconsistencies, standby much larger than primary
Date
Msg-id CAM-w4HOE7ZwMTWttt5rYeQxhURdR5sSm1JLQOv0S4qC8P6-7MQ@mail.gmail.com
Whole thread Raw
In response to Re: Recovery inconsistencies, standby much larger than primary  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Recovery inconsistencies, standby much larger than primary  (Andres Freund <andres@2ndquadrant.com>)
Re: Recovery inconsistencies, standby much larger than primary  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>
>> We're also seeing log entries about "wal contains reference to invalid
>> pages" but these errors seem only vaguely correlated. Sometimes we get
>> the errors but the tables don't grow noticeably and sometimes we don't
>> get the errors and the tables are much larger.
>
> Uhm. I am a bit confused. You see those in the standby's log? At !debug
> log levels? That'd imply that the standby is dead and needed to be
> recloned, no? How do you continue after that?


So in chatting with Heikki last night we came up with a scenario where
this check is insufficient.

If you have multiple checkpoints during the base backup then there
will be restartpoints during recovery. If the reference to the invalid
page is before the restartpont then after crashing recovery and coming
back up the recovery will go forward fine.

Fixing this check doesn't look trivial. I'm inclined to say to
suppress any restartpoints while there are references to invalid pages
in the hash. The problem with this is that it will prevent trimming
the xlog during recovery. It seems frightening that most days recovery
will take little extra space but if you happen to have a drop table or
truncate during the base backup then your recovery might require a lot
of extra space.

The alternative of spilling the hash table to disk at every
restartpoint seems kind of hokey. Then we need to worry about fsyncing
this file, cleaning it up, dealing with the file after crashes, etc.

-- 
greg



pgsql-hackers by date:

Previous
From: Asif Naeem
Date:
Subject: Re: [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder
Next
From: Andres Freund
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary