Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Backup with rsync fails at pg_clog if under load
Date
Msg-id CA+U5nM+pTpT6eWrHD57y-X_MqpicPMguJPXJVvcq1nG1Rid80Q@mail.gmail.com
Whole thread Raw
In response to Re: Hot Backup with rsync fails at pg_clog if under load  (Daniel Farina <daniel@heroku.com>)
List pgsql-hackers
On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <chris@replicon.com> wrote:
>> Well, on the other hand maybe there is something wrong with the data.
>>  Here's the test/steps I just did -
>> 1. I do the pg_basebackup when the master is under load, hot slave now will
>> not start up but warm slave will.
>> 2. I start a warm slave and let it catch up to current
>> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
>> restart'
>> 4. The postgres fails to restart with the same error.
>> 5. I turn hot_standby back off and postgres starts back up fine as a warm
>> slave
>> 6. I then turn off the load, the slave is all caught up, master and slave
>> are both sitting idle
>> 7. I, again, change 'hot_standby=on' and do a service restart
>> 8. Again it fails, with the same error, even though there is no longer any
>> load.
>> 9. I repeat this warmstart/hotstart cycle a couple more times until to my
>> surprise, instead of failing, it successfully starts up as a hot standby
>> (this is after maybe 5 minutes or so of sitting idle)
>> So...given that it continued to fail even after the load had been turned of,
>> that makes me believe that the data which was copied over was invalid in
>> some way.  And when a checkpoint/logrotation/somethingelse occurred when not
>> under load it cleared itself up....I'm shooting in the dark here
>> Anyone have any suggestions/ideas/things to try?
>
> Having digged at this a little -- but not too much -- the problem
> seems to be that postgres is reading the commit logs way, way too
> early, that is to say, before it has played enough WAL to be
> 'consistent' (the WAL between pg_start and pg_stop backup).  I have
> not been able to reproduce this problem (I think) after the message
> from postgres suggesting it has reached a consistent state; at that
> time I am able to go into hot-standby mode.
>
> The message is like: "consistent recovery state reached at %X/%X".
> (this is the errmsg)
>
> It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
> hot-standby path functionality) to be called before that code is
> executed, but it is anyway right now.  I'm not sure if this oversight
> is simply an oversight, or indicative of a misplaced assumption
> somewhere.  Basically, my thoughts for a fix are to suppress
> hot_standby = on (in spirit) before the consistent recovery state is
> reached.

Not sure about that, but I'll look at where this comes from.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: termination of backend waiting for sync rep generates a junk log message
Next
From: Jeff Janes
Date:
Subject: Index only scans and visibilitymap.c