Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers
From | Chris Redekop |
---|---|
Subject | Re: Hot Backup with rsync fails at pg_clog if under load |
Date | |
Msg-id | CAC2SuRJPNqPe7Ga8LT6Q-vOgn05BmiH=y6F_dfiVu5u3NfOs=w@mail.gmail.com Whole thread Raw |
In response to | Re: Hot Backup with rsync fails at pg_clog if under load (Chris Redekop <chris@replicon.com>) |
Responses |
Re: Hot Backup with rsync fails at pg_clog if under load
|
List | pgsql-hackers |
Well, on the other hand maybe there is something wrong with the data. Here's the test/steps I just did -
1. I do the pg_basebackup when the master is under load, hot slave now will not start up but warm slave will.
2. I start a warm slave and let it catch up to current
3. On the slave I change 'hot_standby=on' and do a 'service postgresql restart'
4. The postgres fails to restart with the same error.
5. I turn hot_standby back off and postgres starts back up fine as a warm slave
6. I then turn off the load, the slave is all caught up, master and slave are both sitting idle
7. I, again, change 'hot_standby=on' and do a service restart
8. Again it fails, with the same error, even though there is no longer any load.
9. I repeat this warmstart/hotstart cycle a couple more times until to my surprise, instead of failing, it successfully starts up as a hot standby (this is after maybe 5 minutes or so of sitting idle)
So...given that it continued to fail even after the load had been turned of, that makes me believe that the data which was copied over was invalid in some way. And when a checkpoint/logrotation/somethingelse occurred when not under load it cleared itself up....I'm shooting in the dark here
Anyone have any suggestions/ideas/things to try?
On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop <chris@replicon.com> wrote:
I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync. The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up.On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop <chris@replicon.com> wrote:> > Linas, could you capture the output of pg_controldata *and* increase the> > log level to DEBUG1 on the standby? We should then see nextXid value of> > the checkpoint the recovery is starting from.>> I'll try to do that whenever I'm in that territory again... Incidentally,> recently there was a lot of unrelated-to-this-post work to polish things up> for a talk being given at PGWest 2011 Today :)>> > I also checked what rsync does when a file vanishes after rsync computed the> > file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a special> > exit code for precisely that failure case.>> To be precise, my script has logic to accept the exit code 24, just as> stated in PG manual:>> Docs> For example, some versions of rsync return a separate exit code for> Docs> "vanished source files", and you can write a driver script to accept> Docs> this exit code as a non-error case.I also am running into this issue and can reproduce it very reliably. For me, however, it happens even when doing the "fast backup" like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it. Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)....the only thing that has changed is that I've changed the log level to debug1....I wouldn't think that could be related though. I've linked the requested pg_controldata and debug1 logs for both errors. Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt.pg_clog: http://pastebin.com/mTfdcjwHpg_subtrans: http://pastebin.com/qAXEHAQtAny workarounds would be very appreciated.....would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference?Thanks!
pgsql-hackers by date: