Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Hot Backup with rsync fails at pg_clog if under load
Date
Msg-id E75A2E54-8E6E-4ADE-B6D8-E1FEEDFEF3A6@phlo.org
Whole thread Raw
In response to Re: Hot Backup with rsync fails at pg_clog if under load  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Hot Backup with rsync fails at pg_clog if under load
List pgsql-hackers
On Sep23, 2011, at 21:10 , Robert Haas wrote:
> So the actual error message in the last test was:
>
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
>
> ...but we can't tell if that was before or after nextXid, which seems
> like it would be useful to know.
>
> If Linas can rerun his experiment, but also capture the output of
> pg_controldata before firing up the standby for the first time, then
> we'd able to see that information.

Hm, wouldn't pg_controldata quite certainly show a nextId beyond the clog
if copied after pg_clog/*?

Linas, could you capture the output of pg_controldata *and* increase the
log level to DEBUG1 on the standby? We should then see nextXid value of
the checkpoint the recovery is starting from.

FWIW, I've had a few more theories about what's going on, but none survived
after looking at the code. My first guess was that there maybe are circumstances
under which the nextId from the control file, instead of the one from the
pre-backup checkpoint, ends up becoming the standby's nextXid. But there doesn't
seem to be a way for that to happen.

My next theory was that something increments nextIdx before StartupCLOG().
The only possible candidate seems to be PrescanPreparedTransactions(), which
does increment nextXid if it's smaller than some sub-xid of a prepared xact.
But we only call that before StartupCLOG() if we're starting from a
shutdown checkpoint, which shouldn't be the case for the OP.

I also checked what rsync does when a file vanishes after rsync computed the
file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
exit code for precisely that failure case.

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: contrib/sepgsql regression tests are a no-go
Next
From: Florian Pflug
Date:
Subject: Re: [PATCH] Log crashed backend's query v2