Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Hot Backup with rsync fails at pg_clog if under load
Date
Msg-id 12E895D5-C6D0-4E71-B9C3-85E1BC14E6B8@phlo.org
Whole thread Raw
In response to Re: Hot Backup with rsync fails at pg_clog if under load  (Florian Pflug <fgp@phlo.org>)
Responses Re: Hot Backup with rsync fails at pg_clog if under load  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Oct26, 2011, at 15:57 , Florian Pflug wrote:
> As you said, the CLOG page corresponding to nextId
> *should* always be accessible at the start of recovery (Unless whole file
> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
> Yet the error suggest that the CLOG is, in fact, too short. What I said
> is that we shouldn't apply any fix (for the CLOG problem) before we understand
> the reason for that apparent contradiction.

Ha! I think I've got a working theory.

In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
record, and then call CheckPointGuts() which does the actual writing and fsyncing.
So far, that fine. If a transaction ID is assigned before we compute the
checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will
make sure the new CLOG page goes to disk.

But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
fine too, except that LogStandbySnapshot() re-assigned the *current* value of
ShmemVariableCache->nextXid to the checkpoint's nextXid field.

Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
takes to finish it's work the more likely it becomes (assuming that CLOG writing
and syncing doesn't happen at the very end). This fits the OP's observation ob the
problem vanishing when pg_start_backup() does an immediate checkpoint.

I dunno how to this fix, though, since I don't really understand why
LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there some
documentation on what assumptions the hot standby code makes about the various XID
fields included in a checkpoint?

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Next
From: Dimitri Fontaine
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server