Thread: Backup Question

Backup Question

From

Shaun Thomas

Date:

22 October 2013, 13:47:32

Hey everyone,

This should be pretty straight-forward, but figured I'd pass it by anyway.

I have a revised backup process that's coming out inconsistent, and I'm not entirely sure why. I call
pg_start_backup(),tar.gz the contents elsewhere, then pg_stop_backup(). Nothing crazy. Upon restore, two of my tables
reportduplicate IDs upon executing my redaction scripts. The "duplicate" records ended up having different ctid's,
suggestingthe log replay was incomplete. However, nothing in the restore logs suggest this is the case, and either way,
thedatabase wouldn't have come up if it were. (right?)

Now, the main difference, is that I'm doing the backup process on our streaming replication node. The backup process
callsthe pg_start_backup() function on the upstream provider, backs up the local content, then calls pg_stop_backup()
onthe upstream provider. In both cases, it captures the start/stop transaction log positions to grab all involved
archivedWAL files. I already know the start xlog position is insufficient, because those transaction logs may not have
replayedon the standby yet, so I also grab 3xcheckpoint_timeout extra older files (before backup start), just in case.

So, I get no complaints of missing or damaged archive log files. Yet the restore is invalid. I checked the upstream,
andthose duplicate rows are not present; it's clearly the backup that's at fault. I remember having this problem a
coupleyears ago, but I "fixed" it by working filesystem snapshots into the backup script. I can do that again, but it
seemslike overkill, honestly.

Why am I using my own backup system instead of pg_basebackup, or Barman or something? Because I use pigz for parallel
compressionand hard links to save space. I can back up a 800GB database in less than 20 minutes a night, or 45 minutes
fora non-incremental backup. Without disturbing the primary node. Like I said, I can enable filesystem snapshots to fix
this,but it feels like something more obvious is going on.

Any ideas?

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Backup Question

From

Albe Laurenz

Date:

22 October 2013, 15:11:32

Shaun Thomas wrote:

> I have a revised backup process that's coming out inconsistent, and I'm not entirely sure why. I call
> pg_start_backup(), tar.gz the contents elsewhere, then pg_stop_backup(). Nothing crazy. Upon restore,
> two of my tables report duplicate IDs upon executing my redaction scripts. The "duplicate" records
> ended up having different ctid's, suggesting the log replay was incomplete. However, nothing in the
> restore logs suggest this is the case, and either way, the database wouldn't have come up if it were.
> (right?)

Wrong.  The database cannot check all data for consistency
upon backup.  For one, that would take way too long.

> Now, the main difference, is that I'm doing the backup process on our streaming replication node. The
> backup process calls the pg_start_backup() function on the upstream provider, backs up the local
> content, then calls pg_stop_backup() on the upstream provider. In both cases, it captures the
> start/stop transaction log positions to grab all involved archived WAL files. I already know the start
> xlog position is insufficient, because those transaction logs may not have replayed on the standby
> yet, so I also grab 3xcheckpoint_timeout extra older files (before backup start), just in case.
> 
> So, I get no complaints of missing or damaged archive log files. Yet the restore is invalid. I checked
> the upstream, and those duplicate rows are not present; it's clearly the backup that's at fault. I
> remember having this problem a couple years ago, but I "fixed" it by working filesystem snapshots into
> the backup script. I can do that again, but it seems like overkill, honestly.

If you backup the standby, then you won't have a backup_label file.
You cannot restore a backup without that.

Moreover, recovery needs a checkpoint/restartpoint to start.
Restartpoints on the standby won't be the same as checkpoints
on the primary, so I believe that even with the backup_label
file you would not be able to restore the data.

I'm not sure about the second point, maybe somebody can confirm or refute that.

Yours,
Laurenz Albe

Re: Backup Question

From

Shaun Thomas

Date:

22 October 2013, 15:33:55

> Wrong.  The database cannot check all data for consistency
> upon backup.  For one, that would take way too long.

Well, what I meant, was that it would stop the database if it couldn't
apply one of the transaction logs for whatever reason. It wasn't
"inconsistent enough" for that. :)

> If you backup the standby, then you won't have a backup_label file.
> You cannot restore a backup without that.

Well, the backup_label gets copied to the archive log path when
pg_stop_backup gets called. So, I do have it. But beyond that, I have
the start/stop WAL locations, so I can get all the required files to apply,
which are all that is really necessary.

> Moreover, recovery needs a checkpoint/restartpoint to start.
> Restartpoints on the standby won't be the same as checkpoints
> on the primary, so I believe that even with the backup_label
> file you would not be able to restore the data.

I suppose I could build in a function to pause the backup until the
restartpoint replays on the replica. Then at least, the backup "starts"
on both systems with the same assumptions.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Backup Question

From

Jeff Janes

Date:

22 October 2013, 19:27:35

On Tue, Oct 22, 2013 at 6:47 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:

Hey everyone,

This should be pretty straight-forward, but figured I'd pass it by anyway.

I have a revised backup process that's coming out inconsistent, and I'm not entirely sure why. I call pg_start_backup(), tar.gz the contents elsewhere, then pg_stop_backup(). Nothing crazy. Upon restore, two of my tables report duplicate IDs upon executing my redaction scripts. The "duplicate" records ended up having different ctid's, suggesting the log replay was incomplete. However, nothing in the restore logs suggest this is the case, and either way, the database wouldn't have come up if it were. (right?)

Now, the main difference, is that I'm doing the backup process on our streaming replication node. The backup process calls the pg_start_backup() function on the upstream provider, backs up the local content, then calls pg_stop_backup() on the upstream provider. In both cases, it captures the start/stop transaction log positions to grab all involved archived WAL files. I already know the start xlog position is insufficient, because those transaction logs may not have replayed on the standby yet, so I also grab 3xcheckpoint_timeout extra older files (before backup start), just in case.

That won't work. The backup_label file is promising that WAL before a certain point has already been applied. Recovery rolls forward from this point, and has no ability to jump backwards and roll forward from there (or to detect that that would be needed). So you can grab the extra files, but you can't make it apply them, as you are telling it that it doesn't need to.

You have to wait until the start-of-backup checkpoint has replayed (as a savepoint) on the slave before you start copying the slave's contents.

See https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups

Cheers,

Jeff

Re: Backup Question

From

Shaun Thomas

Date:

22 October 2013, 20:10:33

> So you can grab the extra files, but you can't make it apply them,
> as you are telling it that it doesn't need to.

Do I have to, though? Replaying transaction logs is baked into the crash recovery system. If I interrupt it in the
middleof a checkpoint, it should be able to revert to the previous checkpoint that did succeed. By including the extra
WALfiles, it would re-apply them, just like in a crash recovery. 

Of course, that only works if I interrupt it by shutting the replica down. By backing up across a checkpoint, I run the
riskof a race condition where some files were backed up before the checkpoint, and others afterwards. Which raises the
question:isn't that risk the same with a regular backup? The database doesn't just stop checkpointing because a backup
isin progress. There must be some internal detail I'm missing. 

Either way, I'll add a routine to stall the standby backup until the restartpoint corresponding to the pg_start_backup
hasbeen replayed. I'll see if that helps. 

Thanks!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Backup Question

From

Albe Laurenz

Date:

23 October 2013, 07:54:40

Shaun Thomas wrote:
>> Wrong.  The database cannot check all data for consistency
>> upon backup.  For one, that would take way too long.
> 
> Well, what I meant, was that it would stop the database if it couldn't
> apply one of the transaction logs for whatever reason. It wasn't
> "inconsistent enough" for that. :)

It would not stop the database, it would stop applying the WALs
and run into an endless loop (contact primary, get told that it doesn't
have the WAL, try to read archive logs, fail, reiterate).

> > If you backup the standby, then you won't have a backup_label file.
> > You cannot restore a backup without that.
> 
> Well, the backup_label gets copied to the archive log path when
> pg_stop_backup gets called. So, I do have it. But beyond that, I have
> the start/stop WAL locations, so I can get all the required files to apply,
> which are all that is really necessary.
>
>> Moreover, recovery needs a checkpoint/restartpoint to start.
>> Restartpoints on the standby won't be the same as checkpoints
>> on the primary, so I believe that even with the backup_label
>> file you would not be able to restore the data.
> 
> I suppose I could build in a function to pause the backup until the
> restartpoint replays on the replica. Then at least, the backup "starts"
> on both systems with the same assumptions.

I'm not sure if checkpoints on primary and standby are synchronized.

Why don't you just shutdown the standby and take an offline
backup?  That could certainly be used for PITR.
After the backup, restart the standby and let it catch up.

Yours,
Laurenz Albe

Re: Backup Question

From

Jeff Janes

Date:

23 October 2013, 18:25:05

On Tue, Oct 22, 2013 at 1:10 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:

> So you can grab the extra files, but you can't make it apply them,
> as you are telling it that it doesn't need to.

Do I have to, though? Replaying transaction logs is baked into the crash recovery system. If I interrupt it in the middle of a checkpoint, it should be able to revert to the previous checkpoint that did succeed.

True, but it needs to know that it needs to do that.

By including the extra WAL files, it would re-apply them, just like in a crash recovery.

Of course, that only works if I interrupt it by shutting the replica down. By backing up across a checkpoint, I run the risk of a race condition where some files were backed up before the checkpoint, and others afterwards. Which raises the question: isn't that risk the same with a regular backup? The database doesn't just stop checkpointing because a backup is in progress.

The backup_label file records the checkpoint that occurred inside the pg_start_backup() call and is not updated with subsequent checkpoints. It acts as an alternative control file, forcing recovery to start out at that checkpoint rather than some later one which was completed and recorded into the real control file while the backup was underway.

This is one of the advantages of pg_basebackup: since it injects backup_label directly into the backup (where it is needed) without creating it on the master (where it is not needed, other than as a way to make sure it ends up in the backup), it means that if the master crashes during a backup, with pg_basebackup it will start recovery from the last eligible checkpoint, rather than starting from the pg_start_backup() checkpoint. Not only does using the earlier checkpoint cause extra work, it also runs the risk that some of the WAL needed to start from the earlier checkpoint have already been recycled, so it refuses to start until someone manually intervenes by deleting the backup_label file.

There must be some internal detail I'm missing.

Either way, I'll add a routine to stall the standby backup until the restartpoint corresponding to the pg_start_backup has been replayed. I'll see if that helps.

A possible alternative would be to fake a backup_label file which contains the pointer to the restartpoint that was known-good at the time the master was put into backup mode. If you have full_page_writes off, that would be a problem. There may be other problems with it that I'm unaware of, and it seems like running with scissors.

Cheers,

Jeff