Backup

We work on several backup problems: block-level incremental backup, backup validation, partial backup/restore.

Block-level incremental backup

Block-level incremental backup appears to be good alternative to full backup and continuous WAL archiving, when following two restrictions are satisfied.

  1. Number of changed blocks is low in comparison to total number of blocks.
  2. Volume of WAL archive is much larger than volume of changes blocks. In particular, this means that the same blocks were changed.

The main problem is how to get the map of blocks changes since last backup. There are several  options for that:

  1. Do full scan of all blocks of database cluster to be backed up.
  2. Extract changed blocks from WAL.
  3. Make PostgreSQL maintain map of changed blocks (either bit or LSN per page).

Barman implements #1, but problem is high IO load on database cluster during backup.  We also implemented #2 for pg_arman, but this solution requires WAL archiving which is unwanted. Now, we gave up  non-invasive approaches and are working on patch for maintaining bitmap of changed blocks inside PostgreSQL.

Backup validation

Many users complain that once file-level backup was made it’s hard to check if it’s valid. There are a lot of things we can check in backed up database cluster. However, these checks should be fast enough to be suitable for running after each backup, but nevertheless protect from typical errors.

Partial backup and partial restore

Current restriction of file-level backup is that only full database cluster could be backed up. However, users want to backup only some of cluster databases as well as restore only some databases into existing cluster.  The second feature would require something invasive like rewriting all xids in heap of database to be restored.  But it still seems to be much cheaper than pg_dump/pg_restore.