Proposal: Incremental Backup - Mailing list pgsql-hackers

From Marco Nenciarini
Subject Proposal: Incremental Backup
Date
Msg-id 53D2582E.3080105@2ndquadrant.it
Whole thread Raw
Responses Re: Proposal: Incremental Backup  (Michael Paquier <michael.paquier@gmail.com>)
Re: Proposal: Incremental Backup  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
0. Introduction:
=================================
This is a proposal for adding incremental backup support to streaming
protocol and hence to pg_basebackup command.

1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.

This way of operating has also some advantages over using rsync to take
a physical backup: It does not require the files from the previous
backup to be checksummed again, and they could even reside on some form
of long-term, not-directly-accessible storage, like a tape cartridge or
somewhere in the cloud (e.g. Amazon S3 or Amazon Glacier).

It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.

The final piece of this architecture is a new program called
pg_restorebackup which is able to operate on a "chain of incremental
backups", allowing the user to build an usable PGDATA from them or
executing maintenance operations like verify the checksums or estimate
the final size of recovered PGDATA.

We created a wiki page with all implementation details at
https://wiki.postgresql.org/wiki/Incremental_backup

2. Goals
=================================
The main goal of incremental backup is to reduce the size of the backup.
A secondary goal is to reduce backup time also.

3. Development plan
=================================
Our development plan proposal is articulated in four phases:

Phase 1: Add ‘PROFILE’ option to ‘BASE_BACKUP’
Phase 2: Add ‘INCREMENTAL’ option to ‘BASE_BACKUP’
Phase 3: Support of PROFILE and INCREMENTAL for pg_basebackup
Phase 4: pg_restorebackup

We are willing to get consensus over our design here before to start
implementing it.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: postgresql.auto.conf and reload
Next
From: Tom Lane
Date:
Subject: Re: Shapes on the regression test for polygon