Re: Postgres backup tool recommendations for multi-terabyte databasein Google Cloud - Mailing list pgsql-performance

From Craig Jackson
Subject Re: Postgres backup tool recommendations for multi-terabyte databasein Google Cloud
Date
Msg-id CA+R1LV7uhXHSYgBM0avnW8y2sYqXzMgnyBHB55ww4GD_d4Yg6w@mail.gmail.com
Whole thread Raw
In response to Re: Postgres backup tool recommendations for multi-terabyte databasein Google Cloud  (Craig James <cjames@emolecules.com>)
List pgsql-performance
Thanks, I'll check it out. 

On Thu, Dec 5, 2019 at 12:51 PM Craig James <cjames@emolecules.com> wrote:
On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson <craig.jackson@broadcom.com> wrote:
Hi,

We are in the process of migrating an oracle database to postgres in Google Cloud and are investigating backup/recovery tools. The database is size is > 20TB. We have an SLA that requires us to be able to complete a full restore of the database within 24 hours. We have been testing pgbackreset, barman, and GCP snapshots but wanted to see if there are any other recommendations we should consider. 

Desirable features
- Parallel backup/recovery
- Incremental backups
- Backup directly to a GCP bucket
- Deduplication/Compression

For your 24-hour-restore requirement, there's an additional feature you might consider: incremental restore, or what you might call "recovery in place"; that is, the ability to keep a more-or-less up-to-date copy, and then in an emergency only restore the diffs on the file system. pgbackup uses a built-in rsync-like feature, plus a client-server architecture, that allows it to quickly determine which disk blocks need to be updated. Checksums are computed on each side, and data are only transferred if checksums differ. It's very efficient. I assume that a 20 TB database is mostly static, with only a small fraction of the data updated in any month. I believe the checksums are precomputed and stored in the pgbackrest repository, so you can even do this from an Amazon S3 (or whatever Google's Cloud equivalent is for low-cost storage) backup with just modest bandwidth usage.

In a cloud environment, you can do this on modestly-priced hardware (a few CPUs, modest memory). In the event of a failover, unmount your backup disk, spin up a big server, mount the database, do the incremental restore, and you're in business.

Craig (James)


Any suggestions would be appreciated.

Craig Jackson




--
Craig 

pgsql-performance by date:

Previous
From: Craig James
Date:
Subject: Re: Postgres backup tool recommendations for multi-terabyte databasein Google Cloud
Next
From: Nikolay Samokhvalov
Date:
Subject: Re: Postgres backup tool recommendations for multi-terabyte databasein Google Cloud