Re: Incremental backup - Mailing list pgsql-general
From | Mladen Gogala |
---|---|
Subject | Re: Incremental backup |
Date | |
Msg-id | 3d5f2552-c819-c423-cb37-e190f74ea37f@gmail.com Whole thread Raw |
In response to | Re: Incremental backup (Andreas Joseph Krogh <andreas@visena.com>) |
Responses |
Re: Incremental backup
|
List | pgsql-general |
I think everybody agrees that incremental backup per database, and not cluster-wide, is nice, and it would be nice if PG supported it. But, given the way PG is architectured, having cluster-wide WALs, that's not an easy task to implement.Repeating "other databases have it" doesn't change that.--
Andreas Joseph Krogh
I am not advocating for the database level incremental backups, but all databases that have it also have cluster wide/instance wide WAL logs. Cluster wide WAL logs do not make database level incremental backups hard. Both Oracle and DB2 have database level incremental backups and both have cluster wide WAL (redo or logs). So does SQL*Server which is a bit specific because its log is used for both recovery, as WAL, and undo so its very different from Postgres. Also, SQL Server is, to my knowledge, the only database which doesn't use MVCC.
The technology that all of those databases employ is a bitmap device which has one bit per each database block. Full backups set all bits to 0 and whenever block is modified, the corresponding bit is set to 1. The backup tool in the incremental mode then only copies blocks with the bitmap value of 1. I am not too thrilled by that implementation. In particular, with Oracle there were bugs with database restore, "duplicate database" operation, global cache locks and instance latches. I've had quite a few headaches with RMAN cumulative incremental backups. My preferred method of backup is storage snapshot. Snapshots then can be backed up to other arrays (NetApp SnapVault, Hitachi HUR, EMC SRDF) or can be backed up to deduplicated offline storage like AWS Glacier or EMC Data Domain using simple file level utilities. Once snapshot is taken, it is a read-only file system and the files are no longer opened by the database processes.
The classic file level backup tools like pg_basebackup or rman are pretty much unusable once your database hits 50 TB or so. With 50TB database, your RTO (Recovery Time Objective) will be in days. You can only push around 3TB/hour down a 10Gbit Ethernet pipe. On the other hand companies like Facebook. Amazon, Walmart or Target are losing hundreds of thousands USD per hour of downtime. Downtime of 15 hours or longer is completely unacceptable. Backup is only the last line of defense. It should only be used if everything else fails. And if you have to use file level tool like pg_basebackup and tar to restore your VLDB, you should also get your resume ready.
Incremental backups are awkward and ungainly. You have to restore the full backup and all incrementals taken since the last full. After that, you must apply the remaining WAL files. So, by definition, restoring incremental backup is, by definition, longer than restoring a full backup. So, if you have to take a backup, get yourself a deduplicating backup utility. Deduplication will only save the backup blocks that are different from the previous backup. In other words, you can do a daily full backup with the space expenditure of an incremental backup. Also, if you need backups because of regulatory reasons (HIPAA, SOX), incremental backups will not do.
Long story short, there are much more important things to do on Postgres than incremental database level backups. BTW, Chuck Norris can take and incremental database level backup of a Postgres database and recover database without restoring the preceding full backup. I am not sure whether Chuck Norris is a member of Postgres steering committee.
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
pgsql-general by date: