Thread: binary vs. txt dumps with pg
Hi all This is my first post to the pgsal-admin list. I am doing IT support for a small company, and we have been running PG for a few years now. I have recently upgraded our PG-8.0.3 server to 8.1.0. Everything went OK, and now I am working on the backup schema. I seems that since 8.1 pg_dump does not include a switch to dump blobs anymore. As a consequence, it is now also possible to dump db's with blobs in txt format. This is great, only the dumps become rather large because the blobs have to be escaped etc etc. Since I use rysnc to a remote host for backups, I have created a script hat dumps each db to a file, and then checks if the dump from yesterday is different. If not, the new dump is deleted and the old dump stays in place. This means a significant smaller backup, since most of our db's are pretty big but do not change that much. I have been fiddling with all kinds of techniques to accomplish this. The only way that seems to work is to make txt based dumps, and strip out all the comments and empty lines: pg_dump -Fp db_name | sed 's/^--//g' > db_name.txt.dump If no data has changed, doing this one day later will yield a dump that is 100% the same (as per md5sum). Since the txt dump is really big (in my case way over 2 Gb) I decided to compress it as well: pg_dump -Fp db_name | sed 's/^--//g' | gzip -n > db_name.txt.dump.gz This yields a file of 850 Mb, and the md5sums of unchanged db's are stil the same. When I try to dump to custom format, the dumps are *not* the same anymore, even though the data has *not* changed. Example: dumping a db that is not in use twice will yielf different files: $ pg_dump -Fc test > test1.dump $ pg_dump -Fc test > test2.dump $ md5sum test*.dump ec158ad86aabc7322c4b4dca58d6e4de test1.dump 9156ce40ae5f7c54c64382e078a2bb67 test2.dump I suspect it is because the -Fc format also contains comments etc. Is there a way to make -Fc dumps that do not contain any comments, timestamps or other stuff that makes consequent dumps of the same db different? This would be very helpfull. The txt dumps I use now are OK but are really big and slow to dump/restore. Best regards, -- * *** Dick Visser TIENHUIS Networking ** * * Touwbaan 68 P: +31206843731 * * *** 1018 HS Amsterdam F: +31208641420 * * * * The Netherlands M: +31622698108 * ** * IP-phone (SIP)/email: dick@tienhuis.nl * * * PGP-key: http://www.tienhuis.nl/gpg.txt * * * Webcam: http://www.tienhuis.nl/cam2.asx *** ***
Dick Visser <dick.visser@tienhuis.nl> writes: > Since I use rysnc to a remote host for backups, I have created a > script hat dumps each db to a file, and then checks if the dump from > yesterday is different. > I have been fiddling with all kinds of techniques to accomplish this. > The only way that seems to work is to make txt based dumps, and strip > out all the comments and empty lines: Have you retested this assumption recently? We got rid of the OIDs and stuff in the default comments, so I think that you should be able to just diff the text dump files without any weird processing. > When I try to dump to custom format, the dumps are *not* the same > anymore, even though the data has *not* changed. There's a timestamp in the custom format file header. regards, tom lane
Hi Tom > Have you retested this assumption recently? We got rid of the OIDs and > stuff in the default comments, so I think that you should be able to > just diff the text dump files without any weird processing. Hmmm you're right, they are the same now. I don't know what I did this week, but I was pretty sure the files were different. Nevermind then, as this saves time and effort. > There's a timestamp in the custom format file header. Ah OK that explains the different sums. Would it be an idea to make it a (default) configuration option for pg_dump to include a timestamp? You could work around it by disgarding the first X bytes when checksumming. Any idea what X would be? Best regards, -- * *** Dick Visser TIENHUIS Networking ** * * Touwbaan 68 P: +31206843731 * * *** 1018 HS Amsterdam F: +31208641420 * * * * The Netherlands M: +31622698108 * ** * IP-phone (SIP)/email: dick@tienhuis.nl * * * PGP-key: http://www.tienhuis.nl/gpg.txt * * * Webcam: http://www.tienhuis.nl/cam2.asx *** ***
On Sat, Dec 17, 2005 at 07:32:24PM +0100, Dick Visser wrote: > You could work around it by disgarding the first X bytes when > checksumming. Any idea what X would be? I believe that by default the custom format is gzip'd, which also means you probably couldn't exclude just the timestamp. Though if you're using rsync, I think there's a good chance it would only tranfer the first few hundred KB... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > I believe that by default the custom format is gzip'd, which also means > you probably couldn't exclude just the timestamp. Though if you're using > rsync, I think there's a good chance it would only tranfer the first few > hundred KB... Yes that is true. What I forgot to mention is that our backups are made with rsnapshot. This means that if files have changed, a complete copy of the file is backupped.... -- * *** Dick Visser TIENHUIS Networking ** * * Touwbaan 68 P: +31206843731 * * *** 1018 HS Amsterdam F: +31208641420 * * * * The Netherlands M: +31622698108 * ** * IP-phone (SIP)/email: dick@tienhuis.nl * * * PGP-key: http://www.tienhuis.nl/gpg.txt * * * Webcam: http://www.tienhuis.nl/cam2.asx *** ***