Thread: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
To start at the end of this story: "DETAIL: Could not read from file "pg_clog/007D" at offset 65536: Success." This is a message we received on a a standby that we were bringing online as part of a test. The clog file was present, but apparently too small for Postgres (or at least I tihnk this is what the message meant), so one could stub in another clog file and then continue recovery successfully (modulus the voodoo of stubbing in clog files in general). I am unsure if this is due to an interesting race condition in Postgres or a result of my somewhat-interesting hot-backup protocol, which is slightly more involved than the norm. I will describe what it does here: 1) Call pg start backup 2) crawl the entire postgres cluster directory structure, except pg_xlog, taking notes of the size of every file present 3) begin writing TAR files, but *only up to the size noted during the original crawling of the cluster directory,* so if the file grows between the original snapshot and subsequently actually calling read() on the file those extra bytes will not be added to the TAR. 3a) If a file is truncated partially, I add "\0" bytes to padthe tarfile member up to the size sampled in step 2, as I am streaming the tar file and cannot go back in the stream and adjust the tarfile member size 4) call pg stop backup The reason I go to this trouble is because I use many completely disjoint tar files to do parallel compression, decompression, uploading, and downloading of the base backup of the database, and I want to be able to control the size of these files up-front. The requirement of stubbing in \0 is because of a limitation of the tar format when dealing with streaming archives and the requirement to truncate the files to the size snapshotted in the step 2 is to enable splitting up the files between volumes even in the presence of possible concurrent growth while I'm performing the hot backup. (ex: a handful of nearly-empty heap files can rapidly grow due to a concurrent bulk load if I get unlucky, which I do not intend to allow myself to be). Any ideas? Or does it sound like I'm making some bookkeeping errors and should review my code again? It does work most of the time. I have not gotten a sense how often this reproduces just yet. -- fdr
Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
From
Merlin Moncure
Date:
On Thu, Apr 21, 2011 at 6:15 AM, Daniel Farina <daniel@heroku.com> wrote: > To start at the end of this story: "DETAIL: Could not read from file > "pg_clog/007D" at offset 65536: Success." > > This is a message we received on a a standby that we were bringing > online as part of a test. The clog file was present, but apparently > too small for Postgres (or at least I tihnk this is what the message > meant), so one could stub in another clog file and then continue > recovery successfully (modulus the voodoo of stubbing in clog files in > general). I am unsure if this is due to an interesting race condition > in Postgres or a result of my somewhat-interesting hot-backup > protocol, which is slightly more involved than the norm. I will > describe what it does here: > > 1) Call pg start backup > 2) crawl the entire postgres cluster directory structure, except > pg_xlog, taking notes of the size of every file present > 3) begin writing TAR files, but *only up to the size noted during the > original crawling of the cluster directory,* so if the file grows > between the original snapshot and subsequently actually calling read() > on the file those extra bytes will not be added to the TAR. > 3a) If a file is truncated partially, I add "\0" bytes to pad the > tarfile member up to the size sampled in step 2, as I am streaming the > tar file and cannot go back in the stream and adjust the tarfile > member size > 4) call pg stop backup > > The reason I go to this trouble is because I use many completely > disjoint tar files to do parallel compression, decompression, > uploading, and downloading of the base backup of the database, and I > want to be able to control the size of these files up-front. The > requirement of stubbing in \0 is because of a limitation of the tar > format when dealing with streaming archives and the requirement to > truncate the files to the size snapshotted in the step 2 is to enable > splitting up the files between volumes even in the presence of > possible concurrent growth while I'm performing the hot backup. (ex: a > handful of nearly-empty heap files can rapidly grow due to a > concurrent bulk load if I get unlucky, which I do not intend to allow > myself to be). > > Any ideas? Or does it sound like I'm making some bookkeeping errors > and should review my code again? It does work most of the time. I > have not gotten a sense how often this reproduces just yet. Everyone here is going to assume the problem is in your (too?) fancy tar/diff delta archiving approach because we can't see that code and it just sounds suspicious. A busted clog file is of course very noteworthy but to eliminate your stuff you should try reproducing using a more standard method of grabbing the base backup. Have you considered using rsync instead? merlin
Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
From
Andres Freund
Date:
Hi, On Thursday, April 21, 2011 01:15:48 PM Daniel Farina wrote: > Any ideas? Or does it sound like I'm making some bookkeeping errors > and should review my code again? It does work most of the time. I > have not gotten a sense how often this reproduces just yet. I would suggest taking both, your backup, and a simpler version for now. When the error occurs again you can compare... Andres
On Thu, Apr 21, 2011 at 7:15 AM, Daniel Farina <daniel@heroku.com> wrote: > To start at the end of this story: "DETAIL: Could not read from file > "pg_clog/007D" at offset 65536: Success." > > This is a message we received on a a standby that we were bringing > online as part of a test. The clog file was present, but apparently > too small for Postgres (or at least I tihnk this is what the message > meant), so one could stub in another clog file and then continue > recovery successfully (modulus the voodoo of stubbing in clog files in > general). I am unsure if this is due to an interesting race condition > in Postgres or a result of my somewhat-interesting hot-backup > protocol, which is slightly more involved than the norm. I will > describe what it does here: > > 1) Call pg start backup > 2) crawl the entire postgres cluster directory structure, except > pg_xlog, taking notes of the size of every file present > 3) begin writing TAR files, but *only up to the size noted during the > original crawling of the cluster directory,* so if the file grows > between the original snapshot and subsequently actually calling read() > on the file those extra bytes will not be added to the TAR. > 3a) If a file is truncated partially, I add "\0" bytes to pad the > tarfile member up to the size sampled in step 2, as I am streaming the > tar file and cannot go back in the stream and adjust the tarfile > member size > 4) call pg stop backup In theory I would expect any defects introduced by the, ahem, exciting, procedure described in steps 3 and 3a to be corrected by recovery automatically when you start the new cluster. It shouldn't matter exactly when you read the file, and recovery for unrelated blocks ought to proceed totally independently, and an all-zeros block should be treated the same way as one that isn't allocated yet, so it seems like it ought to work. But you may be stressing some paths in the recovery code that don't get regular exercise, since the manner in which you are taking the backup can produce backups that are different from any backup that could be taken by the normal method, and those paths might have bugs. It's also possible, as others have said, that you've botched it. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
From
Daniel Farina
Date:
On Thu, Apr 21, 2011 at 8:19 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 21, 2011 at 7:15 AM, Daniel Farina <daniel@heroku.com> wrote: >> To start at the end of this story: "DETAIL: Could not read from file >> "pg_clog/007D" at offset 65536: Success." >> >> This is a message we received on a a standby that we were bringing >> online as part of a test. The clog file was present, but apparently >> too small for Postgres (or at least I tihnk this is what the message >> meant), so one could stub in another clog file and then continue >> recovery successfully (modulus the voodoo of stubbing in clog files in >> general). I am unsure if this is due to an interesting race condition >> in Postgres or a result of my somewhat-interesting hot-backup >> protocol, which is slightly more involved than the norm. I will >> describe what it does here: >> >> 1) Call pg start backup >> 2) crawl the entire postgres cluster directory structure, except >> pg_xlog, taking notes of the size of every file present >> 3) begin writing TAR files, but *only up to the size noted during the >> original crawling of the cluster directory,* so if the file grows >> between the original snapshot and subsequently actually calling read() >> on the file those extra bytes will not be added to the TAR. >> 3a) If a file is truncated partially, I add "\0" bytes to pad the >> tarfile member up to the size sampled in step 2, as I am streaming the >> tar file and cannot go back in the stream and adjust the tarfile >> member size >> 4) call pg stop backup > > In theory I would expect any defects introduced by the, ahem, > exciting, procedure described in steps 3 and 3a to be corrected by > recovery automatically when you start the new cluster. Neat. This is mostly what I was looking to get out of this thread, I will start looking for places where I have botched things. Although some of the frontend interface and some of the mechanism is embarrassingly rough for several reasons, the other thread posters can have access to the code if they wish: the code responsible for these shenangians can be found at https://github.com/heroku/wal-e (and https://github.com/fdr/wal-e), in the tar_partition.py file. (https://github.com/heroku/WAL-E/blob/master/wal_e/tar_partition.py) But I realize that's really too much detail for most people to be interested in, which is why I didn't post it in the first place. I think given your assessment I have enough to try to reproduce this case synthetically (I think taking a very old pg_clog snapshot, committing a few million xacts while not vacuuming, and then trying to merge the old clog otherwise newer base backup may prove out the mechanism I have in mind) or add some more robust logging so I can catch my (or any, really) problem. -- fdr