Thread: [PATCH] `pg_dump -Fd` doesn't check write return status...
The attached patch fixes the case when `pg_dump -Fd …` is called on a partition where write(2) fails for some reason or another.In this case, backup jobs were returning with a successful exit code even though most of the files in the dump directorywere all zero length. I haven’t tested this patch’s failure conditions but the fix seems simple enough: cfwrite() needs to have its return statuschecked everywhere and exit_horribly() upon any failure. In this case, callers of _WriteData() were not checking thereturn status and were discarding the negative return status (e.g. ENOSPC). I made a cursory pass over the code and found one other instance where write status wasn’t being checked and also includedthat. -sc -- Sean Chittenden sean@chittenden.org
Attachment
On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote: > The attached patch fixes the case when `pg_dump -Fd …` is called > on a partition where write(2) fails for some reason or another. In > this case, backup jobs were returning with a successful exit code even > though most of the files in the dump directory were all zero length. > > I haven’t tested this patch’s failure conditions but the fix seems > simple enough: cfwrite() needs to have its return status checked > everywhere and exit_horribly() upon any failure. In this case, > callers of _WriteData() were not checking the return status and were > discarding the negative return status (e.g. ENOSPC). > > I made a cursory pass over the code and found one other instance where > write status wasn’t being checked and also included that. As is often the case with pg_dump, the problems you saw are a small part of a larger set of problems in that code --- there is general ignoring of read and write errors. I have developed a comprehensive patch that addresses all the issues I could find. The use of function pointers and the calling of functions directly and through function pointers makes the fix quite complex. I ended up placing checks at the lowest level and removing checks at higher layers where they were sporadically placed. Patch attached. I have tested dump/restore of all four dump output formats with the 9.3 regression database and all tests passed. I believe this patch is too complex to backpatch, and I don't know how it could be fixed more simply. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote: > On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote: > > The attached patch fixes the case when `pg_dump -Fd …` is called > > on a partition where write(2) fails for some reason or another. In > > this case, backup jobs were returning with a successful exit code even > > though most of the files in the dump directory were all zero length. > > > > I haven’t tested this patch’s failure conditions but the fix seems > > simple enough: cfwrite() needs to have its return status checked > > everywhere and exit_horribly() upon any failure. In this case, > > callers of _WriteData() were not checking the return status and were > > discarding the negative return status (e.g. ENOSPC). > > > > I made a cursory pass over the code and found one other instance where > > write status wasn’t being checked and also included that. > > As is often the case with pg_dump, the problems you saw are a small part > of a larger set of problems in that code --- there is general ignoring of > read and write errors. > > I have developed a comprehensive patch that addresses all the issues I > could find. The use of function pointers and the calling of functions > directly and through function pointers makes the fix quite complex. I > ended up placing checks at the lowest level and removing checks at > higher layers where they were sporadically placed. > > Patch attached. I have tested dump/restore of all four dump output > formats with the 9.3 regression database and all tests passed. I > believe this patch is too complex to backpatch, and I don't know how it > could be fixed more simply. Patch applied. Thanks for the report. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, May 05, 2014 at 08:29:56PM -0400, Bruce Momjian wrote: > On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote: > > As is often the case with pg_dump, the problems you saw are a small part > > of a larger set of problems in that code --- there is general ignoring of > > read and write errors. > > > > I have developed a comprehensive patch that addresses all the issues I > > could find. The use of function pointers and the calling of functions > > directly and through function pointers makes the fix quite complex. I > > ended up placing checks at the lowest level and removing checks at > > higher layers where they were sporadically placed. > > > > Patch attached. I have tested dump/restore of all four dump output > > formats with the 9.3 regression database and all tests passed. I > > believe this patch is too complex to backpatch, and I don't know how it > > could be fixed more simply. > > Patch applied. Thanks for the report. This broke automatic detection of tar-format dumps: $ pg_dump -Ft -f tardump $ pg_restore tardump pg_restore: [archiver] could not read from input file: Success $ pg_restore -Ft tardump | wc -c 736 Stack trace: #0 vwrite_msg (modulename=0x4166c9 "archiver", fmt=0x41aa08 "could not read from input file: %s\n", ap=0x7fffffffde38) atpg_backup_utils.c:85 #1 0x000000000040f820 in exit_horribly (modulename=0x4166c9 "archiver", fmt=0x41aa08 "could not read from input file: %s\n")at parallel.c:190 #2 0x000000000040557d in _discoverArchiveFormat (AH=0x425340) at pg_backup_archiver.c:2040 #3 0x0000000000405756 in _allocAH (FileSpec=0x7fffffffecbb "tardump", fmt=archUnknown, compression=0, mode=archModeRead,setupWorkerPtr=0x4044f0 <setupRestoreWorker>) at pg_backup_archiver.c:2160 #4 0x0000000000405819 in OpenArchive (FileSpec=<optimized out>, fmt=<optimized out>) at pg_backup_archiver.c:148 #5 0x0000000000404331 in main (argc=2, argv=0x7fffffffea28) at pg_restore.c:375 -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Tue, May 6, 2014 at 09:22:20AM -0400, Noah Misch wrote: > On Mon, May 05, 2014 at 08:29:56PM -0400, Bruce Momjian wrote: > > On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote: > > > As is often the case with pg_dump, the problems you saw are a small part > > > of a larger set of problems in that code --- there is general ignoring of > > > read and write errors. > > > > > > I have developed a comprehensive patch that addresses all the issues I > > > could find. The use of function pointers and the calling of functions > > > directly and through function pointers makes the fix quite complex. I > > > ended up placing checks at the lowest level and removing checks at > > > higher layers where they were sporadically placed. > > > > > > Patch attached. I have tested dump/restore of all four dump output > > > formats with the 9.3 regression database and all tests passed. I > > > believe this patch is too complex to backpatch, and I don't know how it > > > could be fixed more simply. > > > > Patch applied. Thanks for the report. > > This broke automatic detection of tar-format dumps: > > $ pg_dump -Ft -f tardump > $ pg_restore tardump > pg_restore: [archiver] could not read from input file: Success > $ pg_restore -Ft tardump | wc -c > 736 OK, thanks for the report. Fixed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +