Thread: pg_restore and large files
Hello, I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel) using the ext3 filesystem. I am experiencing problems when performing a pg_restore using a file which is 2.3G in size. The dump, which seemed to run smoothly, was created using the -Fc option. When I perform the restore, the following error occurs before the pg_restore fails: pg_restore: [custom archiver] error during file seek: Invalid argument pg_restore: *** aborted because of error Why is this happening? The error comes from pg_backup_custom.c, it seems that an fseeko() is failing (even though this is the way to support large files). It is my understanding that ext3 supports file sizes up to 1T. The restore worked fine when the database was smaller. Any ideas? Thanks, Mike Charnoky
Mike Charnoky <noky@nextbus.com> writes: > I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel) > using the ext3 filesystem. I am experiencing problems when performing a > pg_restore using a file which is 2.3G in size. The dump, which seemed to run > smoothly, was created using the -Fc option. When I perform the restore, the > following error occurs before the pg_restore fails: > pg_restore: [custom archiver] error during file seek: Invalid argument > pg_restore: *** aborted because of error > Why is this happening? The error comes from pg_backup_custom.c, it seems that > an fseeko() is failing (even though this is the way to support large > files). Hm, can you insert some debugging printout to show the offset value being passed to fseeko? That would let us eliminate one of pg_restore and the kernel as being at fault. Another thing that'd be useful is to run pg_restore under gdb with a breakpoint set at die_horribly, so that you could get a stack trace from the point of the failure. I am suspicious that it's a pg_restore bug and the problem has to do with manipulating file offsets as plain integers someplace. Not enough info yet to go searching, though. regards, tom lane
OK, I ran with gdb and here's what I got: Breakpoint 1, _PrintTocData (AH=0x8058fc0, te=0x8086548, ropt=0x8058ee8) at pg_backup_custom.c:471 471 if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0) (gdb) backtrace #0 _PrintTocData (AH=0x8058fc0, te=0x8086548, ropt=0x8058ee8) at pg_backup_custom.c:471 #1 0x0804a98b in RestoreArchive (AHX=0x8058fc0, ropt=0x8058ee8) at pg_backup_archiver.c:336 #2 0x0804a03e in main (argc=10, argv=0xbffff924) at pg_restore.c:366 #3 0x42015967 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) display tctx->dataPos 2: tctx->dataPos = 1785996817 BTW, the file size is: 2361910772 bytes Mike Charnoky Tom Lane wrote: > Mike Charnoky <noky@nextbus.com> writes: > >>I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel) >>using the ext3 filesystem. I am experiencing problems when performing a >>pg_restore using a file which is 2.3G in size. The dump, which seemed to run >>smoothly, was created using the -Fc option. When I perform the restore, the >>following error occurs before the pg_restore fails: > > >>pg_restore: [custom archiver] error during file seek: Invalid argument >>pg_restore: *** aborted because of error > > >>Why is this happening? The error comes from pg_backup_custom.c, it seems that >>an fseeko() is failing (even though this is the way to support large >>files). > > > Hm, can you insert some debugging printout to show the offset value > being passed to fseeko? That would let us eliminate one of pg_restore > and the kernel as being at fault. Another thing that'd be useful is to > run pg_restore under gdb with a breakpoint set at die_horribly, so that > you could get a stack trace from the point of the failure. > > I am suspicious that it's a pg_restore bug and the problem has to do > with manipulating file offsets as plain integers someplace. Not enough > info yet to go searching, though. > > regards, tom lane >
Whoops, forget that last post. Here's the real data from gdb at the point prior to failure of pg_restore: Breakpoint 1, _PrintTocData (AH=0x8058fc0, te=0x80867e0, ropt=0x8058ee8) at pg_backup_custom.c:471 471 if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0) 1: tctx->dataPos = -1927033749 (gdb) backtrace #0 _PrintTocData (AH=0x8058fc0, te=0x80867e0, ropt=0x8058ee8) at pg_backup_custom.c:471 #1 0x0804a98b in RestoreArchive (AHX=0x8058fc0, ropt=0x8058ee8) at pg_backup_archiver.c:336 #2 0x0804a03e in main (argc=10, argv=0xbffff924) at pg_restore.c:366 #3 0x42015967 in __libc_start_main () from /lib/i686/libc.so.6 Hope this is helpful. BTW, the dump file size is 2361910772 bytes. The last valid dataPos before the crash was at 1785996918 (and the table being restored was pretty big). So, it does look like a pg_restore bug and that dataPos is being treated as an integer somewhere. Mike Charnoky Tom Lane wrote: > Mike Charnoky <noky@nextbus.com> writes: > >>I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel) >>using the ext3 filesystem. I am experiencing problems when performing a >>pg_restore using a file which is 2.3G in size. The dump, which seemed to run >>smoothly, was created using the -Fc option. When I perform the restore, the >>following error occurs before the pg_restore fails: > > >>pg_restore: [custom archiver] error during file seek: Invalid argument >>pg_restore: *** aborted because of error > > >>Why is this happening? The error comes from pg_backup_custom.c, it seems that >>an fseeko() is failing (even though this is the way to support large >>files). > > > Hm, can you insert some debugging printout to show the offset value > being passed to fseeko? That would let us eliminate one of pg_restore > and the kernel as being at fault. Another thing that'd be useful is to > run pg_restore under gdb with a breakpoint set at die_horribly, so that > you could get a stack trace from the point of the failure. > > I am suspicious that it's a pg_restore bug and the problem has to do > with manipulating file offsets as plain integers someplace. Not enough > info yet to go searching, though. > > regards, tom lane >
Mike Charnoky <noky@nextbus.com> writes: > So, it does look like a pg_restore bug and that dataPos is > being treated as an integer somewhere. After digging in the CVS log I bet this is the same bug just noted a month ago: 2004-01-03 23:02 tgl * src/bin/pg_dump/: pg_backup_archiver.c (REL7_4_STABLE), pg_backup_archiver.c: Fix ReadOffset() to work correctly when off_t is wider than int. It looks like the same patch applies to 7.3, modulo slightly different line number. Please try it and let us know if it fixes the problem. regards, tom lane =================================================================== RCS file: /cvsroot//pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v retrieving revision 1.79 retrieving revision 1.79.2.1 diff -c -r1.79 -r1.79.2.1 *** pgsql-server/src/bin/pg_dump/pg_backup_archiver.c 2003/10/20 21:05:11 1.79 --- pgsql-server/src/bin/pg_dump/pg_backup_archiver.c 2004/01/04 04:02:22 1.79.2.1 *************** *** 1425,1431 **** for (off = 0; off < AH->offSize; off++) { if (off < sizeof(off_t)) ! *o |= ((*AH->ReadBytePtr) (AH)) << (off * 8); else { if ((*AH->ReadBytePtr) (AH) != 0) --- 1425,1431 ---- for (off = 0; off < AH->offSize; off++) { if (off < sizeof(off_t)) ! *o |= ((off_t) ((*AH->ReadBytePtr) (AH))) << (off * 8); else { if ((*AH->ReadBytePtr) (AH) != 0)
Excellent! This indeed solves the problem. I'll recompile pg_dump as well, just to be on the safe side. Thank you so much Tom for your quick response. Mike Charnoky Tom Lane wrote: > Mike Charnoky <noky@nextbus.com> writes: > >>So, it does look like a pg_restore bug and that dataPos is >>being treated as an integer somewhere. > > > After digging in the CVS log I bet this is the same bug just noted a > month ago: > > 2004-01-03 23:02 tgl > > * src/bin/pg_dump/: pg_backup_archiver.c (REL7_4_STABLE), > pg_backup_archiver.c: Fix ReadOffset() to work correctly when off_t > is wider than int. > > It looks like the same patch applies to 7.3, modulo slightly different > line number. Please try it and let us know if it fixes the problem. > > regards, tom lane > > =================================================================== > RCS file: /cvsroot//pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v > retrieving revision 1.79 > retrieving revision 1.79.2.1 > diff -c -r1.79 -r1.79.2.1 > *** pgsql-server/src/bin/pg_dump/pg_backup_archiver.c 2003/10/20 21:05:11 1.79 > --- pgsql-server/src/bin/pg_dump/pg_backup_archiver.c 2004/01/04 04:02:22 1.79.2.1 > *************** > *** 1425,1431 **** > for (off = 0; off < AH->offSize; off++) > { > if (off < sizeof(off_t)) > ! *o |= ((*AH->ReadBytePtr) (AH)) << (off * 8); > else > { > if ((*AH->ReadBytePtr) (AH) != 0) > --- 1425,1431 ---- > for (off = 0; off < AH->offSize; off++) > { > if (off < sizeof(off_t)) > ! *o |= ((off_t) ((*AH->ReadBytePtr) (AH))) << (off * 8); > else > { > if ((*AH->ReadBytePtr) (AH) != 0) > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Mike Charnoky <noky@nextbus.com> writes: > Excellent! This indeed solves the problem. I'll recompile pg_dump as well, > just to be on the safe side. Okay. I've installed the patch into the REL7_3_STABLE branch as well, just in case we end up making a 7.3.6 release ... regards, tom lane