Thread: pg_restore and large files

pg_restore and large files

From
Mike Charnoky
Date:
Hello,

I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel)
using the ext3 filesystem.  I am experiencing problems when performing a
pg_restore using a file which is 2.3G in size.  The dump, which seemed to run
smoothly, was created using the -Fc option.  When I perform the restore, the
following error occurs before the pg_restore fails:

pg_restore: [custom archiver] error during file seek: Invalid argument
pg_restore: *** aborted because of error

Why is this happening?  The error comes from pg_backup_custom.c, it seems that
an fseeko() is failing (even though this is the way to support large files).  It
is my understanding that ext3 supports file sizes up to 1T.  The restore worked
fine when the database was smaller.  Any ideas?

Thanks,

Mike Charnoky


Re: pg_restore and large files

From
Tom Lane
Date:
Mike Charnoky <noky@nextbus.com> writes:
> I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel)
> using the ext3 filesystem.  I am experiencing problems when performing a
> pg_restore using a file which is 2.3G in size.  The dump, which seemed to run
> smoothly, was created using the -Fc option.  When I perform the restore, the
> following error occurs before the pg_restore fails:

> pg_restore: [custom archiver] error during file seek: Invalid argument
> pg_restore: *** aborted because of error

> Why is this happening?  The error comes from pg_backup_custom.c, it seems that
> an fseeko() is failing (even though this is the way to support large
> files).

Hm, can you insert some debugging printout to show the offset value
being passed to fseeko?  That would let us eliminate one of pg_restore
and the kernel as being at fault.  Another thing that'd be useful is to
run pg_restore under gdb with a breakpoint set at die_horribly, so that
you could get a stack trace from the point of the failure.

I am suspicious that it's a pg_restore bug and the problem has to do
with manipulating file offsets as plain integers someplace.  Not enough
info yet to go searching, though.

            regards, tom lane

Re: pg_restore and large files

From
Mike Charnoky
Date:
OK, I ran with gdb and here's what I got:

Breakpoint 1, _PrintTocData (AH=0x8058fc0, te=0x8086548, ropt=0x8058ee8)
     at pg_backup_custom.c:471
471                     if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)
(gdb) backtrace
#0  _PrintTocData (AH=0x8058fc0, te=0x8086548, ropt=0x8058ee8)
     at pg_backup_custom.c:471
#1  0x0804a98b in RestoreArchive (AHX=0x8058fc0, ropt=0x8058ee8)
     at pg_backup_archiver.c:336
#2  0x0804a03e in main (argc=10, argv=0xbffff924) at pg_restore.c:366
#3  0x42015967 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) display tctx->dataPos
2: tctx->dataPos = 1785996817

BTW, the file size is: 2361910772 bytes


Mike Charnoky

Tom Lane wrote:
> Mike Charnoky <noky@nextbus.com> writes:
>
>>I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel)
>>using the ext3 filesystem.  I am experiencing problems when performing a
>>pg_restore using a file which is 2.3G in size.  The dump, which seemed to run
>>smoothly, was created using the -Fc option.  When I perform the restore, the
>>following error occurs before the pg_restore fails:
>
>
>>pg_restore: [custom archiver] error during file seek: Invalid argument
>>pg_restore: *** aborted because of error
>
>
>>Why is this happening?  The error comes from pg_backup_custom.c, it seems that
>>an fseeko() is failing (even though this is the way to support large
>>files).
>
>
> Hm, can you insert some debugging printout to show the offset value
> being passed to fseeko?  That would let us eliminate one of pg_restore
> and the kernel as being at fault.  Another thing that'd be useful is to
> run pg_restore under gdb with a breakpoint set at die_horribly, so that
> you could get a stack trace from the point of the failure.
>
> I am suspicious that it's a pg_restore bug and the problem has to do
> with manipulating file offsets as plain integers someplace.  Not enough
> info yet to go searching, though.
>
>             regards, tom lane
>


Re: pg_restore and large files

From
Mike Charnoky
Date:
Whoops, forget that last post.  Here's the real data from gdb at the point prior
to failure of pg_restore:

Breakpoint 1, _PrintTocData (AH=0x8058fc0, te=0x80867e0, ropt=0x8058ee8)
     at pg_backup_custom.c:471
471                     if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)
1: tctx->dataPos = -1927033749
(gdb) backtrace
#0  _PrintTocData (AH=0x8058fc0, te=0x80867e0, ropt=0x8058ee8)
     at pg_backup_custom.c:471
#1  0x0804a98b in RestoreArchive (AHX=0x8058fc0, ropt=0x8058ee8)
     at pg_backup_archiver.c:336
#2  0x0804a03e in main (argc=10, argv=0xbffff924) at pg_restore.c:366
#3  0x42015967 in __libc_start_main () from /lib/i686/libc.so.6

Hope this is helpful.  BTW, the dump file size is 2361910772 bytes.  The last
valid dataPos before the crash was at 1785996918 (and the table being restored
was pretty big).  So, it does look like a pg_restore bug and that dataPos is
being treated as an integer somewhere.


Mike Charnoky

Tom Lane wrote:
> Mike Charnoky <noky@nextbus.com> writes:
>
>>I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel)
>>using the ext3 filesystem.  I am experiencing problems when performing a
>>pg_restore using a file which is 2.3G in size.  The dump, which seemed to run
>>smoothly, was created using the -Fc option.  When I perform the restore, the
>>following error occurs before the pg_restore fails:
>
>
>>pg_restore: [custom archiver] error during file seek: Invalid argument
>>pg_restore: *** aborted because of error
>
>
>>Why is this happening?  The error comes from pg_backup_custom.c, it seems that
>>an fseeko() is failing (even though this is the way to support large
>>files).
>
>
> Hm, can you insert some debugging printout to show the offset value
> being passed to fseeko?  That would let us eliminate one of pg_restore
> and the kernel as being at fault.  Another thing that'd be useful is to
> run pg_restore under gdb with a breakpoint set at die_horribly, so that
> you could get a stack trace from the point of the failure.
>
> I am suspicious that it's a pg_restore bug and the problem has to do
> with manipulating file offsets as plain integers someplace.  Not enough
> info yet to go searching, though.
>
>             regards, tom lane
>


Re: pg_restore and large files

From
Tom Lane
Date:
Mike Charnoky <noky@nextbus.com> writes:
> So, it does look like a pg_restore bug and that dataPos is
> being treated as an integer somewhere.

After digging in the CVS log I bet this is the same bug just noted a
month ago:

2004-01-03 23:02  tgl

    * src/bin/pg_dump/: pg_backup_archiver.c (REL7_4_STABLE),
    pg_backup_archiver.c: Fix ReadOffset() to work correctly when off_t
    is wider than int.

It looks like the same patch applies to 7.3, modulo slightly different
line number.  Please try it and let us know if it fixes the problem.

            regards, tom lane

===================================================================
RCS file: /cvsroot//pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.79
retrieving revision 1.79.2.1
diff -c -r1.79 -r1.79.2.1
*** pgsql-server/src/bin/pg_dump/pg_backup_archiver.c    2003/10/20 21:05:11    1.79
--- pgsql-server/src/bin/pg_dump/pg_backup_archiver.c    2004/01/04 04:02:22    1.79.2.1
***************
*** 1425,1431 ****
      for (off = 0; off < AH->offSize; off++)
      {
          if (off < sizeof(off_t))
!             *o |= ((*AH->ReadBytePtr) (AH)) << (off * 8);
          else
          {
              if ((*AH->ReadBytePtr) (AH) != 0)
--- 1425,1431 ----
      for (off = 0; off < AH->offSize; off++)
      {
          if (off < sizeof(off_t))
!             *o |= ((off_t) ((*AH->ReadBytePtr) (AH))) << (off * 8);
          else
          {
              if ((*AH->ReadBytePtr) (AH) != 0)

Re: pg_restore and large files

From
Mike Charnoky
Date:
Excellent!  This indeed solves the problem.  I'll recompile pg_dump as well,
just to be on the safe side.

Thank you so much Tom for your quick response.


Mike Charnoky

Tom Lane wrote:
> Mike Charnoky <noky@nextbus.com> writes:
>
>>So, it does look like a pg_restore bug and that dataPos is
>>being treated as an integer somewhere.
>
>
> After digging in the CVS log I bet this is the same bug just noted a
> month ago:
>
> 2004-01-03 23:02  tgl
>
>     * src/bin/pg_dump/: pg_backup_archiver.c (REL7_4_STABLE),
>     pg_backup_archiver.c: Fix ReadOffset() to work correctly when off_t
>     is wider than int.
>
> It looks like the same patch applies to 7.3, modulo slightly different
> line number.  Please try it and let us know if it fixes the problem.
>
>             regards, tom lane
>
> ===================================================================
> RCS file: /cvsroot//pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v
> retrieving revision 1.79
> retrieving revision 1.79.2.1
> diff -c -r1.79 -r1.79.2.1
> *** pgsql-server/src/bin/pg_dump/pg_backup_archiver.c    2003/10/20 21:05:11    1.79
> --- pgsql-server/src/bin/pg_dump/pg_backup_archiver.c    2004/01/04 04:02:22    1.79.2.1
> ***************
> *** 1425,1431 ****
>       for (off = 0; off < AH->offSize; off++)
>       {
>           if (off < sizeof(off_t))
> !             *o |= ((*AH->ReadBytePtr) (AH)) << (off * 8);
>           else
>           {
>               if ((*AH->ReadBytePtr) (AH) != 0)
> --- 1425,1431 ----
>       for (off = 0; off < AH->offSize; off++)
>       {
>           if (off < sizeof(off_t))
> !             *o |= ((off_t) ((*AH->ReadBytePtr) (AH))) << (off * 8);
>           else
>           {
>               if ((*AH->ReadBytePtr) (AH) != 0)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


Re: pg_restore and large files

From
Tom Lane
Date:
Mike Charnoky <noky@nextbus.com> writes:
> Excellent!  This indeed solves the problem.  I'll recompile pg_dump as well,
> just to be on the safe side.

Okay.  I've installed the patch into the REL7_3_STABLE branch as well,
just in case we end up making a 7.3.6 release ...

            regards, tom lane