Thread: pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this input stream (fseek required)

Hi chaps,

I've just upgraded a server from 8.3 to 8.4, and when trying to use the parallel restore options I get the following
error:

"pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this input
stream(fseek required)" 

The dump I'm trying to restore is purely a data dump, and the schema is separate (due to the way our setup works).

These are the options I'm using for the dump and the restore:

pg_dump -Fc <dbname> -U postgres -h localhost -a --disable-triggers

pg_restore -U postgres --disable-triggers -j 4 -c -d <dbname>

can anyone tell me what I'm doing wrong, or why my files are not supported by parallel restore?

Thanks
Glyn





Glyn Astill <glynastill@yahoo.co.uk> writes:
> I've just upgraded a server from 8.3 to 8.4, and when trying to use the parallel restore options I get the following
error:

> "pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this
inputstream (fseek required)" 

This is the second or third report we've gotten of that, but nobody's
been able to offer a reproducible test case.  Can you?

            regards, tom lane


--- On Fri, 30/4/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Glyn Astill <glynastill@yahoo.co.uk>
> writes:
> > I've just upgraded a server from 8.3 to 8.4, and when
> trying to use the parallel restore options I get the
> following error:
>
> > "pg_restore: [custom archiver] dumping a specific TOC
> data block out of order is not supported without ID on this
> input stream (fseek required)"
>
> This is the second or third report we've gotten of that,
> but nobody's
> been able to offer a reproducible test case.  Can
> you?
>

Hi Tom,

The schema is fairly large, but I will try.

One thing I forgot to mention is that in the restore script I drop the indexes off my tables between restoring the
schemaand the data. I've always done this to speed up the restore, but is there any chance this could be causing the
issue?

I guess what would help is some insight into what the error message means.

It appers to orginate in _PrintTocData in pg_backup_custom.c, but I don't really understand what's happening here at
all,a wild guess is it's trying to seek to a particular toc entry in the file? or process the file sequentially? 

http://doxygen.postgresql.org/pg__backup__custom_8c.html#6024b8108422e69062072df29f48506f

Glyn




Glyn Astill wrote:

> One thing I forgot to mention is that in the restore script I drop the indexes off my tables between restoring the
schemaand the data. I've always done this to speed up the restore, but is there any chance this could be causing the
issue?

Uh.  Why are you doing that?  pg_restore is supposed to restore the
schema, then data, finally indexes and other stuff.  Are you using
separate schema/data dumps?  If so, don't do that -- it's known to be
slower.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--- On Fri, 30/4/10, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>
> Uh.  Why are you doing that?  pg_restore is
> supposed to restore the
> schema, then data, finally indexes and other stuff. 
> Are you using
> separate schema/data dumps?  If so, don't do that --
> it's known to be
> slower.

Yes, I'm restoring the schema first, then the data.

The reason being that the data can come from different slony 1.2 slaves, but the schema always comes from the origin
serverdue to modifications slony makes to schemas on the slaves. 




Glyn Astill <glynastill@yahoo.co.uk> writes:
> The schema is fairly large, but I will try.

My guess is that you can reproduce it with not a lot of data, if you can
isolate the trigger condition.

> One thing I forgot to mention is that in the restore script I drop the indexes off my tables between restoring the
schemaand the data. I've always done this to speed up the restore, but is there any chance this could be causing the
issue?

Possibly.  I think there must be *something* unusual triggering the
problem, and maybe that is it or part of it.

> I guess what would help is some insight into what the error message means.

It's hard to tell.  The likely theories are (1) we're doing things in an
order that requires seeking backwards in the file, and for some reason
pg_restore thinks it can't do that; (2) there's a bug causing the code
to search for a item number that isn't actually in the file.

One of the previous reports actually turned out to be pilot error: the
initial dump had failed after emitting a partially complete file, and
so the error from pg_restore was essentially an instance of (2).  But
with three or so reports I'm thinking there's something else going on.

            regards, tom lane

Well I've ony just gotten round to taking another look at this, response inline below:

--- On Fri, 30/4/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Glyn Astill <glynastill@yahoo.co.uk>
> writes:
> > The schema is fairly large, but I will try.
>
> My guess is that you can reproduce it with not a lot of
> data, if you can
> isolate the trigger condition.
>

Hmm, tried reducing the amount of data and the issue goes away. Could this indicate some issue with the file, like an
issuewith it's size (~~ 5gb)? Or could it be an issue with the data itself? 

> > One thing I forgot to mention is that in the restore
> script I drop the indexes off my tables between restoring
> the schema and the data. I've always done this to speed up
> the restore, but is there any chance this could be causing
> the issue?
>
> Possibly.  I think there must be *something* unusual
> triggering the
> problem, and maybe that is it or part of it.

I've removed this faffing with indexes inbetween but the problem still persists.

>
> > I guess what would help is some insight into what the
> error message means.
>
> It's hard to tell.  The likely theories are (1) we're
> doing things in an
> order that requires seeking backwards in the file, and for
> some reason
> pg_restore thinks it can't do that; (2) there's a bug
> causing the code
> to search for a item number that isn't actually in the
> file.
>
> One of the previous reports actually turned out to be pilot
> error: the
> initial dump had failed after emitting a partially complete
> file, and
> so the error from pg_restore was essentially an instance of
> (2).  But
> with three or so reports I'm thinking there's something
> else going on.
>

So I'm still at a loss as to why it's happening.

I've tried to dig a little deeper (and I may just be punching thin air here) by adding the value of id into the error
messageat die_horribly() and it gives me id 7550 which is the first table in the TOC entry list when I do a pg_restore
-l,everything above it is a sequence.  

Here's a snip of pg_restore -l:

7775; 0 0 SEQUENCE SET website ui_content_id_seq pgcontrol
7550; 0 22272 TABLE DATA _main_replication sl_archive_counter slony

And the output if run it under gdb:

GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) set args -U postgres --disable-triggers -j 4 -c -d SEE Way5a-pgsql-SEE-data.gz
(gdb) break die_horribly
Breakpoint 1 at 0x4044b0: file pg_backup_archiver.c, line 1384.
(gdb) run
Starting program: /usr/local/pgsql/bin/pg_restore -U postgres --disable-triggers -j 4 -c -d SEE Way5a-pgsql-SEE-data.gz
[Thread debugging using libthread_db enabled]
[New Thread 0x7f72480eb700 (LWP 4335)]
pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this input
stream(fseek required) 
hasSeek = 1 dataState = 1 id = 7550
[Switching to Thread 0x7f72480eb700 (LWP 4335)]

Breakpoint 1, die_horribly (AH=0x61c210, modulename=0x4171f6 "archiver", fmt=0x4167d8 "worker process failed: exit code
%d\n")at pg_backup_archiver.c:1384 
1384    {
(gdb) pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this
inputstream (fseek required) 
hasSeek = 1 dataState = 1 id = 7550
pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this input
stream(fseek required) 
hasSeek = 1 dataState = 1 id = 7550
pg_restore: [custom archiver] dumping a specific TOC data block out of order is not supported without ID on this input
stream(fseek required) 
hasSeek = 1 dataState = 1 id = 7550

(gdb) bt
#0  die_horribly (AH=0x61c210, modulename=0x4171f6 "archiver", fmt=0x4167d8 "worker process failed: exit code %d\n") at
pg_backup_archiver.c:1384
#1  0x0000000000408f14 in RestoreArchive (AHX=0x61c210, ropt=0x61c0d0) at pg_backup_archiver.c:3586
#2  0x0000000000403737 in main (argc=10, argv=0x7fffffffd5b8) at pg_restore.c:380
(gdb) step
pg_restore: [archiver] worker process failed: exit code 1

Program exited with code 01.


Any further ideas of where I should dig would be appreciated.

Thanks
Glyn




On 21 May 2010, at 11:58, Glyn Astill wrote:

> Well I've ony just gotten round to taking another look at this, response inline below:
>
> --- On Fri, 30/4/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Glyn Astill <glynastill@yahoo.co.uk>
>> writes:
>>> The schema is fairly large, but I will try.
>>
>> My guess is that you can reproduce it with not a lot of
>> data, if you can
>> isolate the trigger condition.
>>
>
> Hmm, tried reducing the amount of data and the issue goes away. Could this indicate some issue with the file, like an
issuewith it's size (~~ 5gb)? Or could it be an issue with the data itself? 

The file-size in combination with an "out of order" error smells of a 32-bit integer wrap-around problem.

And indeed, from the documentation (http://www.postgresql.org/docs/8.4/interactive/lo-intro.html):
"One remaining advantage of the large object facility is that it allows values up to 2 GB in size"

So I guess your large object is too large.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4bf6617510414104348269!



--- On Fri, 21/5/10, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

> On 21 May 2010, at 11:58, Glyn Astill
> wrote:
>
> > Well I've ony just gotten round to taking another look
> at this, response inline below:
> >
> > --- On Fri, 30/4/10, Tom Lane <tgl@sss.pgh.pa.us>
> wrote:
> >
> >> Glyn Astill <glynastill@yahoo.co.uk>
> >> writes:
> >>> The schema is fairly large, but I will try.
> >>
> >> My guess is that you can reproduce it with not a
> lot of
> >> data, if you can
> >> isolate the trigger condition.
> >>
> >
> > Hmm, tried reducing the amount of data and the issue
> goes away. Could this indicate some issue with the file,
> like an issue with it's size (~~ 5gb)? Or could it be an
> issue with the data itself?
>
> The file-size in combination with an "out of order" error
> smells of a 32-bit integer wrap-around problem.
>
> And indeed, from the documentation (http://www.postgresql.org/docs/8.4/interactive/lo-intro.html):
> "One remaining advantage of the large object facility is
> that it allows values up to 2 GB in size"
>
> So I guess your large object is too large.

Hmm, we don't use any large objects though, all our data is pretty much just date, text and numeric fields etc

Glyn.





On 21 May 2010, at 12:44, Glyn Astill wrote:

>> So I guess your large object is too large.
>
> Hmm, we don't use any large objects though, all our data is pretty much just date, text and numeric fields etc


Doh! Seems I mixed up a few threads here. It was probably the mentioning of a 5GB file that threw me off, hadn't
realisedyou were referring to a dump file there. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4bf67a7e10411591919641!