Thread: unable to dump database, toast errors

unable to dump database, toast errors

From
Lonni Friedman
Date:
Greetings,
I'm running postgresql-7.2.1 on Redhat-7.2, and when attempting to
perform a dump with the command:
pg_dump -Ft -b -o DB_NAME > /tmp/db.tar

it chugs away for a few minutes, and then dies with the following error:

pg_dump: dumping out the contents of table artifact_file
pg_dump: ERROR: unexpected chunk number 0 (expected 1) for toast value
7685119
pg_dump: lost synchronization with server, resetting connection
pg_dump: SQL command to dump the contents of table "artifact_file"
failed: PQendcopy() failed.
pg_dump: Error message from server: pg_dump: The command was: COPY
"artifact_file" WITH OIDS TO stdout;
pg_dump: *** aborted because of error

I've tried running vacuumdb as follows:
vacuumdb -z -v -a -f

and its come up clean every time, so i'm really at a loss on how to
resolve this.  If it matters, the artifact_file table has 782 rows, and
is comprised of text and integer datatype columns.

Any suggestions would be greatly appreciated.  thanks!


Re: unable to dump database, toast errors

From
Lonni Friedman
Date:
On 04/04/03 14:32, Tom Lane wrote:
>> i'm positive that there is a much more recent release.  this server is
>> sadly, very poorly maintained by the customer.  do you feel that the
>> segfaults generated in psql are blocking any possible solution to the
>> pg_dump failures, or can you/I work on a solution for that indpendently of
>> resolving the segfaults?
>
> Assuming that I've correctly diagnosed the segfault, it will only occur
> if you try to print an extremely wide value in psql.  It's pretty easy
> to avoid doing so, though.  For example, you could query for
> LENGTH(bin_data) or a SUBSTRING() of it, instead of printing the whole
> field value.  AFAIR, 7.2 does not have any optimizations that would
> skip fetching the whole toasted value when it is the argument of one of
> these functions, so "select length(bin_data) ..." should be just as
> useful for checking for brokenness as "select bin_data ...".

Hi Tom,
Following your suggestion, i pinned down the following:

sfee=# select LENGTH(bin_data) from artifact_file LIMIT 1 OFFSET 693;
ERROR:  missing chunk number 1 for toast value 7685119

However, what is very odd, is that I can view the entire contents of
that row's particular bin_data column if i do a normal select:
select bin_data from artifact_file LIMIT 1 OFFSET 693;

and it completes with no errors.  After more testing, it appears that
the 'select LENGTH(bin_data) ...' for any offset greater than or equal
to 693 dies with the same 'missing chunk...' error.

any suggestions on where to go from here?

thanks

-Lonni


Re: unable to dump database, toast errors

From
Jan Wieck
Date:
Lonni Friedman wrote:
>
> On 04/04/03 14:32, Tom Lane wrote:
> >> i'm positive that there is a much more recent release.  this server is
> >> sadly, very poorly maintained by the customer.  do you feel that the
> >> segfaults generated in psql are blocking any possible solution to the
> >> pg_dump failures, or can you/I work on a solution for that indpendently of
> >> resolving the segfaults?
> >
> > Assuming that I've correctly diagnosed the segfault, it will only occur
> > if you try to print an extremely wide value in psql.  It's pretty easy
> > to avoid doing so, though.  For example, you could query for
> > LENGTH(bin_data) or a SUBSTRING() of it, instead of printing the whole
> > field value.  AFAIR, 7.2 does not have any optimizations that would
> > skip fetching the whole toasted value when it is the argument of one of
> > these functions, so "select length(bin_data) ..." should be just as
> > useful for checking for brokenness as "select bin_data ...".
>
> Hi Tom,
> Following your suggestion, i pinned down the following:
>
> sfee=# select LENGTH(bin_data) from artifact_file LIMIT 1 OFFSET 693;
> ERROR:  missing chunk number 1 for toast value 7685119
>
> However, what is very odd, is that I can view the entire contents of
> that row's particular bin_data column if i do a normal select:
> select bin_data from artifact_file LIMIT 1 OFFSET 693;
>
> and it completes with no errors.  After more testing, it appears that
> the 'select LENGTH(bin_data) ...' for any offset greater than or equal
> to 693 dies with the same 'missing chunk...' error.
>
> any suggestions on where to go from here?

From that I would assume that the row at offset 693 is totally intact
but the one at 694 is the one damaged. LIMIT reads one more row
internally than you asked for.

Get the key of the row at 694 (and others you might find) and continue
the work using those keys.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: unable to dump database, toast errors

From
Lonni Friedman
Date:
On 04/06/03 14:50, Jan Wieck wrote:
> Lonni Friedman wrote:
>> On 04/04/03 14:32, Tom Lane wrote:
>> >> i'm positive that there is a much more recent release.  this server is
>> >> sadly, very poorly maintained by the customer.  do you feel that the
>> >> segfaults generated in psql are blocking any possible solution to the
>> >> pg_dump failures, or can you/I work on a solution for that indpendently of
>> >> resolving the segfaults?
>> >
>> > Assuming that I've correctly diagnosed the segfault, it will only occur
>> > if you try to print an extremely wide value in psql.  It's pretty easy
>> > to avoid doing so, though.  For example, you could query for
>> > LENGTH(bin_data) or a SUBSTRING() of it, instead of printing the whole
>> > field value.  AFAIR, 7.2 does not have any optimizations that would
>> > skip fetching the whole toasted value when it is the argument of one of
>> > these functions, so "select length(bin_data) ..." should be just as
>> > useful for checking for brokenness as "select bin_data ...".
>>
>> Hi Tom,
>> Following your suggestion, i pinned down the following:
>>
>> sfee=# select LENGTH(bin_data) from artifact_file LIMIT 1 OFFSET 693;
>> ERROR:  missing chunk number 1 for toast value 7685119
>>
>> However, what is very odd, is that I can view the entire contents of
>> that row's particular bin_data column if i do a normal select:
>> select bin_data from artifact_file LIMIT 1 OFFSET 693;
>>
>> and it completes with no errors.  After more testing, it appears that
>> the 'select LENGTH(bin_data) ...' for any offset greater than or equal
>> to 693 dies with the same 'missing chunk...' error.
>>
>> any suggestions on where to go from here?
>
>From that I would assume that the row at offset 693 is totally intact
> but the one at 694 is the one damaged. LIMIT reads one more row
> internally than you asked for.
>
> Get the key of the row at 694 (and others you might find) and continue
> the work using those keys.

I'm not sure that I understand what it is that I need to do now:

sfee=# select bin_data from artifact_file LIMIT 1 OFFSET 694;
ERROR:  missing chunk number 1 for toast value 7685119
sfee=# select LENGTH(bin_data) from artifact_file LIMIT 1 OFFSET 694;
ERROR:  missing chunk number 1 for toast value 7685119

How would i get the key of row 694?


Re: unable to dump database, toast errors

From
Jan Wieck
Date:
Lonni Friedman wrote:
>
> On 04/06/03 14:50, Jan Wieck wrote:
> >>From that I would assume that the row at offset 693 is totally intact
> > but the one at 694 is the one damaged. LIMIT reads one more row
> > internally than you asked for.
> >
> > Get the key of the row at 694 (and others you might find) and continue
> > the work using those keys.
>
> I'm not sure that I understand what it is that I need to do now:
>
> sfee=# select bin_data from artifact_file LIMIT 1 OFFSET 694;
> ERROR:  missing chunk number 1 for toast value 7685119
> sfee=# select LENGTH(bin_data) from artifact_file LIMIT 1 OFFSET 694;
> ERROR:  missing chunk number 1 for toast value 7685119
>
> How would i get the key of row 694?

does that table have any OTHER fields than the one that is knowingly
corrupted?


Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: unable to dump database, toast errors

From
Lonni Friedman
Date:
On 04/06/03 17:28, Jan Wieck wrote:
> Lonni Friedman wrote:
>>
>> On 04/06/03 14:50, Jan Wieck wrote:
>> >>From that I would assume that the row at offset 693 is totally intact
>> > but the one at 694 is the one damaged. LIMIT reads one more row
>> > internally than you asked for.
>> >
>> > Get the key of the row at 694 (and others you might find) and continue
>> > the work using those keys.
>>
>> I'm not sure that I understand what it is that I need to do now:
>>
>> sfee=# select bin_data from artifact_file LIMIT 1 OFFSET 694;
>> ERROR:  missing chunk number 1 for toast value 7685119
>> sfee=# select LENGTH(bin_data) from artifact_file LIMIT 1 OFFSET 694;
>> ERROR:  missing chunk number 1 for toast value 7685119
>>
>> How would i get the key of row 694?
>
> does that table have any OTHER fields than the one that is knowingly
> corrupted?

yes, its got several other integer & text datatype fields.


Re: unable to dump database, toast errors

From
Jan Wieck
Date:
Lonni Friedman wrote:
> yes, its got several other integer & text datatype fields.

Lonni,

just to let you know, we have all the time in the world and after all it
is your problem. So if you slow down the process by giving the
informaiton one drop at a time, we can certainly play that game. Just
don't blame us that it takes that long then.

Is there any of those other columns that could identify a single row
uniquely? If so, that could work as a key for this purpose.

If you have that, let's call it <keycolumn>. Then do a

    select <keycolumn> from artifact_file limit 1 offset 694.

The key you've got now will be most likely the row containing the
corrupted toast value. To check that it is the only one, let's say you
got <keyvalue> from the query above. Now do a

    select sum(length(bin_data)) from artifact_file
        where <keycolumn> != <keyvalue>

If that completes without an error, you found a way to identify the
damaged row without using LIMIT. I think Tom already explained what can
be done from here.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: unable to dump database, toast errors

From
Manfred Koizar
Date:
On Mon, 07 Apr 2003 07:34:47 -0400, Jan Wieck <JanWieck@Yahoo.com>
wrote:
>Is there any of those other columns that could identify a single row
>uniquely? If so, that could work as a key for this purpose.
>
>If you have that, let's call it <keycolumn>.

Lonni,

let me add that if there is no such *user* column, you can always use
the system column ctid:

    SELECT ctid FROM artifact_file LIMIT 1 OFFSET 694;

 ctid
--------
 (42,7)

Now check whether this is really your broken tuple:

    SELECT length(bin_data) FROM artifact_file
     WHERE ctid = '(42,7)';  -- should give an ERROR

... and that all other tuples are ok as suggested by Jan:

    SELECT sum(length(bin_data)) FROM artifact_file
     WHERE NOT ctid = '(42,7)';  -- should work

Note that you have to use 'NOT ctid =', because operator '!=' is not
defined for datatype tid.

Servus
 Manfred


Re: unable to dump database, toast errors

From
Lonni J Friedman
Date:
On Mon, 7 Apr 2003, Manfred Koizar wrote:
> On Mon, 07 Apr 2003 07:34:47 -0400, Jan Wieck <JanWieck@Yahoo.com>
> wrote:
> >Is there any of those other columns that could identify a single row
> >uniquely? If so, that could work as a key for this purpose.
> >
> >If you have that, let's call it <keycolumn>.
>
> Lonni,
>
> let me add that if there is no such *user* column, you can always use
> the system column ctid:
>
>     SELECT ctid FROM artifact_file LIMIT 1 OFFSET 694;
>
>  ctid
> --------
>  (42,7)
>
> Now check whether this is really your broken tuple:
>
>     SELECT length(bin_data) FROM artifact_file
>      WHERE ctid = '(42,7)';  -- should give an ERROR
>
> ... and that all other tuples are ok as suggested by Jan:
>
>     SELECT sum(length(bin_data)) FROM artifact_file
>      WHERE NOT ctid = '(42,7)';  -- should work
>
> Note that you have to use 'NOT ctid =', because operator '!=' is not
> defined for datatype tid.

thanks for your reply.

# select ctid from artifact_file LIMIT 1 OFFSET 694;
  ctid
---------
 (16,33)

# select LENGTH(bin_data) from artifact_file where ctid='(16,33)';
ERROR:  missing chunk number 1 for toast value 7685119

# select sum(length(bin_data)) from artifact_file where NOT
ctid='(16,33)';
    sum
-----------
 293963428


So, where do i go from here?  Is it now just a matter of nullifying
bin_data for that particular row, or is it more involved?  thanks.

-Lonni


Re: unable to dump database, toast errors

From
Jan Wieck
Date:
Lonni J Friedman wrote:
>
> On Mon, 7 Apr 2003, Manfred Koizar wrote:
> > On Mon, 07 Apr 2003 07:34:47 -0400, Jan Wieck <JanWieck@Yahoo.com>
> > wrote:
>
> [...]
>
> # select ctid from artifact_file LIMIT 1 OFFSET 694;
>   ctid
> ---------
>  (16,33)
>
> # select LENGTH(bin_data) from artifact_file where ctid='(16,33)';
> ERROR:  missing chunk number 1 for toast value 7685119
>
> # select sum(length(bin_data)) from artifact_file where NOT
> ctid='(16,33)';
>     sum
> -----------
>  293963428
>
> So, where do i go from here?  Is it now just a matter of nullifying
> bin_data for that particular row, or is it more involved?  thanks.

I think an

    UPDATE artifact_file SET bin_data = '' WHERE ctid = '(16,33)';

should make a

    SELECT sum(length(bin_data)) FROM artifact_file;

work again. If that is the case, dump the whole database system with
pg_dumpall. If and only if that is okay too, save the current
pg_hba.conf and postgres.conf files, remove the whole PGDATA directory,
run initdb, restore the dump and finally restore the pg_hba.conf and
postgres.conf files.

Another question is how did it come to the corrupted TOAST table. Have
you run test programs that check for bad memory modules? Have you
checked your harddisks for badblocks? Are you running an overclocked
system? So basically, can you trust your hardware as far as you can
throw it?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #