Thread: unable to repair table: missing chunk number

unable to repair table: missing chunk number

From
Alex Krohn
Date:
Hi,

I'm having serious troubles trying to fix a corrupted table, and am
hoping the data is not gone.

When I try to do any sort of SELECT, or COPY that accesses a part of the
table, I get:

ERROR:  missing chunk number 0 for toast value 12851102

I get this using pg_dump, the COPY command, or even if I just do a SELECT
when I access records with id's > 115,000 (out of around 140,000 records).

Is there any type of repair tool I can use to fix the table? Or am I
looking at off disk backups (problem is I don't know when the table went
bad).

A search on 'missing chunk number' produced no help.

Any help on how I can get the data dumped to a file would be greatly
appreciated. I'm using Postgres 7.2.1 built from source on a
Linux 2.4.3-6smp #1 SMP Wed May 16 04:29:16 EDT 2001 i686 unknown
machine.

Cheers,

Alex


Re: unable to repair table: missing chunk number

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
> When I try to do any sort of SELECT, or COPY that accesses a part of the
> table, I get:
> ERROR:  missing chunk number 0 for toast value 12851102

This is a long shot --- but does a VACUUM fix the problem?  There was
a bug in TOAST up through 7.2 that could cause it to mess up after an
aborted VACUUM FULL.  I think a fresh VACUUM would fix it though.

If not, you'll have to identify exactly which tuple references the
trashed toast value and get rid of it.  If you're only seeing this
with one specific toast ID mentioned then there's no reason to think
you've lost more than one value, but table-wide SELECT or COPY won't
work unless you exclude the bad value.

BTW, how did you get into this state?

            regards, tom lane

Re: unable to repair table: missing chunk number

From
Alex Krohn
Date:
Hi Tom,

> > When I try to do any sort of SELECT, or COPY that accesses a part of the
> > table, I get:
> > ERROR:  missing chunk number 0 for toast value 12851102
>
> This is a long shot --- but does a VACUUM fix the problem?  There was
> a bug in TOAST up through 7.2 that could cause it to mess up after an
> aborted VACUUM FULL.  I think a fresh VACUUM would fix it though.

No good, I did a 'VACUUM mytable' and it ran ok, but a pg_dump still
reports:

pg_dump: ERROR:  missing chunk number 0 for toast value 12851102
pg_dump: lost synchronization with server, resetting connection

> If not, you'll have to identify exactly which tuple references the
> trashed toast value and get rid of it.

How do you correlate a tuple to a toast value? What is a toast value?
(Sorry new, to postgres). =)

When I dump the table to disk I get records in sequence by primary key
from 1 to 115847 before it crashes. So I then tried to SELECT * INTO
newtable WHERE primary_key > 115847. I tried increasing the value to see
where it broke, but even with primary_key > 130000 (out of 135,000),
it's always bad.

This may be totally broken logic, as it still may need to access the
data because of the > condition. =)

> BTW, how did you get into this state?

I'm not sure. I wasn't running VACUUM regularly which may have been the
cause. The program was running smoothly for about two weeks, and then
the application started failing for one or two users with that error in
the SQL log.

Thanks again for the help,

Alex


Re: unable to repair table: missing chunk number

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
>> If not, you'll have to identify exactly which tuple references the
>> trashed toast value and get rid of it.

> How do you correlate a tuple to a toast value? What is a toast value?
> (Sorry new, to postgres). =)

A TOAST value is a value of a particular field in a particular row
that's been pushed out-of-line because it's too big.  (Bigger than
a kilobyte or so, probably, though I'm guessing since I don't know
how many columns in your table.)  What you have to do is identify
which row and which field that is.  Even if the toasted value is,
um, toast, you should be able to extract everything else.

> When I dump the table to disk I get records in sequence by primary key
> from 1 to 115847 before it crashes. So I then tried to SELECT * INTO
> newtable WHERE primary_key > 115847.

There could be some buffering going on (records read from database but
not yet emitted to file), but you seem to have the right idea.

> I tried increasing the value to see
> where it broke, but even with primary_key > 130000 (out of 135,000),
> it's always bad.

Do you always get the same toast value number mentioned?  There could be
more than one trashed value.

A brute-force way to narrow things down would be to write a little
program that tries to retrieve each row individually by primary key,
starting at 115848 since you know the rows before that are okay.

> I'm not sure. I wasn't running VACUUM regularly which may have been the
> cause. The program was running smoothly for about two weeks, and then
> the application started failing for one or two users with that error in
> the SQL log.

That's disturbing; short of a serious failure (disk crash, for instance)
I don't know of anything that would cause this.

One thing that would be interesting to try is to investigate the TOAST
table directly.  To do this, get your problem table's OID from pg_class:
  select oid from pg_class where relname = 'your-table-name';
The TOAST table's name is then "pg_toast_OID", for instance
pg_toast_139777 if you find the table OID is 139777.  If you inspect
this table you'll find it's got a structure like

regression=# \d pg_toast_139777
TOAST table "pg_toast_139777"
   Column   |  Type
------------+---------
 chunk_id   | oid
 chunk_seq  | integer
 chunk_data | bytea

The chunk_id is that magic number 12851102 that your error message is
complaining about.  What would be interesting to know is what is
actually in the toast table for that ID.  You might try something like
    select chunk_seq, length(chunk_data) from pg_toast_139777
    where chunk_id = 12851102 order by chunk_seq;

(You probably need to be database superuser to look at a toast table
directly like this.)

            regards, tom lane

ROWTYPE as parameter to function

From
"Nigel J. Andrews"
Date:

I see that RECORD can't be used as a type of an argument in a function,
however, I see nothing that says tablename%ROWTYPE can't. Indeed from the
reason given for RECORD not being allowed I would have thought the %ROWTYPE
would have been.

May be I'm just being silly in what I'm trying to do. What I have is a view
called, say, myview which I am trying to trap inserts to so the real tables can
be written to. The view is the 'user interface' so to speak. So what I have is:

CREATE VIEW myview AS SELECT ...;

and what I'd like to do is:

CREATE FUNCTION myfunction (RECORD) RETURNS ... ;
CREATE RULE insert_myview AS ON INSERT TO myview DO INSTEAD
    SELECT myfunction(NEW);

Obviously I can't use the RECORD type but when I create the function with:

CREATE FUNCTION myfunction (myview%ROWTYPE) RETURNS ... ;

the '%' gets reported in an error message.

I don't think triggers are the things to use, partly because I don't know
anything about these SPI functions available for accessing other tables within
them but also because I think the rule system is the correct route to trap such
things on a view. So, am I being sensible in trying this design? Am I doing
something that isn't permitted, like with the RECORD type, by trying to get an
entire row into a function in one variable?


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


Re: unable to repair table: missing chunk number

From
William Meloney
Date:
When faced with a similiar sounding problem I tried the following with
some success.  It is an extreme alternative and not suited to all
situations...that having been said, here goes.

pg_dump mytable >> mytable.txt

You said
 > When I dump the table to disk I get records in sequence by primary key
 > from 1 to 115847 before it crashes.

I looked at the mytable.txt file and found the last entry that pg_dump
was able to copy and noted the primary key number (115847).

I then executed a quiery that deleted the very next primary key number
(115848) record.  (*BAD!  Lost that data record.  BAD!*)

I was then able to do a pg_dump of the entire table followed by a
vacuum/analyze.  (*Good.  PG is running correctly. Good*)

In my singular case the loss of one data record is acceptable.  It may
not be for you.  Try this resolve only at your own risk and peril.

Peace

- Bill


Re: ROWTYPE as parameter to function

From
Tom Lane
Date:
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> Obviously I can't use the RECORD type but when I create the function with:
> CREATE FUNCTION myfunction (myview%ROWTYPE) RETURNS ... ;
> the '%' gets reported in an error message.

Leave off the %ROWTYPE.  Also read the info about composite-type
arguments at
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/xfunc-sql.html

            regards, tom lane

Re: unable to repair table: missing chunk number

From
Alex Krohn
Date:
Hi Tom,

> A brute-force way to narrow things down would be to write a little
> program that tries to retrieve each row individually by primary key,
> starting at 115848 since you know the rows before that are okay.

Thanks, this worked. I ran a perl script that went from 1 to
max(primary_id), and selected a record and then inserted it into a new
table.

There were a total of two bad records, so not too bad. =)

> That's disturbing; short of a serious failure (disk crash, for instance)
> I don't know of anything that would cause this.
>
> One thing that would be interesting to try is to investigate the TOAST
> table directly.

# select oid from pg_class where relname = 'users';
   oid
---------
 9361620
(1 row)

# select chunk_seq, length(chunk_data) from pg_toast_9361620 where
  chunk_id = 12851102 order by chunk_seq;

 chunk_seq | length
-----------+--------
(0 rows)

Very strange.

Now that we can backup the data, we've switched the database to a brand
new disk drive, and re-imported and vacuumed everything. The application
is running smoothly again.

I doubt this is relevant, but we were symlinking /usr/local/pgsql/data
-> /mnt/disk2/pgsql. Also, one column in the problem table was a text
field avg'ing 20k.

I still have the old database if it helps.

Thanks for all your help,

Alex

Re: ROWTYPE as parameter to function

From
"Nigel J. Andrews"
Date:
On Fri, 19 Apr 2002, Tom Lane wrote:

> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> > Obviously I can't use the RECORD type but when I create the function with:
> > CREATE FUNCTION myfunction (myview%ROWTYPE) RETURNS ... ;
> > the '%' gets reported in an error message.
>
> Leave off the %ROWTYPE.  Also read the info about composite-type
> arguments at
> http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/xfunc-sql.html


Thanks Tom. I didn't read that particular page before, I had been concentrating
on the PL/pgSQL pages.

As soon as I'd read that page I had the function being created. Unfortunately,
I can't now get the rule to be created. Again, I can't find the relevent
information in the docs, even though it's probably obvious again, so any help
on why this:

CREATE RULE insert_listing AS ON insert TO listing_view
     DO INSTEAD SELECT insert_listing_fn(NEW);

gives the error: 'parser: parse error at or near ")"' would be appreciated.

Using:
    1) SELECT 1,2,3
    2) SELECT NEW.title
    3) SELECT insert_listing_fn()
    4) SELECT insert_listing_fn(NEW.title)

as the rule's action do not give this error, although 3 and 4 do of course give
a different error since there aren't version of the function defined with those
parameters.

I know I could just expand the NEW record in the call to the function, giving
each attribute as a parameter. However, I'd be more interested in knowing how
to do this properly.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants