Re: unable to repair table: missing chunk number - Mailing list pgsql-general

From Tom Lane
Subject Re: unable to repair table: missing chunk number
Date
Msg-id 3766.1019248342@sss.pgh.pa.us
Whole thread Raw
In response to Re: unable to repair table: missing chunk number  (Alex Krohn <alex@gossamer-threads.com>)
Responses Re: unable to repair table: missing chunk number
List pgsql-general
Alex Krohn <alex@gossamer-threads.com> writes:
>> If not, you'll have to identify exactly which tuple references the
>> trashed toast value and get rid of it.

> How do you correlate a tuple to a toast value? What is a toast value?
> (Sorry new, to postgres). =)

A TOAST value is a value of a particular field in a particular row
that's been pushed out-of-line because it's too big.  (Bigger than
a kilobyte or so, probably, though I'm guessing since I don't know
how many columns in your table.)  What you have to do is identify
which row and which field that is.  Even if the toasted value is,
um, toast, you should be able to extract everything else.

> When I dump the table to disk I get records in sequence by primary key
> from 1 to 115847 before it crashes. So I then tried to SELECT * INTO
> newtable WHERE primary_key > 115847.

There could be some buffering going on (records read from database but
not yet emitted to file), but you seem to have the right idea.

> I tried increasing the value to see
> where it broke, but even with primary_key > 130000 (out of 135,000),
> it's always bad.

Do you always get the same toast value number mentioned?  There could be
more than one trashed value.

A brute-force way to narrow things down would be to write a little
program that tries to retrieve each row individually by primary key,
starting at 115848 since you know the rows before that are okay.

> I'm not sure. I wasn't running VACUUM regularly which may have been the
> cause. The program was running smoothly for about two weeks, and then
> the application started failing for one or two users with that error in
> the SQL log.

That's disturbing; short of a serious failure (disk crash, for instance)
I don't know of anything that would cause this.

One thing that would be interesting to try is to investigate the TOAST
table directly.  To do this, get your problem table's OID from pg_class:
  select oid from pg_class where relname = 'your-table-name';
The TOAST table's name is then "pg_toast_OID", for instance
pg_toast_139777 if you find the table OID is 139777.  If you inspect
this table you'll find it's got a structure like

regression=# \d pg_toast_139777
TOAST table "pg_toast_139777"
   Column   |  Type
------------+---------
 chunk_id   | oid
 chunk_seq  | integer
 chunk_data | bytea

The chunk_id is that magic number 12851102 that your error message is
complaining about.  What would be interesting to know is what is
actually in the toast table for that ID.  You might try something like
    select chunk_seq, length(chunk_data) from pg_toast_139777
    where chunk_id = 12851102 order by chunk_seq;

(You probably need to be database superuser to look at a toast table
directly like this.)

            regards, tom lane

pgsql-general by date:

Previous
From: Alex Krohn
Date:
Subject: Re: unable to repair table: missing chunk number
Next
From: "Dave"
Date:
Subject: postgres startup script, long delay when auth = password