Thread: unable to repair table: missing chunk number
Hi, I'm having serious troubles trying to fix a corrupted table, and am hoping the data is not gone. When I try to do any sort of SELECT, or COPY that accesses a part of the table, I get: ERROR: missing chunk number 0 for toast value 12851102 I get this using pg_dump, the COPY command, or even if I just do a SELECT when I access records with id's > 115,000 (out of around 140,000 records). Is there any type of repair tool I can use to fix the table? Or am I looking at off disk backups (problem is I don't know when the table went bad). A search on 'missing chunk number' produced no help. Any help on how I can get the data dumped to a file would be greatly appreciated. I'm using Postgres 7.2.1 built from source on a Linux 2.4.3-6smp #1 SMP Wed May 16 04:29:16 EDT 2001 i686 unknown machine. Cheers, Alex
Alex Krohn <alex@gossamer-threads.com> writes: > When I try to do any sort of SELECT, or COPY that accesses a part of the > table, I get: > ERROR: missing chunk number 0 for toast value 12851102 This is a long shot --- but does a VACUUM fix the problem? There was a bug in TOAST up through 7.2 that could cause it to mess up after an aborted VACUUM FULL. I think a fresh VACUUM would fix it though. If not, you'll have to identify exactly which tuple references the trashed toast value and get rid of it. If you're only seeing this with one specific toast ID mentioned then there's no reason to think you've lost more than one value, but table-wide SELECT or COPY won't work unless you exclude the bad value. BTW, how did you get into this state? regards, tom lane
Hi Tom, > > When I try to do any sort of SELECT, or COPY that accesses a part of the > > table, I get: > > ERROR: missing chunk number 0 for toast value 12851102 > > This is a long shot --- but does a VACUUM fix the problem? There was > a bug in TOAST up through 7.2 that could cause it to mess up after an > aborted VACUUM FULL. I think a fresh VACUUM would fix it though. No good, I did a 'VACUUM mytable' and it ran ok, but a pg_dump still reports: pg_dump: ERROR: missing chunk number 0 for toast value 12851102 pg_dump: lost synchronization with server, resetting connection > If not, you'll have to identify exactly which tuple references the > trashed toast value and get rid of it. How do you correlate a tuple to a toast value? What is a toast value? (Sorry new, to postgres). =) When I dump the table to disk I get records in sequence by primary key from 1 to 115847 before it crashes. So I then tried to SELECT * INTO newtable WHERE primary_key > 115847. I tried increasing the value to see where it broke, but even with primary_key > 130000 (out of 135,000), it's always bad. This may be totally broken logic, as it still may need to access the data because of the > condition. =) > BTW, how did you get into this state? I'm not sure. I wasn't running VACUUM regularly which may have been the cause. The program was running smoothly for about two weeks, and then the application started failing for one or two users with that error in the SQL log. Thanks again for the help, Alex
Alex Krohn <alex@gossamer-threads.com> writes: >> If not, you'll have to identify exactly which tuple references the >> trashed toast value and get rid of it. > How do you correlate a tuple to a toast value? What is a toast value? > (Sorry new, to postgres). =) A TOAST value is a value of a particular field in a particular row that's been pushed out-of-line because it's too big. (Bigger than a kilobyte or so, probably, though I'm guessing since I don't know how many columns in your table.) What you have to do is identify which row and which field that is. Even if the toasted value is, um, toast, you should be able to extract everything else. > When I dump the table to disk I get records in sequence by primary key > from 1 to 115847 before it crashes. So I then tried to SELECT * INTO > newtable WHERE primary_key > 115847. There could be some buffering going on (records read from database but not yet emitted to file), but you seem to have the right idea. > I tried increasing the value to see > where it broke, but even with primary_key > 130000 (out of 135,000), > it's always bad. Do you always get the same toast value number mentioned? There could be more than one trashed value. A brute-force way to narrow things down would be to write a little program that tries to retrieve each row individually by primary key, starting at 115848 since you know the rows before that are okay. > I'm not sure. I wasn't running VACUUM regularly which may have been the > cause. The program was running smoothly for about two weeks, and then > the application started failing for one or two users with that error in > the SQL log. That's disturbing; short of a serious failure (disk crash, for instance) I don't know of anything that would cause this. One thing that would be interesting to try is to investigate the TOAST table directly. To do this, get your problem table's OID from pg_class: select oid from pg_class where relname = 'your-table-name'; The TOAST table's name is then "pg_toast_OID", for instance pg_toast_139777 if you find the table OID is 139777. If you inspect this table you'll find it's got a structure like regression=# \d pg_toast_139777 TOAST table "pg_toast_139777" Column | Type ------------+--------- chunk_id | oid chunk_seq | integer chunk_data | bytea The chunk_id is that magic number 12851102 that your error message is complaining about. What would be interesting to know is what is actually in the toast table for that ID. You might try something like select chunk_seq, length(chunk_data) from pg_toast_139777 where chunk_id = 12851102 order by chunk_seq; (You probably need to be database superuser to look at a toast table directly like this.) regards, tom lane
I see that RECORD can't be used as a type of an argument in a function, however, I see nothing that says tablename%ROWTYPE can't. Indeed from the reason given for RECORD not being allowed I would have thought the %ROWTYPE would have been. May be I'm just being silly in what I'm trying to do. What I have is a view called, say, myview which I am trying to trap inserts to so the real tables can be written to. The view is the 'user interface' so to speak. So what I have is: CREATE VIEW myview AS SELECT ...; and what I'd like to do is: CREATE FUNCTION myfunction (RECORD) RETURNS ... ; CREATE RULE insert_myview AS ON INSERT TO myview DO INSTEAD SELECT myfunction(NEW); Obviously I can't use the RECORD type but when I create the function with: CREATE FUNCTION myfunction (myview%ROWTYPE) RETURNS ... ; the '%' gets reported in an error message. I don't think triggers are the things to use, partly because I don't know anything about these SPI functions available for accessing other tables within them but also because I think the rule system is the correct route to trap such things on a view. So, am I being sensible in trying this design? Am I doing something that isn't permitted, like with the RECORD type, by trying to get an entire row into a function in one variable? -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
When faced with a similiar sounding problem I tried the following with some success. It is an extreme alternative and not suited to all situations...that having been said, here goes. pg_dump mytable >> mytable.txt You said > When I dump the table to disk I get records in sequence by primary key > from 1 to 115847 before it crashes. I looked at the mytable.txt file and found the last entry that pg_dump was able to copy and noted the primary key number (115847). I then executed a quiery that deleted the very next primary key number (115848) record. (*BAD! Lost that data record. BAD!*) I was then able to do a pg_dump of the entire table followed by a vacuum/analyze. (*Good. PG is running correctly. Good*) In my singular case the loss of one data record is acceptable. It may not be for you. Try this resolve only at your own risk and peril. Peace - Bill
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > Obviously I can't use the RECORD type but when I create the function with: > CREATE FUNCTION myfunction (myview%ROWTYPE) RETURNS ... ; > the '%' gets reported in an error message. Leave off the %ROWTYPE. Also read the info about composite-type arguments at http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/xfunc-sql.html regards, tom lane
Hi Tom, > A brute-force way to narrow things down would be to write a little > program that tries to retrieve each row individually by primary key, > starting at 115848 since you know the rows before that are okay. Thanks, this worked. I ran a perl script that went from 1 to max(primary_id), and selected a record and then inserted it into a new table. There were a total of two bad records, so not too bad. =) > That's disturbing; short of a serious failure (disk crash, for instance) > I don't know of anything that would cause this. > > One thing that would be interesting to try is to investigate the TOAST > table directly. # select oid from pg_class where relname = 'users'; oid --------- 9361620 (1 row) # select chunk_seq, length(chunk_data) from pg_toast_9361620 where chunk_id = 12851102 order by chunk_seq; chunk_seq | length -----------+-------- (0 rows) Very strange. Now that we can backup the data, we've switched the database to a brand new disk drive, and re-imported and vacuumed everything. The application is running smoothly again. I doubt this is relevant, but we were symlinking /usr/local/pgsql/data -> /mnt/disk2/pgsql. Also, one column in the problem table was a text field avg'ing 20k. I still have the old database if it helps. Thanks for all your help, Alex
On Fri, 19 Apr 2002, Tom Lane wrote: > "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > Obviously I can't use the RECORD type but when I create the function with: > > CREATE FUNCTION myfunction (myview%ROWTYPE) RETURNS ... ; > > the '%' gets reported in an error message. > > Leave off the %ROWTYPE. Also read the info about composite-type > arguments at > http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/xfunc-sql.html Thanks Tom. I didn't read that particular page before, I had been concentrating on the PL/pgSQL pages. As soon as I'd read that page I had the function being created. Unfortunately, I can't now get the rule to be created. Again, I can't find the relevent information in the docs, even though it's probably obvious again, so any help on why this: CREATE RULE insert_listing AS ON insert TO listing_view DO INSTEAD SELECT insert_listing_fn(NEW); gives the error: 'parser: parse error at or near ")"' would be appreciated. Using: 1) SELECT 1,2,3 2) SELECT NEW.title 3) SELECT insert_listing_fn() 4) SELECT insert_listing_fn(NEW.title) as the rule's action do not give this error, although 3 and 4 do of course give a different error since there aren't version of the function defined with those parameters. I know I could just expand the NEW record in the call to the function, giving each attribute as a parameter. However, I'd be more interested in knowing how to do this properly. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants