Thread: db recovery after hd crash (could not open relation 1663/16385/16400: No such file or directory)
db recovery after hd crash (could not open relation 1663/16385/16400: No such file or directory)
From
Roland Wells
Date:
Hello all, Background: DB resides on a mirrored array, HD1 goes bad and before it can be replaced, HD2 starts throwing hardware errors. We were able to get a DD from HD2 and have restored the box quite successfully with the exception of the pg db. There WAS corruption to the filesystem. Since the last backup, a significant number of records have been added to the db. If there is a way to recover some data, it would likely be easier to reconcile and fix manually than reenter. Box is FreeBSD 8, postgresql version 8.1. Progress so far: On new hardware, installed pg8.1 and restored data directory and attempted to start pg. After fixing some config differences, it complained of a missing pg_clog/xxx file. After searching the archives for hints, added a file of zero's and pg starts successfully. The db in question exists with what appears to be intact structure. When attempting to dump the db ran into the error: "could not open relation 1663/16385/16400: No such file or directory" and run into similar errors (with different relation numbers) trying to select * from certain tables. I'm stuck here. Assuming the damage to filesystem, what is the best a way forward? Does the relation error indicate damage to the data? or to an index??? Is there a process I can do to try and salvage partial data? Thanks for any help/tips/suggestions. Roland Wells
Re: db recovery after hd crash (could not open relation 1663/16385/16400: No such file or directory)
From
Tom Lane
Date:
Roland Wells <roland.wells@gmail.com> writes: > On new hardware, installed pg8.1 and restored data directory and > attempted to start pg. After fixing some config differences, it > complained of a missing pg_clog/xxx file. After searching the archives > for hints, added a file of zero's and pg starts successfully. The db > in question exists with what appears to be intact structure. When > attempting to dump the db ran into the error: "could not open relation > 1663/16385/16400: No such file or directory" and run into similar > errors (with different relation numbers) trying to select * from > certain tables. > I'm stuck here. Assuming the damage to filesystem, what is the best a > way forward? Does the relation error indicate damage to the data? or > to an index??? Is there a process I can do to try and salvage partial > data? You can look for the third number in pg_class.relfilenode to see exactly which table or index it's complaining about. If it's happening in pg_dump then it's probably a table not an index, unfortunately. Not sure there's much you can do except go back to your last backup --- PG can't do anything to recover data that the filesystem has lost track of, which is what this is sounding like. But having said that, did you try fsck'ing the DD image to see if fsck could recover any files? regards, tom lane
Re: db recovery after hd crash (could not open relation 1663/16385/16400: No such file or directory)
From
Roland Wells
Date:
On Fri, Feb 26, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Roland Wells <roland.wells@gmail.com> writes: >> On new hardware, installed pg8.1 and restored data directory and >> attempted to start pg. After fixing some config differences, it >> complained of a missing pg_clog/xxx file. After searching the archives >> for hints, added a file of zero's and pg starts successfully. The db >> in question exists with what appears to be intact structure. When >> attempting to dump the db ran into the error: "could not open relation >> 1663/16385/16400: No such file or directory" and run into similar >> errors (with different relation numbers) trying to select * from >> certain tables. > >> I'm stuck here. Assuming the damage to filesystem, what is the best a >> way forward? Does the relation error indicate damage to the data? or >> to an index??? Is there a process I can do to try and salvage partial >> data? > > You can look for the third number in pg_class.relfilenode to see exactly > which table or index it's complaining about. If it's happening in > pg_dump then it's probably a table not an index, unfortunately. > Not sure there's much you can do except go back to your last backup > --- PG can't do anything to recover data that the filesystem has lost > track of, which is what this is sounding like. > > But having said that, did you try fsck'ing the DD image to see if fsck > could recover any files? > Thanks for the replies: A look at pg_class.relfilenode seems to confirm that tables are missing (7 out of 52 tables in the db are missing). I did do an fsck on the restored disk image and there is quite a few files in the lost+found. I am still wading through them at the moment, but not hopeful of recovery this way as many I've looked at so far appear to only be partial files. Additionally, several of the tables in the db show far less rows than should exist, so there may be damage to those files as well... Probably will head to the backup. Thanks again. Roland