On Fri, Oct 21, 2016 at 1:37 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 8:03 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> Most or all the damage seemed to be to the system catalogs with
>>> at least two critical tables dropped or inaccessible in some
>>> fashion. A lot of the OIDs seemed to be pointing at the wrong
>>> thing.
>>
>> While the oid in pg_class often matches the filename, that is not
>> true after some operations (like CLUSTER or VACUUM FULL). It is
>> the relfilenode column that is the definitive link to the file.
>
> no such operations happened. In the first instance at least one table
> dropped from the system catalogs. I have a hunch that the heap is
> fine (supported by the size of the database on disk). At this
> precise moment I'm restoring the database to another fileserver in
> order to do some forensic analysis, also in the hopes of getting the
> second database online in order to expedite recovery.
>
> ah -- done. :-) deleting the init file didn't help, but starting up
> single user allowed the start up to gracefully fail with a FATAL cache
> lookup.
OK, I have some good (very- in the specific case of yours truly) news
to report. Doing a filesystem level copy to a test server I was able
to relfilenode swap one of the critical tables over the place of the
refilenode of the stored backup. Not being able know the file to copy
from, I figured out the source node by judging the size and using
'strings' utility. Data recovery for that table at least appears to
be 100%.
For those following along, this simple process is only likely to work
easily if the table contains only system types; no user types, enums,
composites, etc, since those have a unique ID for each data restore.
merlin