Trying to handle db corruption 9.6 - Mailing list pgsql-performance

From Mariel Cherkassky
Subject Trying to handle db corruption 9.6
Date
Msg-id CA+t6e1=AHht9vqazjB8s15KH2Y7Vs1br3AVZpU5v3fJZ-Eg9Pw@mail.gmail.com
Whole thread Raw
Responses Re: Trying to handle db corruption 9.6
Re: Trying to handle db corruption 9.6
List pgsql-performance
Hey,
I'm trying to handle a corruption that one of our customers is facing.
His disk space was full and as a result of that he decided to run pg_resetxlog a few times(bad idea..) .
When I connected to the machine I saw that the db was down. 
When I started the db (service postgresql start) I saw the next error in the logs :

DETAIL:  Could not open file "pg_multixact/offsets/0000": No such file or directory.

The pg_multixact/offset dir contained one file (0025).
The pg_multixact/members dir contains 2 files : 0000 and 0001.

I tried to follow the documentation of pg_resetxlog, and run pg_resetxlog with -m 0xF0A604,0xEA50CE which are 0025*65536  and 0026*65536  in hexa.
However, it didnt help and the same error appeared.
So I tried to rename the file to 0000 and then the db searched for a file in members that wasnt exist.
I followed the documentation and changed the multitransaction offset (-O) and the transactions id (-c ) based on the doc and then the db was started succesfully.
However after it started I saw the next msg in the logs : 
Multixact member wraparound protections are disabled because oldest checkpointed Multixact 65536 doesnt exist. In addition, no one is able to connect to the db (we keep getting errors database doesnt exist or user doesnt exist , even for postgresql user).

current relevant rows from the control data : 

pg_control version number:            960

Catalog version number:               201608131

Database system identifier:           6692952810876880414

Database cluster state:               shut down

pg_control last modified:             Mon 20 May 2019 07:07:30 AM PDT

Latest checkpoint location:           1837/E3000028

Prior checkpoint location:            1837/E2000028

Latest checkpoint's REDO location:    1837/E3000028

Latest checkpoint's REDO WAL file:    0000000100001837000000E3

Latest checkpoint's TimeLineID:       1

Latest checkpoint's PrevTimeLineID:   1

Latest checkpoint's full_page_writes: on

Latest checkpoint's NextXID:          0:3

Latest checkpoint's NextOID:          10000

Latest checkpoint's NextMultiXactId:  131072

Latest checkpoint's NextMultiOffset:  52352

Latest checkpoint's oldestXID:        3

Latest checkpoint's oldestXID's DB:   0

Latest checkpoint's oldestActiveXID:  0

Latest checkpoint's oldestMultiXid:   65536

Latest checkpoint's oldestMulti's DB: 0

Latest checkpoint's oldestCommitTsXid:4604

Latest checkpoint's newestCommitTsXid:5041




I also checked and I saw that the customer has all the wals (backed up) but without any basebackup..
Any recommendations how to handle the case ?

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Analyze results in more expensive query plan
Next
From: Flo Rance
Date:
Subject: Re: Trying to handle db corruption 9.6