Jira database won't start after disk filled up - Mailing list pgsql-general

From Paul Costello
Subject Jira database won't start after disk filled up
Date
Msg-id CADX_Xgbnx_s3Tzk=mBZwwcYSHkf3DSFaQJaVkgkgRaZAhEpSxg@mail.gmail.com
Whole thread Raw
Responses Re: Jira database won't start after disk filled up  (Vick Khera <vivek@khera.org>)
List pgsql-general
I have a database that wouldn't start due to the disk filling up back on 1/10, unbeknownst to us until 2/27.  This is jira, so it's critical data.  It appears jira was running in memory that entire time.

I needed to run pg_resetxlog -f in order to start the database.  It started, but upon logging in I found the system catalog and some data to be corrupt. 

I was able to run a pg_dumpall on the database and restore it to an re-initialized cluster.  However, there were 3 primary key errors during the restore, because duplicate data got into the tables. 

My hypothesis is that because of the system catalog corruption the primary key uniqueness was not being enforced.  Not sure when this occurred though  1) right after the disk filled up 2) when I ran pg_resetxlog -f or 3) after I ran pg_resetxlog and before I did the backup.  jira was still running after I got it started and I waited a few hours to do the backup.  My guess is the duplicate data got in there right after the disk filled up on 1/10 though.

We had a snapshot from 1/5 which is restored to production, such as it is.  But, they created another test vm for me to attempt to bring data back to 2/27. 

Is there anything I can do short of pg_resetxlog -f to bring this database back up more safely, and possibly avoid the duplicate data/primary key errors?  It wouldn't start without the force option.  Should I simply shut down jira, try pg_restxlog -f again and do the pg_dumpall immediately?

These are the errors I am currently seeing while trying to start the database. 

2018-03-02 11:01:06 CST LOG:  database system was interrupted; last known up at 2018-01-10 12:19:01 CST
2018-03-02 11:01:06 CST LOG:  database system was not properly shut down; automatic recovery in progress
2018-03-02 11:01:06 CST LOG:  redo starts at 36/B8556D58
2018-03-02 11:01:06 CST LOG:  incomplete startup packet
2018-03-02 11:01:07 CST FATAL:  the database system is starting up
...
2018-03-02 11:01:12 CST LOG:  incomplete startup packet
2018-03-02 11:01:29 CST FATAL:  the database system is starting up
...
2018-03-02 11:01:30 CST LOG:  record with zero length at 36/F754CBD8
2018-03-02 11:01:30 CST LOG:  redo done at 36/F754CBA8
2018-03-02 11:01:30 CST LOG:  last completed transaction was at log time 2018-02-26 17:55:43.238541-06

Any ideas or thoughts are appreciated.

Paul

pgsql-general by date:

Previous
From: Gary M
Date:
Subject: Re: Is there a continuous backup for pg ?
Next
From: Vick Khera
Date:
Subject: Re: Is there a continuous backup for pg ?