Re: [GENERAL] Recovery Assistance - Mailing list pgsql-general

From Adrian Klaver
Subject Re: [GENERAL] Recovery Assistance
Date
Msg-id cb6a91c8-10cf-071a-6d07-ce78908a6d60@aklaver.com
Whole thread Raw
In response to [GENERAL] Recovery Assistance  (Brian Mills <brian@trybooking.com>)
Responses Re: [GENERAL] Recovery Assistance  (Brian Mills <brian@trybooking.com>)
List pgsql-general
On 01/27/2017 01:31 PM, Brian Mills wrote:
> Hi,
>
> I have a Atlassian Confluence Wiki that depends on postgres, but I
> haven't much experience with postgres other than for this purpose.
>
> A few days ago, the hard disk filled, so all services stopped working.
> When the admin realised this he increased the disk size (its in a cloud,
> so that was easy to do) however I think the way it shutdown left
> Postgres in an inconsistent state for some reason.
> Now when I start it up I get this message in the log over and over again:
> "FATAL:  the database system is starting up"
>
> I have a backup, which I have successfully recovered, but it is 24 hours
> old, the next backup was the cause of the disk filling. So I'm using
> this as exercise in learning a bit more about postgres.
>
> I did some research and found a number of options. I took a file level
> backup with the service not running then tried 2 things. I haven't found
> much else on what to do though.
>
> *Attempt 1 - Reset Log *
>
> It sounded like this shouldn't be my first option (it wasn't) but it did
> sound like what I needed to do.
> I ran this command
> ./pg_resetxlog /var/lib/postgresql/9.3/main -f
> It worked a treat, the database did startup ok.
> However when I tried to dump the DB:
> root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
> $now-confluencedb.sql
> pg_dump: Dumping the contents of table "bodycontent" failed:
> PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  unexpected chunk size 104
> (expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
> pg_dump: The command was: COPY public.bodycontent (bodycontentid, body,
> contentid, bodytypeid) TO stdout;
> The dump failed, so I assume this did leave my database in an
> inconsistent state.
>
>
> *Attempt 2 -  startup manually and let it try recovery*
>
> I restored my file level backup and started again.
> This time I tried to startup manually on the command line to see the
> output (I'd done it as a service startup a number of times to nearly the
> same effect too)
>
> postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
> /etc/postgresql/9.3/main start
> server starting
> postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT
> LOG:  database system was interrupted while in recovery at 2017-01-27
> 20:13:26 AEDT
> 2017-01-27 20:36:08 AEDT HINT:  This probably means that some data is
> corrupted and you will have to use the last backup for recovery.
> 2017-01-27 20:36:08 AEDT LOG:  database system was not properly shut
> down; automatic recovery in progress
> 2017-01-27 20:36:08 AEDT LOG:  redo starts at 5/528B4558
> 2017-01-27 20:36:18 AEDT LOG:  redo done at 5/A3FFF9E8
> 2017-01-27 20:36:18 AEDT LOG:  last completed transaction was at log
> time 2017-01-24 02:08:00.023064+11
> 2017-01-27 23:00:01 AEDT FATAL:  the database system is starting up
> 2017-01-27 23:00:01 AEDT FATAL:  the database system is starting up

What does ps ax | grep post show?

Is the cluster set up to log to a file, if so what does it show?

Does the system log show anything relevant?

>
> I've left it that way for 12 hours, and its still not allowing connections.
> I assume its doing some kind of consistency check?

What does it say when you attempt a connection?

>
> Does anyone have any suggestions on what I should be doing to try and
> restore this database?
>
> - The amount of change is minimal in the DB after 6pm it should be
> basically no change overnight.
> - The log above seems to suggest it has completed a redo ok, is that right?
> - The last completed transaction time 2017-01-24 02:08:00.023064+11 the
> log mentions would be fine to use, so loosing even a few hours before
> that would be more than adequate.
>
> I'm just not clear what the database is doing now, or how I should be
> trying to recover it.
>
> Any help anyone can suggest would be great! I've given myself this
> weekend to attempt to recover more than the last backup, after that I
> need to restore the service for my team to use and suck up the lost last
> day of updates.
>
> Thanks,
> Brian


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Brian Mills
Date:
Subject: [GENERAL] Recovery Assistance
Next
From: Brian Mills
Date:
Subject: Re: [GENERAL] Recovery Assistance