Thread: Postgresql-9.0.1 Recovery

Postgresql-9.0.1 Recovery

From
Venkat Balaji
Date:
Hello Everyone,

I have a situation here -

I am trying to restore the production online backup and recover the same.

 - I had initially rsynced (excluded pg_log) the data directory and the tarred and zipped the same
 - SCP'd the tar to a different server and untarred and unzipped the same
 - I got an error "unable to read <filename> from pg_clog location" (file size is around 160K)

I have an backup as on Aug 24th and the pg_clog file which pg is unable to read is as on Aug 5th. 

I am aware that the size of the files in pg_clog is supposed to be 256K. 

What i understood is that, rsync some how missed out on syncing the files in "pg_clog", so, i had manually coped the missing pg_clog file from production and tried recovery.

To my surprise, PG recovery had asked for the corresponding pg_xlog (wal archive) file as on Aug 5th.

Is there a way to recovery this ? 
Do i need to get that particular wal archive which is before online backup time ? 

I do have all the other files till Aug24th.

By this experience what i understand is that Postgresql stores committed and uncommited transactions in pg_xlog / wal archive files and information (not the transaction data) about transaction commit status is stored in pg_clog. Am I correct ?

I am in the process of designing a disaster recovery planner for our productions systems.

Version  - PG-9.0.1
OS        - RHEL 5

Please advice !

Regards,
Venkat

Re: Postgresql-9.0.1 Recovery

From
Craig Ringer
Date:
On 30/08/2011 6:59 PM, Venkat Balaji wrote:
> Hello Everyone,
>
> I have a situation here -
>
> I am trying to restore the production online backup and recover the same.
>
>   - I had initially rsynced (excluded pg_log) the data directory and the
> tarred and zipped the same

Did you do that after pg_start_backup() or on a stopped database server?

If you did it on a running database server without first running
pg_start_backup(), your backup is invalid.

Personally I like to take my base backups from an LVM snapshot of the
datadir just to be extra safe. That isn't necessary, though, and a
regular rsync or tar or whatever of a datadir after pg_start_backup() is
fine.

Remember to run pg_stop_backup() afterwards.

>   - I got an error "unable to read <filename> from pg_clog location"
> (file size is around 160K)

... from PostgreSQL, when you tried to start it?

What emitted that error message?

> What i understood is that, rsync some how missed out on syncing the
> files in "pg_clog"  so, i had manually coped the missing pg_clog file
> from production and tried recovery.

That won't work. You need a consistent snapshot of all the files in the
data dir. You cannot just mix and match copies taken at different times.

For efficiency reasons PostgreSQL will recycle used clog files. You
can't just copy a file over and hope that because it has the same name,
it still contains the data you want.

Your backup *failed* at the point where you got an incomplete copy of
the data directory.

> Do i need to get that particular wal archive which is before online
> backup time ?

No, you need to get the missing clog files. If you cannot do that, try
using pg_resetxlog, but be aware that that may lose transactions and can
potentially cause corruption of tables and indexes.

> By this experience what i understand is that Postgresql stores committed
> and uncommited transactions in pg_xlog / wal archive files and
> information (not the transaction data) about transaction commit status
> is stored in pg_clog. Am I correct ?

That sounds right to me, but I don't know as much about how Pg stores
things as I should.

> I am in the process of designing a disaster recovery planner for our
> productions systems.

Congratulations!

Be extremely glad this didn't happen in a real recovery scenario. This
is a marvellous example of why you should always test your backups - you
actually did, and found a problem that would've been a critical issue if
the backup were actually needed.

--
Craig Ringer

Re: Postgresql-9.0.1 Recovery

From
Venkat Balaji
Date:
Thanks Craig !

Below  is what i did -

1. pg_start_backup()
2. rsync the data dir
3. pg_stop_backup()

I believe the backup is valid because, i was able to bring up the cluster without any issues (ofcourse with data loss).

+ve signs-

I am able to bring up the cluster with the Online backup, but, only with the loss  of data.

-ve signs and things to be strongly foreseen while backup testing -
  • pg_clog files were not synced. I suspect they were being written at the time of backup. I might have tried to sync the data dir when pg_clog files were half filled.
  • Though the WAL Archives are there, Postgres is not trying to recover beyond the timestamp at which pg_clog was missing. 
  • Even if i replace the missing pg_clog files (which i did), Postgres is asking for the corresponding wal archive files
Yes. What i learnt is that we need to ensure that all the pg_clog files must be fully copied as on the backup time. We cannot afford to miss any of them.

Thanks
Venkat
On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 30/08/2011 6:59 PM, Venkat Balaji wrote:
Hello Everyone,

I have a situation here -

I am trying to restore the production online backup and recover the same.

 - I had initially rsynced (excluded pg_log) the data directory and the
tarred and zipped the same

Did you do that after pg_start_backup() or on a stopped database server?

If you did it on a running database server without first running pg_start_backup(), your backup is invalid.

Personally I like to take my base backups from an LVM snapshot of the datadir just to be extra safe. That isn't necessary, though, and a regular rsync or tar or whatever of a datadir after pg_start_backup() is fine.

Remember to run pg_stop_backup() afterwards.


 - I got an error "unable to read <filename> from pg_clog location"
(file size is around 160K)

... from PostgreSQL, when you tried to start it?

What emitted that error message?


What i understood is that, rsync some how missed out on syncing the
files in "pg_clog"  so, i had manually coped the missing pg_clog file
from production and tried recovery.

That won't work. You need a consistent snapshot of all the files in the data dir. You cannot just mix and match copies taken at different times.

For efficiency reasons PostgreSQL will recycle used clog files. You can't just copy a file over and hope that because it has the same name, it still contains the data you want.

Your backup *failed* at the point where you got an incomplete copy of the data directory.


Do i need to get that particular wal archive which is before online
backup time ?

No, you need to get the missing clog files. If you cannot do that, try using pg_resetxlog, but be aware that that may lose transactions and can potentially cause corruption of tables and indexes.


By this experience what i understand is that Postgresql stores committed
and uncommited transactions in pg_xlog / wal archive files and
information (not the transaction data) about transaction commit status
is stored in pg_clog. Am I correct ?

That sounds right to me, but I don't know as much about how Pg stores things as I should.


I am in the process of designing a disaster recovery planner for our
productions systems.

Congratulations!

Be extremely glad this didn't happen in a real recovery scenario. This is a marvellous example of why you should always test your backups - you actually did, and found a problem that would've been a critical issue if the backup were actually needed.

--
Craig Ringer