Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Hot Backup with rsync fails at pg_clog if under load
Date
Msg-id F363F283-B85C-47E0-AB8E-F69572C1738B@phlo.org
Whole thread Raw
In response to Hot Backup with rsync fails at pg_clog if under load  (Linas Virbalas <linas.virbalas@continuent.com>)
List pgsql-hackers
On Sep21, 2011, at 16:44 , Linas Virbalas wrote:
> After searching the archives, the only more discussed and similar issue I
> found hit was by Daniel Farina in a thread "hot backups: am I doing it
> wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
> was discarded because of a non-standard backup procedure Deniel used.

That's not the way I read that thread. In fact, Robert Haas confirmed that
Daniel's backup procedure was sound in theory. The open question was whether the
error occurred because of a Bug in Daniel's backup code or postgresql's restore
code. The thread then petered out without that question being answered.

> Procedure:
>
> 1. Start load generator on the master (WAL archiving enabled).
> 2. Prepare a Streaming Replication standby (accepting WAL files too):
> 2.1. pg_switch_xlog() on the master;
> 2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
> while as master is loaded up);
> 2.3. rsync data/global/pg_control to the standby;
> 2.4. rsync all other data/ (without pg_xlog) to the standby;
> 2.5. pg_stop_backup() on the master;
> 2.6. Wait to receive all WAL files, generated during the backup, on the
> standby;
> 2.6. Start the standby PG instance.

Looks good. (2.1) and (2.3) seem redundant (as Euler already noticed),
but shouldn't cause any errors.

Could you provide us with the exact rsync version and parameters you use?

> The last step will, usually, fail with a similar error:
>
> 2011-09-21 13:41:05 CEST LOG:  database system was interrupted; last known
> up at 2011-09-21 13:40:50 CEST
> Restoring 00000014.history
> mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
> or directory
> Restoring 00000013.history
> 2011-09-21 13:41:05 CEST LOG:  restored log file "00000013.history" from
> archive
> 2011-09-21 13:41:05 CEST LOG:  entering standby mode
> Restoring 0000001300000006000000DC
> 2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DC"
> from archive
> Restoring 0000001300000006000000DB
> 2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DB"
> from archive
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction
> 1188673
> 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at
> offset 32768: Success.

Whats the size of the file (pg_clog/0001) on both the master and the slave?

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Daniel Vázquez
Date:
Subject: unaccent contrib
Next
From: Greg Stark
Date:
Subject: Re: Inlining comparators as a performance optimisation