Home > mailing lists

Re: PITR problem - Mailing list pgsql-general

From	Erik Jones
Subject	Re: PITR problem
Date	April 28, 2008 14:03:58
Msg-id	5309E09C-113D-4CEB-9776-D6F01D40C4DE@myemma.com Whole thread Raw
In response to	PITR problem (wstrzalka <wstrzalka@gmail.com>)
List	pgsql-general

Tree view

On Apr 26, 2008, at 5:11 PM, wstrzalka wrote:

> I have some problem with setting up PITR recovery on the database.
>
> I have archive_command set properly and logs are shipping OK. Archive
> timeout is also set (5 min).
>
> When performing pg_start_backup the WAL is lets say on position
> 0000000100000001000000D9, then I start copy database to the second
> machine which takes me 30 minutes. In that time archive timeout is
> called a few times and those file are shipped properly to the second
> host. After DB is succesfully copied i'm calling pg_stop_backup. The
> WAL is at the moment on position 0000000100000001000000DE.
>
> In that moment I see on the second machine WAL files from
> 0000000100000001000000D9 to 0000000100000001000000DE as well as
> 0000000100000001000000D9.00000020.backup
>
> The problem occurs now when I'm trying to start my standby server in
> recovery mode (with pg_standby).
>
> The output from pg_standby:
> ------------------------------------
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 00000001.history
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 00000001.history
> Restoring to...          : pg_xlog/RECOVERYHISTORY
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 00000001.history" "pg_xlog/RECOVERYHISTORY"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
>
>
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 0000000100000001000000D9.00000020.backup
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup
> Restoring to...          : pg_xlog/RECOVERYHISTORY
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup" "pg_xlog/RECOVERYHISTORY"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
>
>
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 0000000100000001000000D9
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9
> Restoring to...          : pg_xlog/RECOVERYXLOG
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9" "pg_xlog/RECOVERYXLOG"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000D9"
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000DA"
>
> --------------------------------------------------------------------------------------------------------
>
>
> For the first time I start standby Postgres log says and the postgres
> process goes down:
> --------------------------------------------------------------------------------------------------------
> restored log file "0000000100000001000000D9.00000020.backup" from
> archive
> could not open file "pg_xlog/0000000100000001000000D9" (log file 1,
> segment 217): No such file or directory
> invalid checkpoint record
> could not locate required checkpoint record
> If you are not restoring from a backup, try removing the file "/var/
> lib/pgsql/data/backup_label".
> startup process (PID 19201) was terminated by signal 6: Aborted
> aborting startup due to startup process failure
> --------------------------------------------------------------------------------------------------------
>
> When I try to start PG for the second time it just stucks waiting
> for ...000D9
>
> In my opinion the problem is that when starting standby PostgresSQL
> wants to recovery WAL 0000000100000001000000D9, but first deletes it,
> as keep  archive history (%r) param is set to
> 0000000100000001000000DB
>
> Is it a bug or I'm missing something? I can repeat the scenario with
> this big DB. However it's not happening on exactly the same
> environment when playing with smaller cluster (copying cluster is
> shorter then archive_timeout ).

What is the full pg_standby command string (restore_command=....) in
your recovery.conf.  It sound's like you have pg_standby set to delete
archived WALs and possibly have that a little too aggressive.  Do you
have the -k flag set in your pg_standby call in your restore_command?

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

pgsql-general by date:

From: Viktor Rosenfeld
Date: 28 April 2008, 13:13:04
Subject: passing a temporary table with more than one column to a stored procedure

From: "Roberts, Jon"
Date: 28 April 2008, 14:26:02
Subject: Re: passing a temporary table with more than one column to a stored procedure

Re: PITR problem - Mailing list pgsql-general

Previous

Next