Re: Some problem with warm standby server - Mailing list pgsql-general

From Simon Riggs
Subject Re: Some problem with warm standby server
Date
Msg-id 1177755645.3622.93.camel@silverbirch.site
Whole thread Raw
In response to Some problem with warm standby server  (Nico Sabbi <nsabbi@officinedigitali.it>)
Responses Re: Some problem with warm standby server  (Nico Sabbi <nsabbi@officinedigitali.it>)
List pgsql-general
On Fri, 2007-04-27 at 12:31 +0200, Nico Sabbi wrote:

> I have some doubts regarding the settings and the access procedure of
> warm standby servers:
> - can autovacuum be safely enabled on the replicator?

Not a relevant question. The standby isn't "up", so no SQL can be
executed on the standby, so it never needs vacuuming. The autovacuum
process doesn't start until recovery finishes. That is why it is Warm
rather than Hot standby. Florian Pflug is working on allowing read-only
queries to execute on the standby, so we're hopeful of further
enhancements in the next release.

> - I'm using pg_standby (from cvs) that is generally working well as
> expected (logs are copied with
>   scp); today I wanted to  temporarily stop the replication to verify
> some data to restart it later on, so
>   I touched the trigger file, waited for the log to report "database
> ready", verified that the
>   databases were actually up-to-date. All was fine, then I ran
>
>   rm -f pg_xlog/* pg_xlog/archive_status/*
>   mv recovery.done recovery.conf (the permissions were right)
>   /etc/init.d/postgresql stop ; /etc/init.d/postgresql start
>
>   the replication seemed to start:
>  ----
> ---------------------------------------------------
> LOG:  database system was shut down at 2007-04-27 12:16:13 CEST
> LOG:  starting archive recovery
> LOG:  restore_command = "/usr/local/bin/pg_standby -s 5 -w 0 -t
> /usr/local/postgres_replica/trigger  /usr/local/postgres_replica/log/ %f %p"
> cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No
> such file or directory
> cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No
> such file or directory
> cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No
> such file or directory

Looks like there's an issue with double slashes on the archive filename,
which I will fix. This probably isnt the problem though.

> then I updated the master with a batch of inserts, but after a while the
> slave stopped with
> these messages:
>
> LOG:  restored log file "000000010000000000000021" from archive
> LOG:  record with zero length at 0/21000048
> LOG:  invalid primary checkpoint record
> LOG:  restored log file "000000010000000000000020" from archive
> LOG:  restored log file "000000010000000000000021" from archive
> LOG:  invalid resource manager ID in secondary checkpoint record
> PANIC:  could not locate a valid checkpoint record
> LOG:  startup process (PID 19619) was terminated by signal 6
> LOG:  aborting startup due to startup process failure

Please run pg_controldata to print out the control file.
Backup all the files in case we need to inspect them.
What was the ending log sequence number (e.g. x/xxxx) from the previous
recovery? I'll see if I can re-create this.

> What did I do wrong? Is there any other procedure to follow to restart a
> stopped replication?

You're right, using the trigger is not the right way to stop/start the
standby. Just stop/start the standby server normally.

The trigger means that you'd like to perform a failover.

There is a patch not yet applied which will make a new version of
pg_standby. pg_standby's official status right now is beta, so please
expect, look for and report any issues you find. Thanks.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



pgsql-general by date:

Previous
From: Lexington Luthor
Date:
Subject: Re: Processing a work queue
Next
From: "Marcelo de Moraes Serpa"
Date:
Subject: "Protocol error. Session setup failed" (PostgreSQL 8.3devel/postgresql-8.3dev-600.jdbc3)