Problem with hot standby - Mailing list pgsql-general

From Michael Blake
Subject Problem with hot standby
Date
Msg-id AANLkTi=Svt92zSdhKTC1H2dFXiqzTfj4E-1NMe0bDD06@mail.gmail.com
Whole thread Raw
List pgsql-general
I'm trying to set up a master/slave server, which initially worked
fine, but recently started failing with the following error:

==============
LOG:  database system was interrupted; last known up at [time]
LOG:  could not open file "pg_xlog/00000001000000000000002B" (log file
0, segment 43): No such file or directory
LOG:  invalid checkpoint record
PANIC:  could not locate required checkpoint record
HINT:  If you are not restoring from a backup, try removing the file
"/var/lib/postgresql/9.0/main/backup_label".
LOG:  startup process (PID 31489) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
==============

This is an Ubuntu 10.04 machine, with all debian default
configurations barring the following changes:

[Primary: postgresql.conf]
wal_level = hot_standby
max_wal_senders = 1
archive_mode = on
archive_command = 'cp -i %p /var/lib/postgresql/export/9.0/main/%f
</dev/null'  # Unix
log_statement = 'all'

[Secondary: postgresql.conf]
hot_standby = on

[Secondary: recovery.conf]
standby_mode = 'on'
primary_conninfo = 'host=10.168.60.41 port=5432 user=replication_sys
password=XXXXXXXXX'
restore_command = 'cp /var/lib/postgresql/archive/9.0/main/%f "%p"'
#restore_command = '/usr/lib/postgresql/9.0/bin/pg_standby -c -d -s 2
-t /var/log/pgpool/trigger/trigger_file1
/var/lib/postgresql/archive/9.0/main %p >>
/var/log/postgresql/postgresql-9.0-standby.log.1 1>&2'
#restore_command = '/usr/lib/postgresql/9.0/bin/pg_standby
/var/lib/postgresql/archive/9.0/main %f %p %r'
#archive_cleanup_command = 'pg_archivecleanup
/var/lib/postgresql/archive/9.0/main %r'
#archive_command = 'cp %p /var/lib/postgresql/archive/9.0/main/%f'



The 'archive directory' mentioned above is an NFS mount of the primary
server's /var/lib/postgresql/export/9.0/main directory.
This is working fine, and I can see (in the archive directory on the
recovery server) the pg_xlog file mentioned in the error above.


The script I use to bring a server up to date after failure is as
follows, run as the postgresql user:

================
#!/bin/sh
SERVER=10.168.60.41
VERSION="9.0"
CLUSTER="main"
DEST_CLUSTER="/var/lib/postgresql/$VERSION/$CLUSTER"
ARCHIVE_CLUSTER="/var/lib/postgresql/archive/$VERSION/$CLUSTER"
/etc/init.d/postgresql stop

echo "SELECT pg_start_backup('backup');" | psql --host $SERVER --user
replication_sys template1
rm -rf $DEST_CLUSTER/pg_xlog
# Don't need to ignore postgresql.conf etc as they are in
/etc/postgresql as per debian standard install
rsync  -C -a -c --delete -e ssh --exclude pg_log --exclude pg_xlog
--exclude postmaster.pid --exclude postmaster.opts
$SERVER:$DEST_CLUSTER/* $DEST_CLUSTER/
mkdir -p $DEST_CLUSTER/pg_xlog/archive_status
chmod -R 700 $DEST_CLUSTER/pg_xlog
# stop the backup on the master
echo "SELECT pg_stop_backup();" | psql --host $SERVER --user
replication_sys template1
/etc/init.d/postgresql start
================



So I believe I'm doing it right, just can't seem to crack why the
pg_xlog error is happening.

pgsql-general by date:

Previous
From: "Brent Wood"
Date:
Subject: Re: Simple, free PG GUI/query tool wanted
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Simple, free PG GUI/query tool wanted