Problem with hot standby - Mailing list pgsql-general
From | Michael Blake |
---|---|
Subject | Problem with hot standby |
Date | |
Msg-id | AANLkTi=Svt92zSdhKTC1H2dFXiqzTfj4E-1NMe0bDD06@mail.gmail.com Whole thread Raw |
List | pgsql-general |
I'm trying to set up a master/slave server, which initially worked fine, but recently started failing with the following error: ============== LOG: database system was interrupted; last known up at [time] LOG: could not open file "pg_xlog/00000001000000000000002B" (log file 0, segment 43): No such file or directory LOG: invalid checkpoint record PANIC: could not locate required checkpoint record HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.0/main/backup_label". LOG: startup process (PID 31489) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure ============== This is an Ubuntu 10.04 machine, with all debian default configurations barring the following changes: [Primary: postgresql.conf] wal_level = hot_standby max_wal_senders = 1 archive_mode = on archive_command = 'cp -i %p /var/lib/postgresql/export/9.0/main/%f </dev/null' # Unix log_statement = 'all' [Secondary: postgresql.conf] hot_standby = on [Secondary: recovery.conf] standby_mode = 'on' primary_conninfo = 'host=10.168.60.41 port=5432 user=replication_sys password=XXXXXXXXX' restore_command = 'cp /var/lib/postgresql/archive/9.0/main/%f "%p"' #restore_command = '/usr/lib/postgresql/9.0/bin/pg_standby -c -d -s 2 -t /var/log/pgpool/trigger/trigger_file1 /var/lib/postgresql/archive/9.0/main %p >> /var/log/postgresql/postgresql-9.0-standby.log.1 1>&2' #restore_command = '/usr/lib/postgresql/9.0/bin/pg_standby /var/lib/postgresql/archive/9.0/main %f %p %r' #archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/archive/9.0/main %r' #archive_command = 'cp %p /var/lib/postgresql/archive/9.0/main/%f' The 'archive directory' mentioned above is an NFS mount of the primary server's /var/lib/postgresql/export/9.0/main directory. This is working fine, and I can see (in the archive directory on the recovery server) the pg_xlog file mentioned in the error above. The script I use to bring a server up to date after failure is as follows, run as the postgresql user: ================ #!/bin/sh SERVER=10.168.60.41 VERSION="9.0" CLUSTER="main" DEST_CLUSTER="/var/lib/postgresql/$VERSION/$CLUSTER" ARCHIVE_CLUSTER="/var/lib/postgresql/archive/$VERSION/$CLUSTER" /etc/init.d/postgresql stop echo "SELECT pg_start_backup('backup');" | psql --host $SERVER --user replication_sys template1 rm -rf $DEST_CLUSTER/pg_xlog # Don't need to ignore postgresql.conf etc as they are in /etc/postgresql as per debian standard install rsync -C -a -c --delete -e ssh --exclude pg_log --exclude pg_xlog --exclude postmaster.pid --exclude postmaster.opts $SERVER:$DEST_CLUSTER/* $DEST_CLUSTER/ mkdir -p $DEST_CLUSTER/pg_xlog/archive_status chmod -R 700 $DEST_CLUSTER/pg_xlog # stop the backup on the master echo "SELECT pg_stop_backup();" | psql --host $SERVER --user replication_sys template1 /etc/init.d/postgresql start ================ So I believe I'm doing it right, just can't seem to crack why the pg_xlog error is happening.
pgsql-general by date: