Replication fell out of sync - Mailing list pgsql-general

From David Kerr
Subject Replication fell out of sync
Date
Msg-id 20150302232510.GA21880@mr-paradox.net
Whole thread Raw
Responses Re: Replication fell out of sync  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: Replication fell out of sync  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Howdy,

I had an instance where a replica fell out of sync with the master.

Now it's in in a state where it's unable to catch up because the master has already removed the WAL segment.

(logs)
Mar  2 23:10:13 db13 postgres[11099]: [3-1] user=,db=,host= LOG:  streaming replication successfully connected to
primary
Mar  2 23:10:13 db13 postgres[11099]: [4-1] user=,db=,host= FATAL:  could not receive data from WAL stream: FATAL:
requestedWAL segment 000000060000047C0000001F has already been removed 


I was under the impression that when you setup streaming replication if you specify a restore command like :
restore_command= 'cp /arch/%f %p' 

Then even if the slave falls out of sync, and the master removes the WAL segment, as long as you can still retrieve the
WALfiles, then it can bring itself back into sync. 


But that doesn't seem to be happening.

The restore_command is working
# Slave's $PGDATA/pg_xlog/
-rw------- 1 postgres postgres 16777216 Mar  2 21:29 000000060000047C0000001F
-rwx------ 1 postgres postgres 16777216 Mar  2 23:13 RECOVERYXLOG

I'm on PG 9.2.7, which i know is old, but I'm upgrading shortly.

recovery.conf:
standby_mode      = 'on'
primary_conninfo  = 'host=pgmaster port=5432'
restore_command   = 'cp /arch/%f %p'

relevant info from postgresql.conf:
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
hot_standby = on
hot_standby_feedback = true


I know that to avoid this entirely I need to set wal_keep_segments higher, although in this particular case it wouldn't
havemattered because a rogue program slammed the database and basically 32/64/128 WAL segments went by in a short span
oftime. 

However, I really thought that as long as PG could get the archived logs i'd be able to recover.

Was I wrong with that assertion or did i just run into a bug?

Thanks


pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Application written in pure pgsql, good idea?
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Replication fell out of sync