On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
> Hello,
> We have already started a discussion on pgsql-hackers for the problem of
taking fresh backup during the failback operation here is the link for that:
>
http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
JgWrFu513s+Q@mail.gmail.com
> Let me again summarize the problem we are trying to address.
> When the master fails, last few WAL files may not reach the standby. But
the master may have gone ahead and made changes to its local file system
after > flushing WAL to the local storage. So master contains some file
system level changes that standby does not have. At this point, the data
directory of > master is ahead of standby's data directory.
> Subsequently, the standby will be promoted as new master. Later when the
old master wants to be a standby of the new master, it can't just join the
> setup since there is inconsistency in between these two servers. We need
to take the fresh backup from the new master. This can happen in both the
> synchronous as well as asynchronous replication.
> Fresh backup is also needed in case of clean switch-over because in the
current HEAD, the master does not wait for the standby to receive all the
WAL
> up to the shutdown checkpoint record before shutting down the connection.
Fujii Masao has already submitted a patch to handle clean switch-over case,
> but the problem is still remaining for failback case.
> The process of taking fresh backup is very time consuming when databases
are of very big sizes, say several TB's, and when the servers are connected
> over a relatively slower link. This would break the service level
agreement of disaster recovery system. So there is need to improve the
process of
> disaster recovery in PostgreSQL. One way to achieve this is to maintain
consistency between master and standby which helps to avoid need of fresh
> backup.
> So our proposal on this problem is that we must ensure that master should
not make any file system level changes without confirming that the
> corresponding WAL record is replicated to the standby.
How will you take care of extra WAL on old master during recovery. If it
plays the WAL which has not reached new-master, it can be a problem.
With Regards,
Amit Kapila.