On Fri, Jun 2, 2017, at 11:51 AM, Alexander Kukushkin wrote:
There is one strange and awful thing I don't understand about restore_command: it is always being called for every single WAL segment postgres wants to apply (even if such segment already exists in pg_xlog) until replica start streaming from the master.
The real problem this question is related to is being unable to bring a former master, demoted after a crash, online, since the WAL segments required to get it to the consistent state were not archived while it was still a master, and local segments in pg_xlog are ignored when a restore_command is defined. The other replicas wouldn't be good candidates for promotion as well, as they were way behind the master (because the last N WAL segments were not archived and streaming replication had a few seconds delay).
Is this a correct list for such questions, or would it be more appropriate to ask elsewhere (i.e. pgsql-bugs?)
If there is no restore_command in the recovery.conf - it perfectly works, i.e. postgres replays existing wal segments and at some point connects to the master and start streaming from it.
When recovery_conf is there, starting of a replica could become a real problem, especially if restore_command is slow.
Is it possible to change this behavior somehow? First look into pg_xlog and only if file is missing or "corrupted" call restore_command.
Regards,
---
Alexander Kukushkin