On Tue, Feb 19, 2019 at 09:25:57AM -0800, Andre Piwoni wrote:
> I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master
> when failover happens. archive_mode is set to "on" and not "always".
> I repoint slave to the master by stopping it, updating recovery.conf and
> restarting it. Let me know if I'm doing it wrong.
As long as you stop the primary cleanly (stop or smart mode) so as the
primary has the possibility to send its shutdown checkpoint record to
the standby and makes sure that the standby has flushed the record,
that's safe.
> I think this problem is created before promotion when new slave is created
> using pg_basebackup with --wal-method=stream and manifests when actual
> promotion happens.
> What I'm trying to say it does not seem that .partial extension is the
> issue here but lack of .done extension.
Well, sure. If you begin by reusing an old backup, you have a risk to
potentially archive the same segment multiple times if you use the
same archive location for all your servers. Since 9.5 this can get
even more complication as archive_mode has gained an "always" mode
which makes also standbys archive segments while in recovery to give
the users a switch for more archiving redundancy, which is useful when
working with async standbys across multiple sites. My point is that
this stuff has always worked this way. And people around do not
actually complain about the difference made for archive_status/ when
using the stream of fetch methods with pg_basebackup. From what I can
see as well, your archive_command is actually unsafe on many points,
so my take is that you should more carefully design it, or rely on an
existing backup solution developed by experts in PostgreSQL backups.
And no, it is not safe to change a behavior that other people may rely
heavily on for their solutions since pg_basebackup got smarter with
its stream mode.
--
Michael