David Steele <david@pgmasters.net> writes:
> On 9/18/19 9:40 PM, Ron wrote:
>
>>
>> I'm concerned with one pgbackrest process stepping over another one and
>> the restore (or the "pg_ctl start" recovery phase) accidentally
>> corrupting the production database by writing WAL files to the original
>> cluster.
>
> This is not an issue unless you seriously game the system. When a
And/or your recovery system is running archive_mode=always :-)
I don't know how popular that setting value is but that plus an
identical archive_command as the origin... duplicate archival with
whatever consequences.
Disclaimer: I don't know if pgbackrest guards against such a
configuration.
> cluster is promoted it selects a new timeline and all WAL will be
> archived to the repo on that new timeline. It's possible to promote a
> cluster without a timeline switch by tricking it but this is obviously a
> bad idea.
>
> So, if you promote the new cluster and forget to disable archive_command
> there will be no conflict because the clusters will be generating WAL on
> separate timelines.
>
> In the case of a future failover a higher timeline will be selected so
> there still won't be a conflict.
>
> Unfortunately, that dead WAL from the rogue cluster will persist in the
> repo until an PostgreSQL upgrade because expire doesn't know when it can
> be removed since it has no context. We're not quite sure how to handle
> this but it seems a relatively minor issue, at least as far as
> consistency is concerned.
>
> If you do have a split-brain situation where two primaries are archiving
> on the same timeline then first-in wins. WAL from the losing primary
> will be rejected.
>
> Regards,
--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net