Hackers,
The .partial mechanism was added in de768844 to help avoid conflicts
between a newly promoted primary and an old primary that might produce
the same WAL segment. This works for a single promotion but can become
problematic in HA configurations where there may be several promotions
before a stable primary emerges.
Consider the following scenario:
1) A is the primary
2) B follows A as a standby
3) A is shutdown immediate
4) B is promoted and selects timeline 2
5) B archives 000000010000000100000001.partial
6) B archives 00000002.history
7) B goes away before archiving 000000020000000100000001
8) A is put into recovery
9) A is promoted and selects timeline 3
10) A can't archive 000000010000000100000001.partial because it already
exists
We recommend that archive commands not overwrite an existing segment.
Some backup tools will compare the contents and succeed if they are
equal, but in this case that will still often fail because recycled WAL
segments will have different bytes at the end on the primary and
standby. The files may not even be logically the same because B may not
have received all WAL from A.
After some discussion with the Patroni folks, Stephen and I came up with
the idea of adding the timeline that the cluster is *promoting to* into
the .partial name to avoid these sorts of conflicts.
However, there is still a race condition here. Since the
000000010000000100000001.partial is archived first the 00000002.history
file might not make it to the archive before B crashes. In that case A
will pick timeline 2 and still be stuck. However, I'm thinking it would
be easy to teach pgarch_readyXlog() to return any .history files it
finds first (in order, of course).
Another option would be to immediately archive the first WAL segment on
timeline 2 and forgo the .partial file entirely. In this case the
archiver will archive the 00000002.history file before
000000020000000100000001 and we avoid the race condition above. That
also means we could recover A and promote without a conflict on the
.partial. Or we could recover A along timeline 2.
Or we could do some combination of the above.
I have attached a patch that adds the timeline to the .partial file.
This passes check-world.
I think we should consider back-patching some set of these changes since
this causes real pain in current production HA configurations.
Thoughts?
--
-David
david@pgmasters.net