On Tue, Jul 31, 2018 at 02:55:58PM +0200, Emre Hasegeli wrote:
> == The Workarounds ==
>
> We can possibly work around this inside the "restore_command" or
> by delaying the archiving. Working around inside the "restore_command"
> would involve checking whether the file exists under pg_wal/. This
> should not be easy because the WAL file may be written partially. It
> should be easier for Postgres to do this as it knows where to stop
> processing the local WAL.
It is also not that complicated to check if a WAL segment is properly
shaped by just running pg_waldump or such, so that would be fine for all
your cases with back-branches perhaps?
> == The Change ==
>
> This "restore_command" behavior is coming from the initial archiving
> and point-in-time-recovery implementation [2]. The code says
> "the reason is that the file in XLOGDIR could be an old, un-filled or
> partly-filled version that was copied and restored as part of
> backing up $PGDATA." This was probably a good reason in 2004, but
> I don't think it still is. AFAIK "pg_basebackup" eliminates this
> problem.
pg_basebackup is not the only backup solution, though I'd like that
folks use it more, it can be a bottleneck and comes with its own
limitations when streaming for example tar data with multiple
tablespaces for example still...
> Also, with this reasoning, we should also try streaming from the
> master before trying the local WAL, but AFAIU we don't.
... You have a point here, things are rather inconsistent by this
argument. I have not worked on that in details, but at least
WaitForWALToBecomeAvailable() which enforces XLOG_FROM_ARCHIVE when the
current source is XLOG_FROM_PG_WAL would need to be changed.
--
Michael