On Fri, Jan 11, 2019 at 1:50 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-01-11 16:52:42 +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 15591 > Logged by: Jeff Janes > Email address: jeff.janes@gmail.com > PostgreSQL version: 11.1 > Operating system: all > Description: > > When you invoke pg_receivewal using --slot to give it the name of an > existing slot which has WAL reserved, and -D pointing to an empty directory, > it fast-forwards the slot's LSN reservation to the beginning of the most > recent WAL file on the server, and starts streaming from there. Rather than > streaming from the LSN reservation point.
...
> Does this not utterly destroy the main point of using slots? If I didn't > want to ensure a gapless WAL stream, why use slots in the first place?
So the upstream server doesn't drop WAL that a standby (or something like that) still needs? It's pretty rare to randomly start to stream to a differnt place.
I don't want to start it randomly. I want to start it where the pg_basebackup (or some other backup method) using the same slot name left off, which is not-by-coincidence the same place or later than where the slot itself left off. I thought that that was the point of slots--or at least the user-facing documentation implies it is and I don't see that it disclaims it for this particular case. It seems like pg_receivelog is a second class citizen, it doesn't count as either a standby, or as "something like that". At least not when you are first transitioning from the base backup to it. If you are resuming an interrupted or lagging pg_receivewal, then the slot does do its job. So the slots appear to be global on the surface, but functionally they are local to pg_receivewal.
The barrier to fixing it is that the replication protocol offers neither a way to interrogate where a slot left off, nor a way to tell it to pick up where a slot left off (regressed to the start of the WAL file). Other users of slot have a way to figure that out for themselves, but pg_receivewal (are there others?) do not.
A work around is to "seed" the directory about to be used by pg_receivewal by copying the last wal file from the backup's pg_wal into it. (and adding .partial to the end? That probably isn't needed as the end of backup does a log switch)
If this isn't a bug, then is there a way to document it so the end user knows what is going on? Or is there existing documentation I am overlooking? I guess the doc change would need to be in pg_receivelog, if the problem is unique to it.