Re: BUG #15591: pg_receivewal does not honor replication slots - Mailing list pgsql-bugs

From Jeff Janes
Subject Re: BUG #15591: pg_receivewal does not honor replication slots
Date
Msg-id CAMkU=1zTe8toCD+df9isTs_JOhexd-3f2o8PS=oFEHmcmde=tQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15591: pg_receivewal does not honor replication slots  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #15591: pg_receivewal does not honor replication slots
List pgsql-bugs
On Fri, Jan 11, 2019 at 1:50 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-01-11 16:52:42 +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      15591
> Logged by:          Jeff Janes
> Email address:      jeff.janes@gmail.com
> PostgreSQL version: 11.1
> Operating system:   all
> Description:       
>
> When you invoke pg_receivewal using --slot to give it the name of an
> existing slot which has WAL reserved, and -D pointing to an empty directory,
> it fast-forwards the slot's LSN reservation to the beginning of the most
> recent WAL file on the server, and starts streaming from there.  Rather than
> streaming from the LSN reservation point.

...
 
> Does this not utterly destroy the main point of using slots?  If I didn't
> want to ensure a gapless WAL stream, why use slots in the first place?

So the upstream server doesn't drop WAL that a standby (or something
like that) still needs?  It's pretty rare to randomly start to stream to
a differnt place.

I don't want to start it randomly.  I want to start it where the pg_basebackup (or some other backup method) using the same slot name left off, which is not-by-coincidence the same place or later than where the slot itself left off.  I thought that that was the point of slots--or at least the user-facing documentation implies it is and I don't see that it disclaims it for this particular case.  It seems like pg_receivelog is a second class citizen, it doesn't count as either a standby, or as "something like that".  At least not when you are first transitioning from the base backup to it.  If you are resuming an interrupted or lagging pg_receivewal, then the slot does do its job.  So the slots appear to be global on the surface, but functionally they are local to pg_receivewal. 

The barrier to fixing it is that the replication protocol offers neither a way to interrogate where a slot left off, nor a way to tell it to pick up where a slot left off (regressed to the start of the WAL file).  Other users of slot have a way to figure that out for themselves, but pg_receivewal (are there others?) do not.  

A work around is to "seed" the directory about to be used by pg_receivewal by copying the last wal file from the backup's pg_wal into it.  (and adding .partial to the end?  That probably isn't needed as the end of backup does a log switch)

If this isn't a bug, then is there a way to document it so the end user knows what is going on?  Or is there existing documentation I am overlooking?  I guess the doc change would need to be in pg_receivelog, if the problem is unique to it.

Cheers,

Jeff

pgsql-bugs by date:

Previous
From: Дилян Палаузов
Date:
Subject: Re: psql and readline comments
Next
From: Michael Paquier
Date:
Subject: Re: Is temporary functions feature official/supported? Found someissues with it.