Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: pg_receivewal starting position
Date
Msg-id 15651763.8WXi7s64xO@aivenronan
Whole thread Raw
In response to Re: pg_receivewal starting position  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: pg_receivewal starting position  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
> wrote in
> > Hello,
> >
> > I've notived that pg_receivewal logic for deciding which LSN to start
> >
> > streaming at consists of:
> >   - looking up the latest WAL file in our destination folder, and resume
> >   from
> >
> > here
> >
> >   - if there isn't, use the current flush location instead.
> >
> > This behaviour surprised me when using it with a replication slot: I was
> > expecting it to start streaming at the last flushed location from the
> > replication slot instead. If you consider a backup tool which will take
> > pg_receivewal's output and transfer it somewhere else, using the
> > replication slot position would be the easiest way to ensure we don't
> > miss WAL files.
> >
> > Does that make sense ?
> >
> > I don't know if it should be the default, toggled by a command line flag,
> > or if we even should let the user provide a LSN.
>
> *I* think it is completely reasonable (or at least convenient or less
> astonishing) that pg_receivewal starts from the restart_lsn of the
> replication slot to use.  The tool already decides the clean-start LSN
> a bit unusual way. And it seems to me that proposed behavior can be
> the default when -S is specified.
>

As of now we can't get the replication_slot restart_lsn with a replication
connection AFAIK.

This implies that the patch could require the user to specify a maintenance-db
parameter, and we would use that if provided to fetch the replication slot
info, or fallback to the previous behaviour. I don't really like this approach
as the behaviour changing wether we supply a maintenance-db parameter or not
is error-prone for the user.

Another option would be to add a new replication command (for example
ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
current one, and return some info about it (restart_lsn at least for a
physical slot).

I don't see any reason not to make it work for logical replication connections
/ slots, but it wouldn't be that useful since we can query the database in
that case.

Acquiring the replication slot instead of just reading it would make sure that
no other process could start the replication between the time we read the
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
check if we already have a replication slot, and ensure it is the same one as
the one we're trying to use.

From pg_receivewal point of view, this would amount to:

 - check if we currently have wal in the target directory.
   - if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
  - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
  - if we still don't have a start_lsn, fallback to using the current server
wal position as is done.

What do you think ? Which information should we provide about the slot ?


--
Ronan Dunklau





pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.
Next
From: Amul Sul
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY