Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_receivewal starting position
Date
Msg-id CALj2ACV5daANc=Uu8jwkQ+x9CcSXB77f4win2CtnSwQTJKAfFg@mail.gmail.com
Whole thread Raw
In response to Re: pg_receivewal starting position  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: pg_receivewal starting position
List pgsql-hackers
On Mon, Sep 6, 2021 at 12:20 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
>
> Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :
> > On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> > > 0002 for pg_receivewal tried to simplify the logic of information to
> > > return, by using a dedicated struct for this. This accounts for Bahrath's
> > > demands to return every possible field.
> > > In particular, an is_logical field simplifies the detection of the type of
> > > slot. In my opinion it makes sense to simplify it like this on the client
> > > side while being more open-minded on the server side if we ever need to
> > > provide a new type of slot. Also, GetSlotInformation now returns an enum
> > > to be able to handle the different modes of failures, which differ
> > > between pg_receivewal and pg_basebackup.
> >
> > +   if (PQserverVersion(conn) < 150000)
> > +       return READ_REPLICATION_SLOT_UNSUPPORTED;
> > [...]
> > +typedef enum {
> > +   READ_REPLICATION_SLOT_OK,
> > +   READ_REPLICATION_SLOT_UNSUPPORTED,
> > +   READ_REPLICATION_SLOT_ERROR,
> > +   READ_REPLICATION_SLOT_NONEXISTENT
> > +} ReadReplicationSlotStatus;
> >
> > Do you really need this much complication?  We could treat the
> > unsupported case and the non-existing case the same way: we don't know
> > so just assign InvalidXlogRecPtr or NULL to the fields of the
> > structure, and make GetSlotInformation() return false just on error,
> > with some pg_log_error() where adapted in its internals.
>
> I actually started with the implementation you propose, but changed my mind
> while writing it because I realised it's easier to reason about like this,
> instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
> when actually trying to start the replication slot because it doesn't exists.
> Either the user expects the replication slot to exists, and in this case we
> should retry the whole loop in the hope of getting an interesting LSN, or the
> user doesn't and shouldn't even pass a replication_slot name to begin with.

I don't think so we need to retry the whole loop if the
READ_REPLICATION_SLOT command fails as pg_receivewal might enter an
infinite loop there. IMO, we should just exit(1) if
READ_REPLICATION_SLOT fails.

> > > There is still the concern raised by Bharath about being able to select
> > > the
> > > way to fetch the replication slot information for the user, and what
> > > should or should not be included in the documentation. I am in favor of
> > > documenting the process of selecting the wal start, and maybe include a
> > > --start-lsn option for the user to override it, but that's maybe for
> > > another patch.
> >
> > The behavior of pg_receivewal that you are implementing should be
> > documented.  We don't say either how the start location is selected
> > when working on an existing directory, so I would recommend to add a
> > paragraph in the description section to detail all that, as of:
> > - First a scan of the existing archives is done.
> > - If nothing is found, and if there is a slot, request the slot
> > information.
> > - If still nothing (aka slot undefined, or command not supported), use
> > the last flush location.
>
> Sounds good, I will add another patch for the documentation of this.

+1.

> > While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
> > grabbing the data of a logical slot.  We are not going to use that
> > with pg_recvlogical as by default START_STREAMING 0/0 would just use
> > the confirmed LSN.  Do we have use cases where this information would
> > help?  There is the argument of consistency with physical slots and
> > that this can be helpful to do sanity checks for clients, but that's
> > rather thin.
>
> If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
> and error out on the server if the slot is not actually a physical one to
> spare the client from performing those checks. I still think it's better to
> support both cases as opposed to having two completely different APIs
> (READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
> connection, pg_replication_slots view for logical ones) as it would enable
> more for third-party clients at a relatively low maintenance cost for us.

-1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
some other? IMO, READ_REPLICATION_SLOT should just return info of all
slots. The clients know better how to deal with the slot type.
Although, we don't have a use case for logical slots with the
READ_REPLICATION_SLOT command, let's not change it.

If others are still concerned about the unnecessary slot being
returned, you might consider passing in a parameter to
READ_REPLICATION_SLOT command, something like below. But this too
looks complex to me. I would vote for what the existing patch does
with READ_REPLICATION_SLOT.
READ_REPLICATION_SLOT 'slot_name' 'physical'; only returns physical
slot info with the given name.
READ_REPLICATION_SLOT 'slot_name' 'logical'; only returns logical slot
with the given name.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Re: pg_receivewal starting position
Next
From: Laurenz Albe
Date:
Subject: Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead