Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: pg_receivewal starting position
Date
Msg-id 5742739.ga3mSNWIix@aivenronan
Whole thread Raw
In response to pg_receivewal starting position  (Ronan Dunklau <ronan.dunklau@aiven.io>)
List pgsql-hackers
Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :
> On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
> > Following the discussion at [1], I refactored the implementation into
> > streamutil and added a third patch making use of it in pg_basebackup
> > itself in order to fail early if the replication slot doesn't exist, so
> > please find attached v2 for that.
>
> Thanks for the split.  That helps a lot.
>

Thank you very much for the review, please find attached an updated patchset.
I've also taken into account some remarks made by Bharath Rupireddy.

> +
> +
>  /*
>   * Run IDENTIFY_SYSTEM through a given connection and give back to caller
>
> The patch series has some noise diffs here and there, you may want to
> clean up that for clarity.

Ok, sorry about that.

>
> +   if (slot == NULL || !slot->in_use)
> +   {
> +       LWLockRelease(ReplicationSlotControlLock);
> +
> +       ereport(ERROR,
> +               (errcode(ERRCODE_UNDEFINED_OBJECT),
> LWLocks are released on ERROR, so no need for LWLockRelease() here.
>

Following your suggestion of not erroring out on an unexisting slot this point
is no longer be relevant, but thanks for pointing this out anyway.

> +    <listitem>
> +      <para>
> +      Read information about the named replication slot.  This is
> useful to determine which WAL location we should be asking the server
> to start streaming at.
>
> A nit.  You may want to be more careful with the indentation of the
> documentation.  Things are usually limited in width for readability.
> More <literal> markups would be nice for the field names used in the
> descriptions.

Ok.

>
> +   if (slot == NULL || !slot->in_use)
>
>                                                                     [...] +
>       ereport(ERROR,
> +               (errcode(ERRCODE_UNDEFINED_OBJECT),
> +                errmsg("replication slot \"%s\" does not exist",
> +                       cmd->slotname)));
> [...]
> +       if (PQntuples(res) == 0)
> +       {
> +               pg_log_error("replication slot %s does not exist",
> slot_name); +               PQclear(0);
> +               return false;
> So, the backend and ReadReplicationSlot() report an ERROR if a slot
> does not exist but pg_basebackup's GetSlotInformation() does the same
> if there are no tuples returned.  That's inconsistent.  Wouldn't it be
> more instinctive to return a NULL tuple instead if the slot does not
> exist to be able to check after real ERRORs in frontends using this
> interface?

The attached patch returns no tuple at all when the replication slot doesn't
exist. I'm not sure if that's what you meant by returning a NULL tuple ?

> A slot in use exists, so the error is a bit confusing here
> anyway, no?

From my understanding, a slot  *not* in use doesn't exist anymore, as such I
don't really understand this point. Could you clarify ?


>
> +    * XXX: should we allow the caller to specify which target timeline it
> wants +    * ?
> +    */
> What are you thinking about here?

I was thinking that maybe instead of walking back the timeline history from
where we currently are on the server, we could allow an additional argument
for the client to specify which timeline it wants. But I guess a replication
slot can not be present for a past, divergent timeline ? I have removed that
suggestion.

>
> -# restarts of pg_receivewal will see this segment as full..
> +# restarts of pg_receivewal will see this segment as full../
> Typo.

Ok.

>
> +   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4,
> "restart_lsn_timeline", +                             INT4OID, -1, 0);
> +   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5,
> "confirmed_flush_lsn_timeline", +                             INT4OID, -1,
> 0);
> I would call these restart_tli and confirmed_flush_tli., without the
> "lsn" part.

Ok.
>
> The patch for READ_REPLICATION_SLOT could have some tests using a
> connection that has replication=1 in some TAP tests.  We do that in
> 001_stream_rep.pl with SHOW, as one example.

Ok. I added the physical part to 001_stream_rep.pl, using the protocol
interface directly for creating / dropping the slot, and some tests for
logical replication slots to 006_logical_decoding.pl.

>
> -               'slot0'
> +               'slot0',                     '-p',
> +               "$port"
> Something we are missing here?

The thing we're missing here is a wrapper for command_fails_like. I've added
this to PostgresNode.pm.

Best regards,


--
Ronan Dunklau
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Failure of subscription tests with topminnow
Next
From: Ronan Dunklau
Date:
Subject: Re: pg_receivewal starting position