Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_receivewal starting position
Date
Msg-id CALj2ACV=U687py0cKWHomzAYsdVXK5K0M7gg2EWjLHq=gb6kTA@mail.gmail.com
Whole thread Raw
In response to Re: pg_receivewal starting position  (Michael Paquier <michael@paquier.xyz>)
Responses Re: pg_receivewal starting position  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> > Done. I haven't touched the timeline switch test patch for now, but I still
> > include it here for completeness.
>
> Thanks.  I have applied and back-patched 0001, then looked again at
> 0002 that adds READ_REPLICATION_SLOT:
> - Change the TLI to use int8 rather than int4, so as we will always be
> right with TimelineID which is unsigned (this was discussed upthread
> but I got back on it after more thoughts, to avoid any future
> issues).
> - Added an extra initialization for the set of Datum values, just as
> an extra safety net.
> - There was a bug with the timeline returned when executing the
> command while in recovery as ThisTimeLineID is 0 in the context of a
> standby, but we need to support the case of physical slots even when
> streaming archives from a standby.  The fix is similar to what we do
> for IDENTIFY_SYSTEM, where we need to use the timeline currently
> replayed from GetXLogReplayRecPtr(), before looking at the past
> timeline history using restart_lsn and the replayed TLI.
>
> With that in place, I think that we are good now for this part.

Thanks for the updated patch. I have following comments on v10:

1) It's better to initialize nulls with false, we can avoid setting
them to true. The instances where the columns are not nulls is going
to be more than the columns with null values, so we could avoid some
of nulls[i] = false; instructions.
+ MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
I suggest we do the following. The number of instances of hitting the
"else" parts will be less.
MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool));

    if (slot == NULL || !slot->in_use)
    {
        MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
        LWLockRelease(ReplicationSlotControlLock);
    }

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

The code will look like following:
        Oid            database;
        XLogRecPtr restart_lsn;

        /* Take required information from slot contents while holding
spinlock */
        SpinLockAcquire(&slot->mutex);
        database= slot->data.database;
        restart_lsn= slot->data.restart_lsn;
        SpinLockRelease(&slot->mutex);
        LWLockRelease(ReplicationSlotControlLock);

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: XTS cipher mode for cluster file encryption
Next
From: Bharath Rupireddy
Date:
Subject: Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir