Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_receivewal starting position
Date
Msg-id CALj2ACUqSdwrrYFae-_R=S_7CyczAcp9HwE=vRXNG_rf_xC+yA@mail.gmail.com
Whole thread Raw
In response to Re: pg_receivewal starting position  (Michael Paquier <michael@paquier.xyz>)
Responses Re: pg_receivewal starting position
List pgsql-hackers
On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> > 2) As I previously mentioned, we are not copying the slot contents
> > while holding the spinlock, it's just we are taking the memory address
> > and releasing the lock, so there is a chance that the memory we are
> > looking at can become unavailable or stale while we access
> > slot_contents. So, I suggest we do the memcpy of the *slot to
> > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> > contents will be costlier, so let's just take the info that we need
> > (data.database, data.restart_lsn) into local variables while we hold
> > the spin lock
>
> The style of the patch is consistent with what we do in other areas
> (see pg_get_replication_slots as one example).
>
> > + /* Copy slot contents while holding spinlock */
> > + SpinLockAcquire(&slot->mutex);
> > + slot_contents = *slot;
>
> And what this does is to copy the contents of the slot into a local
> area (note that we use a NameData pointing to an area with
> NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
> reason (this cannot change as of the LWLock acquired), what we have
> saved is unchanged as of this command's context.

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

Having said that, the ReplicationSlotCreate,
SearchNamedReplicationSlot (when need_lock is true) etc. release the
lock immediately. I'm not sure if I should ignore this staleness
problem in ReadReplicationSlot.

> > 3) The data that the new command returns to the client can actually
> > become stale while it is captured and in transit to the client as we
> > release the spinlock and other backends can drop or alter the info.
> > So, it's better we talk about this in the documentation of the new
> > command and also in the comments saying "clients will have to deal
> > with it."
>
> The same can be said with IDENTIFY_SYSTEM when the flushed location
> becomes irrelevant.  I am not sure that this has any need to apply
> here.  We could add that this is useful to get a streaming start
> position though.

I think that's okay, let's not make any changes or add any comments in
regards to the above. The client is basically bound to get the
snapshot of the data at the time it requests the database.

> > 4) How about we be more descriptive about the error added? This will
> > help identify for which replication slot the command has failed from
> > tons of server logs which really help in debugging and analysis.
> > I suggest we have this:
> > errmsg("cannot use \"%s\" command with logical replication slot
> > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> > instead of just a plain, non-informative, generic message:
> > errmsg("cannot use \"%s\" with logical replication slots",
>
> Yeah.  I don't mind adding the slot name in this string.

Thanks.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: MDAM techniques and Index Skip Scan patch
Next
From: "Dian M Fay"
Date:
Subject: Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)