Re: pg_receivewal starting position - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: pg_receivewal starting position |
Date | |
Msg-id | CALj2ACUqSdwrrYFae-_R=S_7CyczAcp9HwE=vRXNG_rf_xC+yA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_receivewal starting position (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: pg_receivewal starting position
|
List | pgsql-hackers |
On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier <michael@paquier.xyz> wrote: > On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote: > > 2) As I previously mentioned, we are not copying the slot contents > > while holding the spinlock, it's just we are taking the memory address > > and releasing the lock, so there is a chance that the memory we are > > looking at can become unavailable or stale while we access > > slot_contents. So, I suggest we do the memcpy of the *slot to > > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot > > contents will be costlier, so let's just take the info that we need > > (data.database, data.restart_lsn) into local variables while we hold > > the spin lock > > The style of the patch is consistent with what we do in other areas > (see pg_get_replication_slots as one example). > > > + /* Copy slot contents while holding spinlock */ > > + SpinLockAcquire(&slot->mutex); > > + slot_contents = *slot; > > And what this does is to copy the contents of the slot into a local > area (note that we use a NameData pointing to an area with > NAMEDATALEN). Aka if the contents of *slot are changed by whatever > reason (this cannot change as of the LWLock acquired), what we have > saved is unchanged as of this command's context. pg_get_replication_slots holds the ReplicationSlotControlLock until the end of the function so it can be assured that *slot contents will not change. In ReadReplicationSlot, the ReplicationSlotControlLock is released immediately after taking *slot pointer into slot_contents. Isn't it better if we hold the lock until the end of the function so that we can avoid the slot contents becoming stale problems? Having said that, the ReplicationSlotCreate, SearchNamedReplicationSlot (when need_lock is true) etc. release the lock immediately. I'm not sure if I should ignore this staleness problem in ReadReplicationSlot. > > 3) The data that the new command returns to the client can actually > > become stale while it is captured and in transit to the client as we > > release the spinlock and other backends can drop or alter the info. > > So, it's better we talk about this in the documentation of the new > > command and also in the comments saying "clients will have to deal > > with it." > > The same can be said with IDENTIFY_SYSTEM when the flushed location > becomes irrelevant. I am not sure that this has any need to apply > here. We could add that this is useful to get a streaming start > position though. I think that's okay, let's not make any changes or add any comments in regards to the above. The client is basically bound to get the snapshot of the data at the time it requests the database. > > 4) How about we be more descriptive about the error added? This will > > help identify for which replication slot the command has failed from > > tons of server logs which really help in debugging and analysis. > > I suggest we have this: > > errmsg("cannot use \"%s\" command with logical replication slot > > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname); > > instead of just a plain, non-informative, generic message: > > errmsg("cannot use \"%s\" with logical replication slots", > > Yeah. I don't mind adding the slot name in this string. Thanks. Regards, Bharath Rupireddy.
pgsql-hackers by date: