Re: persist logical slots to disk during shutdown checkpoint - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: persist logical slots to disk during shutdown checkpoint |
Date | |
Msg-id | CAA4eK1JZ7GL+xOf-OpNBpR9WuZeeS_wWrw-t7uUFTzSopUTdGg@mail.gmail.com Whole thread Raw |
In response to | Re: persist logical slots to disk during shutdown checkpoint (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: persist logical slots to disk during shutdown checkpoint
|
List | pgsql-hackers |
On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Sep 5, 2023 at 5:04 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > > > Can't we just have code like this? I mean we will have to make > > > ReplicationSlotMarkDirty take slot as an argument or have another version > > > which takes slot as an argument and that would be called by us as well as by > > > ReplicationSlotMarkDirty(). I mean why do we need these checks > > > (s-(data.invalidated == RS_INVAL_NONE && > > > s->data.confirmed_flush != s->last_saved_confirmed_flush) under the > > > mutex? Walsender is shutdown so confirmed flush LSN can not move > > > concurrently and slot can not be invalidated as well because that is done by > > > checkpointer and we are in checkpointer? > > > ... > > Additionally, if we don't take the lock, we rely on the assumption that the > > walsender will exit before the shutdown checkpoint, currently, that's true for > > logical walsender, but physical walsender can exit later than checkpointer. So, > > I am slight woirred that if we change the logical walsender's exit timing in > > the future, the assumption may not hold. > > > > Besides, for non-built-in logical replication, if someone creates their own > > walsender or other processes to send the changes and the process doesn't exit > > before the shutdown checkpoint, it may also be a problem. Although I don't have > > exsiting examples about these extensions, but I feel taking the lock would make > > it more robust. > > I think our all logic is based on that the walsender is existed > already. If not then even if you check under the mutex that the > confirmed flush LSN is not changed then it can changed right after you > release the lock and then we will not be flushing the latest update of > the confirmed flush lsn to the disk and our logic of comparing > checkpoint.redo with the confirmed flush lsn might not work? > Right, it can change and in that case, the check related to confirm_flush LSN will fail during the upgrade. However, the point is that if we don't take spinlock, we need to properly write comments on why unlike in other places it is safe here to check these values without spinlock. We can do that but I feel we have to be careful for all future usages of these variables, so, having spinlock makes them follow the normal coding pattern which I feel makes it more robust. Yes, marking dirty via common function also has merits but personally, I find it better to follow the normal coding practice of checking the required fields under spinlock. The other possibility is to first check if we need to mark the slot dirty under spinlock, then release the spinlock, and then call the common MarkDirty function, but again that will be under the assumption that these flags won't change. -- With Regards, Amit Kapila.
pgsql-hackers by date: