Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date | |
Msg-id | CAA4eK1LvssAcp61qg957xRF0=K4xF0hfJQjQ6c9V7EGHSqraPg@mail.gmail.com Whole thread Raw |
In response to | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly ("Vitaly Davydov" <v.davydov@postgrespro.ru>) |
List | pgsql-hackers |
On Thu, Jun 5, 2025 at 8:51 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote: > > Dear Alexander, Amit > > Alexander Korotkov wrote: > > Also, I've changed ReplicationSlotsComputeRequiredLSN() call to > > CheckPointReplicationSlots() to update required LSN after > > SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass > > checks in 001_stream_rep.pl without additional hacks. > > Thank you for the improvement and patch preparation. I confirm the test is > passed without additional hacks now. > > I sill do not understand why this solution is favored. It is, in my opinion, > a non backward-compatible solution. In any case, I'm ok to go with this patch. > If needed, I may prepare a backward-compatible solution where > last_saved_restart_lsn values will be in an another place of the shmem, rather > than in ReplicationSlot struct. > I think we can use this approach for HEAD and probably keep the previous idea for backbranches. Keeping some value in shared_memory per slot sounds risky to me in terms of introducing new bugs. > I still would like to add my 5 cents to the discussion. > > The purpose of the xmin value is to prevent tuples from vacuuming. Slots' > restart_lsn values are used to calculate the oldest lsn to keep WAL segments > from removal in checkpoint. These processes are pretty independent. > > The logical slots are advanced in 2 steps. At the first step, the logical > decoding stuff periodically sets consistent candidate values for catalog_xmin and > restart_lsn. At the second step, when LogicalConfirmReceivedLocation is called, > the candidate values are assigned on catalog_xmin and restart_lsn values based > on the confirmed lsn value. The slot is saved with these consistent values. > It is important, that the candidate values are consistent, decoding guarantees > it. In case of a crash, we should guarantee that the loaded from the disk > catalog_xmin and restart_lsn values are consistent and valid for logical slots. > LogicalConfirmReceivedLocation function keeps this consistency by updating them > from consistent candidate values in a single operation. > > We have to guarantee that we use saved to disk values to calculate xmin horizon > and slots' oldest lsn. For this purpose, effective_catalog_xmin is used. We > update effective_catalog_xmin in LogicalConfirmReceivedLocation just > after saving slot to disk. Another place where we update effective_catalog_xmin > is when walsender receives hot standby feedback message. > > Once, we have two independent processes (vacuuming, checkpoint), we can calculate > xmin horizon and oldest WAL lsn values independently (at different times) from > the saved to disk values. Note, these values are updated in a non atomic way. > > The xmin value is set when the node receives hot standby feedback and it is used > to keep tuples from vacuuming as well as catalog_xmin for decoding stuff. > Yeah, but with physical slots, it is possible that the slot's xmin value is pointing to some value, say 700 (after restart), but vacuum would have removed tuples from transaction IDs greater than 700 as explained in email [1]. Not > sure, xmin is applicable for logical replication. > > The confirmed flush lsn is used as a startpoint when a peer node doesn't provide > the start lsn and to check that the start lsn is not older than the latest > confirmed flush lsn. The saving of the slot on disk at each call of > LogicalConfirmReceivedLocation doesn't help to avoid conflicts completely, but > it helps to decrease the probability of conflicts. > We don't save slots at each call of LogicalConfirmReceivedLocation() and when we save also, it is not to avoid conflicts but to avoid removing required WAL segments and tuples. > So, i'm still not sure, we > need to save logical slots on each advance to avoid conflicts, because it > doesn't help in general. The conflicts should be resolved by other means. > > Once, we truncate old wal segments in checkpoint only. I believe, it is ok if we > calculate the oldest lsn only at the beginning of the checkpoint, as it was in > the alternative solution. I think, we can update xmin horizon in checkpoint only > but the horizon advancing will be more lazy in this case. > > Taking into account these thoughts, I can't see any problems with the alternative > patch where oldest wal lsn is calculated only in checkpoint. > The alternative will needlessly prevent removing WAL segments in some cases when logical slots are in use. [1] - https://www.postgresql.org/message-id/CAA4eK1KMaPA5jir_SFu%2Bqr3qu55OOdFWVZpuUkqTSGZ9fyPpHA%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: