Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALj2ACXQNGXokgx8APwdxrG4MHMF=cOz6XQtUL7EHua9oUfkgA@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
Re: Introduce XID age and inactive timeout based replication slot invalidation Re: Introduce XID age and inactive timeout based replication slot invalidation |
List | pgsql-hackers |
Hi, On Mon, Jun 17, 2024 at 5:55 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Here are my thoughts on when to do the XID age invalidation. In all > the patches sent so far, the XID age invalidation happens in two > places - one during the slot acquisition, and another during the > checkpoint. As the suggestion is to do it during the vacuum (manual > and auto), so that even if the checkpoint isn't happening in the > database for whatever reasons, a vacuum command or autovacuum can > invalidate the slots whose XID is aged. > > An idea is to check for XID age based invalidation for all the slots > in ComputeXidHorizons() before it reads replication_slot_xmin and > replication_slot_catalog_xmin, and obviously before the proc array > lock is acquired. A potential problem with this approach is that the > invalidation check can become too aggressive as XID horizons are > computed from many places. > > Another idea is to check for XID age based invalidation for all the > slots in higher levels than ComputeXidHorizons(), for example in > vacuum() which is an entry point for both vacuum command and > autovacuum. This approach seems similar to vacuum_failsafe_age GUC > which checks each relation for the failsafe age before vacuum gets > triggered on it. I am attaching the patches implementing the idea of invalidating replication slots during vacuum when current slot xmin limits (procArray->replication_slot_xmin and procArray->replication_slot_catalog_xmin) are aged as per the new XID age GUC. When either of these limits are aged, there must be at least one replication slot that is aged, because the xmin limits, after all, are the minimum of xmin or catalog_xmin of all replication slots. In this approach, the new XID age GUC will help vacuum when needed, because the current slot xmin limits are recalculated after invalidating replication slots that are holding xmins for longer than the age. The code is placed in vacuum() which is common for both vacuum command and autovacuum, and gets executed only once every vacuum cycle to not be too aggressive in invalidating. However, there might be some concerns with this approach like the following: 1) Adding more code to vacuum might not be acceptable 2) What if invalidation of replication slots emits an error, will it block vacuum forever? Currently, InvalidateObsoleteReplicationSlots() is also called as part of the checkpoint, and emitting ERRORs from within is avoided already. Therefore, there is no concern here for now. 3) What if there are more replication slots to be invalidated, will it delay the vacuum? If yes, by how much? <<TODO>> 4) Will the invalidation based on just current replication slot xmin limits suffice irrespective of vacuum cutoffs? IOW, if the replication slots are invalidated but vacuum isn't going to do any work because vacuum cutoffs are not yet met? Is the invalidation work wasteful here? 5) Is it okay to take just one more time the proc array lock to get current replication slot xmin limits via ProcArrayGetReplicationSlotXmin() once every vacuum cycle? <<TODO>> 6) Vacuum command can't be run on the standby in recovery. So, to help invalidate replication slots on the standby, I have for now let the checkpointer also do the XID age based invalidation. I know invalidating both in checkpointer and vacuum may not be a great idea, but I'm open to thoughts. Following are some of the alternative approaches which IMHO don't help vacuum when needed: a) Let the checkpointer do the XID age based invalidation, and call it out in the documentation that if the checkpoint doesn't happen, the new GUC doesn't help even if the vacuum is run. This has been the approach until v40 patch. b) Checkpointer and/or other backends add an autovacuum work item via AutoVacuumRequestWork(), and autovacuum when it gets to it will invalidate the replication slots. But, what to do for the vacuum command here? Please find the attached v41 patches implementing the idea of vacuum doing the invalidation. Thoughts? Thanks to Sawada-san for a detailed off-list discussion. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: