Hi,
On Wed, Mar 27, 2024 at 12:28:06PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 27, 2024 at 10:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 26, 2024 at 9:10 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > On 2024-Mar-26, Nathan Bossart wrote:
> > >
> > > > FWIW I'd really prefer to have something like max_slot_xid_age for this. A
> > > > time-based parameter would likely help with most cases, but transaction ID
> > > > usage will vary widely from server to server, so it'd be nice to have
> > > > something to protect against wraparound more directly.
> > >
> > > Yeah, I tend to agree that an XID-based limit makes more sense than a
> > > time-based one.
> > >
> > So, do we want the time-based parameter or just max_slot_xid_age
> > considering both will be GUC's? Yes, the xid_age based parameter
> > sounds to be directly helpful for transaction ID wraparound or dead
> > row cleanup, OTOH having a lot of inactive slots (which is possible in
> > use cases where a tool creates a lot of slots and forgets to remove
> > them, or the tool exits without cleaning up slots (say due to server
> > shutdown)) also prohibit removing dead space which is not nice either?
>
> I've personally seen the leftover slots problem on production systems
> where a timeout based invalidation mechanism could have been of more
> help to save time and reduce manual intervention. Usually, most if not
> all, migration/upgrade/other tools create slots, and if at all any
> errors occur or the operation gets cancelled midway, there's a chance
> that the slots can be leftover if such tools forgets to clean them up
> either because there was a bug or for whatever reasons. These are
> unintended/ghost slots for the database user unnecessarily holding up
> resources such as XIDs, dead rows and WAL; which might lead to XID
> wraparound or server crash if unnoticed. Although XID based GUC helps
> a lot, why do these unintended and unnoticed slots need to hold up the
> resources even before the XID age of say 1.5 or 2 billion is reached.
>
> With both GUCs max_slot_xid_age and replication_slot_inactive_timeout
> in place, I can set max_slot_xid_age = 1.5 billion or so and also set
> replication_slot_inactive_timeout = 2 days or so to make the database
> foolproof.
Yeah, I think that both makes senses. The reason is that one depends of the
database activity and slot activity (the xid age one) while the other (the
timeout one) depends only of the slot activity.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com