Thread: Do we need to do better for pg_ctl timeouts?
Hi, Right now using -w for shutting down clusters with a bit bigger shared buffers will very frequently fail, because the shutdown checkpoint takes much longer than 60s. Obviously that can be addressed by manually setting PGCTLTIMEOUT to something higher, but forcing many users to do that doesn't seem right. And while many users probably don't want to aggressively time-out on the shutdown checkpoint, I'd assume most do want to time out aggressively if the server doesn't actually start the checkpoint. I wonder if we need to split the timeout into two: One value for postmaster to acknowledge the action, one for that action to complete. It seems to me that that'd be useful for all of starting, restarting and stopping. I think we have all the necessary information in the pid file, we would just need to check for PM_STATUS_STARTING for start, PM_STATUS_STOPPING for restart/stop. Comments? Greetings, Andres Freund
On 2019-06-20 18:33, Andres Freund wrote: > I wonder if we need to split the timeout into two: One value for > postmaster to acknowledge the action, one for that action to > complete. It seems to me that that'd be useful for all of starting, > restarting and stopping. > > I think we have all the necessary information in the pid file, we would > just need to check for PM_STATUS_STARTING for start, PM_STATUS_STOPPING > for restart/stop. A related thing I came across the other day: systemd has a new sd_notify() functionality EXTEND_TIMEOUT_USEC where the service can notify systemd to extend the timeout. I think that's the same idea: You want to timeout if you're stuck, but you want to keep going as long as you're doing useful work. So yes, improving that would be welcome. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services