Thread: Reducing power consumption on idle servers
Some years ago we did a pass through the various worker processes to add hibernation as a mechanism to reduce power consumption on an idle server. Replication never got the memo, so power consumption on an idle server is not very effective on standby or logical subscribers. The code and timing for hibernation is also different for each worker, which is confusing. Proposal is to improve the situation and reduce power consumption on an idle server, with a series of fairly light changes to give positive green/environmental benefit. CURRENT STATE These servers naturally sleep for long periods when inactive: * postmaster - 60s * checkpointer - checkpoint_timeout * syslogger - log_rotation_age * pgarch - 60s * autovac launcher - autovacuum_naptime * walsender - wal_sender_timeout /2 * contrib/prewarm - checkpoint_timeout or shutdown * pgstat - except on windows, see later The following servers all have some kind of hibernation modes: * bgwriter - 50 * bgwriter_delay * walwriter. - 25 * wal_writer_delay These servers don't try to hibernate at all: * logical worker - 1s * logical launcher - wal_retrieve_retry_interval (undocumented) * startup - hardcoded 5s when streaming, wal_receiver_retry_interval for WAL files * wal_receiver - 100ms, currently gets woken when WAL arrives * pgstat - 2s (on Windows only) PROPOSED CHANGES 1. Standardize the hibernation time at 60s, using a #define HIBERNATE_DELAY_SEC 60 2. Refactor postmaster and pgarch to use that setting, rather than hardcoded 60 3. Standardize the hibernation design pattern through a set of macros in latch.h, based on the coding of walwriter.c, since that was the best example of hibernation code. Hibernation is independent for each process. This is explained in the header for latch.h, with an example in src/test/modules/worker_spi/worker_spi.c The intention here is to provide simple facilities to allow authors of bg workers to add hibernation code also. Summary: after 50 idle loops, hibernate for 60s each time through the loop In all cases, switch immediately back into action when needed. 4. Change these processes to hibernate using the standard design pattern * bgwriter - currently gets woken when user allocates a bugger, no change proposed * walwriter - currently gets woken by XLogFlush, no change proposed 5. Startup process has a hardcoded 5s loop because it checks for trigger file to promote it. So hibernating would mean that it would promote more slowly, and/or restart failing walreceiver more slowly, so this requires user approval, and hence add new GUCs to approve that choice. This is a valid choice because a long-term idle server is obviously not in current use, so waiting 60s for failover or restart is very unlikely to cause significant issue. Startup process is woken by WALReceiver when WAL arrives, so if all is well, Startup will swing back into action quickly when needed. If standby_can_hibernate = true, then these processes can hibernate: * startup * walreceiver - hibernate, but only for wal_receiver_timeout/2, not 60s, so other existing behavior is preserved. If subscription_can_hibernate = true, then these processes can hibernate: * logical launcher * logical worker - hibernate, but only for wal_receiver_timeout/2, not 60s, so other existing behavior is preserved. Patch to do all of the above attached. Comments please. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Hi, On 2022-02-19 14:10:39 +0000, Simon Riggs wrote: > Some years ago we did a pass through the various worker processes to > add hibernation as a mechanism to reduce power consumption on an idle > server. Replication never got the memo, so power consumption on an > idle server is not very effective on standby or logical subscribers. > The code and timing for hibernation is also different for each worker, > which is confusing. Yea, I think it's high time that we fix this. It's not even just power consumption: - the short timeouts hide all kinds of bugs around missed wakeups / racy wakeup sequences - debugging problems gets harder if there's lots of frequent activity > CURRENT STATE > > These servers naturally sleep for long periods when inactive: > * postmaster - 60s > * checkpointer - checkpoint_timeout > * syslogger - log_rotation_age > * pgarch - 60s Why do we need *any* timeout here? It seems one of those bug/race hiding things? Imo we should only use a timeout when a prior archive failed and rely on wakeups otherwise. > * autovac launcher - autovacuum_naptime On production systems autovacuum_naptime often can't be a large value, otherwise it's easy to not keep up on small busy tables. That's fine for actually busy servers, but with the increase in hosted PG offerings, the defaults in those offerings needs to cater to a broader audience. > These servers don't try to hibernate at all: > * logical worker - 1s Not great. > * logical launcher - wal_retrieve_retry_interval (undocumented) I think it's actually 180s in the happy path. The wal_retrieve_retry_interval is about how often workers get restarted. But if there's no need to restart, it sleeps longer. > * startup - hardcoded 5s when streaming, wal_receiver_retry_interval > for WAL files > * wal_receiver - 100ms, currently gets woken when WAL arrives This is a fairly insane one. We should compute a precise timeout based on wal_receiver_timeout. And it's not just one syscall every 100ms, it's recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) epoll_create1(EPOLL_CLOEXEC) = 6 epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0 epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0 epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0 epoll_wait(6, [], 1, 100) = 0 close(6) = 0 > PROPOSED CHANGES > > 1. Standardize the hibernation time at 60s, using a #define > HIBERNATE_DELAY_SEC 60 I don't think the hibernation stuff is a great pattern. When hibernation was introduced we neither had latches nor condition variables, so we couldn't easily do better. Today we have, so we should do better. We should either not use timeouts at all (e.g. pgarch without a preceding failure) and rely on being woken up when new work arrives. Or use precisely calculated timeouts (e.g. NOW() - last_recv_timestamp + wal_receiver_timeout) when there's a good reason to wake up (like needing to send a status message). IMO we should actually consider moving *away* from hibernation for the cases where we currently use it and rely much more heavily on being notified when there's work to do, without a timeout when not. > 5. Startup process has a hardcoded 5s loop because it checks for > trigger file to promote it. So hibernating would mean that it would > promote more slowly, and/or restart failing walreceiver more slowly, > so this requires user approval, and hence add new GUCs to approve that > choice. This is a valid choice because a long-term idle server is > obviously not in current use, so waiting 60s for failover or restart > is very unlikely to cause significant issue. There's plenty of databases that are close to read-only but business critical, so I don't buy that argument. IMO we should instead consider either deprecating file based promotion, or adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc) that avoid the need to poll for file creation. Greetings, Andres Freund
On Sun, Feb 20, 2022 at 6:03 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-02-19 14:10:39 +0000, Simon Riggs wrote: > > * wal_receiver - 100ms, currently gets woken when WAL arrives > > This is a fairly insane one. We should compute a precise timeout based on > wal_receiver_timeout. I proposed a patch to do that here: https://commitfest.postgresql.org/37/3520/ It needs a couple more tweaks (I realised it needs to round microsecond sleep times up to the nearest whole millisecond, explaining a few spurious wakeups, and Horiguchi-san had some more feedback for me that I haven't got to yet), but it seems close. > And it's not just one syscall every 100ms, it's > > recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) > epoll_create1(EPOLL_CLOEXEC) = 6 > epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0 > epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0 > epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0 > epoll_wait(6, [], 1, 100) = 0 > close(6) = 0 Yeah, I have a patch for that (no CF entry yet, will create shortly), and together these two patches get walreceiver's wait loop down to a single epoll_wait() call (or local equivalent) that waits the exact amount of time required to perform the next periodic action. > > 5. Startup process has a hardcoded 5s loop because it checks for > > trigger file to promote it. So hibernating would mean that it would > > promote more slowly, and/or restart failing walreceiver more slowly, > > so this requires user approval, and hence add new GUCs to approve that > > choice. This is a valid choice because a long-term idle server is > > obviously not in current use, so waiting 60s for failover or restart > > is very unlikely to cause significant issue. > > There's plenty of databases that are close to read-only but business critical, > so I don't buy that argument. > > IMO we should instead consider either deprecating file based promotion, or > adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc) > that avoid the need to poll for file creation. Yeah, I pondered inotify/KQ_FILTER_VNODE/FindFirstChangeNotification for this while working on 600f2f50. I realised I could probably teach a WaitEventSet to wake up when a file is added on most OSes, but I didn't try to prototype it... I agree that the whole file system-watching concept feels pretty clunky, so if just getting rid of it is an option... It would be nice to tame the walwriter's wakeups...
On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-02-19 14:10:39 +0000, Simon Riggs wrote: > > Some years ago we did a pass through the various worker processes to > > add hibernation as a mechanism to reduce power consumption on an idle > > server. Replication never got the memo, so power consumption on an > > idle server is not very effective on standby or logical subscribers. > > The code and timing for hibernation is also different for each worker, > > which is confusing. > > Yea, I think it's high time that we fix this. Good to have your support. There are millions of servers running idle for some part of their duty cycle. This patch seeks to change the situation for the better in PG15, i.e. soon, so the changes proposed are deliberately light. It also seeks to provide a framework that writers of background worker processes can follow, since we can't just fix core, we need to fix all the various bgworkers in use as well. > IMO we should instead consider either deprecating file based promotion, or > adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc) > that avoid the need to poll for file creation. Deprecating explicit file-based promotion is possible and simple, so that is the approach in the latest version of the patch. Thanks for the suggestion. I've re-analyzed all of the code around startup/walreceiver interactions and there isn't any need for 5s delay on startup process, IMHO, so we can sleep longer, for startup process. > IMO we should actually consider moving *away* from hibernation for the cases > where we currently use it and rely much more heavily on being notified when > there's work to do, without a timeout when not. I don't think that is a practical approach this close to end of PG15. This would require changing behavior of bgwriter, walwriter, pgarch when they are not broken. The likelihood that we would break something is too high. What is an issue is that the sleep times of various procs are on completely different cycles, which is why I am proposing normalizing them so that Postgres can actually sleep effectively. > > * autovac launcher - autovacuum_naptime > > On production systems autovacuum_naptime often can't be a large value, > otherwise it's easy to not keep up on small busy tables. That's fine for > actually busy servers, but with the increase in hosted PG offerings, the > defaults in those offerings needs to cater to a broader audience. Autovac varies its wakeup cycle according to how much work is done. It is OK to set autovacuum_naptime without affecting power consumption when idle. Idle for autovac is defined slightly differently, since if all user work completes then there may still be a lot of vacuuming to do before it goes fully idle. But my observation is that there are many servers that go idle for more than 50% of each week, when operating 8-12 hours per day, 5 days per week, so we can still save a lot of power. This patch doesn't change how autovac works, it just uses a common setting for the hibernation that eventually occurs. > > These servers don't try to hibernate at all: > > * logical worker - 1s > > Not great. Agreed, the patch improves this, roughly same as walreceiver. > > * logical launcher - wal_retrieve_retry_interval (undocumented) > > I think it's actually 180s in the happy path. The wal_retrieve_retry_interval > is about how often workers get restarted. But if there's no need to restart, > it sleeps longer. I propose normalizing all of the various hibernation times to the same value. > > * wal_receiver - 100ms, currently gets woken when WAL arrives > > This is a fairly insane one. We should compute a precise timeout based on > wal_receiver_timeout. That is exactly what the patch does, when it hibernates. I wasn't aware of Thomas' work, but now that I am we can choose which of those approaches to use for WAL receiver. I hope that we can fix logical worker and wal receiver to use the same algorithm. The rest of this patch would still be valid, whatever we do for those two procs. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Hi, On 02/21/22 11:11, Simon Riggs wrote: > This patch seeks to change the situation for the better in PG15, i.e. > soon, so the changes proposed are deliberately light. It also seeks to > provide a framework that writers of background worker processes can > follow, since we can't just fix core, we need to fix all the various > bgworkers in use as well. I think there might be a typo in the worker_spi.c example: + /* + * Use the standard design pattern for wait time/hibernation. + * After 50 consecutive loops with work_done=true the wait time + * will be set to the standard hibernation timeout of 60s. + */ + SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L); Shouldn't the comment be "with work_done=false" ? Regards, -Chap
On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote: > Shouldn't the comment be "with work_done=false" ? Good catch, thanks. I've also added docs to say that "promote_trigger_file" is now deprecated. There were no tests for that functionality, so just as well it is being removed. v3 attached. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Mon, Feb 21, 2022 at 5:11 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2022-02-19 14:10:39 +0000, Simon Riggs wrote: > > IMO we should instead consider either deprecating file based promotion, or > > adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc) > > that avoid the need to poll for file creation. Came here to suggest exactly that :) > Deprecating explicit file-based promotion is possible and simple, so > that is the approach in the latest version of the patch. Is there any actual use-case for this other than backwards compatibility? The alternative has certainly been around for some time now, so if we don't know a specific use-case for the file-based one it's definitely time to deprecate it properly. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: >> Deprecating explicit file-based promotion is possible and simple, so >> that is the approach in the latest version of the patch. > Is there any actual use-case for this other than backwards > compatibility? The fundamental problem with signal-based promotion is that it's an edge-triggered rather than level-triggered condition, ie if you miss the single notification event you're screwed. I'm not sure why we'd want to go there, considering all the problems we're having with edge-triggered APIs in nearby threads. We could combine the two approaches, perhaps: have "pg_ctl promote" create a trigger file and then send a signal. The trigger file would record the existence of the promotion request, so that if the postmaster isn't running right now or misses the signal, it'd still promote when restarted or otherwise at the next opportunity. But with an approach like this, we could expect quick response to the signal in normal conditions, without need for constant wakeups to check for the file. Also, this route would allow overloading of the signal, since it would just mean "wake up, I asked you to do something" rather than needing to carry the full semantics of what is to be done. regards, tom lane
On 2/21/22 10:11 AM, Simon Riggs wrote:
I'm wondering how many people understand that. I've seen a number of servers running very low values of autovacuum_naptime in order to make things more responsive.* autovac launcher - autovacuum_naptimeOn production systems autovacuum_naptime often can't be a large value, otherwise it's easy to not keep up on small busy tables. That's fine for actually busy servers, but with the increase in hosted PG offerings, the defaults in those offerings needs to cater to a broader audience.Autovac varies its wakeup cycle according to how much work is done. It is OK to set autovacuum_naptime without affecting power consumption when idle.
I'm wondering if it'd be worth linking autovac wakeup from a truly idle state to the stats collector. If there's no stats messages coming in clearly there's nothing new for autovac.Idle for autovac is defined slightly differently, since if all user work completes then there may still be a lot of vacuuming to do before it goes fully idle. But my observation is that there are many servers that go idle for more than 50% of each week, when operating 8-12 hours per day, 5 days per week, so we can still save a lot of power. This patch doesn't change how autovac works, it just uses a common setting for the hibernation that eventually occurs.
Jim Nasby <nasbyj@amazon.com> writes: > I'm wondering if it'd be worth linking autovac wakeup from a truly idle > state to the stats collector. If there's no stats messages coming in > clearly there's nothing new for autovac. That seems pretty scary in the current system design, where the stats collector is intentionally not 100% reliable (and sometimes, less intentionally, it fails completely). If we get to a place where we're willing to bank on stats being collected 100% of the time, it might make sense. regards, tom lane
Hi, > Replication never got the memo, so power consumption on an > idle server is not very effective on standby or logical subscribers. > The code and timing for hibernation is also different for each worker, > which is confusing. Agree, this patch makes it easier to understand the hibernation behavior of various workers. > 1. Standardize the hibernation time at 60s, using a #define > HIBERNATE_DELAY_SEC 60 I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300 seconds, what’s the reasoning behind it? Is longer hibernation delay better? If so can we set it to INT_MAX (the max timeout allowed by WaitLatch()) in which case a worker in hibernation only relies on wakeup? I think it would be nice to run experiments to verify that the patch reduces power consumption while varying the value of HIBERNATE_DELAY_SEC. Regards, Zheng Li Amazon RDS/Aurora for PostgreSQL On Thu, Mar 3, 2022 at 5:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Jim Nasby <nasbyj@amazon.com> writes: > > I'm wondering if it'd be worth linking autovac wakeup from a truly idle > > state to the stats collector. If there's no stats messages coming in > > clearly there's nothing new for autovac. > > That seems pretty scary in the current system design, where the > stats collector is intentionally not 100% reliable (and sometimes, > less intentionally, it fails completely). If we get to a place > where we're willing to bank on stats being collected 100% of the > time, it might make sense. > > regards, tom lane > >
On Sat, 26 Feb 2022 at 17:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Magnus Hagander <magnus@hagander.net> writes: > >> Deprecating explicit file-based promotion is possible and simple, so > >> that is the approach in the latest version of the patch. > > > Is there any actual use-case for this other than backwards > > compatibility? > > The fundamental problem with signal-based promotion is that it's > an edge-triggered rather than level-triggered condition, ie if you > miss the single notification event you're screwed. I'm not sure > why we'd want to go there, considering all the problems we're having > with edge-triggered APIs in nearby threads. > > We could combine the two approaches, perhaps: have "pg_ctl promote" > create a trigger file and then send a signal. The trigger file > would record the existence of the promotion request, so that if the > postmaster isn't running right now or misses the signal, it'd still > promote when restarted or otherwise at the next opportunity. > But with an approach like this, we could expect quick response to > the signal in normal conditions, without need for constant wakeups > to check for the file. Also, this route would allow overloading > of the signal, since it would just mean "wake up, I asked you to > do something" rather than needing to carry the full semantics of > what is to be done. The above is how this works now, luckily, but few comments explain why. This current state evolved because I first added file-based promotion, for the reasons you give, but it was slow, so the signal approach was added later, with new API. What we are discussing deprecating is the ability to create the trigger file directly (the original API). Once that is gone the mechanism would be to request promotion via pg_ctl promote (the new API), which then creates the trigger file and sends a signal. So in both APIs the trigger file is still used. In this patch, if you create the trigger file directly/explicitly/yourself (the old API) then it will still trigger when hibernating, but it will be slower. The new API is unaffected by this change. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote: > > 1. Standardize the hibernation time at 60s, using a #define > > HIBERNATE_DELAY_SEC 60 > > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300 > seconds, what’s the reasoning behind it? Is longer hibernation delay > better? If so can we set it to INT_MAX (the max timeout allowed by > WaitLatch()) in which case a worker in hibernation only relies on > wakeup? I think it would be nice to run experiments to verify that the > patch reduces power consumption while varying the value of > HIBERNATE_DELAY_SEC. Setting it to INT_MAX would be the same as not allowing a timeout, which changes a lot of current behavior and makes it less robust. Waking once per minute is what we do in various cases, so 60sec is a good choice. In the case of logical rep launcher we currently use 300sec, so using 60s would decrease this. I don't see much difference between power consumption with timeouts of 60s and 300s. In the latest patch, I chose 300s. Does anyone have an opinion on the value here? -- Simon Riggs http://www.EnterpriseDB.com/
Hi, On 2022-03-10 17:50:47 +0000, Simon Riggs wrote: > On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote: > > > > 1. Standardize the hibernation time at 60s, using a #define > > > HIBERNATE_DELAY_SEC 60 > > > > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300 > > seconds, what’s the reasoning behind it? Is longer hibernation delay > > better? If so can we set it to INT_MAX (the max timeout allowed by > > WaitLatch()) in which case a worker in hibernation only relies on > > wakeup? I think it would be nice to run experiments to verify that the > > patch reduces power consumption while varying the value of > > HIBERNATE_DELAY_SEC. > > Setting it to INT_MAX would be the same as not allowing a timeout, > which changes a lot of current behavior and makes it less robust. Most of these timeouts are a bad idea and should not exist. We repeatedly have had bugs where we were missing wakeups etc but those bugs were harder to notice because of timeouts. I'm against entrenching this stuff further. Greetings, Andres Freund
On 2022-02-21 21:04:19 +0000, Simon Riggs wrote: > On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote: > > > Shouldn't the comment be "with work_done=false" ? > > Good catch, thanks. > > I've also added docs to say that "promote_trigger_file" is now > deprecated. There were no tests for that functionality, so just as > well it is being removed. Doesn't pass tests, and hasn't for weeks from what I can see: https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153 Marked as waiting-on-author. - Andres
On Tue, 22 Mar 2022 at 00:54, Andres Freund <andres@anarazel.de> wrote: > > On 2022-02-21 21:04:19 +0000, Simon Riggs wrote: > > On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote: > > > > > Shouldn't the comment be "with work_done=false" ? > > > > Good catch, thanks. > > > > I've also added docs to say that "promote_trigger_file" is now > > deprecated. There were no tests for that functionality, so just as > > well it is being removed. > > Doesn't pass tests, and hasn't for weeks from what I can see: https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153 > > Marked as waiting-on-author. Hmm, just the not-in-sample test again. Fixed by marking GUC_NOT_IN_SAMPLE. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
At Thu, 10 Mar 2022 11:45:10 -0800, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-03-10 17:50:47 +0000, Simon Riggs wrote: > > On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote: > > > > > > 1. Standardize the hibernation time at 60s, using a #define > > > > HIBERNATE_DELAY_SEC 60 > > > > > > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300 > > > seconds, what’s the reasoning behind it? Is longer hibernation delay > > > better? If so can we set it to INT_MAX (the max timeout allowed by > > > WaitLatch()) in which case a worker in hibernation only relies on > > > wakeup? I think it would be nice to run experiments to verify that the > > > patch reduces power consumption while varying the value of > > > HIBERNATE_DELAY_SEC. > > > > Setting it to INT_MAX would be the same as not allowing a timeout, > > which changes a lot of current behavior and makes it less robust. > > Most of these timeouts are a bad idea and should not exist. We repeatedly have > had bugs where we were missing wakeups etc but those bugs were harder to I basically agree to this. > notice because of timeouts. I'm against entrenching this stuff further. For example, pgarch.c theoretically doesn't need hibernate, since it has an apparent trigger event and a terminal condition. What we need to do about archiver is to set timeout only when it didn't reach the lastest finished segment at an iteration. (this might need additional shared memory use, though..) I'm not sure about bgwriter, walwriter and logical replication stuff... About walreciver, - if ((now - startTime) > WALRCV_STARTUP_TIMEOUT) + if ((now - startTime) > wal_receiver_timeout) This is simply a misuse of the timeout. WALRCV_STARTP_TIMEOUT is not used for a sleep and irrelevant to receiving data from the peer walsender. wal_receiver_timeout's default is 60s but we don't assume that walreceiver takes that long time to start up. (I think Thomas is wroking on the walreceiver timeout stuff.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, 24 Mar 2022 at 07:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Most of these timeouts are a bad idea and should not exist. We repeatedly have > > had bugs where we were missing wakeups etc but those bugs were harder to > > I basically agree to this. As a general point, maybe. But we have a lot of procs doing a lot of polling and we are unlikely to change that anytime soon. > > notice because of timeouts. I'm against entrenching this stuff further. > > For example, pgarch.c theoretically doesn't need hibernate, since it > has an apparent trigger event and a terminal condition. What we need > to do about archiver is to set timeout only when it didn't reach the > lastest finished segment at an iteration. (this might need additional > shared memory use, though..) archiver is not the important thing here. If there is objection to that section of code we can remove that. > I'm not sure about bgwriter, walwriter and logical replication stuff... I am sure. bgwriter, walwriter and logical worker need action from us to allow them to hibernate in a sensible way that allows powersaving. > I think Thomas is wroking on the walreceiver timeout stuff. Yes, he is. I have no problem going with what he advocates, if others agree. However, that fix does not solve the whole problem, since many other procs also need fixing. The proposals of this patch are the following, each of which can be independently accepted/rejected: 1. fix the sleep pattern of bgwriter, walwriter and logical worker (directly affects powersave) 2. deprecate promote_trigger_file, which then allows us to fix the sleep for startup process (directly affects powersave) 3. treat hibernation in all procs the same, for simplicity, and to make sure we don't forget one later 4. provide a design pattern for background worker extensions to follow, so as to encourage powersaving Please don't reject the whole patch because you disagree with part (3), it is not that important. Thanks -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > The proposals of this patch are the following, each of which can be > independently accepted/rejected: > 1. fix the sleep pattern of bgwriter, walwriter and logical worker > (directly affects powersave) > 2. deprecate promote_trigger_file, which then allows us to fix the > sleep for startup process (directly affects powersave) > 3. treat hibernation in all procs the same, for simplicity, and to > make sure we don't forget one later > 4. provide a design pattern for background worker extensions to > follow, so as to encourage powersaving Unfortunately, the patch isn't split in a way that corresponds to this division. I think I'm +1 on change #2 -- deprecating promote_trigger_file seems like a good idea to me independently of any power-saving considerations. But I'm not sure that I am on board with any of the other changes. I do agree with the basic goal of trying to reduce unnecessary wakeups, but I think the rest of the patch is taking a bit of a one-size-fits-all approach where I think that we might want to be more nuanced. I think there are a couple of different kinds of cases here. In some places, like DetermineSleepTime(), we're already willing to sleep for pretty long periods of time, like a minute. I think in those cases we could consider doing nothing, on the theory that one wakeup per minute is already not very much. If we do want to do something, I think it should be in the direction of convincing ourselves that we don't need a timeout at all. Having a timeout is a bit like insurance: it guarantees that if for some reason the event by which we expect to be awoken doesn't actually wake us up even though something meaningful has happened, we won't hang forever. But if we think a wakeup per minute is meaningful and we don't think there's any possibility of a missed wakeup, let's just wait forever. I don't think we'll avoid many user complaints by recovering from a missed wakeup in just under an hour. In other places, like WalWriterMain, we're basically polling. One question in these kinds of cases is whether we can avoid polling in favor of having some other process wake us up if the event that we care about happens. This is unlikely to be practical in all cases - e.g. we can't wait for a promotion trigger file to show up unless we want to use inotify or something like that. However, we may be able to avoid polling in some instances. When we can't, then I think it makes sense to increase the wait time when the system appears to be idle. In that subset of cases - we're polling and we can't avoid polling - this kind of approach seems fairly reasonable. I do have some concerns about the idea that the right strategy in general, or even in particular cases, is to multiply by 50. This patch hasn't invented that problem; it's already there. My concern is that multiplying a small number by 50 seems very different than multiplying a large number by 50. If we normally wait for 100ms before checking for something to happen, and we wait for 5s, it's probably not going to be a huge issue, because 5s is still a short amount of time for human beings. If we normally wait for a minute before checking for something to happen and we wait for 50 minutes, that's much more likely to make a human being unhappy. Therefore, it's very unclear to me that those cases should be treated the same way. There's a more technical issue with this strategy, too: if we multiply both short and long timeouts by 50, I think we are going to get pretty much the same result as if we only multiply the short timeouts by 50. Why even bother reducing the frequency of wakeups from minutes to hours if we're elsewhere reducing the frequency from seconds to minutes? If process A is still waking up every minute, putting process B in the deep freeze isn't going to do a whole lot in terms of letting the system go into any kind of deeper sleep. All in all I feel that encouraging developers to adopt a multiply-by-50 rule in all cases seems too simplistic to me. It may be better than what we're doing right now, but it doesn't really seem like the right policy. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > The proposals of this patch are the following, each of which can be > > independently accepted/rejected: > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker > > (directly affects powersave) > > 2. deprecate promote_trigger_file, which then allows us to fix the > > sleep for startup process (directly affects powersave) > > 3. treat hibernation in all procs the same, for simplicity, and to > > make sure we don't forget one later > > 4. provide a design pattern for background worker extensions to > > follow, so as to encourage powersaving > > Unfortunately, the patch isn't split in a way that corresponds to this > division. I think I'm +1 on change #2 -- deprecating > promote_trigger_file seems like a good idea to me independently of any > power-saving considerations. But I'm not sure that I am on board with > any of the other changes. OK, so change (2) is good. Separate patch attached. > I do agree with the basic goal of trying to > reduce unnecessary wakeups, but I think the rest of the patch is > taking a bit of a one-size-fits-all approach where I think that we > might want to be more nuanced. I think there are a couple of different > kinds of cases here. OK, so you disagree with (3) and probably (4). No problem. What about (1)? That directly affects the powersave capability. I didn't read anything specific to that. If we don't fix (1) as well, the changes for startup and walreceiver will be ineffective for powersaving. What changes will be acceptable for bgwriter, walwriter and logical worker? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > What about (1)? That directly affects the powersave capability. I > didn't read anything specific to that. > > If we don't fix (1) as well, the changes for startup and walreceiver > will be ineffective for powersaving. > > What changes will be acceptable for bgwriter, walwriter and logical worker? Hmm, I think it would be fine to introduce some kind of hibernation mechanism for logical workers. bgwriter and wal writer already have a hibernation mechanism, so I'm not sure what your concern is there exactly. In your initial email you said you weren't proposing changes there, but maybe that changed somewhere in the course of the subsequent discussion. If you could catch me up on your present thinking that would be helpful. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
2022年3月25日(金) 1:03 Simon Riggs <simon.riggs@enterprisedb.com>: > > On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs > > <simon.riggs@enterprisedb.com> wrote: > > > The proposals of this patch are the following, each of which can be > > > independently accepted/rejected: > > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker > > > (directly affects powersave) > > > 2. deprecate promote_trigger_file, which then allows us to fix the > > > sleep for startup process (directly affects powersave) > > > 3. treat hibernation in all procs the same, for simplicity, and to > > > make sure we don't forget one later > > > 4. provide a design pattern for background worker extensions to > > > follow, so as to encourage powersaving > > > > Unfortunately, the patch isn't split in a way that corresponds to this > > division. I think I'm +1 on change #2 -- deprecating > > promote_trigger_file seems like a good idea to me independently of any > > power-saving considerations. But I'm not sure that I am on board with > > any of the other changes. > > OK, so change (2) is good. Separate patch attached. Hi cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is currently underway, this would be an excellent time to update the patch. [1] http://cfbot.cputube.org/patch_40_3566.log Thnks Ian Barwick
On Thu, 24 Mar 2022 at 16:02, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs > > <simon.riggs@enterprisedb.com> wrote: > > > The proposals of this patch are the following, each of which can be > > > independently accepted/rejected: > > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker > > > (directly affects powersave) > > > 2. deprecate promote_trigger_file, which then allows us to fix the > > > sleep for startup process (directly affects powersave) > > > 3. treat hibernation in all procs the same, for simplicity, and to > > > make sure we don't forget one later > > > 4. provide a design pattern for background worker extensions to > > > follow, so as to encourage powersaving > > > > Unfortunately, the patch isn't split in a way that corresponds to this > > division. I think I'm +1 on change #2 -- deprecating > > promote_trigger_file seems like a good idea to me independently of any > > power-saving considerations. But I'm not sure that I am on board with > > any of the other changes. > > OK, so change (2) is good. Separate patch attached. Thanks to Ian for prompting me to pick up this thread again; apologies for getting distracted. The attached patch is a reduced version of the original. It covers only: * deprecation of the promote_trigger_file - there are no tests that use that, hence why there is no test coverage for the patch * changing the sleep time of the startup process to 60s * docs and comments Other points will be discussed in another branch of this thread. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > The attached patch is a reduced version of the original. It covers only: > * deprecation of the promote_trigger_file - there are no tests that > use that, hence why there is no test coverage for the patch > * changing the sleep time of the startup process to 60s > * docs and comments LPGTM. If we also fix the bogus SIGALRM wakeups[1], then finally a completely idle recovery process looks like: kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) Presumably it would have no timeout at all in the next release. [1] https://www.postgresql.org/message-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com
On Sun, 13 Nov 2022 at 21:28, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > The attached patch is a reduced version of the original. It covers only: > > * deprecation of the promote_trigger_file - there are no tests that > > use that, hence why there is no test coverage for the patch > > * changing the sleep time of the startup process to 60s > > * docs and comments > > LPGTM. If we also fix the bogus SIGALRM wakeups[1], then finally a > completely idle recovery process looks like: > > kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) > kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) > kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) > > Presumably it would have no timeout at all in the next release. > > [1] https://www.postgresql.org/message-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com Clearly, I haven't been watching Hackers! Thanks for the nudge. See if this does the trick? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Sun, 13 Nov 2022 at 23:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Sun, 13 Nov 2022 at 21:28, Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs > > <simon.riggs@enterprisedb.com> wrote: > > > The attached patch is a reduced version of the original. It covers only: > > > * deprecation of the promote_trigger_file - there are no tests that > > > use that, hence why there is no test coverage for the patch > > > * changing the sleep time of the startup process to 60s > > > * docs and comments > > > > LPGTM. If we also fix the bogus SIGALRM wakeups[1], then finally a > > completely idle recovery process looks like: > > > > kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) > > kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) > > kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0) > > > > Presumably it would have no timeout at all in the next release. > > > > [1] https://www.postgresql.org/message-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com > > Clearly, I haven't been watching Hackers! Thanks for the nudge. > > See if this does the trick? Looks like the SIGALRM wakeups will be silenced on the other thread, so we can just revert back to using v6 of this patch. Reposting v6 now so that patch tester doesn't think this has failed when the patch on other thread gets applied. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > Reposting v6 now so that patch tester doesn't think this has failed > when the patch on other thread gets applied. Intention of the patch, that is, to get rid of promote_trigger_file GUC sometime in future, looks good to me. However, the timeout change to 60 sec from 5 sec seems far-reaching. While it discourages the use of the GUC, it can impact many existing production servers that still rely on promote_trigger_file as it can increase the failover times impacting SLAs around failover. How about retaining 5 sec as-is and adding a WARNING in promote_trigger_file check/assign and in show GUC, in CheckForStandbyTrigger() whenever PromoteTriggerFile is detected and specifying about the depreciation in GUC's description? + * to react to a trigger file. Direct use of trigger file + * is now deprecated and the promote_trigger_file will be + * removed in a later release. I think, adding 'Direct use of trigger file .....' in a next line that starts with XXX (typically, this represents a TODO item) is good, no? Also, do we need to add a TODO in postgresql wiki (https://wiki.postgresql.org/wiki/Todo), possibly under a new section 'Deprecated Features' or 'Features To-Be-Removed In Near Future' or some other (hm, it seems too vague, but it starts to track such deprecated items), to not miss on removing the promote_trigger_file in future releases? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > > > Reposting v6 now so that patch tester doesn't think this has failed > > when the patch on other thread gets applied. > > Intention of the patch, that is, to get rid of promote_trigger_file > GUC sometime in future, looks good to me. However, the timeout change > to 60 sec from 5 sec seems far-reaching. While it discourages the use > of the GUC, it can impact many existing production servers that still > rely on promote_trigger_file as it can increase the failover times > impacting SLAs around failover. The purpose of 60s is to allow for power reduction, so 5s won't do. promote_trigger_file is not tested and there are better ways, so deprecating it in this release is fine. Anyone that relies on it can update their mechanisms to a supported one with a one-line change. Realistically, anyone using it won't be on the latest release anyway, at least for a long time, since if they use manual methods then they are well behind the times. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, Nov 16, 2022 at 8:35 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs > > <simon.riggs@enterprisedb.com> wrote: > > > > > > Reposting v6 now so that patch tester doesn't think this has failed > > > when the patch on other thread gets applied. > > > > Intention of the patch, that is, to get rid of promote_trigger_file > > GUC sometime in future, looks good to me. However, the timeout change > > to 60 sec from 5 sec seems far-reaching. While it discourages the use > > of the GUC, it can impact many existing production servers that still > > rely on promote_trigger_file as it can increase the failover times > > impacting SLAs around failover. > > The purpose of 60s is to allow for power reduction, so 5s won't do. I understand. I know it's a bit hard to measure the power savings, I'm wondering if there's any info, maybe not necessarily related to postgres, but in general how much power gets saved if a certain number of waits/polls/system calls are reduced. > promote_trigger_file is not tested and there are better ways, so > deprecating it in this release is fine. Hm, but.. > Anyone that relies on it can update their mechanisms to a supported > one with a one-line change. Realistically, anyone using it won't be on > the latest release anyway, at least for a long time, since if they use > manual methods then they are well behind the times. I may be overly pessimistic here - the change from 5 sec to 60 sec for detecting promote_trigger_file will have a direct impact on failovers I believe. After upgrading to the version with 60 sec waiting, there'll be an increase in failover times if the developers miss to see the deprecation info about promote_trigger_file. I'm not sure what's the best way to discourage using promote_trigger_file. Maybe the change from 5 sec to 60 sec is the way. I'm not really sure. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, 17 Nov 2022 at 07:36, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > promote_trigger_file is not tested and there are better ways, so > > deprecating it in this release is fine. > > Hm, but.. > > > Anyone that relies on it can update their mechanisms to a supported > > one with a one-line change. Realistically, anyone using it won't be on > > the latest release anyway, at least for a long time, since if they use > > manual methods then they are well behind the times. > > I may be overly pessimistic here - the change from 5 sec to 60 sec for > detecting promote_trigger_file will have a direct impact on failovers > I believe. No, it will have a direct effect only on people using promote_trigger_file who do not read and act upon the deprecation notice before upgrading by making a one line change to their failover scripts. Since pretty much everyone doing HA uses external HA software (cloud or otherwise) this shouldn't affect anyone. -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > No, it will have a direct effect only on people using promote_trigger_file > who do not read and act upon the deprecation notice before upgrading > by making a one line change to their failover scripts. TBH, I wonder if we shouldn't just nuke promote_trigger_file. pg_promote was added in 2018, and pg_ctl promote was added in 2011. So even if we haven't said promote_trigger_file was deprecated, it hasn't been the state-of-the-art way of doing this in a really long time. If we think that there are still a lot of people using it, or if popular tools are relying on it, then perhaps a deprecation period is warranted anyway. But I think we should at least consider the possibility that it's OK to just get rid of it right away. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2022-11-17 13:06:23 +0530, Bharath Rupireddy wrote: > I understand. I know it's a bit hard to measure the power savings, I'm > wondering if there's any info, maybe not necessarily related to > postgres, but in general how much power gets saved if a certain number > of waits/polls/system calls are reduced. It's heavily hardware and hardware settings dependent. On some systems you can get an *approximate* idea of the power usage even while plugged in. On others you can only see it while on battery power. On systems with RAPL support (most Intel and I think newer AMD CPUs) you can query power usage with: powerstat -R 1 (adding -D provides a bit more detail) But note that RAPL typically severely undercounts power usage because it doesn't cover a lot of sources of power usage (display, sometimes memory, all other peripherals). Sometimes powerstat -R -D can split power usage up more granularly, e.g. between the different CPU sockets and memory. On laptops you can often measure the discharge rate when not plugged in, with powerstat -d 0 1. But IME the latency till the values update makes it harder to interpret. On some workstations / servers you can read the power usage via ACPI. E.g. on my workstation 'sensors' shows a power_meter-acpi-0 As an example of the difference it can make, here's the output of powerstat -D 1 on my laptop. Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts 16:41:55 0.1 0.0 0.1 99.8 0.0 1 403 246 1 0 1 2.62 16:41:56 0.1 0.0 0.1 99.8 0.0 1 357 196 1 0 1 2.72 16:41:57 0.1 0.0 0.1 99.6 0.2 1 510 231 4 0 4 2.64 16:41:58 0.1 0.0 0.1 99.9 0.0 2 1350 758 64 62 63 4.06 16:41:59 0.3 0.0 1.0 98.7 0.0 2 4166 2406 244 243 244 7.20 16:42:00 0.2 0.0 0.7 99.1 0.0 2 4203 2353 247 246 247 7.21 16:42:01 0.5 0.0 1.6 98.0 0.0 2 4079 2395 240 239 240 7.08 16:42:02 0.5 0.0 0.9 98.7 0.0 2 4097 2405 245 243 245 7.20 16:42:03 0.4 0.0 1.3 98.3 0.0 2 4117 2311 243 242 243 7.14 16:42:04 0.1 0.0 0.4 99.4 0.1 1 1721 1152 70 70 71 4.48 16:42:05 0.1 0.0 0.2 99.8 0.0 1 433 250 1 0 1 2.92 16:42:06 0.0 0.0 0.3 99.7 0.0 1 400 231 1 0 1 2.66 In the period of higher power etc usage I ran while true;do sleep 0.001;done and then interupted that after a bit with ctrl-c. Same thing on my workstation (: Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts 16:43:48 1.0 0.0 0.2 98.7 0.1 1 8218 2354 0 0 0 46.43 16:43:49 1.1 0.0 0.3 98.7 0.0 1 7866 2477 0 0 0 45.99 16:43:50 1.1 0.0 0.4 98.5 0.0 2 7753 2996 0 0 0 48.93 16:43:51 0.8 0.0 1.7 97.5 0.0 1 9395 5285 0 0 0 75.48 16:43:52 0.5 0.0 1.7 97.8 0.0 1 9141 4806 0 0 0 75.30 16:43:53 1.1 0.0 1.8 97.1 0.0 2 10065 5504 0 0 0 76.27 16:43:54 1.3 0.0 1.5 97.2 0.0 1 10962 5165 0 0 0 76.33 16:43:55 0.9 0.0 0.8 98.3 0.0 1 8452 3939 0 0 0 61.99 16:43:56 0.6 0.0 0.1 99.3 0.0 2 6541 1999 0 0 0 40.92 16:43:57 0.9 0.0 0.2 98.9 0.0 2 8199 2477 0 0 0 42.91 And if I query the power supply via ACPI instead: while true;do sensors power_meter-acpi-0|grep power1|awk '{print $2, $3}';sleep 1;done ... 163.00 W 173.00 W 173.00 W 172.00 W 203.00 W 206.00 W 206.00 W 206.00 W 209.00 W 205.00 W 211.00 W 213.00 W 203.00 W 175.00 W 166.00 W ... As you can see the difference is quite substantial. This is solely due to a 1ms sleep loop (albeit one where we fork a process after each sleep, which likely is a good chunk of the CPU usage). However, the difference in power usage will be far smaller if the system already busy, because the CPU will already run at a high frequency. Greetings, Andres Freund
On Fri, Nov 18, 2022 at 6:29 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-11-17 13:06:23 +0530, Bharath Rupireddy wrote: > > I understand. I know it's a bit hard to measure the power savings, I'm > > wondering if there's any info, maybe not necessarily related to > > postgres, but in general how much power gets saved if a certain number > > of waits/polls/system calls are reduced. > > It's heavily hardware and hardware settings dependent. > > On some systems you can get an *approximate* idea of the power usage even > while plugged in. On others you can only see it while on battery power. > > On systems with RAPL support (most Intel and I think newer AMD CPUs) you can > query power usage with: > powerstat -R 1 (adding -D provides a bit more detail) > > But note that RAPL typically severely undercounts power usage because it > doesn't cover a lot of sources of power usage (display, sometimes memory, > all other peripherals). > > Sometimes powerstat -R -D can split power usage up more granularly, > e.g. between the different CPU sockets and memory. > > > On laptops you can often measure the discharge rate when not plugged in, with > powerstat -d 0 1. But IME the latency till the values update makes it harder > to interpret. > > On some workstations / servers you can read the power usage via ACPI. E.g. on > my workstation 'sensors' shows a power_meter-acpi-0 > > > As an example of the difference it can make, here's the output of > powerstat -D 1 > on my laptop. > > Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts > 16:41:55 0.1 0.0 0.1 99.8 0.0 1 403 246 1 0 1 2.62 > 16:41:56 0.1 0.0 0.1 99.8 0.0 1 357 196 1 0 1 2.72 > 16:41:57 0.1 0.0 0.1 99.6 0.2 1 510 231 4 0 4 2.64 > 16:41:58 0.1 0.0 0.1 99.9 0.0 2 1350 758 64 62 63 4.06 > 16:41:59 0.3 0.0 1.0 98.7 0.0 2 4166 2406 244 243 244 7.20 > 16:42:00 0.2 0.0 0.7 99.1 0.0 2 4203 2353 247 246 247 7.21 > 16:42:01 0.5 0.0 1.6 98.0 0.0 2 4079 2395 240 239 240 7.08 > 16:42:02 0.5 0.0 0.9 98.7 0.0 2 4097 2405 245 243 245 7.20 > 16:42:03 0.4 0.0 1.3 98.3 0.0 2 4117 2311 243 242 243 7.14 > 16:42:04 0.1 0.0 0.4 99.4 0.1 1 1721 1152 70 70 71 4.48 > 16:42:05 0.1 0.0 0.2 99.8 0.0 1 433 250 1 0 1 2.92 > 16:42:06 0.0 0.0 0.3 99.7 0.0 1 400 231 1 0 1 2.66 > > In the period of higher power etc usage I ran > while true;do sleep 0.001;done > and then interupted that after a bit with ctrl-c. > > Same thing on my workstation (: > > Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts > 16:43:48 1.0 0.0 0.2 98.7 0.1 1 8218 2354 0 0 0 46.43 > 16:43:49 1.1 0.0 0.3 98.7 0.0 1 7866 2477 0 0 0 45.99 > 16:43:50 1.1 0.0 0.4 98.5 0.0 2 7753 2996 0 0 0 48.93 > 16:43:51 0.8 0.0 1.7 97.5 0.0 1 9395 5285 0 0 0 75.48 > 16:43:52 0.5 0.0 1.7 97.8 0.0 1 9141 4806 0 0 0 75.30 > 16:43:53 1.1 0.0 1.8 97.1 0.0 2 10065 5504 0 0 0 76.27 > 16:43:54 1.3 0.0 1.5 97.2 0.0 1 10962 5165 0 0 0 76.33 > 16:43:55 0.9 0.0 0.8 98.3 0.0 1 8452 3939 0 0 0 61.99 > 16:43:56 0.6 0.0 0.1 99.3 0.0 2 6541 1999 0 0 0 40.92 > 16:43:57 0.9 0.0 0.2 98.9 0.0 2 8199 2477 0 0 0 42.91 > > > And if I query the power supply via ACPI instead: > > while true;do sensors power_meter-acpi-0|grep power1|awk '{print $2, $3}';sleep 1;done > ... > 163.00 W > 173.00 W > 173.00 W > 172.00 W > 203.00 W > 206.00 W > 206.00 W > 206.00 W > 209.00 W > 205.00 W > 211.00 W > 213.00 W > 203.00 W > 175.00 W > 166.00 W > ... > > As you can see the difference is quite substantial. This is solely due to a > 1ms sleep loop (albeit one where we fork a process after each sleep, which > likely is a good chunk of the CPU usage). > > However, the difference in power usage will be far smaller if the system > already busy, because the CPU will already run at a high frequency. Thanks a lot for sharing the details. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Nov 18, 2022 at 2:08 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > No, it will have a direct effect only on people using promote_trigger_file > > who do not read and act upon the deprecation notice before upgrading > > by making a one line change to their failover scripts. > > TBH, I wonder if we shouldn't just nuke promote_trigger_file. > pg_promote was added in 2018, and pg_ctl promote was added in 2011. So > even if we haven't said promote_trigger_file was deprecated, it hasn't > been the state-of-the-art way of doing this in a really long time. If > we think that there are still a lot of people using it, or if popular > tools are relying on it, then perhaps a deprecation period is > warranted anyway. But I think we should at least consider the > possibility that it's OK to just get rid of it right away. I agree with Robert here. It's better to do away with promote_trigger_file, at least that's better than deprecating it now and removing it somewhere in later releases. However, I'm a bit worried about how it'll affect the tools/providers/extensions that depend on it. And it turns out that this worry exists for every feature that one thinks to get rid of. If we go the route of doing away with promote_trigger_file, I think we need to discuss it in a separate thread as the subject of this thread doesn't match with what we're discussing here. This way, we'll get thoughts from other hackers. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, 18 Nov 2022 at 08:55, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > However, I'm a bit > worried about how it'll affect the tools/providers/extensions that > depend on it. Who is that? Which ones depend upon it? -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, 17 Nov 2022 at 20:38, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > No, it will have a direct effect only on people using promote_trigger_file > > who do not read and act upon the deprecation notice before upgrading > > by making a one line change to their failover scripts. > > TBH, I wonder if we shouldn't just nuke promote_trigger_file. > pg_promote was added in 2018, and pg_ctl promote was added in 2011. So > even if we haven't said promote_trigger_file was deprecated, it hasn't > been the state-of-the-art way of doing this in a really long time. If > we think that there are still a lot of people using it, or if popular > tools are relying on it, then perhaps a deprecation period is > warranted anyway. But I think we should at least consider the > possibility that it's OK to just get rid of it right away. I agree. I can't see a reason to keep it anymore. I'm nervous about not having any wakeup at all, but since we are removing the parameter there is no other reason not to do as Andres suggests. New version attached, which assumes that the SIGALRMs are silenced on the other thread. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > I agree. I can't see a reason to keep it anymore. + Use of <varname>promote_trigger_file</varname> is deprecated. If you're I think 'deprecated' usually implies that it still works but you should avoid it. I think you need something stronger. > I'm nervous about not having any wakeup at all, but since we are > removing the parameter there is no other reason not to do as Andres > suggests. Why? If we're accidentally relying on this timeout for recovery to not hang in some situation, that's a bug waiting to be discovered and fixed and it won't be this patch's fault. > New version attached, which assumes that the SIGALRMs are silenced on > the other thread. I tested this + Bharath's v5 from the other thread. meson test passes, and tracing the recovery process shows that it is indeed, finally, completely idle. Huzzah!
On Fri, 18 Nov 2022 at 20:26, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > I agree. I can't see a reason to keep it anymore. > > + Use of <varname>promote_trigger_file</varname> is deprecated. If you're > > I think 'deprecated' usually implies that it still works but you > should avoid it. I think you need something stronger. Whisky? Or maybe just reword the sentence... New version attached. > > I'm nervous about not having any wakeup at all, but since we are > > removing the parameter there is no other reason not to do as Andres > > suggests. > > Why? If we're accidentally relying on this timeout for recovery to > not hang in some situation, that's a bug waiting to be discovered and > fixed and it won't be this patch's fault. > > > New version attached, which assumes that the SIGALRMs are silenced on > > the other thread. > > I tested this + Bharath's v5 from the other thread. meson test > passes, and tracing the recovery process shows that it is indeed, > finally, completely idle. Huzzah! Thanks for testing! Finally completely idle? Good to achieve that. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > New version attached. Fix for doc xref -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Thu, 24 Mar 2022 at 16:21, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs > > What changes will be acceptable for bgwriter, walwriter and logical worker? > > Hmm, I think it would be fine to introduce some kind of hibernation > mechanism for logical workers. bgwriter and wal writer already have a > hibernation mechanism, so I'm not sure what your concern is there > exactly. In your initial email you said you weren't proposing changes > there, but maybe that changed somewhere in the course of the > subsequent discussion. If you could catch me up on your present > thinking that would be helpful. Now that we seem to have solved the problem for Startup process, let's circle back to the others.... Bgwriter does hibernate currently, but only at 50x the bgwriter_delay. At default values that is 5s, but could be much less. So we need to move that up to 60s, same as others. WALwriter also hibernates currently, but only at 25x the wal_writer_delay. At default values that is 2.5s, but could be much less. So we need to move that up to 60s, same as others. At the same time, make sure that when we hibernate we use a new WaitEvent, similarly to the way bgwriter reports its hibernation state (which also helps test the patch). As a 3rd patch, I will work on making logical workers hibernate. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Mon, Nov 21, 2022 at 3:35 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > New version attached. > > Fix for doc xref I removed a stray variable declaration from xlogrecovery.h, and wrote a draft commit message. I'll wait 24 hours before committing, to provide a last chance for anyone who wants to complain about dropping promote_trigger_file. (We could of course change it so that the timeout based wakeup only happens if you have actually set promote_trigger_file, but I think we've established that we just don't want the filesystem polling feature so I'm whispering this in parentheses.)
Attachment
On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > As a 3rd patch, I will work on making logical workers hibernate. Duelling patch warning: Nathan mentioned[1] that he's hacking on a patch for that, along the lines of the recent walreceiver change IIUC. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJGhX4r2LPUE3Oy9BX71Eum6PBcS8L3sJpScR9oKaTVaA%40mail.gmail.com
On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote: > On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: >> As a 3rd patch, I will work on making logical workers hibernate. > > Duelling patch warning: Nathan mentioned[1] that he's hacking on a > patch for that, along the lines of the recent walreceiver change IIUC. I coded something up last week, but, like the walreceiver patch, it caused check-world to take much longer [0], and I haven't looked into whether it could be easily fixed. I'm hoping to make some time for this again in the near future. [0] https://postgr.es/m/20221113222644.GA1269110%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote: > I'll wait 24 hours before committing, to > provide a last chance for anyone who wants to complain about dropping > promote_trigger_file. Remove "promote_trigger_file"? Now I have never seen anybody use that parameter, but I don't think that it is a good idea to deviate from our usual standard of deprecating a feature for about five years before actually removing it. Yours, Laurenz Albe
On Mon, Nov 21, 2022 at 10:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote: > > I'll wait 24 hours before committing, to > > provide a last chance for anyone who wants to complain about dropping > > promote_trigger_file. > > Remove "promote_trigger_file"? Now I have never seen anybody use that > parameter, but I don't think that it is a good idea to deviate from our > usual standard of deprecating a feature for about five years before > actually removing it. I'm not sure what the guidelines are here, however years have gone by since pg_ctl promote [1] in 2011 and pg_promote() [2] in 2018 were added. With two such alternatives in place for many years, it was sort of an undeclared deprecation of promote_trigger_file GUC. And the changes required to move to newer ways from the GUC aren't that hard for those who're still relying on the GUC. Therefore, I think it's now time for us to do away with the GUC. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4695da5ae97bbb58d274887fd68edbe88d03ebcb [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10074651e3355e2405015f6253602be8344bc829 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Nov 21, 2022 at 2:43 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Nov 21, 2022 at 3:35 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > New version attached. > > > > Fix for doc xref > > I removed a stray variable declaration from xlogrecovery.h, and wrote > a draft commit message. I'll wait 24 hours before committing, to > provide a last chance for anyone who wants to complain about dropping > promote_trigger_file. > > (We could of course change it so that the timeout based wakeup only > happens if you have actually set promote_trigger_file, but I think > we've established that we just don't want the filesystem polling > feature so I'm whispering this in parentheses.) Thanks. The v11 patch mostly looks good to me and it passes cirrus-ci tests - https://github.com/BRupireddy/postgres/tree/do_away_with_promote_trigger_file. I have a comment: - * Wait for more WAL to arrive. Time out after 5 seconds - * to react to a trigger file promptly and to check if the - * WAL receiver is still active. + * Wait for more WAL to arrive, when we will be woken + * immediately by the WAL receiver. Use of trigger file + * via promote_trigger_file is now fully removed. */ Do we need to introduce reference to removal of promote_trigger_file in the code? If left as-is, it'll lie there for many years. Isn't it enough to specify in appendix-obsolete-recovery-config.sgml? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote: > > I'll wait 24 hours before committing, to > > provide a last chance for anyone who wants to complain about dropping > > promote_trigger_file. > > Remove "promote_trigger_file"? Now I have never seen anybody use that > parameter, but I don't think that it is a good idea to deviate from our > usual standard of deprecating a feature for about five years before > actually removing it. We aren't removing the ability to promote, just enforcing a change to a better mechanism, hence I don't see a reason for a long(er) deprecation period than we have already had. -- Simon Riggs http://www.EnterpriseDB.com/
On Sun, 20 Nov 2022 at 22:55, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote: > > On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs > > <simon.riggs@enterprisedb.com> wrote: > >> As a 3rd patch, I will work on making logical workers hibernate. > > > > Duelling patch warning: Nathan mentioned[1] that he's hacking on a > > patch for that, along the lines of the recent walreceiver change IIUC. > > I coded something up last week, but, like the walreceiver patch, it caused > check-world to take much longer [0], and I haven't looked into whether it > could be easily fixed. I'm hoping to make some time for this again in the > near future. > > [0] https://postgr.es/m/20221113222644.GA1269110%40nathanxps13 OK, Nathan, will leave this one to you - remembering that we need to fix ALL processes to get a useful power reduction when idle. -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, 2022-11-21 at 11:42 +0530, Bharath Rupireddy wrote: > On Mon, Nov 21, 2022 at 10:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote: > > > I'll wait 24 hours before committing, to > > > provide a last chance for anyone who wants to complain about dropping > > > promote_trigger_file. > > > > Remove "promote_trigger_file"? Now I have never seen anybody use that > > parameter, but I don't think that it is a good idea to deviate from our > > usual standard of deprecating a feature for about five years before > > actually removing it. > > I'm not sure what the guidelines are here, however years have gone by > since pg_ctl promote [1] in 2011 and pg_promote() [2] in 2018 were > added. With two such alternatives in place for many years, it was sort > of an undeclared deprecation of promote_trigger_file GUC. And the > changes required to move to newer ways from the GUC aren't that hard > for those who're still relying on the GUC. Therefore, I think it's now > time for us to do away with the GUC. I disagree. With the same argument, you could rip out "serial", since we have supported identity columns since v11. I understand the desire to avoid needless wakeups, but isn't it possible to only perform the regular poll if "promote_trigger_file" is set? Yours, Laurenz Albe
On Mon, 2022-11-21 at 07:36 +0000, Simon Riggs wrote: > On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote: > > > I'll wait 24 hours before committing, to > > > provide a last chance for anyone who wants to complain about dropping > > > promote_trigger_file. > > > > Remove "promote_trigger_file"? Now I have never seen anybody use that > > parameter, but I don't think that it is a good idea to deviate from our > > usual standard of deprecating a feature for about five years before > > actually removing it. > > We aren't removing the ability to promote, just enforcing a change to > a better mechanism, hence I don't see a reason for a long(er) > deprecation period than we have already had. We have had a deprecation period? I looked at the documentation, but found no mention of a deprecation. How hard can it be to leave the GUC and only poll for the existence of the file if it is set? I personally don't need the GUC, and I know nobody who does, but I think we should not be cavalier about introducing unnecessary compatibility breaks. Yours, Laurenz Albe
On Mon, 21 Nov 2022 at 08:40, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2022-11-21 at 07:36 +0000, Simon Riggs wrote: > > On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > > > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote: > > > > I'll wait 24 hours before committing, to > > > > provide a last chance for anyone who wants to complain about dropping > > > > promote_trigger_file. > > > > > > Remove "promote_trigger_file"? Now I have never seen anybody use that > > > parameter, but I don't think that it is a good idea to deviate from our > > > usual standard of deprecating a feature for about five years before > > > actually removing it. > > > > We aren't removing the ability to promote, just enforcing a change to > > a better mechanism, hence I don't see a reason for a long(er) > > deprecation period than we have already had. > > We have had a deprecation period? I looked at the documentation, but found > no mention of a deprecation. How hard can it be to leave the GUC and only > poll for the existence of the file if it is set? > > I personally don't need the GUC, and I know nobody who does, Nobody else does either. > I disagree. With the same argument, you could rip out "serial", since we > have supported identity columns since v11. ...and this is not a user facing change, only HA systems interface with this. > but I think > we should not be cavalier about introducing unnecessary compatibility breaks. I agree we should not be cavalier, nor has anyone been so. The first version of the patch was cautious on this very point, but since many people think we should remove it, it is not a user facing feature and nobody on this thread knows anybody or anything that uses it, I have changed my mind and now think we should remove it. We have two versions to choose from now * v6 deprecates this option * v10 removes this option so it is no effort at all to choose either option. The main issue is about reducing power usage, so please let's move to commit something soon, one way or another. -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, Nov 21, 2022 at 3:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > We have had a deprecation period? I looked at the documentation, but found > no mention of a deprecation. How hard can it be to leave the GUC and only > poll for the existence of the file if it is set? > > I personally don't need the GUC, and I know nobody who does, but I think > we should not be cavalier about introducing unnecessary compatibility breaks. As a project, we have a problem with deprecation warnings, simply because community consensus changes over time as people change their minds and as participants come and go. Slapping a deprecation notice on things saying that they will be removed in five years does not mean they will actually get removed in five years, nor does the failure to slap a deprecation notice on them mean that they won't be removed in five years. We have some instances of following through on such promises, but plenty where what we said we were going to do and what we actually did were quite different. Furthermore, even if we were 100% on follow-through on such things, I have no faith in the idea that a mandatory five-year deprecation period in all cases would be good for the project. I think it's totally reasonable to phase out a little-used feature that no one really cares about faster than a feature that is heavily-relied upon and for which there are no reasonable alternatives. Something like standard_conforming_strings or the switch from md5 to SCRAM authentication affects not only database users but also drivers and applications; time needs to be allowed for the effect to seep through the whole ecosystem. Backward compatibility is really important for the transition period. In this case, though, the alternatives have already existed for a long time and we have reasons to think most people have already been using them for a long time. Moreover, if they haven't been, the transition shouldn't be too hard. For there to be a problem here, we have to imagine a user who not can't run pg_promote() or pg_ctl promote, who can only create a file, and who furthermore needs that file to be located someplace other than the data directory. I'm having a hard time envisioning that. The reason that I pushed back -- not as successfully as I would have liked -- on the changes to pg_stop_backup / pg_start_backup is that I know there are people using the old method successfully, and it's not just a 1:1 substitution. Here I don't, and it is. I'm totally open to the feedback that such people exist and to hearing why adopting one of the newer methods would be a problem for them, if that's the case. But if there's no evidence that such people exist or that changing is a problem for them, I don't think waiting 5 years on principle is good for the project. P.S. One notable example of how bad we are at these things is that contrib/xml2 has been deprecated since PostgreSQL 8.3. The deprecation notice says that it isn't needed any more because all of the functionality is otherwise available, but some people think that's not true so we haven't removed the module. An obvious alternative would be to remove the deprecation notice but FWICT other people think that getting rid of it is the right idea so we haven't done that either. Whee. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > The reason that I pushed back -- not as successfully as I would have > liked -- on the changes to pg_stop_backup / pg_start_backup is that I > know there are people using the old method successfully, and it's not > just a 1:1 substitution. Here I don't, and it is. I'm totally open to > the feedback that such people exist and to hearing why adopting one of > the newer methods would be a problem for them, if that's the case. But > if there's no evidence that such people exist or that changing is a > problem for them, I don't think waiting 5 years on principle is good > for the project. We make incompatible changes in every release; see the release notes. Unless somebody can give a plausible use-case where this'd be a difficult change to deal with, I concur that we don't need to deprecate it ahead of time. regards, tom lane
On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > The reason that I pushed back -- not as successfully as I would have > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I > > know there are people using the old method successfully, and it's not > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to > > the feedback that such people exist and to hearing why adopting one of > > the newer methods would be a problem for them, if that's the case. But > > if there's no evidence that such people exist or that changing is a > > problem for them, I don't think waiting 5 years on principle is good > > for the project. > > We make incompatible changes in every release; see the release notes. > Unless somebody can give a plausible use-case where this'd be a > difficult change to deal with, I concur that we don't need to > deprecate it ahead of time. Since I am the only one that seems to worry, I'll shut up. You are probably right that it the feature won't be missed by many users. Yours, Laurenz Albe
2022年11月22日(火) 5:50 Laurenz Albe <laurenz.albe@cybertec.at>: > > On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > > The reason that I pushed back -- not as successfully as I would have > > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I > > > know there are people using the old method successfully, and it's not > > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to > > > the feedback that such people exist and to hearing why adopting one of > > > the newer methods would be a problem for them, if that's the case. But > > > if there's no evidence that such people exist or that changing is a > > > problem for them, I don't think waiting 5 years on principle is good > > > for the project. > > > > We make incompatible changes in every release; see the release notes. > > Unless somebody can give a plausible use-case where this'd be a > > difficult change to deal with, I concur that we don't need to > > deprecate it ahead of time. > > Since I am the only one that seems to worry, I'll shut up. You are probably > right that it the feature won't be missed by many users. FWIW, though I prefer to err on the side of caution when removing features from anything, I am struggling to remember ever having used "promote_trigger_file" (or "trigger_file" as it was in Pg11 and earlier); grepping my private notes brings up a single example from ca. 2012 when I appear to have been experimenting with replication. On a quick web search, a large part of the results are related to its change to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files; most of the rest seem to be blog articles/tutorials on setting up replication. Regards Ian Barwick
On Sun, Nov 27, 2022 at 6:15 PM Ian Lawrence Barwick <barwick@gmail.com> wrote: > 2022年11月22日(火) 5:50 Laurenz Albe <laurenz.albe@cybertec.at>: > > On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote: > > > Robert Haas <robertmhaas@gmail.com> writes: > > > > The reason that I pushed back -- not as successfully as I would have > > > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I > > > > know there are people using the old method successfully, and it's not > > > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to > > > > the feedback that such people exist and to hearing why adopting one of > > > > the newer methods would be a problem for them, if that's the case. But > > > > if there's no evidence that such people exist or that changing is a > > > > problem for them, I don't think waiting 5 years on principle is good > > > > for the project. > > > > > > We make incompatible changes in every release; see the release notes. > > > Unless somebody can give a plausible use-case where this'd be a > > > difficult change to deal with, I concur that we don't need to > > > deprecate it ahead of time. > > > > Since I am the only one that seems to worry, I'll shut up. You are probably > > right that it the feature won't be missed by many users. > > FWIW, though I prefer to err on the side of caution when removing features > from anything, I am struggling to remember ever having used > "promote_trigger_file" > (or "trigger_file" as it was in Pg11 and earlier); grepping my private notes > brings up a single example from ca. 2012 when I appear to have been > experimenting with replication. > > On a quick web search, a large part of the results are related to its change > to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files; > most of the rest seem to be blog articles/tutorials on setting up replication. Thanks Ian, Laurenz, Robert, Tom for the discussion. Seems like we're all set then. Sorry for delaying over trivialities, but I have a couple more comments about the documentation before committing: "The trigger_file and promote_trigger_file have been removed." was missing some words. I've also added a sentence to say which releases were involved, to make it like nearby paragraphs about other obsolete stuff. The funny thing about that "obsolete" appendix is that it's intended as a URL-preserving way to document the recovery.conf file's demise in r12. It doesn't look like the right place to document ongoing changes to recovery-related GUCs in general. In this particular case it's sort of OK because the old trigger_file GUC was indeed in recovery.conf, so it's not completely irrelevant. So maybe it's OK? Perhaps in future, in a separate commit, we could have a new appendix "Obsolete settings" that has an alphabetical list of the fallen? The main documentation of pg_promote() etc now has "The parameter <varname>promote_trigger_file</varname> has been removed" in the places where the GUC was previously mentioned. Perhaps we should just remove the mentions completely (it's somehow either too much and not enough information), but I'm OK with leaving those in for now until some better place exists for historical notes. So this is the version I will push unless someone sees anything more to tweak about the documentation.
Attachment
On Mon, Nov 28, 2022 at 5:27 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > "The trigger_file and promote_trigger_file have been removed." was > missing some words. I've also added a sentence to say which releases > were involved, to make it like nearby paragraphs about other obsolete > stuff. LGTM. > The funny thing about that "obsolete" appendix is that it's intended > as a URL-preserving way to document the recovery.conf file's demise in > r12. It doesn't look like the right place to document ongoing changes > to recovery-related GUCs in general. In this particular case it's > sort of OK because the old trigger_file GUC was indeed in > recovery.conf, so it's not completely irrelevant. So maybe it's OK? > Perhaps in future, in a separate commit, we could have a new appendix > "Obsolete settings" that has an alphabetical list of the fallen? Tracking down history to figure out and add all the removed/renamed settings under a new appendix seems a tedious task. The best is to look into corresponding release notes to figure out which ones have been removed/renamed/altered IMO. FWIW, it's been a while (3 major releases have happened) since recovery.conf was removed, I think we need to do away with appendix-obsolete-recovery-config.sgml someday. > The main documentation of pg_promote() etc now has "The parameter > <varname>promote_trigger_file</varname> has been removed" in the > places where the GUC was previously mentioned. Perhaps we should just > remove the mentions completely (it's somehow either too much and not > enough information), but I'm OK with leaving those in for now until > some better place exists for historical notes. +1 for the way it is in the v12 patch i.e. specifying removal of it in the docs. However, I don't see any reason to keep the traces of it in the code, can we remove "Use of trigger file via promote_trigger_file is now fully removed." sentence in the code below? + * Wait for more WAL to arrive, when we will be woken + * immediately by the WAL receiver. Use of trigger file + * via promote_trigger_file is now fully removed. */ > So this is the version I will push unless someone sees anything more > to tweak about the documentation. Thanks. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Sun, Nov 27, 2022 at 6:57 PM Thomas Munro <thomas.munro@gmail.com> wrote: > The main documentation of pg_promote() etc now has "The parameter > <varname>promote_trigger_file</varname> has been removed" in the > places where the GUC was previously mentioned. Perhaps we should just > remove the mentions completely (it's somehow either too much and not > enough information), but I'm OK with leaving those in for now until > some better place exists for historical notes. I think we should remove those mentions. Otherwise the documentation just collects mentions of an increasing number of things that are no longer relevant. And, as you say, we can't presume that future readers would know what promote_trigger_file even is. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I think we should remove those mentions. Otherwise the documentation > just collects mentions of an increasing number of things that are no > longer relevant. Yeah, I think the same. There will be a release-note entry, and I don't object to having something about it in appendix-obsolete.sgml; but we shouldn't clutter the main docs with it. regards, tom lane
I found some more comments and some documentation to word-smith very lightly, and pushed. The comments were stray references to the trigger file. It's a little confusing because the remaining mechanism also uses a file, but it uses a signal first so seems better to refer to promotion requests rather than files. /me wonders how many idle servers there are in the world
On Mon, 28 Nov 2022 at 23:16, Thomas Munro <thomas.munro@gmail.com> wrote: > > I found some more comments and some documentation to word-smith very > lightly, and pushed. Thanks > The comments were stray references to the > trigger file. It's > a little confusing because the remaining mechanism also uses a file, > but it uses a signal first so seems better to refer to promotion > requests rather than files. ..and again > /me wonders how many idle servers there are in the world My estimate is there are 100 million servers in use worldwide, with only about 1% of them on a continuously busy duty cycle and most of them not in the cloud. If we guess that we save 10W when idle, then that saves some proportion of a GW. It's not a huge contribution to the effort, but if by doing this we inspire others to do the same, we may yet make a difference. -- Simon Riggs http://www.EnterpriseDB.com/
On Sun, 20 Nov 2022 at 20:00, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Thu, 24 Mar 2022 at 16:21, Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs > > > > What changes will be acceptable for bgwriter, walwriter and logical worker? > > > > Hmm, I think it would be fine to introduce some kind of hibernation > > mechanism for logical workers. bgwriter and wal writer already have a > > hibernation mechanism, so I'm not sure what your concern is there > > exactly. In your initial email you said you weren't proposing changes > > there, but maybe that changed somewhere in the course of the > > subsequent discussion. If you could catch me up on your present > > thinking that would be helpful. > > Now that we seem to have solved the problem for Startup process, let's > circle back to the others.... Thanks for committing changes to the startup process. > Bgwriter does hibernate currently, but only at 50x the bgwriter_delay. > At default values that is 5s, but could be much less. So we need to > move that up to 60s, same as others. > > WALwriter also hibernates currently, but only at 25x the > wal_writer_delay. At default values that is 2.5s, but could be much > less. So we need to move that up to 60s, same as others. At the same > time, make sure that when we hibernate we use a new WaitEvent, > similarly to the way bgwriter reports its hibernation state (which > also helps test the patch). Re-attaching patch for bgwriter and walwriter, so it is clear this is not yet committed. I've added this to the next CF, since the entry was closed when the startup patch was committed. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > Re-attaching patch for bgwriter and walwriter, so it is clear this is > not yet committed. I'm just curious, and not suggesting that 60s wakeups are a problem for the polar ice caps, but why even time out at all? Are the latch protocols involved not reliable enough? At a guess from a quick glance, the walwriter's is but maybe the bgwriter could miss a wakeup as it races against StrategyGetBuffer(), which means you might stay asleep until the *next* buffer allocation, but that's already true I think, and a 60s timeout is not much of a defence.
On Wed, 30 Nov 2022 at 03:50, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > Re-attaching patch for bgwriter and walwriter, so it is clear this is > > not yet committed. > > I'm just curious, and not suggesting that 60s wakeups are a problem > for the polar ice caps, but why even time out at all? Are the latch > protocols involved not reliable enough? At a guess from a quick > glance, the walwriter's is but maybe the bgwriter could miss a wakeup > as it races against StrategyGetBuffer(), which means you might stay > asleep until the *next* buffer allocation, but that's already true I > think, and a 60s timeout is not much of a defence. That sounds reasonable. It does sound like we agree that the existing behavior of waking up every 5s or 2.5s is not good. I hope you will act to improve that. The approach taken in this patch, and others of mine, has been to offer a minimal change that achieves the objective of lengthy hibernation to save power. Removing the timeout entirely may not work in other circumstances I have not explored. Doing that requires someone to check it actually works, and for others to believe that check has occurred. For me, that is too time consuming to actually happen in this dev cycle, and time is part of the objective since perfect designs yet with unreleased code have no utility. <Simon enters lengthy hibernation> -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, Nov 30, 2022 at 7:40 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > On Wed, 30 Nov 2022 at 03:50, Thomas Munro <thomas.munro@gmail.com> wrote: > > I'm just curious, and not suggesting that 60s wakeups are a problem > > for the polar ice caps, but why even time out at all? Are the latch > > protocols involved not reliable enough? At a guess from a quick > > glance, the walwriter's is but maybe the bgwriter could miss a wakeup > > as it races against StrategyGetBuffer(), which means you might stay > > asleep until the *next* buffer allocation, but that's already true I > > think, and a 60s timeout is not much of a defence. > > That sounds reasonable. > > It does sound like we agree that the existing behavior of waking up > every 5s or 2.5s is not good. I hope you will act to improve that. > > The approach taken in this patch, and others of mine, has been to > offer a minimal change that achieves the objective of lengthy > hibernation to save power. > > Removing the timeout entirely may not work in other circumstances I > have not explored. Doing that requires someone to check it actually > works, and for others to believe that check has occurred. For me, that > is too time consuming to actually happen in this dev cycle, and time > is part of the objective since perfect designs yet with unreleased > code have no utility. > > <Simon enters lengthy hibernation> <Thomas returns from aestivation> Yeah, I definitely want to fix it. I just worry that 60s is so long that it also needs that analysis work to be done to explain that it's OK that we're a bit sloppy on noticing when to wake up, at which point you might as well go to infinity.
Thomas Munro <thomas.munro@gmail.com> writes: > Yeah, I definitely want to fix it. I just worry that 60s is so long > that it also needs that analysis work to be done to explain that it's > OK that we're a bit sloppy on noticing when to wake up, at which point > you might as well go to infinity. Yeah. The perfectionist in me wants to say that there should be explicit wakeups for every event of interest, in which case there's no need for a timeout. The engineer in me says "but what about bugs?". Better a slow reaction than never reacting at all. OTOH, then you have to have a discussion about whether 60s (or any other ice-cap-friendly value) is an acceptable response time even in the presence of bugs. It's kind of moot until we've reached the point where we can credibly claim to have explicit wakeups for every event of interest. I don't think we're very close to that today, and I do think we should try to get closer. There may come a point of diminishing returns though. regards, tom lane
On Wed, Jan 25, 2023 at 2:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > Yeah, I definitely want to fix it. I just worry that 60s is so long > > that it also needs that analysis work to be done to explain that it's > > OK that we're a bit sloppy on noticing when to wake up, at which point > > you might as well go to infinity. > > Yeah. The perfectionist in me wants to say that there should be > explicit wakeups for every event of interest, in which case there's > no need for a timeout. The engineer in me says "but what about bugs?". > Better a slow reaction than never reacting at all. OTOH, then you > have to have a discussion about whether 60s (or any other > ice-cap-friendly value) is an acceptable response time even in the > presence of bugs. > > It's kind of moot until we've reached the point where we can > credibly claim to have explicit wakeups for every event of > interest. I don't think we're very close to that today, and > I do think we should try to get closer. There may come a point > of diminishing returns though. IIUC, we're discussing here whether or not to get rid of hibernate loops, IOW, sleep-wakeup-doworkifthereisany-sleep loops and rely on other processes' wakeup signals to reduce the overall power consumption, am I right? I'm trying to understand this a bit - can the signals (especially, SIGURG that we use to set latches to wake up processes) ever get lost on the way before reaching the target process? If yes, how? How frequently can it happen? Is there any history of reported issues in postgres because a signal got lost? I'm reading about Pending Signals and queuing of signals with sigqueue() (in linux), can any of these guarantee that signals sent never get lost? FWIW, a recent commit cd4329d that removed promote_trigger_file also removed hibernation and added wait forever relying on the latch set. If we're worried that the signals can get lost on the way, then this also needs to be fixed. And, I also see lot of WaitLatch() with waiting forever relying on others to wake them up. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Jan 27, 2023 at 7:37 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Wed, Jan 25, 2023 at 2:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It's kind of moot until we've reached the point where we can > > credibly claim to have explicit wakeups for every event of > > interest. I don't think we're very close to that today, and > > I do think we should try to get closer. There may come a point > > of diminishing returns though. > > IIUC, we're discussing here whether or not to get rid of hibernate > loops, IOW, sleep-wakeup-doworkifthereisany-sleep loops and rely on > other processes' wakeup signals to reduce the overall power > consumption, am I right? > > I'm trying to understand this a bit - can the signals (especially, > SIGURG that we use to set latches to wake up processes) ever get lost > on the way before reaching the target process? If yes, how? How > frequently can it happen? Is there any history of reported issues in > postgres because a signal got lost? > > I'm reading about Pending Signals and queuing of signals with > sigqueue() (in linux), can any of these guarantee that signals sent > never get lost? Signals don't get lost. At least with the current definition of so called "reliable" signals, in the POSIX standard. (There was a previous system of signals in ancient UNIX that was harder to program with; we still see echos of that today, for example in C standard there are basic signals, and Windows implements those, but they're borderline useless; before a signal handler runs, the handler is also de-registered, so many interesting handlers would have to re-register it, but signals can't be atomically blocked at the same time and there is a window of time where the default behaviour applies, which (from our modern perspective) is clearly insane as there is literally no way to close that race and avoid some fatal default behaviour). BTW sigqueue (signals that are queued up without being consolidated, and carry a user-controlled value with them) are "realtime signals" AKA "signal queuing", which I guess we'll never be able to use because at least Apple didn't implement them; the "reliable signals" we use are the plain kind where signals don't carry any kind of payload and are collapsed into a single "pending" bit, so that if 2 people do kill(SIGFOO) around the same time your handler might only run once (if you registered one and its not blocked), but it will definitely run > 0 times (or be delievered via various other means such as sigwait(), etc...). Then there is the question of races as you go into the wait primitive. I mean in between our reading latch->is_set and entering the kernel, which we defend against using various techniques which you can read about at the top of latch.c; the short version is that if that happens, those system calls will immediately return. The uncertainty discussed here comes from the comment beginning "There is a race condition here, ..." in bgwriter.c, referring to the protocol implemented by StrategyNotifyBgWriter(). I haven't really looked into it. I *guess* it's probably approximately OK if the bgwriter misses a wakeup from time to time, because there's another chance to wake it up as soon as someone needs another buffer, and if you don't need any more buffers soon your system is probably idle; but really "needing another buffer" isn't a great proxy for "more buffers are being made dirty" at all! Someone would need to think about it and construct a complete argument for why it's OK to go to sleep for 60 seconds, or infinity, with that sloppiness in place. There's also the walwriter to look into; from memory it was a little less fuzzy but I haven't looked recently.
On Fri, Jan 27, 2023 at 12:41 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > There's also the walwriter to look into; from memory it was a little > less fuzzy but I haven't looked recently. Thanks. I tried to do away with the walwriter hibernation for just some time and made it wait indefinitely until an event occurs. Once walwriter finds no work for some time, it goes to wait forever mode and it's woken up when someone inserts WAL. Please see the attached patch for more details. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com