Thread: Reducing power consumption on idle servers

Reducing power consumption on idle servers

From
Simon Riggs
Date:
Some years ago we did a pass through the various worker processes to
add hibernation as a mechanism to reduce power consumption on an idle
server. Replication never got the memo, so power consumption on an
idle server is not very effective on standby or logical subscribers.
The code and timing for hibernation is also different for each worker,
which is confusing.

Proposal is to improve the situation and reduce power consumption on
an idle server, with a series of fairly light changes to give positive
green/environmental benefit.

CURRENT STATE

These servers naturally sleep for long periods when inactive:
* postmaster - 60s
* checkpointer - checkpoint_timeout
* syslogger - log_rotation_age
* pgarch - 60s
* autovac launcher - autovacuum_naptime
* walsender - wal_sender_timeout /2
* contrib/prewarm - checkpoint_timeout or shutdown
* pgstat - except on windows, see later

The following servers all have some kind of hibernation modes:
* bgwriter - 50 * bgwriter_delay
* walwriter. - 25 * wal_writer_delay

These servers don't try to hibernate at all:
* logical worker - 1s
* logical launcher - wal_retrieve_retry_interval (undocumented)
* startup - hardcoded 5s when streaming, wal_receiver_retry_interval
for WAL files
* wal_receiver - 100ms, currently gets woken when WAL arrives
* pgstat - 2s (on Windows only)

PROPOSED CHANGES

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

2. Refactor postmaster and pgarch to use that setting, rather than hardcoded 60

3. Standardize the hibernation design pattern through a set of macros
in latch.h,
    based on the coding of walwriter.c, since that was the best
example of hibernation code.
    Hibernation is independent for each process.
    This is explained in the header for latch.h, with an example in
   src/test/modules/worker_spi/worker_spi.c
   The intention here is to provide simple facilities to allow authors
of bg workers to add hibernation code also.
   Summary: after 50 idle loops, hibernate for 60s each time through the loop
   In all cases, switch immediately back into action when needed.

4. Change these processes to hibernate using the standard design pattern
* bgwriter - currently gets woken when user allocates a bugger, no
change proposed
* walwriter - currently gets woken by XLogFlush, no change proposed

5. Startup process has a hardcoded 5s loop because it checks for
trigger file to promote it. So hibernating would mean that it would
promote more slowly, and/or restart failing walreceiver more slowly,
so this requires user approval, and hence add new GUCs to approve that
choice. This is a valid choice because a long-term idle server is
obviously not in current use, so waiting 60s for failover or restart
is very unlikely to cause significant issue. Startup process is woken
by WALReceiver when WAL arrives, so if all is well, Startup will swing
back into action quickly when needed.

If standby_can_hibernate = true, then these processes can hibernate:
* startup
* walreceiver - hibernate, but only for wal_receiver_timeout/2, not
60s, so other existing behavior is preserved.

If subscription_can_hibernate = true, then these processes can hibernate:
* logical launcher
* logical worker - hibernate, but only for wal_receiver_timeout/2, not
60s, so other existing behavior is preserved.

Patch to do all of the above attached.

Comments please.

--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Andres Freund
Date:
Hi,

On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
> Some years ago we did a pass through the various worker processes to
> add hibernation as a mechanism to reduce power consumption on an idle
> server. Replication never got the memo, so power consumption on an
> idle server is not very effective on standby or logical subscribers.
> The code and timing for hibernation is also different for each worker,
> which is confusing.

Yea, I think it's high time that we fix this.

It's not even just power consumption:
- the short timeouts hide all kinds of bugs around missed wakeups / racy
  wakeup sequences
- debugging problems gets harder if there's lots of frequent activity


> CURRENT STATE
>
> These servers naturally sleep for long periods when inactive:
> * postmaster - 60s
> * checkpointer - checkpoint_timeout
> * syslogger - log_rotation_age
> * pgarch - 60s

Why do we need *any* timeout here? It seems one of those bug/race hiding
things? Imo we should only use a timeout when a prior archive failed and rely
on wakeups otherwise.


> * autovac launcher - autovacuum_naptime

On production systems autovacuum_naptime often can't be a large value,
otherwise it's easy to not keep up on small busy tables. That's fine for
actually busy servers, but with the increase in hosted PG offerings, the
defaults in those offerings needs to cater to a broader audience.


> These servers don't try to hibernate at all:
> * logical worker - 1s

Not great.


> * logical launcher - wal_retrieve_retry_interval (undocumented)

I think it's actually 180s in the happy path. The wal_retrieve_retry_interval
is about how often workers get restarted. But if there's no need to restart,
it sleeps longer.


> * startup - hardcoded 5s when streaming, wal_receiver_retry_interval
> for WAL files

> * wal_receiver - 100ms, currently gets woken when WAL arrives

This is a fairly insane one. We should compute a precise timeout based on
wal_receiver_timeout.

And it's not just one syscall every 100ms, it's

recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
epoll_create1(EPOLL_CLOEXEC)            = 6
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0
epoll_wait(6, [], 1, 100)               = 0
close(6)                                = 0



> PROPOSED CHANGES
>
> 1. Standardize the hibernation time at 60s, using a #define
> HIBERNATE_DELAY_SEC 60

I don't think the hibernation stuff is a great pattern. When hibernation was
introduced we neither had latches nor condition variables, so we couldn't
easily do better. Today we have, so we should do better.

We should either not use timeouts at all (e.g. pgarch without a preceding
failure) and rely on being woken up when new work arrives.

Or use precisely calculated timeouts (e.g. NOW() - last_recv_timestamp +
wal_receiver_timeout) when there's a good reason to wake up (like needing to
send a status message).


IMO we should actually consider moving *away* from hibernation for the cases
where we currently use it and rely much more heavily on being notified when
there's work to do, without a timeout when not.


> 5. Startup process has a hardcoded 5s loop because it checks for
> trigger file to promote it. So hibernating would mean that it would
> promote more slowly, and/or restart failing walreceiver more slowly,
> so this requires user approval, and hence add new GUCs to approve that
> choice. This is a valid choice because a long-term idle server is
> obviously not in current use, so waiting 60s for failover or restart
> is very unlikely to cause significant issue.

There's plenty of databases that are close to read-only but business critical,
so I don't buy that argument.

IMO we should instead consider either deprecating file based promotion, or
adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
that avoid the need to poll for file creation.


Greetings,

Andres Freund



Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Sun, Feb 20, 2022 at 6:03 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
> > * wal_receiver - 100ms, currently gets woken when WAL arrives
>
> This is a fairly insane one. We should compute a precise timeout based on
> wal_receiver_timeout.

I proposed a patch to do that here: https://commitfest.postgresql.org/37/3520/

It needs a couple more tweaks (I realised it needs to round
microsecond sleep times up to the nearest whole millisecond,
explaining a few spurious wakeups, and Horiguchi-san had some more
feedback for me that I haven't got to yet), but it seems close.

> And it's not just one syscall every 100ms, it's
>
> recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> epoll_create1(EPOLL_CLOEXEC)            = 6
> epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0
> epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0
> epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0
> epoll_wait(6, [], 1, 100)               = 0
> close(6)                                = 0

Yeah, I have a patch for that (no CF entry yet, will create shortly),
and together these two patches get walreceiver's wait loop down to a
single epoll_wait() call (or local equivalent) that waits the exact
amount of time required to perform the next periodic action.

> > 5. Startup process has a hardcoded 5s loop because it checks for
> > trigger file to promote it. So hibernating would mean that it would
> > promote more slowly, and/or restart failing walreceiver more slowly,
> > so this requires user approval, and hence add new GUCs to approve that
> > choice. This is a valid choice because a long-term idle server is
> > obviously not in current use, so waiting 60s for failover or restart
> > is very unlikely to cause significant issue.
>
> There's plenty of databases that are close to read-only but business critical,
> so I don't buy that argument.
>
> IMO we should instead consider either deprecating file based promotion, or
> adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
> that avoid the need to poll for file creation.

Yeah, I pondered inotify/KQ_FILTER_VNODE/FindFirstChangeNotification
for this while working on 600f2f50.  I realised I could probably teach
a WaitEventSet to wake up when a file is added on most OSes, but I
didn't try to prototype it... I agree that the whole file
system-watching concept feels pretty clunky, so if just getting rid of
it is an option...

It would be nice to tame the walwriter's wakeups...



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
> > Some years ago we did a pass through the various worker processes to
> > add hibernation as a mechanism to reduce power consumption on an idle
> > server. Replication never got the memo, so power consumption on an
> > idle server is not very effective on standby or logical subscribers.
> > The code and timing for hibernation is also different for each worker,
> > which is confusing.
>
> Yea, I think it's high time that we fix this.

Good to have your support. There are millions of servers running idle
for some part of their duty cycle.

This patch seeks to change the situation for the better in PG15, i.e.
soon, so the changes proposed are deliberately light. It also seeks to
provide a framework that writers of background worker processes can
follow, since we can't just fix core, we need to fix all the various
bgworkers in use as well.

> IMO we should instead consider either deprecating file based promotion, or
> adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
> that avoid the need to poll for file creation.

Deprecating explicit file-based promotion is possible and simple, so
that is the approach in the latest version of the patch.

Thanks for the suggestion.

I've re-analyzed all of the code around startup/walreceiver
interactions and there isn't any need for 5s delay on startup process,
IMHO, so we can sleep longer, for startup process.

> IMO we should actually consider moving *away* from hibernation for the cases
> where we currently use it and rely much more heavily on being notified when
> there's work to do, without a timeout when not.

I don't think that is a practical approach this close to end of PG15.
This would require changing behavior of bgwriter, walwriter, pgarch
when they are not broken. The likelihood that we would break something
is too high.

What is an issue is that the sleep times of various procs are on
completely different cycles, which is why I am proposing normalizing
them so that Postgres can actually sleep effectively.

> > * autovac launcher - autovacuum_naptime
>
> On production systems autovacuum_naptime often can't be a large value,
> otherwise it's easy to not keep up on small busy tables. That's fine for
> actually busy servers, but with the increase in hosted PG offerings, the
> defaults in those offerings needs to cater to a broader audience.

Autovac varies its wakeup cycle according to how much work is done. It
is OK to set autovacuum_naptime without affecting power consumption
when idle.

Idle for autovac is defined slightly differently, since if all user
work completes then there may still be a lot of vacuuming to do before
it goes fully idle. But my observation is that there are many servers
that go idle for more than 50% of each week, when operating 8-12 hours
per day, 5 days per week, so we can still save a lot of power.

This patch doesn't change how autovac works, it just uses a common
setting for the hibernation that eventually occurs.

> > These servers don't try to hibernate at all:
> > * logical worker - 1s
>
> Not great.

Agreed, the patch improves this, roughly same as walreceiver.

> > * logical launcher - wal_retrieve_retry_interval (undocumented)
>
> I think it's actually 180s in the happy path. The wal_retrieve_retry_interval
> is about how often workers get restarted. But if there's no need to restart,
> it sleeps longer.

I propose normalizing all of the various hibernation times to the same value.


> > * wal_receiver - 100ms, currently gets woken when WAL arrives
>
> This is a fairly insane one. We should compute a precise timeout based on
> wal_receiver_timeout.

That is exactly what the patch does, when it hibernates.


I wasn't aware of Thomas' work, but now that I am we can choose which
of those approaches to use for WAL receiver. I hope that we can fix
logical worker and wal receiver to use the same algorithm. The rest of
this patch would still be valid, whatever we do for those two procs.

--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Chapman Flack
Date:
Hi,

On 02/21/22 11:11, Simon Riggs wrote:
> This patch seeks to change the situation for the better in PG15, i.e.
> soon, so the changes proposed are deliberately light. It also seeks to
> provide a framework that writers of background worker processes can
> follow, since we can't just fix core, we need to fix all the various
> bgworkers in use as well.

I think there might be a typo in the worker_spi.c example:

+    /*
+     * Use the standard design pattern for wait time/hibernation.
+     * After 50 consecutive loops with work_done=true the wait time
+     * will be set to the standard hibernation timeout of 60s.
+     */
+    SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L);


Shouldn't the comment be "with work_done=false" ?

Regards,
-Chap



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote:

> Shouldn't the comment be "with work_done=false" ?

Good catch, thanks.

I've also added docs to say that "promote_trigger_file" is now
deprecated. There were no tests for that functionality, so just as
well it is being removed.

v3 attached.


--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Magnus Hagander
Date:
On Mon, Feb 21, 2022 at 5:11 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
> > IMO we should instead consider either deprecating file based promotion, or
> > adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
> > that avoid the need to poll for file creation.

Came here to suggest exactly that :)


> Deprecating explicit file-based promotion is possible and simple, so
> that is the approach in the latest version of the patch.

Is there any actual use-case for this other than backwards
compatibility? The alternative has certainly been around for some time
now, so if we don't know a specific use-case for the file-based one
it's definitely time to deprecate it properly.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Reducing power consumption on idle servers

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
>> Deprecating explicit file-based promotion is possible and simple, so
>> that is the approach in the latest version of the patch.

> Is there any actual use-case for this other than backwards
> compatibility?

The fundamental problem with signal-based promotion is that it's
an edge-triggered rather than level-triggered condition, ie if you
miss the single notification event you're screwed.  I'm not sure
why we'd want to go there, considering all the problems we're having
with edge-triggered APIs in nearby threads.

We could combine the two approaches, perhaps: have "pg_ctl promote"
create a trigger file and then send a signal.  The trigger file
would record the existence of the promotion request, so that if the
postmaster isn't running right now or misses the signal, it'd still
promote when restarted or otherwise at the next opportunity.
But with an approach like this, we could expect quick response to
the signal in normal conditions, without need for constant wakeups
to check for the file.  Also, this route would allow overloading
of the signal, since it would just mean "wake up, I asked you to
do something" rather than needing to carry the full semantics of
what is to be done.

            regards, tom lane



Re: Reducing power consumption on idle servers

From
Jim Nasby
Date:

On 2/21/22 10:11 AM, Simon Riggs wrote:

* autovac launcher - autovacuum_naptime
On production systems autovacuum_naptime often can't be a large value,
otherwise it's easy to not keep up on small busy tables. That's fine for
actually busy servers, but with the increase in hosted PG offerings, the
defaults in those offerings needs to cater to a broader audience.
Autovac varies its wakeup cycle according to how much work is done. It
is OK to set autovacuum_naptime without affecting power consumption
when idle.
I'm wondering how many people understand that. I've seen a number of servers running very low values of autovacuum_naptime in order to make things more responsive.
Idle for autovac is defined slightly differently, since if all user
work completes then there may still be a lot of vacuuming to do before
it goes fully idle. But my observation is that there are many servers
that go idle for more than 50% of each week, when operating 8-12 hours
per day, 5 days per week, so we can still save a lot of power.

This patch doesn't change how autovac works, it just uses a common
setting for the hibernation that eventually occurs.
I'm wondering if it'd be worth linking autovac wakeup from a truly idle state to the stats collector. If there's no stats messages coming in clearly there's nothing new for autovac.

Re: Reducing power consumption on idle servers

From
Tom Lane
Date:
Jim Nasby <nasbyj@amazon.com> writes:
> I'm wondering if it'd be worth linking autovac wakeup from a truly idle 
> state to the stats collector. If there's no stats messages coming in 
> clearly there's nothing new for autovac.

That seems pretty scary in the current system design, where the
stats collector is intentionally not 100% reliable (and sometimes,
less intentionally, it fails completely).  If we get to a place
where we're willing to bank on stats being collected 100% of the
time, it might make sense.

            regards, tom lane



Re: Reducing power consumption on idle servers

From
Zheng Li
Date:
Hi,

> Replication never got the memo, so power consumption on an
> idle server is not very effective on standby or logical subscribers.
> The code and timing for hibernation is also different for each worker,
> which is confusing.

Agree, this patch makes it easier to understand the hibernation
behavior of various workers.

> 1. Standardize the hibernation time at 60s, using a #define
> HIBERNATE_DELAY_SEC 60

I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
seconds, what’s the reasoning behind it? Is longer hibernation delay
better? If so can we set it to INT_MAX (the max timeout allowed by
WaitLatch()) in which case a worker in hibernation only relies on
wakeup? I think it would be nice to run experiments to verify that the
patch reduces power consumption while varying the value of
HIBERNATE_DELAY_SEC.

Regards,
Zheng Li
Amazon RDS/Aurora for PostgreSQL



On Thu, Mar 3, 2022 at 5:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jim Nasby <nasbyj@amazon.com> writes:
> > I'm wondering if it'd be worth linking autovac wakeup from a truly idle
> > state to the stats collector. If there's no stats messages coming in
> > clearly there's nothing new for autovac.
>
> That seems pretty scary in the current system design, where the
> stats collector is intentionally not 100% reliable (and sometimes,
> less intentionally, it fails completely).  If we get to a place
> where we're willing to bank on stats being collected 100% of the
> time, it might make sense.
>
>                         regards, tom lane
>
>



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sat, 26 Feb 2022 at 17:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> >> Deprecating explicit file-based promotion is possible and simple, so
> >> that is the approach in the latest version of the patch.
>
> > Is there any actual use-case for this other than backwards
> > compatibility?
>
> The fundamental problem with signal-based promotion is that it's
> an edge-triggered rather than level-triggered condition, ie if you
> miss the single notification event you're screwed.  I'm not sure
> why we'd want to go there, considering all the problems we're having
> with edge-triggered APIs in nearby threads.
>
> We could combine the two approaches, perhaps: have "pg_ctl promote"
> create a trigger file and then send a signal.  The trigger file
> would record the existence of the promotion request, so that if the
> postmaster isn't running right now or misses the signal, it'd still
> promote when restarted or otherwise at the next opportunity.
> But with an approach like this, we could expect quick response to
> the signal in normal conditions, without need for constant wakeups
> to check for the file.  Also, this route would allow overloading
> of the signal, since it would just mean "wake up, I asked you to
> do something" rather than needing to carry the full semantics of
> what is to be done.

The above is how this works now, luckily, but few comments explain why.

This current state evolved because I first added file-based promotion,
for the reasons you give, but it was slow, so the signal approach was
added later, with new API.

What we are discussing deprecating is the ability to create the
trigger file directly (the original API). Once that is gone the
mechanism would be to request promotion via pg_ctl promote (the new
API), which then creates the trigger file and sends a signal. So in
both APIs the trigger file is still used.

In this patch, if you create the trigger file
directly/explicitly/yourself (the old API) then it will still trigger
when hibernating, but it will be slower. The new API is unaffected by
this change.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote:

> > 1. Standardize the hibernation time at 60s, using a #define
> > HIBERNATE_DELAY_SEC 60
>
> I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
> seconds, what’s the reasoning behind it? Is longer hibernation delay
> better? If so can we set it to INT_MAX (the max timeout allowed by
> WaitLatch()) in which case a worker in hibernation only relies on
> wakeup? I think it would be nice to run experiments to verify that the
> patch reduces power consumption while varying the value of
> HIBERNATE_DELAY_SEC.

Setting it to INT_MAX would be the same as not allowing a timeout,
which changes a lot of current behavior and makes it less robust.

Waking once per minute is what we do in various cases, so 60sec is a
good choice.

In the case of logical rep launcher we currently use 300sec, so using
60s would decrease this.

I don't see much difference between power consumption with timeouts of
60s and 300s.

In the latest patch, I chose 300s. Does anyone have an opinion on the
value here?

--
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Andres Freund
Date:
Hi,

On 2022-03-10 17:50:47 +0000, Simon Riggs wrote:
> On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote:
> 
> > > 1. Standardize the hibernation time at 60s, using a #define
> > > HIBERNATE_DELAY_SEC 60
> >
> > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
> > seconds, what’s the reasoning behind it? Is longer hibernation delay
> > better? If so can we set it to INT_MAX (the max timeout allowed by
> > WaitLatch()) in which case a worker in hibernation only relies on
> > wakeup? I think it would be nice to run experiments to verify that the
> > patch reduces power consumption while varying the value of
> > HIBERNATE_DELAY_SEC.
> 
> Setting it to INT_MAX would be the same as not allowing a timeout,
> which changes a lot of current behavior and makes it less robust.

Most of these timeouts are a bad idea and should not exist. We repeatedly have
had bugs where we were missing wakeups etc but those bugs were harder to
notice because of timeouts.  I'm against entrenching this stuff further.

Greetings,

Andres Freund



Re: Reducing power consumption on idle servers

From
Andres Freund
Date:
On 2022-02-21 21:04:19 +0000, Simon Riggs wrote:
> On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote:
> 
> > Shouldn't the comment be "with work_done=false" ?
> 
> Good catch, thanks.
> 
> I've also added docs to say that "promote_trigger_file" is now
> deprecated. There were no tests for that functionality, so just as
> well it is being removed.

Doesn't pass tests, and hasn't for weeks from what I can see:
https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153

Marked as waiting-on-author.

- Andres



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Tue, 22 Mar 2022 at 00:54, Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-02-21 21:04:19 +0000, Simon Riggs wrote:
> > On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote:
> >
> > > Shouldn't the comment be "with work_done=false" ?
> >
> > Good catch, thanks.
> >
> > I've also added docs to say that "promote_trigger_file" is now
> > deprecated. There were no tests for that functionality, so just as
> > well it is being removed.
>
> Doesn't pass tests, and hasn't for weeks from what I can see:
https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153
>
> Marked as waiting-on-author.

Hmm, just the not-in-sample test again. Fixed by marking GUC_NOT_IN_SAMPLE.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Kyotaro Horiguchi
Date:
At Thu, 10 Mar 2022 11:45:10 -0800, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2022-03-10 17:50:47 +0000, Simon Riggs wrote:
> > On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote:
> > 
> > > > 1. Standardize the hibernation time at 60s, using a #define
> > > > HIBERNATE_DELAY_SEC 60
> > >
> > > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
> > > seconds, what’s the reasoning behind it? Is longer hibernation delay
> > > better? If so can we set it to INT_MAX (the max timeout allowed by
> > > WaitLatch()) in which case a worker in hibernation only relies on
> > > wakeup? I think it would be nice to run experiments to verify that the
> > > patch reduces power consumption while varying the value of
> > > HIBERNATE_DELAY_SEC.
> > 
> > Setting it to INT_MAX would be the same as not allowing a timeout,
> > which changes a lot of current behavior and makes it less robust.
> 
> Most of these timeouts are a bad idea and should not exist. We repeatedly have
> had bugs where we were missing wakeups etc but those bugs were harder to

I basically agree to this.

> notice because of timeouts.  I'm against entrenching this stuff further.

For example, pgarch.c theoretically doesn't need hibernate, since it
has an apparent trigger event and a terminal condition. What we need
to do about archiver is to set timeout only when it didn't reach the
lastest finished segment at an iteration. (this might need additional
shared memory use, though..)

I'm not sure about bgwriter, walwriter and logical replication stuff...

About walreciver, 

-        if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+        if ((now - startTime) > wal_receiver_timeout)

This is simply a misuse of the timeout. WALRCV_STARTP_TIMEOUT is not
used for a sleep and irrelevant to receiving data from the peer
walsender.  wal_receiver_timeout's default is 60s but we don't assume
that walreceiver takes that long time to start up. (I think Thomas is
wroking on the walreceiver timeout stuff.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Thu, 24 Mar 2022 at 07:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> > Most of these timeouts are a bad idea and should not exist. We repeatedly have
> > had bugs where we were missing wakeups etc but those bugs were harder to
>
> I basically agree to this.

As a general point, maybe. But we have a lot of procs doing a lot of
polling and we are unlikely to change that anytime soon.

> > notice because of timeouts.  I'm against entrenching this stuff further.
>
> For example, pgarch.c theoretically doesn't need hibernate, since it
> has an apparent trigger event and a terminal condition. What we need
> to do about archiver is to set timeout only when it didn't reach the
> lastest finished segment at an iteration. (this might need additional
> shared memory use, though..)

archiver is not the important thing here. If there is objection to
that section of code we can remove that.

> I'm not sure about bgwriter, walwriter and logical replication stuff...

I am sure. bgwriter, walwriter and logical worker need action from us
to allow them to hibernate in a sensible way that allows powersaving.

> I think Thomas is wroking on the walreceiver timeout stuff.

Yes, he is. I have no problem going with what he advocates, if others agree.

However, that fix does not solve the whole problem, since many other
procs also need fixing.


The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Please don't reject the whole patch because you disagree with part
(3), it is not that important.

Thanks

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Robert Haas
Date:
On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> The proposals of this patch are the following, each of which can be
> independently accepted/rejected:
> 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> (directly affects powersave)
> 2. deprecate promote_trigger_file, which then allows us to fix the
> sleep for startup process (directly affects powersave)
> 3. treat hibernation in all procs the same, for simplicity, and to
> make sure we don't forget one later
> 4. provide a design pattern for background worker extensions to
> follow, so as to encourage powersaving

Unfortunately, the patch isn't split in a way that corresponds to this
division. I think I'm +1 on change #2 -- deprecating
promote_trigger_file seems like a good idea to me independently of any
power-saving considerations. But I'm not sure that I am on board with
any of the other changes. I do agree with the basic goal of trying to
reduce unnecessary wakeups, but I think the rest of the patch is
taking a bit of a one-size-fits-all approach where I think that we
might want to be more nuanced. I think there are a couple of different
kinds of cases here.

In some places, like DetermineSleepTime(), we're already willing to
sleep for pretty long periods of time, like a minute. I think in those
cases we could consider doing nothing, on the theory that one wakeup
per minute is already not very much. If we do want to do something, I
think it should be in the direction of convincing ourselves that we
don't need a timeout at all. Having a timeout is a bit like insurance:
it guarantees that if for some reason the event by which we expect to
be awoken doesn't actually wake us up even though something meaningful
has happened, we won't hang forever. But if we think a wakeup per
minute is meaningful and we don't think there's any possibility of a
missed wakeup, let's just wait forever. I don't think we'll avoid many
user complaints by recovering from a missed wakeup in just under an
hour.

In other places, like WalWriterMain, we're basically polling. One
question in these kinds of cases is whether we can avoid polling in
favor of having some other process wake us up if the event that we
care about happens. This is unlikely to be practical in all cases -
e.g. we can't wait for a promotion trigger file to show up unless we
want to use inotify or something like that. However, we may be able to
avoid polling in some instances. When we can't, then I think it makes
sense to increase the wait time when the system appears to be idle. In
that subset of cases - we're polling and we can't avoid polling - this
kind of approach seems fairly reasonable.

I do have some concerns about the idea that the right strategy in
general, or even in particular cases, is to multiply by 50. This patch
hasn't invented that problem; it's already there. My concern is that
multiplying a small number by 50 seems very different than multiplying
a large number by 50. If we normally wait for 100ms before checking
for something to happen, and we wait for 5s, it's probably not going
to be a huge issue, because 5s is still a short amount of time for
human beings. If we normally wait for a minute before checking for
something to happen and we wait for 50 minutes, that's much more
likely to make a human being unhappy. Therefore, it's very unclear to
me that those cases should be treated the same way.

There's a more technical issue with this strategy, too: if we multiply
both short and long timeouts by 50, I think we are going to get pretty
much the same result as if we only multiply the short timeouts by 50.
Why even bother reducing the frequency of wakeups from minutes to
hours if we're elsewhere reducing the frequency from seconds to
minutes? If process A is still waking up every minute, putting process
B in the deep freeze isn't going to do a whole lot in terms of letting
the system go into any kind of deeper sleep.

All in all I feel that encouraging developers to adopt a
multiply-by-50 rule in all cases seems too simplistic to me. It may be
better than what we're doing right now, but it doesn't really seem
like the right policy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > The proposals of this patch are the following, each of which can be
> > independently accepted/rejected:
> > 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> > (directly affects powersave)
> > 2. deprecate promote_trigger_file, which then allows us to fix the
> > sleep for startup process (directly affects powersave)
> > 3. treat hibernation in all procs the same, for simplicity, and to
> > make sure we don't forget one later
> > 4. provide a design pattern for background worker extensions to
> > follow, so as to encourage powersaving
>
> Unfortunately, the patch isn't split in a way that corresponds to this
> division. I think I'm +1 on change #2 -- deprecating
> promote_trigger_file seems like a good idea to me independently of any
> power-saving considerations. But I'm not sure that I am on board with
> any of the other changes.

OK, so change (2) is good. Separate patch attached.

> I do agree with the basic goal of trying to
> reduce unnecessary wakeups, but I think the rest of the patch is
> taking a bit of a one-size-fits-all approach where I think that we
> might want to be more nuanced. I think there are a couple of different
> kinds of cases here.

OK, so you disagree with (3) and probably (4). No problem.

What about (1)? That directly affects the powersave capability. I
didn't read anything specific to that.

If we don't fix (1) as well, the changes for startup and walreceiver
will be ineffective for powersaving.

What changes will be acceptable for bgwriter, walwriter and logical worker?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Robert Haas
Date:
On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> What about (1)? That directly affects the powersave capability. I
> didn't read anything specific to that.
>
> If we don't fix (1) as well, the changes for startup and walreceiver
> will be ineffective for powersaving.
>
> What changes will be acceptable for bgwriter, walwriter and logical worker?

Hmm, I think it would be fine to introduce some kind of hibernation
mechanism for logical workers. bgwriter and wal writer already have a
hibernation mechanism, so I'm not sure what your concern is there
exactly. In your initial email you said you weren't proposing changes
there, but maybe that changed somewhere in the course of the
subsequent discussion. If you could catch me up on your present
thinking that would be helpful.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Reducing power consumption on idle servers

From
Ian Lawrence Barwick
Date:
2022年3月25日(金) 1:03 Simon Riggs <simon.riggs@enterprisedb.com>:
>
> On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > > The proposals of this patch are the following, each of which can be
> > > independently accepted/rejected:
> > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> > > (directly affects powersave)
> > > 2. deprecate promote_trigger_file, which then allows us to fix the
> > > sleep for startup process (directly affects powersave)
> > > 3. treat hibernation in all procs the same, for simplicity, and to
> > > make sure we don't forget one later
> > > 4. provide a design pattern for background worker extensions to
> > > follow, so as to encourage powersaving
> >
> > Unfortunately, the patch isn't split in a way that corresponds to this
> > division. I think I'm +1 on change #2 -- deprecating
> > promote_trigger_file seems like a good idea to me independently of any
> > power-saving considerations. But I'm not sure that I am on board with
> > any of the other changes.
>
> OK, so change (2) is good. Separate patch attached.

Hi

cfbot reports the patch no longer applies [1].  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1] http://cfbot.cputube.org/patch_40_3566.log

Thnks

Ian Barwick



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Thu, 24 Mar 2022 at 16:02, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > > The proposals of this patch are the following, each of which can be
> > > independently accepted/rejected:
> > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> > > (directly affects powersave)
> > > 2. deprecate promote_trigger_file, which then allows us to fix the
> > > sleep for startup process (directly affects powersave)
> > > 3. treat hibernation in all procs the same, for simplicity, and to
> > > make sure we don't forget one later
> > > 4. provide a design pattern for background worker extensions to
> > > follow, so as to encourage powersaving
> >
> > Unfortunately, the patch isn't split in a way that corresponds to this
> > division. I think I'm +1 on change #2 -- deprecating
> > promote_trigger_file seems like a good idea to me independently of any
> > power-saving considerations. But I'm not sure that I am on board with
> > any of the other changes.
>
> OK, so change (2) is good. Separate patch attached.

Thanks to Ian for prompting me to pick up this thread again; apologies
for getting distracted.

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

Other points will be discussed in another branch of this thread.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> The attached patch is a reduced version of the original. It covers only:
> * deprecation of the promote_trigger_file - there are no tests that
> use that, hence why there is no test coverage for the patch
> * changing the sleep time of the startup process to 60s
> * docs and comments

LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
completely idle recovery process looks like:

kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)

Presumably it would have no timeout at all in the next release.

[1] https://www.postgresql.org/message-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sun, 13 Nov 2022 at 21:28, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > The attached patch is a reduced version of the original. It covers only:
> > * deprecation of the promote_trigger_file - there are no tests that
> > use that, hence why there is no test coverage for the patch
> > * changing the sleep time of the startup process to 60s
> > * docs and comments
>
> LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
> completely idle recovery process looks like:
>
> kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
> kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
> kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
>
> Presumably it would have no timeout at all in the next release.
>
> [1] https://www.postgresql.org/message-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com

Clearly, I haven't been watching Hackers! Thanks for the nudge.

See if this does the trick?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sun, 13 Nov 2022 at 23:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Sun, 13 Nov 2022 at 21:28, Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > > The attached patch is a reduced version of the original. It covers only:
> > > * deprecation of the promote_trigger_file - there are no tests that
> > > use that, hence why there is no test coverage for the patch
> > > * changing the sleep time of the startup process to 60s
> > > * docs and comments
> >
> > LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
> > completely idle recovery process looks like:
> >
> > kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
> > kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
> > kevent(8,0x0,0,{ },1,{ 60.000000000 })           = 0 (0x0)
> >
> > Presumably it would have no timeout at all in the next release.
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com
>
> Clearly, I haven't been watching Hackers! Thanks for the nudge.
>
> See if this does the trick?

Looks like the SIGALRM wakeups will be silenced on the other thread,
so we can just revert back to using v6 of this patch.

Reposting v6 now so that patch tester doesn't think this has failed
when the patch on other thread gets applied.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> Reposting v6 now so that patch tester doesn't think this has failed
> when the patch on other thread gets applied.

Intention of the patch, that is, to get rid of promote_trigger_file
GUC sometime in future, looks good to me. However, the timeout change
to 60 sec from 5 sec seems far-reaching. While it discourages the use
of the GUC, it can impact many existing production servers that still
rely on promote_trigger_file as it can increase the failover times
impacting SLAs around failover.

How about retaining 5 sec as-is and adding a WARNING in
promote_trigger_file check/assign and in show GUC, in
CheckForStandbyTrigger() whenever PromoteTriggerFile is detected and
specifying about the depreciation in GUC's description?

+                     * to react to a trigger file.  Direct use of trigger file
+                     * is now deprecated and the promote_trigger_file will be
+                     * removed in a later release.
I think, adding 'Direct use of trigger file .....' in a next line that
starts with XXX (typically, this represents a TODO item) is good, no?

Also, do we need to add a TODO in postgresql wiki
(https://wiki.postgresql.org/wiki/Todo), possibly under a new section
'Deprecated Features' or 'Features To-Be-Removed In Near Future' or
some other (hm, it seems too vague, but it starts to track such
deprecated items), to not miss on removing the promote_trigger_file in
future releases?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> >
> > Reposting v6 now so that patch tester doesn't think this has failed
> > when the patch on other thread gets applied.
>
> Intention of the patch, that is, to get rid of promote_trigger_file
> GUC sometime in future, looks good to me. However, the timeout change
> to 60 sec from 5 sec seems far-reaching. While it discourages the use
> of the GUC, it can impact many existing production servers that still
> rely on promote_trigger_file as it can increase the failover times
> impacting SLAs around failover.

The purpose of 60s is to allow for power reduction, so 5s won't do.

promote_trigger_file is not tested and there are better ways, so
deprecating it in this release is fine.

Anyone that relies on it can update their mechanisms to a supported
one with a one-line change. Realistically, anyone using it won't be on
the latest release anyway, at least for a long time, since if they use
manual methods then they are well behind the times.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Wed, Nov 16, 2022 at 8:35 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > >
> > > Reposting v6 now so that patch tester doesn't think this has failed
> > > when the patch on other thread gets applied.
> >
> > Intention of the patch, that is, to get rid of promote_trigger_file
> > GUC sometime in future, looks good to me. However, the timeout change
> > to 60 sec from 5 sec seems far-reaching. While it discourages the use
> > of the GUC, it can impact many existing production servers that still
> > rely on promote_trigger_file as it can increase the failover times
> > impacting SLAs around failover.
>
> The purpose of 60s is to allow for power reduction, so 5s won't do.

I understand. I know it's a bit hard to measure the power savings, I'm
wondering if there's any info, maybe not necessarily related to
postgres, but in general how much power gets saved if a certain number
of waits/polls/system calls are reduced.

> promote_trigger_file is not tested and there are better ways, so
> deprecating it in this release is fine.

Hm, but..

> Anyone that relies on it can update their mechanisms to a supported
> one with a one-line change. Realistically, anyone using it won't be on
> the latest release anyway, at least for a long time, since if they use
> manual methods then they are well behind the times.

I may be overly pessimistic here - the change from 5 sec to 60 sec for
detecting promote_trigger_file will have a direct impact on failovers
I believe. After upgrading to the version with 60 sec waiting,
there'll be an increase in failover times if the developers miss to
see the deprecation info about promote_trigger_file. I'm not sure
what's the best way to discourage using promote_trigger_file. Maybe
the change from 5 sec to 60 sec is the way. I'm not really sure.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Thu, 17 Nov 2022 at 07:36, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> > promote_trigger_file is not tested and there are better ways, so
> > deprecating it in this release is fine.
>
> Hm, but..
>
> > Anyone that relies on it can update their mechanisms to a supported
> > one with a one-line change. Realistically, anyone using it won't be on
> > the latest release anyway, at least for a long time, since if they use
> > manual methods then they are well behind the times.
>
> I may be overly pessimistic here - the change from 5 sec to 60 sec for
> detecting promote_trigger_file will have a direct impact on failovers
> I believe.

No, it will have a direct effect only on people using promote_trigger_file
who do not read and act upon the deprecation notice before upgrading
by making a one line change to their failover scripts.

Since pretty much everyone doing HA uses external HA software (cloud
or otherwise) this shouldn't affect anyone.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Robert Haas
Date:
On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> No, it will have a direct effect only on people using promote_trigger_file
> who do not read and act upon the deprecation notice before upgrading
> by making a one line change to their failover scripts.

TBH, I wonder if we shouldn't just nuke promote_trigger_file.
pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
even if we haven't said promote_trigger_file was deprecated, it hasn't
been the state-of-the-art way of doing this in a really long time. If
we think that there are still a lot of people using it, or if popular
tools are relying on it, then perhaps a deprecation period is
warranted anyway. But I think we should at least consider the
possibility that it's OK to just get rid of it right away.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Reducing power consumption on idle servers

From
Andres Freund
Date:
Hi,

On 2022-11-17 13:06:23 +0530, Bharath Rupireddy wrote:
> I understand. I know it's a bit hard to measure the power savings, I'm
> wondering if there's any info, maybe not necessarily related to
> postgres, but in general how much power gets saved if a certain number
> of waits/polls/system calls are reduced.

It's heavily hardware and hardware settings dependent.

On some systems you can get an *approximate* idea of the power usage even
while plugged in. On others you can only see it while on battery power.

On systems with RAPL support (most Intel and I think newer AMD CPUs) you can
query power usage with:
  powerstat -R 1 (adding -D provides a bit more detail)

But note that RAPL typically severely undercounts power usage because it
doesn't cover a lot of sources of power usage (display, sometimes memory,
all other peripherals).

Sometimes powerstat -R -D can split power usage up more granularly,
e.g. between the different CPU sockets and memory.


On laptops you can often measure the discharge rate when not plugged in, with
powerstat -d 0 1. But IME the latency till the values update makes it harder
to interpret.

On some workstations / servers you can read the power usage via ACPI. E.g. on
my workstation 'sensors' shows a power_meter-acpi-0


As an example of the difference it can make, here's the output of
  powerstat -D 1
on my laptop.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
16:41:55   0.1   0.0   0.1  99.8   0.0    1    403    246    1    0    1   2.62
16:41:56   0.1   0.0   0.1  99.8   0.0    1    357    196    1    0    1   2.72
16:41:57   0.1   0.0   0.1  99.6   0.2    1    510    231    4    0    4   2.64
16:41:58   0.1   0.0   0.1  99.9   0.0    2   1350    758   64   62   63   4.06
16:41:59   0.3   0.0   1.0  98.7   0.0    2   4166   2406  244  243  244   7.20
16:42:00   0.2   0.0   0.7  99.1   0.0    2   4203   2353  247  246  247   7.21
16:42:01   0.5   0.0   1.6  98.0   0.0    2   4079   2395  240  239  240   7.08
16:42:02   0.5   0.0   0.9  98.7   0.0    2   4097   2405  245  243  245   7.20
16:42:03   0.4   0.0   1.3  98.3   0.0    2   4117   2311  243  242  243   7.14
16:42:04   0.1   0.0   0.4  99.4   0.1    1   1721   1152   70   70   71   4.48
16:42:05   0.1   0.0   0.2  99.8   0.0    1    433    250    1    0    1   2.92
16:42:06   0.0   0.0   0.3  99.7   0.0    1    400    231    1    0    1   2.66

In the period of higher power etc usage I ran
while true;do sleep 0.001;done
and then interupted that after a bit with ctrl-c.

Same thing on my workstation (:

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
16:43:48   1.0   0.0   0.2  98.7   0.1    1   8218   2354    0    0    0  46.43
16:43:49   1.1   0.0   0.3  98.7   0.0    1   7866   2477    0    0    0  45.99
16:43:50   1.1   0.0   0.4  98.5   0.0    2   7753   2996    0    0    0  48.93
16:43:51   0.8   0.0   1.7  97.5   0.0    1   9395   5285    0    0    0  75.48
16:43:52   0.5   0.0   1.7  97.8   0.0    1   9141   4806    0    0    0  75.30
16:43:53   1.1   0.0   1.8  97.1   0.0    2  10065   5504    0    0    0  76.27
16:43:54   1.3   0.0   1.5  97.2   0.0    1  10962   5165    0    0    0  76.33
16:43:55   0.9   0.0   0.8  98.3   0.0    1   8452   3939    0    0    0  61.99
16:43:56   0.6   0.0   0.1  99.3   0.0    2   6541   1999    0    0    0  40.92
16:43:57   0.9   0.0   0.2  98.9   0.0    2   8199   2477    0    0    0  42.91


And if I query the power supply via ACPI instead:

while true;do sensors power_meter-acpi-0|grep power1|awk '{print $2, $3}';sleep 1;done
...
163.00 W
173.00 W
173.00 W
172.00 W
203.00 W
206.00 W
206.00 W
206.00 W
209.00 W
205.00 W
211.00 W
213.00 W
203.00 W
175.00 W
166.00 W
...

As you can see the difference is quite substantial. This is solely due to a
1ms sleep loop (albeit one where we fork a process after each sleep, which
likely is a good chunk of the CPU usage).

However, the difference in power usage will be far smaller if the system
already busy, because the CPU will already run at a high frequency.

Greetings,

Andres Freund



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Fri, Nov 18, 2022 at 6:29 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-11-17 13:06:23 +0530, Bharath Rupireddy wrote:
> > I understand. I know it's a bit hard to measure the power savings, I'm
> > wondering if there's any info, maybe not necessarily related to
> > postgres, but in general how much power gets saved if a certain number
> > of waits/polls/system calls are reduced.
>
> It's heavily hardware and hardware settings dependent.
>
> On some systems you can get an *approximate* idea of the power usage even
> while plugged in. On others you can only see it while on battery power.
>
> On systems with RAPL support (most Intel and I think newer AMD CPUs) you can
> query power usage with:
>   powerstat -R 1 (adding -D provides a bit more detail)
>
> But note that RAPL typically severely undercounts power usage because it
> doesn't cover a lot of sources of power usage (display, sometimes memory,
> all other peripherals).
>
> Sometimes powerstat -R -D can split power usage up more granularly,
> e.g. between the different CPU sockets and memory.
>
>
> On laptops you can often measure the discharge rate when not plugged in, with
> powerstat -d 0 1. But IME the latency till the values update makes it harder
> to interpret.
>
> On some workstations / servers you can read the power usage via ACPI. E.g. on
> my workstation 'sensors' shows a power_meter-acpi-0
>
>
> As an example of the difference it can make, here's the output of
>   powerstat -D 1
> on my laptop.
>
>   Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
> 16:41:55   0.1   0.0   0.1  99.8   0.0    1    403    246    1    0    1   2.62
> 16:41:56   0.1   0.0   0.1  99.8   0.0    1    357    196    1    0    1   2.72
> 16:41:57   0.1   0.0   0.1  99.6   0.2    1    510    231    4    0    4   2.64
> 16:41:58   0.1   0.0   0.1  99.9   0.0    2   1350    758   64   62   63   4.06
> 16:41:59   0.3   0.0   1.0  98.7   0.0    2   4166   2406  244  243  244   7.20
> 16:42:00   0.2   0.0   0.7  99.1   0.0    2   4203   2353  247  246  247   7.21
> 16:42:01   0.5   0.0   1.6  98.0   0.0    2   4079   2395  240  239  240   7.08
> 16:42:02   0.5   0.0   0.9  98.7   0.0    2   4097   2405  245  243  245   7.20
> 16:42:03   0.4   0.0   1.3  98.3   0.0    2   4117   2311  243  242  243   7.14
> 16:42:04   0.1   0.0   0.4  99.4   0.1    1   1721   1152   70   70   71   4.48
> 16:42:05   0.1   0.0   0.2  99.8   0.0    1    433    250    1    0    1   2.92
> 16:42:06   0.0   0.0   0.3  99.7   0.0    1    400    231    1    0    1   2.66
>
> In the period of higher power etc usage I ran
> while true;do sleep 0.001;done
> and then interupted that after a bit with ctrl-c.
>
> Same thing on my workstation (:
>
>   Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
> 16:43:48   1.0   0.0   0.2  98.7   0.1    1   8218   2354    0    0    0  46.43
> 16:43:49   1.1   0.0   0.3  98.7   0.0    1   7866   2477    0    0    0  45.99
> 16:43:50   1.1   0.0   0.4  98.5   0.0    2   7753   2996    0    0    0  48.93
> 16:43:51   0.8   0.0   1.7  97.5   0.0    1   9395   5285    0    0    0  75.48
> 16:43:52   0.5   0.0   1.7  97.8   0.0    1   9141   4806    0    0    0  75.30
> 16:43:53   1.1   0.0   1.8  97.1   0.0    2  10065   5504    0    0    0  76.27
> 16:43:54   1.3   0.0   1.5  97.2   0.0    1  10962   5165    0    0    0  76.33
> 16:43:55   0.9   0.0   0.8  98.3   0.0    1   8452   3939    0    0    0  61.99
> 16:43:56   0.6   0.0   0.1  99.3   0.0    2   6541   1999    0    0    0  40.92
> 16:43:57   0.9   0.0   0.2  98.9   0.0    2   8199   2477    0    0    0  42.91
>
>
> And if I query the power supply via ACPI instead:
>
> while true;do sensors power_meter-acpi-0|grep power1|awk '{print $2, $3}';sleep 1;done
> ...
> 163.00 W
> 173.00 W
> 173.00 W
> 172.00 W
> 203.00 W
> 206.00 W
> 206.00 W
> 206.00 W
> 209.00 W
> 205.00 W
> 211.00 W
> 213.00 W
> 203.00 W
> 175.00 W
> 166.00 W
> ...
>
> As you can see the difference is quite substantial. This is solely due to a
> 1ms sleep loop (albeit one where we fork a process after each sleep, which
> likely is a good chunk of the CPU usage).
>
> However, the difference in power usage will be far smaller if the system
> already busy, because the CPU will already run at a high frequency.

Thanks a lot for sharing the details.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Fri, Nov 18, 2022 at 2:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > No, it will have a direct effect only on people using promote_trigger_file
> > who do not read and act upon the deprecation notice before upgrading
> > by making a one line change to their failover scripts.
>
> TBH, I wonder if we shouldn't just nuke promote_trigger_file.
> pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
> even if we haven't said promote_trigger_file was deprecated, it hasn't
> been the state-of-the-art way of doing this in a really long time. If
> we think that there are still a lot of people using it, or if popular
> tools are relying on it, then perhaps a deprecation period is
> warranted anyway. But I think we should at least consider the
> possibility that it's OK to just get rid of it right away.

I agree with Robert here. It's better to do away with
promote_trigger_file, at least that's better than deprecating it now
and removing it somewhere in later releases. However, I'm a bit
worried about how it'll affect the tools/providers/extensions that
depend on it. And it turns out that this worry exists for every
feature that one thinks to get rid of.

If we go the route of doing away with promote_trigger_file, I think we
need to discuss it in a separate thread as the subject of this thread
doesn't match with what we're discussing here. This way, we'll get
thoughts from other hackers.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Fri, 18 Nov 2022 at 08:55, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> However, I'm a bit
> worried about how it'll affect the tools/providers/extensions that
> depend on it.

Who is that? Which ones depend upon it?

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Thu, 17 Nov 2022 at 20:38, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > No, it will have a direct effect only on people using promote_trigger_file
> > who do not read and act upon the deprecation notice before upgrading
> > by making a one line change to their failover scripts.
>
> TBH, I wonder if we shouldn't just nuke promote_trigger_file.
> pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
> even if we haven't said promote_trigger_file was deprecated, it hasn't
> been the state-of-the-art way of doing this in a really long time. If
> we think that there are still a lot of people using it, or if popular
> tools are relying on it, then perhaps a deprecation period is
> warranted anyway. But I think we should at least consider the
> possibility that it's OK to just get rid of it right away.

I agree. I can't see a reason to keep it anymore.

I'm nervous about not having any wakeup at all, but since we are
removing the parameter there is no other reason not to do as Andres
suggests.

New version attached, which assumes that the SIGALRMs are silenced on
the other thread.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> I agree. I can't see a reason to keep it anymore.

+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're

I think 'deprecated' usually implies that it still works but you
should avoid it.  I think you need something stronger.

> I'm nervous about not having any wakeup at all, but since we are
> removing the parameter there is no other reason not to do as Andres
> suggests.

Why?  If we're accidentally relying on this timeout for recovery to
not hang in some situation, that's a bug waiting to be discovered and
fixed and it won't be this patch's fault.

> New version attached, which assumes that the SIGALRMs are silenced on
> the other thread.

I tested this + Bharath's v5 from the other thread.  meson test
passes, and tracing the recovery process shows that it is indeed,
finally, completely idle.  Huzzah!



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Fri, 18 Nov 2022 at 20:26, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > I agree. I can't see a reason to keep it anymore.
>
> +    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
>
> I think 'deprecated' usually implies that it still works but you
> should avoid it.  I think you need something stronger.

Whisky? Or maybe just reword the sentence...

New version attached.

> > I'm nervous about not having any wakeup at all, but since we are
> > removing the parameter there is no other reason not to do as Andres
> > suggests.
>
> Why?  If we're accidentally relying on this timeout for recovery to
> not hang in some situation, that's a bug waiting to be discovered and
> fixed and it won't be this patch's fault.
>
> > New version attached, which assumes that the SIGALRMs are silenced on
> > the other thread.
>
> I tested this + Bharath's v5 from the other thread.  meson test
> passes, and tracing the recovery process shows that it is indeed,
> finally, completely idle.  Huzzah!

Thanks for testing!

Finally completely idle? Good to achieve that.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> New version attached.

Fix for doc xref

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Thu, 24 Mar 2022 at 16:21, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs

> > What changes will be acceptable for bgwriter, walwriter and logical worker?
>
> Hmm, I think it would be fine to introduce some kind of hibernation
> mechanism for logical workers. bgwriter and wal writer already have a
> hibernation mechanism, so I'm not sure what your concern is there
> exactly. In your initial email you said you weren't proposing changes
> there, but maybe that changed somewhere in the course of the
> subsequent discussion. If you could catch me up on your present
> thinking that would be helpful.

Now that we seem to have solved the problem for Startup process, let's
circle back to the others....

Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
At default values that is 5s, but could be much less. So we need to
move that up to 60s, same as others.

WALwriter also hibernates currently, but only at 25x the
wal_writer_delay. At default values that is 2.5s, but could be much
less. So we need to move that up to 60s, same as others. At the same
time, make sure that when we hibernate we use a new WaitEvent,
similarly to the way bgwriter reports its hibernation state (which
also helps test the patch).


As a 3rd patch, I will work on making logical workers hibernate.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Mon, Nov 21, 2022 at 3:35 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > New version attached.
>
> Fix for doc xref

I removed a stray variable declaration from xlogrecovery.h, and wrote
a draft commit message.  I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

(We could of course change it so that the timeout based wakeup only
happens if you have actually set promote_trigger_file, but I think
we've established that we just don't want the filesystem polling
feature so I'm whispering this in parentheses.)

Attachment

Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> As a 3rd patch, I will work on making logical workers hibernate.

Duelling patch warning: Nathan mentioned[1] that he's hacking on a
patch for that, along the lines of the recent walreceiver change IIUC.

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJGhX4r2LPUE3Oy9BX71Eum6PBcS8L3sJpScR9oKaTVaA%40mail.gmail.com



Re: Reducing power consumption on idle servers

From
Nathan Bossart
Date:
On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote:
> On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
>> As a 3rd patch, I will work on making logical workers hibernate.
> 
> Duelling patch warning: Nathan mentioned[1] that he's hacking on a
> patch for that, along the lines of the recent walreceiver change IIUC.

I coded something up last week, but, like the walreceiver patch, it caused
check-world to take much longer [0], and I haven't looked into whether it
could be easily fixed.  I'm hoping to make some time for this again in the
near future.

[0] https://postgr.es/m/20221113222644.GA1269110%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Laurenz Albe
Date:
On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> I'll wait 24 hours before committing, to
> provide a last chance for anyone who wants to complain about dropping
> promote_trigger_file.

Remove "promote_trigger_file"?  Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

Yours,
Laurenz Albe



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Mon, Nov 21, 2022 at 10:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > I'll wait 24 hours before committing, to
> > provide a last chance for anyone who wants to complain about dropping
> > promote_trigger_file.
>
> Remove "promote_trigger_file"?  Now I have never seen anybody use that
> parameter, but I don't think that it is a good idea to deviate from our
> usual standard of deprecating a feature for about five years before
> actually removing it.

I'm not sure what the guidelines are here, however years have gone by
since pg_ctl promote [1] in 2011 and pg_promote() [2] in 2018 were
added. With two such alternatives in place for many years, it was sort
of an undeclared deprecation of promote_trigger_file GUC. And the
changes required to move to newer ways from the GUC aren't that hard
for those who're still relying on the GUC. Therefore, I think it's now
time for us to do away with the GUC.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4695da5ae97bbb58d274887fd68edbe88d03ebcb
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10074651e3355e2405015f6253602be8344bc829

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Mon, Nov 21, 2022 at 2:43 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, Nov 21, 2022 at 3:35 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > > New version attached.
> >
> > Fix for doc xref
>
> I removed a stray variable declaration from xlogrecovery.h, and wrote
> a draft commit message.  I'll wait 24 hours before committing, to
> provide a last chance for anyone who wants to complain about dropping
> promote_trigger_file.
>
> (We could of course change it so that the timeout based wakeup only
> happens if you have actually set promote_trigger_file, but I think
> we've established that we just don't want the filesystem polling
> feature so I'm whispering this in parentheses.)

Thanks. The v11 patch mostly looks good to me and it passes cirrus-ci
tests - https://github.com/BRupireddy/postgres/tree/do_away_with_promote_trigger_file.

I have a comment:

-                     * Wait for more WAL to arrive. Time out after 5 seconds
-                     * to react to a trigger file promptly and to check if the
-                     * WAL receiver is still active.
+                     * Wait for more WAL to arrive, when we will be woken
+                     * immediately by the WAL receiver. Use of trigger file
+                     * via promote_trigger_file is now fully removed.
                      */
Do we need to introduce reference to removal of promote_trigger_file
in the code? If left as-is, it'll lie there for many years. Isn't it
enough to specify in appendix-obsolete-recovery-config.sgml?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > I'll wait 24 hours before committing, to
> > provide a last chance for anyone who wants to complain about dropping
> > promote_trigger_file.
>
> Remove "promote_trigger_file"?  Now I have never seen anybody use that
> parameter, but I don't think that it is a good idea to deviate from our
> usual standard of deprecating a feature for about five years before
> actually removing it.

We aren't removing the ability to promote, just enforcing a change to
a better mechanism, hence I don't see a reason for a long(er)
deprecation period than we have already had.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sun, 20 Nov 2022 at 22:55, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote:
> > On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> >> As a 3rd patch, I will work on making logical workers hibernate.
> >
> > Duelling patch warning: Nathan mentioned[1] that he's hacking on a
> > patch for that, along the lines of the recent walreceiver change IIUC.
>
> I coded something up last week, but, like the walreceiver patch, it caused
> check-world to take much longer [0], and I haven't looked into whether it
> could be easily fixed.  I'm hoping to make some time for this again in the
> near future.
>
> [0] https://postgr.es/m/20221113222644.GA1269110%40nathanxps13

OK, Nathan, will leave this one to you - remembering that we need to
fix ALL processes to get a useful power reduction when idle.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Laurenz Albe
Date:
On Mon, 2022-11-21 at 11:42 +0530, Bharath Rupireddy wrote:
> On Mon, Nov 21, 2022 at 10:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > 
> > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > > I'll wait 24 hours before committing, to
> > > provide a last chance for anyone who wants to complain about dropping
> > > promote_trigger_file.
> > 
> > Remove "promote_trigger_file"?  Now I have never seen anybody use that
> > parameter, but I don't think that it is a good idea to deviate from our
> > usual standard of deprecating a feature for about five years before
> > actually removing it.
> 
> I'm not sure what the guidelines are here, however years have gone by
> since pg_ctl promote [1] in 2011 and pg_promote() [2] in 2018 were
> added. With two such alternatives in place for many years, it was sort
> of an undeclared deprecation of promote_trigger_file GUC. And the
> changes required to move to newer ways from the GUC aren't that hard
> for those who're still relying on the GUC. Therefore, I think it's now
> time for us to do away with the GUC.

I disagree.  With the same argument, you could rip out "serial", since we
have supported identity columns since v11.

I understand the desire to avoid needless wakeups, but isn't it possible
to only perform the regular poll if "promote_trigger_file" is set?

Yours,
Laurenz Albe



Re: Reducing power consumption on idle servers

From
Laurenz Albe
Date:
On Mon, 2022-11-21 at 07:36 +0000, Simon Riggs wrote:
> On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > 
> > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > > I'll wait 24 hours before committing, to
> > > provide a last chance for anyone who wants to complain about dropping
> > > promote_trigger_file.
> > 
> > Remove "promote_trigger_file"?  Now I have never seen anybody use that
> > parameter, but I don't think that it is a good idea to deviate from our
> > usual standard of deprecating a feature for about five years before
> > actually removing it.
> 
> We aren't removing the ability to promote, just enforcing a change to
> a better mechanism, hence I don't see a reason for a long(er)
> deprecation period than we have already had.

We have had a deprecation period?  I looked at the documentation, but found
no mention of a deprecation.  How hard can it be to leave the GUC and only
poll for the existence of the file if it is set?

I personally don't need the GUC, and I know nobody who does, but I think
we should not be cavalier about introducing unnecessary compatibility breaks.

Yours,
Laurenz Albe



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Mon, 21 Nov 2022 at 08:40, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2022-11-21 at 07:36 +0000, Simon Riggs wrote:
> > On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > >
> > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > > > I'll wait 24 hours before committing, to
> > > > provide a last chance for anyone who wants to complain about dropping
> > > > promote_trigger_file.
> > >
> > > Remove "promote_trigger_file"?  Now I have never seen anybody use that
> > > parameter, but I don't think that it is a good idea to deviate from our
> > > usual standard of deprecating a feature for about five years before
> > > actually removing it.
> >
> > We aren't removing the ability to promote, just enforcing a change to
> > a better mechanism, hence I don't see a reason for a long(er)
> > deprecation period than we have already had.
>
> We have had a deprecation period?  I looked at the documentation, but found
> no mention of a deprecation.  How hard can it be to leave the GUC and only
> poll for the existence of the file if it is set?
>
> I personally don't need the GUC, and I know nobody who does,

Nobody else does either.

> I disagree.  With the same argument, you could rip out "serial", since we
> have supported identity columns since v11.

...and this is not a user facing change, only HA systems interface with this.

> but I think
> we should not be cavalier about introducing unnecessary compatibility breaks.

I agree we should not be cavalier, nor has anyone been so. The first
version of the patch was cautious on this very point, but since many
people think we should remove it, it is not a user facing feature and
nobody on this thread knows anybody or anything that uses it, I have
changed my mind and now think we should remove it.

We have two versions to choose from now
* v6 deprecates this option
* v10 removes this option
so it is no effort at all to choose either option.

The main issue is about reducing power usage, so please let's move to
commit something soon, one way or another.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Robert Haas
Date:
On Mon, Nov 21, 2022 at 3:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> We have had a deprecation period?  I looked at the documentation, but found
> no mention of a deprecation.  How hard can it be to leave the GUC and only
> poll for the existence of the file if it is set?
>
> I personally don't need the GUC, and I know nobody who does, but I think
> we should not be cavalier about introducing unnecessary compatibility breaks.

As a project, we have a problem with deprecation warnings, simply
because community consensus changes over time as people change their
minds and as participants come and go. Slapping a deprecation notice
on things saying that they will be removed in five years does not mean
they will actually get removed in five years, nor does the failure to
slap a deprecation notice on them mean that they won't be removed in
five years.  We have some instances of following through on such
promises, but plenty where what we said we were going to do and what
we actually did were quite different. Furthermore, even if we were
100% on follow-through on such things, I have no faith in the idea
that a mandatory five-year deprecation period in all cases would be
good for the project.

I think it's totally reasonable to phase out a little-used feature
that no one really cares about faster than a feature that is
heavily-relied upon and for which there are no reasonable
alternatives. Something like standard_conforming_strings or the switch
from md5 to SCRAM authentication affects not only database users but
also drivers and applications; time needs to be allowed for the effect
to seep through the whole ecosystem. Backward compatibility is really
important for the transition period. In this case, though, the
alternatives have already existed for a long time and we have reasons
to think most people have already been using them for a long time.
Moreover, if they haven't been, the transition shouldn't be too hard.
For there to be a problem here, we have to imagine a user who not
can't run pg_promote() or pg_ctl promote, who can only create a file,
and who furthermore needs that file to be located someplace other than
the data directory. I'm having a hard time envisioning that.

The reason that I pushed back -- not as successfully as I would have
liked -- on the changes to pg_stop_backup / pg_start_backup is that I
know there are people using the old method successfully, and it's not
just a 1:1 substitution. Here I don't, and it is. I'm totally open to
the feedback that such people exist and to hearing why adopting one of
the newer methods would be a problem for them, if that's the case. But
if there's no evidence that such people exist or that changing is a
problem for them, I don't think waiting 5 years on principle is good
for the project.

P.S. One notable example of how bad we are at these things is that
contrib/xml2 has been deprecated since PostgreSQL 8.3. The deprecation
notice says that it isn't needed any more because all of the
functionality is otherwise available, but some people think that's not
true so we haven't removed the module. An obvious alternative would be
to remove the deprecation notice but FWICT other people think that
getting rid of it is the right idea so we haven't done that either.
Whee.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Reducing power consumption on idle servers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> The reason that I pushed back -- not as successfully as I would have
> liked -- on the changes to pg_stop_backup / pg_start_backup is that I
> know there are people using the old method successfully, and it's not
> just a 1:1 substitution. Here I don't, and it is. I'm totally open to
> the feedback that such people exist and to hearing why adopting one of
> the newer methods would be a problem for them, if that's the case. But
> if there's no evidence that such people exist or that changing is a
> problem for them, I don't think waiting 5 years on principle is good
> for the project.

We make incompatible changes in every release; see the release notes.
Unless somebody can give a plausible use-case where this'd be a
difficult change to deal with, I concur that we don't need to
deprecate it ahead of time.

            regards, tom lane



Re: Reducing power consumption on idle servers

From
Laurenz Albe
Date:
On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > The reason that I pushed back -- not as successfully as I would have
> > liked -- on the changes to pg_stop_backup / pg_start_backup is that I
> > know there are people using the old method successfully, and it's not
> > just a 1:1 substitution. Here I don't, and it is. I'm totally open to
> > the feedback that such people exist and to hearing why adopting one of
> > the newer methods would be a problem for them, if that's the case. But
> > if there's no evidence that such people exist or that changing is a
> > problem for them, I don't think waiting 5 years on principle is good
> > for the project.
> 
> We make incompatible changes in every release; see the release notes.
> Unless somebody can give a plausible use-case where this'd be a
> difficult change to deal with, I concur that we don't need to
> deprecate it ahead of time.

Since I am the only one that seems to worry, I'll shut up.  You are probably
right that it the feature won't be missed by many users.

Yours,
Laurenz Albe



Re: Reducing power consumption on idle servers

From
Ian Lawrence Barwick
Date:
2022年11月22日(火) 5:50 Laurenz Albe <laurenz.albe@cybertec.at>:
>
> On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > The reason that I pushed back -- not as successfully as I would have
> > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I
> > > know there are people using the old method successfully, and it's not
> > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to
> > > the feedback that such people exist and to hearing why adopting one of
> > > the newer methods would be a problem for them, if that's the case. But
> > > if there's no evidence that such people exist or that changing is a
> > > problem for them, I don't think waiting 5 years on principle is good
> > > for the project.
> >
> > We make incompatible changes in every release; see the release notes.
> > Unless somebody can give a plausible use-case where this'd be a
> > difficult change to deal with, I concur that we don't need to
> > deprecate it ahead of time.
>
> Since I am the only one that seems to worry, I'll shut up.  You are probably
> right that it the feature won't be missed by many users.

FWIW, though I prefer to err on the side of caution when removing features
from anything, I am struggling to remember ever having used
"promote_trigger_file"
 (or "trigger_file" as it was in Pg11 and earlier); grepping my private notes
brings up a single example from ca. 2012 when I appear to have been
experimenting with replication.

On a quick web search, a large part of the results are related to its change
to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files;
most of the rest seem to be blog articles/tutorials on setting up replication.

Regards

Ian Barwick



Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Sun, Nov 27, 2022 at 6:15 PM Ian Lawrence Barwick <barwick@gmail.com> wrote:
> 2022年11月22日(火) 5:50 Laurenz Albe <laurenz.albe@cybertec.at>:
> > On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:
> > > Robert Haas <robertmhaas@gmail.com> writes:
> > > > The reason that I pushed back -- not as successfully as I would have
> > > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I
> > > > know there are people using the old method successfully, and it's not
> > > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to
> > > > the feedback that such people exist and to hearing why adopting one of
> > > > the newer methods would be a problem for them, if that's the case. But
> > > > if there's no evidence that such people exist or that changing is a
> > > > problem for them, I don't think waiting 5 years on principle is good
> > > > for the project.
> > >
> > > We make incompatible changes in every release; see the release notes.
> > > Unless somebody can give a plausible use-case where this'd be a
> > > difficult change to deal with, I concur that we don't need to
> > > deprecate it ahead of time.
> >
> > Since I am the only one that seems to worry, I'll shut up.  You are probably
> > right that it the feature won't be missed by many users.
>
> FWIW, though I prefer to err on the side of caution when removing features
> from anything, I am struggling to remember ever having used
> "promote_trigger_file"
>  (or "trigger_file" as it was in Pg11 and earlier); grepping my private notes
> brings up a single example from ca. 2012 when I appear to have been
> experimenting with replication.
>
> On a quick web search, a large part of the results are related to its change
> to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files;
> most of the rest seem to be blog articles/tutorials on setting up replication.

Thanks Ian, Laurenz, Robert, Tom for the discussion.  Seems like we're
all set then.  Sorry for delaying over trivialities, but I have a
couple more comments about the documentation before committing:

"The trigger_file and promote_trigger_file have been removed." was
missing some words.  I've also added a sentence to say which releases
were involved, to make it like nearby paragraphs about other obsolete
stuff.

The funny thing about that "obsolete" appendix is that it's intended
as a URL-preserving way to document the recovery.conf file's demise in
r12.  It doesn't look like the right place to document ongoing changes
to recovery-related GUCs in general.  In this particular case it's
sort of OK because the old trigger_file GUC was indeed in
recovery.conf, so it's not completely irrelevant.  So maybe it's OK?
Perhaps in future, in a separate commit, we could have a new appendix
"Obsolete settings" that has an alphabetical list of the fallen?

The main documentation of pg_promote() etc now has "The parameter
<varname>promote_trigger_file</varname> has been removed" in the
places where the GUC was previously mentioned.  Perhaps we should just
remove the mentions completely (it's somehow either too much and not
enough information), but I'm OK with leaving those in for now until
some better place exists for historical notes.

So this is the version I will push unless someone sees anything more
to tweak about the documentation.

Attachment

Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Mon, Nov 28, 2022 at 5:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> "The trigger_file and promote_trigger_file have been removed." was
> missing some words.  I've also added a sentence to say which releases
> were involved, to make it like nearby paragraphs about other obsolete
> stuff.

LGTM.

> The funny thing about that "obsolete" appendix is that it's intended
> as a URL-preserving way to document the recovery.conf file's demise in
> r12.  It doesn't look like the right place to document ongoing changes
> to recovery-related GUCs in general.  In this particular case it's
> sort of OK because the old trigger_file GUC was indeed in
> recovery.conf, so it's not completely irrelevant.  So maybe it's OK?
> Perhaps in future, in a separate commit, we could have a new appendix
> "Obsolete settings" that has an alphabetical list of the fallen?

Tracking down history to figure out and add all the removed/renamed
settings under a new appendix seems a tedious task. The best is to
look into corresponding release notes to figure out which ones have
been removed/renamed/altered IMO.

FWIW, it's been a while (3 major releases have happened) since
recovery.conf was removed, I think we need to do away with
appendix-obsolete-recovery-config.sgml someday.

> The main documentation of pg_promote() etc now has "The parameter
> <varname>promote_trigger_file</varname> has been removed" in the
> places where the GUC was previously mentioned.  Perhaps we should just
> remove the mentions completely (it's somehow either too much and not
> enough information), but I'm OK with leaving those in for now until
> some better place exists for historical notes.

+1 for the way it is in the v12 patch i.e. specifying removal of it in
the docs. However, I don't see any reason to keep the traces of it in
the code, can we remove "Use of trigger file via promote_trigger_file
is now fully removed." sentence in the code below?

+                     * Wait for more WAL to arrive, when we will be woken
+                     * immediately by the WAL receiver. Use of trigger file
+                     * via promote_trigger_file is now fully removed.
                      */

> So this is the version I will push unless someone sees anything more
> to tweak about the documentation.

Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Robert Haas
Date:
On Sun, Nov 27, 2022 at 6:57 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> The main documentation of pg_promote() etc now has "The parameter
> <varname>promote_trigger_file</varname> has been removed" in the
> places where the GUC was previously mentioned.  Perhaps we should just
> remove the mentions completely (it's somehow either too much and not
> enough information), but I'm OK with leaving those in for now until
> some better place exists for historical notes.

I think we should remove those mentions. Otherwise the documentation
just collects mentions of an increasing number of things that are no
longer relevant.

And, as you say, we can't presume that future readers would know what
promote_trigger_file even is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Reducing power consumption on idle servers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I think we should remove those mentions. Otherwise the documentation
> just collects mentions of an increasing number of things that are no
> longer relevant.

Yeah, I think the same.  There will be a release-note entry, and
I don't object to having something about it in appendix-obsolete.sgml;
but we shouldn't clutter the main docs with it.

            regards, tom lane



Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
I found some more comments and some documentation to word-smith very
lightly, and pushed.  The comments were stray references to the
trigger file.  It's
a little confusing because the remaining mechanism also uses a file,
but it uses a signal first so seems better to refer to promotion
requests rather than files.

/me wonders how many idle servers there are in the world



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Mon, 28 Nov 2022 at 23:16, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> I found some more comments and some documentation to word-smith very
> lightly, and pushed.

Thanks

> The comments were stray references to the
> trigger file.  It's
> a little confusing because the remaining mechanism also uses a file,
> but it uses a signal first so seems better to refer to promotion
> requests rather than files.

..and again

> /me wonders how many idle servers there are in the world

My estimate is there are 100 million servers in use worldwide, with
only about 1% of them on a continuously busy duty cycle and most of
them not in the cloud.

If we guess that we save 10W when idle, then that saves some proportion of a GW.

It's not a huge contribution to the effort, but if by doing this we
inspire others to do the same, we may yet make a difference.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Sun, 20 Nov 2022 at 20:00, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 24 Mar 2022 at 16:21, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs
>
> > > What changes will be acceptable for bgwriter, walwriter and logical worker?
> >
> > Hmm, I think it would be fine to introduce some kind of hibernation
> > mechanism for logical workers. bgwriter and wal writer already have a
> > hibernation mechanism, so I'm not sure what your concern is there
> > exactly. In your initial email you said you weren't proposing changes
> > there, but maybe that changed somewhere in the course of the
> > subsequent discussion. If you could catch me up on your present
> > thinking that would be helpful.
>
> Now that we seem to have solved the problem for Startup process, let's
> circle back to the others....

Thanks for committing changes to the startup process.

> Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
> At default values that is 5s, but could be much less. So we need to
> move that up to 60s, same as others.
>
> WALwriter also hibernates currently, but only at 25x the
> wal_writer_delay. At default values that is 2.5s, but could be much
> less. So we need to move that up to 60s, same as others. At the same
> time, make sure that when we hibernate we use a new WaitEvent,
> similarly to the way bgwriter reports its hibernation state (which
> also helps test the patch).

Re-attaching patch for bgwriter and walwriter, so it is clear this is
not yet committed.

I've added this to the next CF, since the entry was closed when the
startup patch was committed.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> Re-attaching patch for bgwriter and walwriter, so it is clear this is
> not yet committed.

I'm just curious, and not suggesting that 60s wakeups are a problem
for the polar ice caps, but why even time out at all?  Are the latch
protocols involved not reliable enough?  At a guess from a quick
glance, the walwriter's is but maybe the bgwriter could miss a wakeup
as it races against StrategyGetBuffer(), which means you might stay
asleep until the *next* buffer allocation, but that's already true I
think, and a 60s timeout is not much of a defence.



Re: Reducing power consumption on idle servers

From
Simon Riggs
Date:
On Wed, 30 Nov 2022 at 03:50, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > Re-attaching patch for bgwriter and walwriter, so it is clear this is
> > not yet committed.
>
> I'm just curious, and not suggesting that 60s wakeups are a problem
> for the polar ice caps, but why even time out at all?  Are the latch
> protocols involved not reliable enough?  At a guess from a quick
> glance, the walwriter's is but maybe the bgwriter could miss a wakeup
> as it races against StrategyGetBuffer(), which means you might stay
> asleep until the *next* buffer allocation, but that's already true I
> think, and a 60s timeout is not much of a defence.

That sounds reasonable.

It does sound like we agree that the existing behavior of waking up
every 5s or 2.5s is not good. I hope you will act to improve that.

The approach taken in this patch, and others of mine, has been to
offer a minimal change that achieves the objective of lengthy
hibernation to save power.

Removing the timeout entirely may not work in other circumstances I
have not explored. Doing that requires someone to check it actually
works, and for others to believe that check has occurred. For me, that
is too time consuming to actually happen in this dev cycle, and time
is part of the objective since perfect designs yet with unreleased
code have no utility.

<Simon enters lengthy hibernation>

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Wed, Nov 30, 2022 at 7:40 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> On Wed, 30 Nov 2022 at 03:50, Thomas Munro <thomas.munro@gmail.com> wrote:
> > I'm just curious, and not suggesting that 60s wakeups are a problem
> > for the polar ice caps, but why even time out at all?  Are the latch
> > protocols involved not reliable enough?  At a guess from a quick
> > glance, the walwriter's is but maybe the bgwriter could miss a wakeup
> > as it races against StrategyGetBuffer(), which means you might stay
> > asleep until the *next* buffer allocation, but that's already true I
> > think, and a 60s timeout is not much of a defence.
>
> That sounds reasonable.
>
> It does sound like we agree that the existing behavior of waking up
> every 5s or 2.5s is not good. I hope you will act to improve that.
>
> The approach taken in this patch, and others of mine, has been to
> offer a minimal change that achieves the objective of lengthy
> hibernation to save power.
>
> Removing the timeout entirely may not work in other circumstances I
> have not explored. Doing that requires someone to check it actually
> works, and for others to believe that check has occurred. For me, that
> is too time consuming to actually happen in this dev cycle, and time
> is part of the objective since perfect designs yet with unreleased
> code have no utility.
>
> <Simon enters lengthy hibernation>

<Thomas returns from aestivation>

Yeah, I definitely want to fix it.  I just worry that 60s is so long
that it also needs that analysis work to be done to explain that it's
OK that we're a bit sloppy on noticing when to wake up, at which point
you might as well go to infinity.



Re: Reducing power consumption on idle servers

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Yeah, I definitely want to fix it.  I just worry that 60s is so long
> that it also needs that analysis work to be done to explain that it's
> OK that we're a bit sloppy on noticing when to wake up, at which point
> you might as well go to infinity.

Yeah.  The perfectionist in me wants to say that there should be
explicit wakeups for every event of interest, in which case there's
no need for a timeout.  The engineer in me says "but what about bugs?".
Better a slow reaction than never reacting at all.  OTOH, then you
have to have a discussion about whether 60s (or any other
ice-cap-friendly value) is an acceptable response time even in the
presence of bugs.

It's kind of moot until we've reached the point where we can
credibly claim to have explicit wakeups for every event of
interest.  I don't think we're very close to that today, and
I do think we should try to get closer.  There may come a point
of diminishing returns though.

            regards, tom lane



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Wed, Jan 25, 2023 at 2:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Yeah, I definitely want to fix it.  I just worry that 60s is so long
> > that it also needs that analysis work to be done to explain that it's
> > OK that we're a bit sloppy on noticing when to wake up, at which point
> > you might as well go to infinity.
>
> Yeah.  The perfectionist in me wants to say that there should be
> explicit wakeups for every event of interest, in which case there's
> no need for a timeout.  The engineer in me says "but what about bugs?".
> Better a slow reaction than never reacting at all.  OTOH, then you
> have to have a discussion about whether 60s (or any other
> ice-cap-friendly value) is an acceptable response time even in the
> presence of bugs.
>
> It's kind of moot until we've reached the point where we can
> credibly claim to have explicit wakeups for every event of
> interest.  I don't think we're very close to that today, and
> I do think we should try to get closer.  There may come a point
> of diminishing returns though.

IIUC, we're discussing here whether or not to get rid of hibernate
loops, IOW, sleep-wakeup-doworkifthereisany-sleep loops and rely on
other processes' wakeup signals to reduce the overall power
consumption, am I right?

I'm trying to understand this a bit - can the signals (especially,
SIGURG that we use to set latches to wake up processes) ever get lost
on the way before reaching the target process? If yes, how? How
frequently can it happen? Is there any history of reported issues in
postgres because a signal got lost?

I'm reading about Pending Signals and queuing of signals with
sigqueue() (in linux), can any of these guarantee that signals sent
never get lost?

FWIW, a recent commit cd4329d that removed promote_trigger_file also
removed hibernation and added wait forever relying on the latch set.
If we're worried that the signals can get lost on the way, then this
also needs to be fixed. And, I also see lot of
WaitLatch() with waiting forever relying on others to wake them up.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Reducing power consumption on idle servers

From
Thomas Munro
Date:
On Fri, Jan 27, 2023 at 7:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Wed, Jan 25, 2023 at 2:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It's kind of moot until we've reached the point where we can
> > credibly claim to have explicit wakeups for every event of
> > interest.  I don't think we're very close to that today, and
> > I do think we should try to get closer.  There may come a point
> > of diminishing returns though.
>
> IIUC, we're discussing here whether or not to get rid of hibernate
> loops, IOW, sleep-wakeup-doworkifthereisany-sleep loops and rely on
> other processes' wakeup signals to reduce the overall power
> consumption, am I right?
>
> I'm trying to understand this a bit - can the signals (especially,
> SIGURG that we use to set latches to wake up processes) ever get lost
> on the way before reaching the target process? If yes, how? How
> frequently can it happen? Is there any history of reported issues in
> postgres because a signal got lost?
>
> I'm reading about Pending Signals and queuing of signals with
> sigqueue() (in linux), can any of these guarantee that signals sent
> never get lost?

Signals don't get lost.  At least with the current definition of so
called "reliable" signals, in the POSIX standard.  (There was a
previous system of signals in ancient UNIX that was harder to program
with; we still see echos of that today, for example in C standard
there are basic signals, and Windows implements those, but they're
borderline useless; before a signal handler runs, the handler is also
de-registered, so many interesting handlers would have to re-register
it, but signals can't be atomically blocked at the same time and there
is a window of time where the default behaviour applies, which (from
our modern perspective) is clearly insane as there is literally no way
to close that race and avoid some fatal default behaviour).  BTW
sigqueue (signals that are queued up without being consolidated, and
carry a user-controlled value with them) are "realtime signals" AKA
"signal queuing", which I guess we'll never be able to use because at
least Apple didn't implement them; the "reliable signals" we use are
the plain kind where signals don't carry any kind of payload and are
collapsed into a single "pending" bit, so that if 2 people do
kill(SIGFOO) around the same time your handler might only run once (if
you registered one and its not blocked), but it will definitely run >
0 times (or be delievered via various other means such as sigwait(),
etc...).

Then there is the question of races as you go into the wait primitive.
I mean in between our reading latch->is_set and entering the kernel,
which we defend against using various techniques which you can read
about at the top of latch.c; the short version is that if that
happens, those system calls will immediately return.

The uncertainty discussed here comes from the comment beginning "There
is a race condition here, ..." in bgwriter.c, referring to the
protocol implemented by StrategyNotifyBgWriter().  I haven't really
looked into it.  I *guess* it's probably approximately OK if the
bgwriter misses a wakeup from time to time, because there's another
chance to wake it up as soon as someone needs another buffer, and if
you don't need any more buffers soon your system is probably idle; but
really "needing another buffer" isn't a great proxy for "more buffers
are being made dirty" at all!  Someone would need to think about it
and construct a complete argument for why it's OK to go to sleep for
60 seconds, or infinity, with that sloppiness in place.

There's also the walwriter to look into; from memory it was a little
less fuzzy but I haven't looked recently.



Re: Reducing power consumption on idle servers

From
Bharath Rupireddy
Date:
On Fri, Jan 27, 2023 at 12:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> There's also the walwriter to look into; from memory it was a little
> less fuzzy but I haven't looked recently.

Thanks. I tried to do away with the walwriter hibernation for just
some time and made it wait indefinitely until an event occurs. Once
walwriter finds no work for some time, it goes to wait forever mode
and it's woken up when someone inserts WAL. Please see the attached
patch for more details. Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment