Thread: Process wakeups when idle and power consumption

Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
There is a general need to have Postgres consume fewer CPU cycles and
less power when idle. Until something is done about this, shared
hosting providers, particularly those who want to deploy many VM
instances with databases, will continue to choose MySQL out of hand.

I have quantified the difference in the number of wake-ups when idle
between Postgres and MySQL using Intel's powertop utility on my
laptop, which runs Fedora 14. These figures are for a freshly initdb'd
database from git master, and mysql-server 5.1.56 from my system's
package manager.

*snip*
   2.7% ( 11.5)   [      ] postgres
   1.1% (  4.6)   [  1663] Xorg
   0.9% (  3.7)   [  1463] wpa_supplicant
   0.6% (  2.7)   [     ] [ahci] <interrupt>
   0.5% (  2.2)   [     ] mysqld
*snip*

Postgres consistenly has 11.5 wakeups per second, while MySQL
consistently has 2.2 wakeups (averaged over the 5 second period that
each cycle of instrumentation lasts).

If I turn on archiving, the figure for Postgres naturally increases:

*snip*
   1.7% ( 12.5)   [      ] postgres
   1.6% ( 12.0)   [   808] phy0
   0.7% (  5.4)   [  1463] wpa_supplicant
   0.6% (  4.3)   [     ] [ahci] <interrupt>
   0.3% (  2.2)   [     ] mysqld
*snip*

It increases by exactly the amount that you'd expect after looking at
pgarch.c - one wakeup per second. This is because there is a loop
within the main event loop for the process that is a prime example of
what unix_latch.c describes as "the common pattern of using
pg_usleep() or select() to wait until a signal arrives, where the
signal handler sets a global variable". The loop naps for one second
per iteration.

Attached is the first in what I hope will become a series of patches
for reducing power consumption when idle. It makes the archiver
process wake far less frequently, using a latch primitive,
specifically a non-shared latch. I'm not sure if I should have used a
shared latch, and have SetLatch() calls replace
SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) calls. Would that have
broken some implied notion of encapsulation? In any case, if I apply
the patch and rebuild, the difference is quite apparent:

***snip***
 3.9% ( 21.8)   [  1663] Xorg
   3.2% ( 17.9)   [      ] [ath9k] <interrupt>
   2.1% ( 11.9)   [   808] phy0
   2.1% ( 11.5)   [      ] postgres
   1.0% (  5.4)   [  1463] wpa_supplicant
   0.4% (  2.2)   [     ] mysqld
***snip***

The difference from not running the archiver at all appears to have
been completely eliminated (in fact, we still wake up every
PGARCH_AUTOWAKE_INTERVAL seconds, which is 60 seconds, but that
usually isn't apparent to powertop, which measures wakeups over 5
second periods).

If we could gain similar decreases in idle power consumption across
all Postgres ancillary processes, perhaps we'd see Postgres available
as an option for shared hosting plans more frequently. When these
differences are multiplied by thousands of VM instances, they really
matter. Unfortunately, there doesn't seem to be a way to get powertop
to display its instrumentation per-process to quickly get a detailed
overview of where those wake-ups occur across all pg processes.

I hope to work on reducing wakeups for PG ancillary processes in this
order (order of perceived difficulty), using shared latches to
eliminate "the waiting pattern" in each case:

* WALWriter
* BgWriter
* WALReceiver
* Startup process

I'll need to take a look at statistics, autovacuum and Logger
processes too, to see if they present more subtle opportunities for
reduced idle power consumption.

Do constants like PGARCH_AUTOWAKE_INTERVAL need to always be set at
their current, conservative levels? Perhaps these sorts of values
could be collectively controlled with a single GUC that represents a
trade-off between CPU cycles used when idle against
safety/reliability. On the other hand, there are GUCs that control
that per process in some cases already, such as wal_writer_delay, and
that suggestion could well be a bit woolly. It might be an enum value
that represented various levels of concern that would default to
something like 'conservative' (i.e. the current values).

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: Process wakeups when idle and power consumption

From
Tom Lane
Date:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> Attached is the first in what I hope will become a series of patches
> for reducing power consumption when idle.

Cool.  This has been on my personal to-do list for awhile, but it keeps
on failing to get to the top, so I'm glad to see somebody else putting
time into it.

The major problem I'm aware of for getting rid of periodic wakeups is
the need for child processes to notice when the postmaster has died
unexpectedly.  Your patch appears to degrade the archiver's response
time for that really significantly, like from O(1 sec) to O(1 min),
which I don't think is acceptable.  We've occasionally kicked around
ideas for mechanisms that would solve this problem, but nothing's gotten
done.  It doesn't seem to be an easy problem to solve portably...

> +         * The caveat about signals invalidating the timeout of 
> +         * WaitLatch() on some platforms can be safely disregarded, 

Really?
        regards, tom lane


Re: Process wakeups when idle and power consumption

From
Alvaro Herrera
Date:
Excerpts from Peter Geoghegan's message of jue may 05 16:49:25 -0300 2011:

> I'll need to take a look at statistics, autovacuum and Logger
> processes too, to see if they present more subtle opportunities for
> reduced idle power consumption.

More subtle?  Autovacuum wakes up once per second and it could sleep a
lot longer if it weren't for the loop that checks for signals.  I think
that could be improved a lot.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Process wakeups when idle and power consumption

From
"A.M."
Date:
On May 5, 2011, at 4:08 PM, Alvaro Herrera wrote:

> Excerpts from Peter Geoghegan's message of jue may 05 16:49:25 -0300 2011:
>
>> I'll need to take a look at statistics, autovacuum and Logger
>> processes too, to see if they present more subtle opportunities for
>> reduced idle power consumption.
>
> More subtle?  Autovacuum wakes up once per second and it could sleep a
> lot longer if it weren't for the loop that checks for signals.  I think
> that could be improved a lot.

Could kqueue be of use here? Non-kqueue-supporting platforms could always fall back to the existing select().

Cheers,
M

Re: Process wakeups when idle and power consumption

From
Robert Haas
Date:
On Thu, May 5, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +              * The caveat about signals invalidating the timeout of
>> +              * WaitLatch() on some platforms can be safely disregarded,
>
> Really?

I'm a bit confused by the phrasing of this comment as well, but it
does seem to me that if all the relevant signal handlers set the
latch, then it ought not to be necessary to break the sleep down into
one-second intervals.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Process wakeups when idle and power consumption

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 5, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> +              * The caveat about signals invalidating the timeout of
>>> +              * WaitLatch() on some platforms can be safely disregarded,

>> Really?

> I'm a bit confused by the phrasing of this comment as well, but it
> does seem to me that if all the relevant signal handlers set the
> latch, then it ought not to be necessary to break the sleep down into
> one-second intervals.

[ reads code some more ... ]  Yeah, I think you are probably right,
which makes it just a badly phrased comment.  The important point here
is that the self-pipe trick in unix_latch.c fixes the problem, so long
as we are relying on latch release and NOT timeout-driven wakeup.

What that really means is that any WaitOnLatch call with a finite
timeout ought to be viewed with a jaundiced eye.  Ideally, we want them
all to be waiting for latch release and nothing else.  I'm concerned
that we're going to be moving towards some intermediate state where we
have WaitOnLatch calls with very long timeouts, because the longer the
timeout, the worse the problem gets on platforms that have the problem.
If you have say a 1-minute timeout, it's not difficult to believe that
you'll basically never wake up because of random signals resetting the
timeout.
        regards, tom lane


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 5 May 2011 22:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, May 5, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> +              * The caveat about signals invalidating the timeout of
>>>> +              * WaitLatch() on some platforms can be safely disregarded,
>
>>> Really?
>
>> I'm a bit confused by the phrasing of this comment as well, but it
>> does seem to me that if all the relevant signal handlers set the
>> latch, then it ought not to be necessary to break the sleep down into
>> one-second intervals.
>
> [ reads code some more ... ]  Yeah, I think you are probably right,
> which makes it just a badly phrased comment.  The important point here
> is that the self-pipe trick in unix_latch.c fixes the problem, so long
> as we are relying on latch release and NOT timeout-driven wakeup.

Why do you think that my comment is badly phrased?

> What that really means is that any WaitOnLatch call with a finite
> timeout ought to be viewed with a jaundiced eye.  Ideally, we want them
> all to be waiting for latch release and nothing else.  I'm concerned
> that we're going to be moving towards some intermediate state where we
> have WaitOnLatch calls with very long timeouts, because the longer the
> timeout, the worse the problem gets on platforms that have the problem.
> If you have say a 1-minute timeout, it's not difficult to believe that
> you'll basically never wake up because of random signals resetting the
> timeout.

Unless all signal handlers for signals that we expect call SetLatch()
anyway, as in this case.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Tom Lane
Date:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 5 May 2011 22:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What that really means is that any WaitOnLatch call with a finite
>> timeout ought to be viewed with a jaundiced eye. �Ideally, we want them
>> all to be waiting for latch release and nothing else. �I'm concerned
>> that we're going to be moving towards some intermediate state where we
>> have WaitOnLatch calls with very long timeouts, because the longer the
>> timeout, the worse the problem gets on platforms that have the problem.
>> If you have say a 1-minute timeout, it's not difficult to believe that
>> you'll basically never wake up because of random signals resetting the
>> timeout.

> Unless all signal handlers for signals that we expect call SetLatch()
> anyway, as in this case.

It's signals that we don't expect that I'm a bit worried about here.

In any case, the bottom line is that having a timeout on WaitOnLatch
is a kludge, and we should try to avoid it.
        regards, tom lane


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The major problem I'm aware of for getting rid of periodic wakeups is
> the need for child processes to notice when the postmaster has died
> unexpectedly.  Your patch appears to degrade the archiver's response
> time for that really significantly, like from O(1 sec) to O(1 min),
> which I don't think is acceptable.  We've occasionally kicked around
> ideas for mechanisms that would solve this problem, but nothing's gotten
> done.  It doesn't seem to be an easy problem to solve portably...

Could you please expand upon this? Why is it of any consequence if the
archiver notices that the postmaster is dead after 60 seconds rather
than after 1? So control in the archiver is going to stay in its event
loop for longer than it would have before, until pgarch_MainLoop()
finally returns. The DBA might be required to kill the archiver where
before they wouldn't have been (they wouldn't have had time to), but
they are also required to kill other backends anyway before deleting
postmaster.pid, or there will be dire consequences. Nothing important
happens after waiting on the latch but before checking
PostmasterIsAlive(), and nothing important happens after the
postmaster is found to be dead. ISTM that it wouldn't be particularly
bad if the archiver was SIGKILL'd while waiting on a latch.

The only salient thread I found concerning the problem of making
children know when the postmaster died is this one:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php

Fujii Masao suggests removing wal_sender_delay in that thread, and
replacing it with a generic default. That does work well with my
suggestion to unify these sorts of timeouts under a single GUC.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Tom Lane
Date:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The major problem I'm aware of for getting rid of periodic wakeups is
>> the need for child processes to notice when the postmaster has died
>> unexpectedly.

> Could you please expand upon this? Why is it of any consequence if the
> archiver notices that the postmaster is dead after 60 seconds rather
> than after 1?

Because somebody might try to start a new postmaster before that, and
it's not really a good idea to have a rogue archiver running in addition
to the new one.  You might be able to construct an argument about how
that was safe, but it would be a fragile one.  What's more, it would not
apply to any other child process, and we need a solution that scales to
all the children or we're going nowhere in terms of saving power.

In the case of the children that are connected to shared memory, such as
bgwriter, a long delay until child exit means a long delay until a new
postmaster can start at all --- which means you're effectively creating
a denial of service, with the length directly proportional to how
aggressively you're trying to avoid "unnecessary" wakeups.

So that's not a tradeoff I want to be making.  I'd rather have a
solution in which children somehow get notified of postmaster death
without having to wake up just to poll for it.  Then, once we fix the
other issues, there are no timeouts needed at all, which is obviously
the ideal situation for power consumption as well as response time.

> The only salient thread I found concerning the problem of making
> children know when the postmaster died is this one:
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php

You didn't look terribly hard then.  Here are two recent threads:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg01011.php
http://archives.postgresql.org/pgsql-hackers/2011-02/msg02142.php

The pipe solution mentioned in the first one would work on all Unixen,
and we could possibly optimize things a bit on Linux using the second
method.  (There was also a bit of speculation about relying on SEM_UNDO,
but I don't think we followed that idea far.)  I don't know however what
we'd need on Windows.
        regards, tom lane


Re: Process wakeups when idle and power consumption

From
Robert Haas
Date:
On Fri, May 6, 2011 at 8:16 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Could you please expand upon this? Why is it of any consequence if the
> archiver notices that the postmaster is dead after 60 seconds rather
> than after 1? So control in the archiver is going to stay in its event
> loop for longer than it would have before, until pgarch_MainLoop()
> finally returns. The DBA might be required to kill the archiver where
> before they wouldn't have been (they wouldn't have had time to), but
> they are also required to kill other backends anyway before deleting
> postmaster.pid, or there will be dire consequences. Nothing important
> happens after waiting on the latch but before checking
> PostmasterIsAlive(), and nothing important happens after the
> postmaster is found to be dead. ISTM that it wouldn't be particularly
> bad if the archiver was SIGKILL'd while waiting on a latch.

Well, IMHO, the desirable state of affairs is for all child processes,
including regular backends, to exit near-instantaneously once the
postmaster dies.  Among many other problems, once the postmaster is
gone, there's no guard against shared memory corruption.  And as long
as there is at least one backend kicking around attached to shared
memory, you won't be able to restart postmaster, which is something
you typically want to do as quickly as humanly possible.

http://www.postgresql.org/support/submitbug

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Process wakeups when idle and power consumption

From
Robert Haas
Date:
On Fri, May 6, 2011 at 10:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 6, 2011 at 8:16 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Could you please expand upon this? Why is it of any consequence if the
>> archiver notices that the postmaster is dead after 60 seconds rather
>> than after 1? So control in the archiver is going to stay in its event
>> loop for longer than it would have before, until pgarch_MainLoop()
>> finally returns. The DBA might be required to kill the archiver where
>> before they wouldn't have been (they wouldn't have had time to), but
>> they are also required to kill other backends anyway before deleting
>> postmaster.pid, or there will be dire consequences. Nothing important
>> happens after waiting on the latch but before checking
>> PostmasterIsAlive(), and nothing important happens after the
>> postmaster is found to be dead. ISTM that it wouldn't be particularly
>> bad if the archiver was SIGKILL'd while waiting on a latch.
>
> Well, IMHO, the desirable state of affairs is for all child processes,
> including regular backends, to exit near-instantaneously once the
> postmaster dies.  Among many other problems, once the postmaster is
> gone, there's no guard against shared memory corruption.  And as long
> as there is at least one backend kicking around attached to shared
> memory, you won't be able to restart postmaster, which is something
> you typically want to do as quickly as humanly possible.
>
> http://www.postgresql.org/support/submitbug

The apparently irrelevant link at the bottom of this email is the
result of a cut-and-paste into the wrong email window.  Sorry....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 6 May 2011 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The major problem I'm aware of for getting rid of periodic wakeups is
>>> the need for child processes to notice when the postmaster has died
>>> unexpectedly.
>
>> Could you please expand upon this? Why is it of any consequence if the
>> archiver notices that the postmaster is dead after 60 seconds rather
>> than after 1?
>
> Because somebody might try to start a new postmaster before that, and
> it's not really a good idea to have a rogue archiver running in addition
> to the new one.  You might be able to construct an argument about how
> that was safe, but it would be a fragile one.  What's more, it would not
> apply to any other child process, and we need a solution that scales to
> all the children or we're going nowhere in terms of saving power.
>
> In the case of the children that are connected to shared memory, such as
> bgwriter, a long delay until child exit means a long delay until a new
> postmaster can start at all --- which means you're effectively creating
> a denial of service, with the length directly proportional to how
> aggressively you're trying to avoid "unnecessary" wakeups.

Perhaps I'm missing the point here, but I don't think that I have to
make an argument for why it might be acceptable to have two archivers
running at once, or two of any other auxiliary process. Let's assume
that it's completely unacceptable. It may still be worth while
applying this patch essentially as-is.

It's also clearly completely unacceptable to have orphaned regular
backends running at the same time as another, freshly started sets of
backends with their own shared buffers that aren't in contact with the
orphans, but have the same data directory. That's still possible today
though. This is the main reason that we caution people against kill
-9'ing the postmaster - if they do so, but then delete postmaster.pid
before starting a new postmaster, that causes data corruption.

This happens under the same circumstances that any conceivable problem
(or at least any problem that I can immediately think of) with
auxiliary processes co-existing as children of different postmasters
(or Ex-Postmasters). I don't think that we've lost anything by
allowing two completely unacceptable things to happen under those
circumstances rather than just one. The precedent for having
completely unacceptable things happen, like data loss, under those
circumstances exists already. You could argue that that is a bad state
of affairs that we should fix, and I'd be inclined to agree, but it
seems like a separate issue.

> So that's not a tradeoff I want to be making.  I'd rather have a
> solution in which children somehow get notified of postmaster death
> without having to wake up just to poll for it.  Then, once we fix the
> other issues, there are no timeouts needed at all, which is obviously
> the ideal situation for power consumption as well as response time.
>
>> The only salient thread I found concerning the problem of making
>> children know when the postmaster died is this one:
>> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php
>
> You didn't look terribly hard then.  Here are two recent threads:
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01011.php
> http://archives.postgresql.org/pgsql-hackers/2011-02/msg02142.php
>
> The pipe solution mentioned in the first one would work on all Unixen,
> and we could possibly optimize things a bit on Linux using the second
> method.  (There was also a bit of speculation about relying on SEM_UNDO,
> but I don't think we followed that idea far.)  I don't know however what
> we'd need on Windows.

I've taken a look at Florian Pflug's work in the first thread. The
most promising lead I have on a method for monitoring if the
Postmaster has died on windows is PsSetCreateProcessNotifyRoutine(),
which necessitates registering a kernel mode driver and dynamically
loading it. That sounds very kludgey indeed. Here is a sample program
that demonstrates that sort of usage:

http://www.codeproject.com/KB/threads/procmon.aspx

Alternatively, we could do something with PSAPI. It apparently doesn't
allow you to define hooks on any kind for when a process ends. We
could, I suppose, have a heartbeat process that monitors running
backends on windows using much the same "nap and check" pattern, that
wakes up child processes to finish their little bit of remaining work
and exit() on finding the Postmaster dead. That has the same
"fundamental race condition" that Tom described in the first of the
above threads though.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Tom Lane
Date:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> Perhaps I'm missing the point here, but I don't think that I have to
> make an argument for why it might be acceptable to have two archivers
> running at once, or two of any other auxiliary process. Let's assume
> that it's completely unacceptable. It may still be worth while
> applying this patch essentially as-is.

> It's also clearly completely unacceptable to have orphaned regular
> backends running at the same time as another, freshly started sets of
> backends with their own shared buffers that aren't in contact with the
> orphans, but have the same data directory. That's still possible today
> though. This is the main reason that we caution people against kill
> -9'ing the postmaster - if they do so, but then delete postmaster.pid
> before starting a new postmaster, that causes data corruption.

Indeed, which is why we have the postmaster.pid interlock against doing
that.  What you describe is a DBA with a death wish who's going out of
his way to defeat the safety interlock.  We can't do much about that
level of idiocy.  However, it's quite irrelevant to the current
discussion.

The aspect of this that *is* relevant is that if you haven't
deliberately defeated the interlock (and thereby put your data at risk),
you won't be able to start a new postmaster until all the old
shmem-attached children are gone.  And that's why having a child with a
very long reaction time for parent death represents a denial of service.
        regards, tom lane


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 7 May 2011 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The aspect of this that *is* relevant is that if you haven't
> deliberately defeated the interlock (and thereby put your data at risk),
> you won't be able to start a new postmaster until all the old
> shmem-attached children are gone.  And that's why having a child with a
> very long reaction time for parent death represents a denial of service.

Alright. I don't suppose it would be acceptable to have the startup
process signal any auxiliary process that it might find with init as a
parent through ps, and within the handler for that signal in each
auxiliary (I suppose it's a SIGUSR2), take appropriate action,
typically just waking up through a SetLatch() call once we
independently verify that we are in fact orphaned?

If we find orphans, we could perform a "nap and check" loop within the
startup process (probably tighter than 1 second per iteration), until
the shmem-attached children that are liable to block us from starting
a new postmaster exit().

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
I've taken a look into it, and I'm not optimistic about the likelihood
of the way I've suggested that we can register a callback on process
termination on windows being acceptable. It seems to be a kludge too
far. It does work on Vista, just not very well. There is a
considerable delay on closing the above console application that uses
this technique, for example, and there seems to be an unpredictable
delay in the callback occurring.

A simpler solution on Windows might be to make the timeout on
auxiliary processes much smaller, but have it increase on each
subsequent timeout (starting from scratch if we wakeup for any reason
other than timeout) until eventually it maxes out at something like
the current value for PGARCH_AUTOWAKE_INTERVAL. If backends are
sleeping for increasing periods of time, the chance of the postmaster
crashing goes down, so denial of service is much less of a concern.

An alternative might be to just not do this on Windows. Certainly,
idle wakeups are likely to be less important on that platform, which
is not a very popular choice for virtual machines deployed on cloudy
infrastructure, the use case that will benefit from these enhancements
the most, by some margin.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Heikki Linnakangas
Date:
On 09.05.2011 12:20, Peter Geoghegan wrote:
> I've taken a look into it, and I'm not optimistic about the likelihood
> of the way I've suggested that we can register a callback on process
> termination on windows being acceptable. It seems to be a kludge too
> far. It does work on Vista, just not very well. There is a
> considerable delay on closing the above console application that uses
> this technique, for example, and there seems to be an unpredictable
> delay in the callback occurring.

Can't we use the pipe trick on Windows? The API is different, but we use 
pipes on Windows for other things already. When a process is launched, 
open a pipe between postmaster and the child process. In the child, 
spawn a thread that just calls ReadFile() on the pipe, which blocks. If 
postmaster dies, the ReadFile() call will return with an error.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 9 May 2011 11:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> Can't we use the pipe trick on Windows? The API is different, but we use
> pipes on Windows for other things already. When a process is launched, open
> a pipe between postmaster and the child process. In the child, spawn a
> thread that just calls ReadFile() on the pipe, which blocks. If postmaster
> dies, the ReadFile() call will return with an error.

Alright. I'm currently working on a proof-of-concept implementation of
that. In the meantime, any thoughts on how this should meld with the
existing latch implementation?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Fujii Masao
Date:
On Mon, May 9, 2011 at 8:27 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 9 May 2011 11:19, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>
>> Can't we use the pipe trick on Windows? The API is different, but we use
>> pipes on Windows for other things already. When a process is launched, open
>> a pipe between postmaster and the child process. In the child, spawn a
>> thread that just calls ReadFile() on the pipe, which blocks. If postmaster
>> dies, the ReadFile() call will return with an error.
>
> Alright. I'm currently working on a proof-of-concept implementation of
> that. In the meantime, any thoughts on how this should meld with the
> existing latch implementation?

How about making WaitLatch monitor the file descriptor for the pipe
by using select()?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 10 May 2011 02:58, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Alright. I'm currently working on a proof-of-concept implementation of
>> that. In the meantime, any thoughts on how this should meld with the
>> existing latch implementation?
>
> How about making WaitLatch monitor the file descriptor for the pipe
> by using select()?

Alright, so it's reasonable to assume that all clients of the latch
code are happy to be invariably woken up on Postmaster death?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Heikki Linnakangas
Date:
On 10.05.2011 11:22, Peter Geoghegan wrote:
> On 10 May 2011 02:58, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>> Alright. I'm currently working on a proof-of-concept implementation of
>>> that. In the meantime, any thoughts on how this should meld with the
>>> existing latch implementation?
>>
>> How about making WaitLatch monitor the file descriptor for the pipe
>> by using select()?
>
> Alright, so it's reasonable to assume that all clients of the latch
> code are happy to be invariably woken up on Postmaster death?

That doesn't sound like a safe assumption. All the helper processes 
should die quickly on postmaster death, but I'm not sure if that holds 
for all inter-process communication. I think the caller needs to specify 
if he wants that or not.


Once you add that to the WaitLatchOrSocket function, it's quite clear 
that the API is getting out of hand. There's five different events that 
can wake it up:

* latch is set
* a socket becomes readable
* a socket becomes writeable
* timeout
* postmaster dies

I think we need to refactor the function into something like:

#define WL_LATCH_SET    1
#define WL_SOCKET_READABLE 2
#define WL_SOCKET_WRITEABLE 4
#define WL_TIMEOUT    8
#define WL_POSTMASTER_DEATH 16

int WaitLatch(Latch latch, int events, long timeout)

Where 'event's is a bitmask of events that should cause a wakeup, and 
return value is a bitmask identifying which event(s) caused the call to 
return.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 10 May 2011 09:45, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> I think we need to refactor the function into something like:
>
> #define WL_LATCH_SET    1
> #define WL_SOCKET_READABLE 2
> #define WL_SOCKET_WRITEABLE 4
> #define WL_TIMEOUT      8
> #define WL_POSTMASTER_DEATH 16

While I agree with the need to not box ourselves into a corner on the
latch interface by making sweeping assumptions, isn't the fact that a
socket became readable or writable strictly an implementation detail?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
Attached is win32 implementation of the "named pipe trick".

It consists of a Visual Studio 2008 solution that contains two
projects, named_pipe_trick (which represents the postmaster) and
auxiliary_backend (which represents each auxiliary process). I split
the solution into two projects/programs because Windows lacks fork()
to make it all happen with a single program.

Thoughts? Once I have some buy-in, I'd like to write a patch for the
latch code that incorporates monitoring the postmaster using the named
pipe trick (for both unix_latch.c and win32_latch.c), plus Heikki's
suggestions.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: Process wakeups when idle and power consumption

From
Robert Haas
Date:
On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 10 May 2011 09:45, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>
>> I think we need to refactor the function into something like:
>>
>> #define WL_LATCH_SET    1
>> #define WL_SOCKET_READABLE 2
>> #define WL_SOCKET_WRITEABLE 4
>> #define WL_TIMEOUT      8
>> #define WL_POSTMASTER_DEATH 16
>
> While I agree with the need to not box ourselves into a corner on the
> latch interface by making sweeping assumptions, isn't the fact that a
> socket became readable or writable strictly an implementation detail?

The thing about the socket being readable/writeable is needed for
walsender.  It needs to notice when its connection to walreceiver is
writeable (so it can send more WAL) or readable (so it can receive a
reply message).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Process wakeups when idle and power consumption

From
Simon Riggs
Date:
On Tue, May 10, 2011 at 12:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> On 10 May 2011 09:45, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>
>>> I think we need to refactor the function into something like:
>>>
>>> #define WL_LATCH_SET    1
>>> #define WL_SOCKET_READABLE 2
>>> #define WL_SOCKET_WRITEABLE 4
>>> #define WL_TIMEOUT      8
>>> #define WL_POSTMASTER_DEATH 16
>>
>> While I agree with the need to not box ourselves into a corner on the
>> latch interface by making sweeping assumptions, isn't the fact that a
>> socket became readable or writable strictly an implementation detail?
>
> The thing about the socket being readable/writeable is needed for
> walsender.  It needs to notice when its connection to walreceiver is
> writeable (so it can send more WAL) or readable (so it can receive a
> reply message).

I've got a feeling that things will go easier if we have a separate
connection for the feedback channel.

Yes, two connections, one in either direction.

That would make everything simple, nice one way connections. It would
also mean we could stream at higher data rates.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Process wakeups when idle and power consumption

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I've got a feeling that things will go easier if we have a separate
> connection for the feedback channel.

> Yes, two connections, one in either direction.

> That would make everything simple, nice one way connections. It would
> also mean we could stream at higher data rates.

The above sounds like complete nonsense.  TCP connections are already
full-duplex.
        regards, tom lane


Re: Process wakeups when idle and power consumption

From
Heikki Linnakangas
Date:
On 10.05.2011 14:39, Peter Geoghegan wrote:
> Attached is win32 implementation of the "named pipe trick".
>
> It consists of a Visual Studio 2008 solution that contains two
> projects, named_pipe_trick (which represents the postmaster) and
> auxiliary_backend (which represents each auxiliary process). I split
> the solution into two projects/programs because Windows lacks fork()
> to make it all happen with a single program.
>
> Thoughts? Once I have some buy-in, I'd like to write a patch for the
> latch code that incorporates monitoring the postmaster using the named
> pipe trick (for both unix_latch.c and win32_latch.c), plus Heikki's
> suggestions.

It should be an anonymous pipe that's inherited by the child process by 
rather than a named pipe. Otherwise seems fine to me, as far as this 
proof of concept program goes.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 10 May 2011 17:43, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> It should be an anonymous pipe that's inherited by the child process by
> rather than a named pipe. Otherwise seems fine to me, as far as this proof
> of concept program goes.

Alright, thanks. I'll use an anonymous pipe in the patch itself.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 9 May 2011 11:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> In the child, spawn a thread

How exactly should I go about this? The one place in the code that I
knew to use multiple threads, pgbench, falls back on "emulation with
fork()" on some platforms.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Magnus Hagander
Date:
On Wed, May 11, 2011 at 10:52, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 9 May 2011 11:19, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>
>> In the child, spawn a thread
>
> How exactly should I go about this? The one place in the code that I
> knew to use multiple threads, pgbench, falls back on "emulation with
> fork()" on some platforms.

If you're doing this Win32 specific, take a look at
src/backend/port/win32/signal.c for an example.

If you're not doing this win32-specific, I doubt we really want
threads to be involved...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Process wakeups when idle and power consumption

From
Peter Geoghegan
Date:
On 11 May 2011 09:54, Magnus Hagander <magnus@hagander.net> wrote:

> If you're doing this Win32 specific, take a look at
> src/backend/port/win32/signal.c for an example.
>
> If you're not doing this win32-specific, I doubt we really want
> threads to be involved...

Well, that seems to be the traditional wisdom. It seems sensible to me
that each process should look out for postmaster death itself though.
Tom described potential race conditions in looking at ps output...do
we really want to double the number of auxiliary processes in a single
release of Postgres?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Process wakeups when idle and power consumption

From
Heikki Linnakangas
Date:
On 11.05.2011 13:34, Peter Geoghegan wrote:
> On 11 May 2011 09:54, Magnus Hagander<magnus@hagander.net>  wrote:
>
>> If you're doing this Win32 specific, take a look at
>> src/backend/port/win32/signal.c for an example.
>>
>> If you're not doing this win32-specific, I doubt we really want
>> threads to be involved...
>
> Well, that seems to be the traditional wisdom. It seems sensible to me
> that each process should look out for postmaster death itself though.
> Tom described potential race conditions in looking at ps output...do
> we really want to double the number of auxiliary processes in a single
> release of Postgres?

Uh, no you don't want any new processes on Unix. You want each process 
to check for postmaster death every once in a while, like they do today. 
The pipe-trick is to make sure the processes wake up promptly to notice 
the death when the postmaster dies. You just need to add the 
postmaster-pipe to the select() calls we already do.

I'm not sure if on Windows you can similarly just add to the 
postmaster-pipe to the WaitForMultipleObjects() calls we already do. 
Then you won't need new threads on Windows either.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com