Re: Process wakeups when idle and power consumption - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Process wakeups when idle and power consumption
Date
Msg-id BANLkTi=5ru82EAo7O0N+52aBOe0Sjw-QJg@mail.gmail.com
Whole thread Raw
In response to Re: Process wakeups when idle and power consumption  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Process wakeups when idle and power consumption  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 6 May 2011 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The major problem I'm aware of for getting rid of periodic wakeups is
>>> the need for child processes to notice when the postmaster has died
>>> unexpectedly.
>
>> Could you please expand upon this? Why is it of any consequence if the
>> archiver notices that the postmaster is dead after 60 seconds rather
>> than after 1?
>
> Because somebody might try to start a new postmaster before that, and
> it's not really a good idea to have a rogue archiver running in addition
> to the new one.  You might be able to construct an argument about how
> that was safe, but it would be a fragile one.  What's more, it would not
> apply to any other child process, and we need a solution that scales to
> all the children or we're going nowhere in terms of saving power.
>
> In the case of the children that are connected to shared memory, such as
> bgwriter, a long delay until child exit means a long delay until a new
> postmaster can start at all --- which means you're effectively creating
> a denial of service, with the length directly proportional to how
> aggressively you're trying to avoid "unnecessary" wakeups.

Perhaps I'm missing the point here, but I don't think that I have to
make an argument for why it might be acceptable to have two archivers
running at once, or two of any other auxiliary process. Let's assume
that it's completely unacceptable. It may still be worth while
applying this patch essentially as-is.

It's also clearly completely unacceptable to have orphaned regular
backends running at the same time as another, freshly started sets of
backends with their own shared buffers that aren't in contact with the
orphans, but have the same data directory. That's still possible today
though. This is the main reason that we caution people against kill
-9'ing the postmaster - if they do so, but then delete postmaster.pid
before starting a new postmaster, that causes data corruption.

This happens under the same circumstances that any conceivable problem
(or at least any problem that I can immediately think of) with
auxiliary processes co-existing as children of different postmasters
(or Ex-Postmasters). I don't think that we've lost anything by
allowing two completely unacceptable things to happen under those
circumstances rather than just one. The precedent for having
completely unacceptable things happen, like data loss, under those
circumstances exists already. You could argue that that is a bad state
of affairs that we should fix, and I'd be inclined to agree, but it
seems like a separate issue.

> So that's not a tradeoff I want to be making.  I'd rather have a
> solution in which children somehow get notified of postmaster death
> without having to wake up just to poll for it.  Then, once we fix the
> other issues, there are no timeouts needed at all, which is obviously
> the ideal situation for power consumption as well as response time.
>
>> The only salient thread I found concerning the problem of making
>> children know when the postmaster died is this one:
>> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php
>
> You didn't look terribly hard then.  Here are two recent threads:
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01011.php
> http://archives.postgresql.org/pgsql-hackers/2011-02/msg02142.php
>
> The pipe solution mentioned in the first one would work on all Unixen,
> and we could possibly optimize things a bit on Linux using the second
> method.  (There was also a bit of speculation about relying on SEM_UNDO,
> but I don't think we followed that idea far.)  I don't know however what
> we'd need on Windows.

I've taken a look at Florian Pflug's work in the first thread. The
most promising lead I have on a method for monitoring if the
Postmaster has died on windows is PsSetCreateProcessNotifyRoutine(),
which necessitates registering a kernel mode driver and dynamically
loading it. That sounds very kludgey indeed. Here is a sample program
that demonstrates that sort of usage:

http://www.codeproject.com/KB/threads/procmon.aspx

Alternatively, we could do something with PSAPI. It apparently doesn't
allow you to define hooks on any kind for when a process ends. We
could, I suppose, have a heartbeat process that monitors running
backends on windows using much the same "nap and check" pattern, that
wakes up child processes to finish their little bit of remaining work
and exit() on finding the Postmaster dead. That has the same
"fundamental race condition" that Tom described in the first of the
above threads though.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: switch UNLOGGED to LOGGED
Next
From: Tom Lane
Date:
Subject: Re: Backpatching of "Teach the regular expression functions to do case-insensitive matching"