Re: Process wakeups when idle and power consumption - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Process wakeups when idle and power consumption |
Date | |
Msg-id | BANLkTi=5ru82EAo7O0N+52aBOe0Sjw-QJg@mail.gmail.com Whole thread Raw |
In response to | Re: Process wakeups when idle and power consumption (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Process wakeups when idle and power consumption
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
On 6 May 2011 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The major problem I'm aware of for getting rid of periodic wakeups is >>> the need for child processes to notice when the postmaster has died >>> unexpectedly. > >> Could you please expand upon this? Why is it of any consequence if the >> archiver notices that the postmaster is dead after 60 seconds rather >> than after 1? > > Because somebody might try to start a new postmaster before that, and > it's not really a good idea to have a rogue archiver running in addition > to the new one. You might be able to construct an argument about how > that was safe, but it would be a fragile one. What's more, it would not > apply to any other child process, and we need a solution that scales to > all the children or we're going nowhere in terms of saving power. > > In the case of the children that are connected to shared memory, such as > bgwriter, a long delay until child exit means a long delay until a new > postmaster can start at all --- which means you're effectively creating > a denial of service, with the length directly proportional to how > aggressively you're trying to avoid "unnecessary" wakeups. Perhaps I'm missing the point here, but I don't think that I have to make an argument for why it might be acceptable to have two archivers running at once, or two of any other auxiliary process. Let's assume that it's completely unacceptable. It may still be worth while applying this patch essentially as-is. It's also clearly completely unacceptable to have orphaned regular backends running at the same time as another, freshly started sets of backends with their own shared buffers that aren't in contact with the orphans, but have the same data directory. That's still possible today though. This is the main reason that we caution people against kill -9'ing the postmaster - if they do so, but then delete postmaster.pid before starting a new postmaster, that causes data corruption. This happens under the same circumstances that any conceivable problem (or at least any problem that I can immediately think of) with auxiliary processes co-existing as children of different postmasters (or Ex-Postmasters). I don't think that we've lost anything by allowing two completely unacceptable things to happen under those circumstances rather than just one. The precedent for having completely unacceptable things happen, like data loss, under those circumstances exists already. You could argue that that is a bad state of affairs that we should fix, and I'd be inclined to agree, but it seems like a separate issue. > So that's not a tradeoff I want to be making. I'd rather have a > solution in which children somehow get notified of postmaster death > without having to wake up just to poll for it. Then, once we fix the > other issues, there are no timeouts needed at all, which is obviously > the ideal situation for power consumption as well as response time. > >> The only salient thread I found concerning the problem of making >> children know when the postmaster died is this one: >> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php > > You didn't look terribly hard then. Here are two recent threads: > http://archives.postgresql.org/pgsql-hackers/2011-01/msg01011.php > http://archives.postgresql.org/pgsql-hackers/2011-02/msg02142.php > > The pipe solution mentioned in the first one would work on all Unixen, > and we could possibly optimize things a bit on Linux using the second > method. (There was also a bit of speculation about relying on SEM_UNDO, > but I don't think we followed that idea far.) I don't know however what > we'd need on Windows. I've taken a look at Florian Pflug's work in the first thread. The most promising lead I have on a method for monitoring if the Postmaster has died on windows is PsSetCreateProcessNotifyRoutine(), which necessitates registering a kernel mode driver and dynamically loading it. That sounds very kludgey indeed. Here is a sample program that demonstrates that sort of usage: http://www.codeproject.com/KB/threads/procmon.aspx Alternatively, we could do something with PSAPI. It apparently doesn't allow you to define hooks on any kind for when a process ends. We could, I suppose, have a heartbeat process that monitors running backends on windows using much the same "nap and check" pattern, that wakes up child processes to finish their little bit of remaining work and exit() on finding the Postmaster dead. That has the same "fundamental race condition" that Tom described in the first of the above threads though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: