Thread: Process wakeups when idle and power consumption
There is a general need to have Postgres consume fewer CPU cycles and less power when idle. Until something is done about this, shared hosting providers, particularly those who want to deploy many VM instances with databases, will continue to choose MySQL out of hand. I have quantified the difference in the number of wake-ups when idle between Postgres and MySQL using Intel's powertop utility on my laptop, which runs Fedora 14. These figures are for a freshly initdb'd database from git master, and mysql-server 5.1.56 from my system's package manager. *snip* 2.7% ( 11.5) [ ] postgres 1.1% ( 4.6) [ 1663] Xorg 0.9% ( 3.7) [ 1463] wpa_supplicant 0.6% ( 2.7) [ ] [ahci] <interrupt> 0.5% ( 2.2) [ ] mysqld *snip* Postgres consistenly has 11.5 wakeups per second, while MySQL consistently has 2.2 wakeups (averaged over the 5 second period that each cycle of instrumentation lasts). If I turn on archiving, the figure for Postgres naturally increases: *snip* 1.7% ( 12.5) [ ] postgres 1.6% ( 12.0) [ 808] phy0 0.7% ( 5.4) [ 1463] wpa_supplicant 0.6% ( 4.3) [ ] [ahci] <interrupt> 0.3% ( 2.2) [ ] mysqld *snip* It increases by exactly the amount that you'd expect after looking at pgarch.c - one wakeup per second. This is because there is a loop within the main event loop for the process that is a prime example of what unix_latch.c describes as "the common pattern of using pg_usleep() or select() to wait until a signal arrives, where the signal handler sets a global variable". The loop naps for one second per iteration. Attached is the first in what I hope will become a series of patches for reducing power consumption when idle. It makes the archiver process wake far less frequently, using a latch primitive, specifically a non-shared latch. I'm not sure if I should have used a shared latch, and have SetLatch() calls replace SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) calls. Would that have broken some implied notion of encapsulation? In any case, if I apply the patch and rebuild, the difference is quite apparent: ***snip*** 3.9% ( 21.8) [ 1663] Xorg 3.2% ( 17.9) [ ] [ath9k] <interrupt> 2.1% ( 11.9) [ 808] phy0 2.1% ( 11.5) [ ] postgres 1.0% ( 5.4) [ 1463] wpa_supplicant 0.4% ( 2.2) [ ] mysqld ***snip*** The difference from not running the archiver at all appears to have been completely eliminated (in fact, we still wake up every PGARCH_AUTOWAKE_INTERVAL seconds, which is 60 seconds, but that usually isn't apparent to powertop, which measures wakeups over 5 second periods). If we could gain similar decreases in idle power consumption across all Postgres ancillary processes, perhaps we'd see Postgres available as an option for shared hosting plans more frequently. When these differences are multiplied by thousands of VM instances, they really matter. Unfortunately, there doesn't seem to be a way to get powertop to display its instrumentation per-process to quickly get a detailed overview of where those wake-ups occur across all pg processes. I hope to work on reducing wakeups for PG ancillary processes in this order (order of perceived difficulty), using shared latches to eliminate "the waiting pattern" in each case: * WALWriter * BgWriter * WALReceiver * Startup process I'll need to take a look at statistics, autovacuum and Logger processes too, to see if they present more subtle opportunities for reduced idle power consumption. Do constants like PGARCH_AUTOWAKE_INTERVAL need to always be set at their current, conservative levels? Perhaps these sorts of values could be collectively controlled with a single GUC that represents a trade-off between CPU cycles used when idle against safety/reliability. On the other hand, there are GUCs that control that per process in some cases already, such as wal_writer_delay, and that suggestion could well be a bit woolly. It might be an enum value that represented various levels of concern that would default to something like 'conservative' (i.e. the current values). Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
Peter Geoghegan <peter@2ndquadrant.com> writes: > Attached is the first in what I hope will become a series of patches > for reducing power consumption when idle. Cool. This has been on my personal to-do list for awhile, but it keeps on failing to get to the top, so I'm glad to see somebody else putting time into it. The major problem I'm aware of for getting rid of periodic wakeups is the need for child processes to notice when the postmaster has died unexpectedly. Your patch appears to degrade the archiver's response time for that really significantly, like from O(1 sec) to O(1 min), which I don't think is acceptable. We've occasionally kicked around ideas for mechanisms that would solve this problem, but nothing's gotten done. It doesn't seem to be an easy problem to solve portably... > + * The caveat about signals invalidating the timeout of > + * WaitLatch() on some platforms can be safely disregarded, Really? regards, tom lane
Excerpts from Peter Geoghegan's message of jue may 05 16:49:25 -0300 2011: > I'll need to take a look at statistics, autovacuum and Logger > processes too, to see if they present more subtle opportunities for > reduced idle power consumption. More subtle? Autovacuum wakes up once per second and it could sleep a lot longer if it weren't for the loop that checks for signals. I think that could be improved a lot. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On May 5, 2011, at 4:08 PM, Alvaro Herrera wrote: > Excerpts from Peter Geoghegan's message of jue may 05 16:49:25 -0300 2011: > >> I'll need to take a look at statistics, autovacuum and Logger >> processes too, to see if they present more subtle opportunities for >> reduced idle power consumption. > > More subtle? Autovacuum wakes up once per second and it could sleep a > lot longer if it weren't for the loop that checks for signals. I think > that could be improved a lot. Could kqueue be of use here? Non-kqueue-supporting platforms could always fall back to the existing select(). Cheers, M
On Thu, May 5, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> + * The caveat about signals invalidating the timeout of >> + * WaitLatch() on some platforms can be safely disregarded, > > Really? I'm a bit confused by the phrasing of this comment as well, but it does seem to me that if all the relevant signal handlers set the latch, then it ought not to be necessary to break the sleep down into one-second intervals. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, May 5, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> + * The caveat about signals invalidating the timeout of >>> + * WaitLatch() on some platforms can be safely disregarded, >> Really? > I'm a bit confused by the phrasing of this comment as well, but it > does seem to me that if all the relevant signal handlers set the > latch, then it ought not to be necessary to break the sleep down into > one-second intervals. [ reads code some more ... ] Yeah, I think you are probably right, which makes it just a badly phrased comment. The important point here is that the self-pipe trick in unix_latch.c fixes the problem, so long as we are relying on latch release and NOT timeout-driven wakeup. What that really means is that any WaitOnLatch call with a finite timeout ought to be viewed with a jaundiced eye. Ideally, we want them all to be waiting for latch release and nothing else. I'm concerned that we're going to be moving towards some intermediate state where we have WaitOnLatch calls with very long timeouts, because the longer the timeout, the worse the problem gets on platforms that have the problem. If you have say a 1-minute timeout, it's not difficult to believe that you'll basically never wake up because of random signals resetting the timeout. regards, tom lane
On 5 May 2011 22:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, May 5, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> + * The caveat about signals invalidating the timeout of >>>> + * WaitLatch() on some platforms can be safely disregarded, > >>> Really? > >> I'm a bit confused by the phrasing of this comment as well, but it >> does seem to me that if all the relevant signal handlers set the >> latch, then it ought not to be necessary to break the sleep down into >> one-second intervals. > > [ reads code some more ... ] Yeah, I think you are probably right, > which makes it just a badly phrased comment. The important point here > is that the self-pipe trick in unix_latch.c fixes the problem, so long > as we are relying on latch release and NOT timeout-driven wakeup. Why do you think that my comment is badly phrased? > What that really means is that any WaitOnLatch call with a finite > timeout ought to be viewed with a jaundiced eye. Ideally, we want them > all to be waiting for latch release and nothing else. I'm concerned > that we're going to be moving towards some intermediate state where we > have WaitOnLatch calls with very long timeouts, because the longer the > timeout, the worse the problem gets on platforms that have the problem. > If you have say a 1-minute timeout, it's not difficult to believe that > you'll basically never wake up because of random signals resetting the > timeout. Unless all signal handlers for signals that we expect call SetLatch() anyway, as in this case. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 5 May 2011 22:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What that really means is that any WaitOnLatch call with a finite >> timeout ought to be viewed with a jaundiced eye. �Ideally, we want them >> all to be waiting for latch release and nothing else. �I'm concerned >> that we're going to be moving towards some intermediate state where we >> have WaitOnLatch calls with very long timeouts, because the longer the >> timeout, the worse the problem gets on platforms that have the problem. >> If you have say a 1-minute timeout, it's not difficult to believe that >> you'll basically never wake up because of random signals resetting the >> timeout. > Unless all signal handlers for signals that we expect call SetLatch() > anyway, as in this case. It's signals that we don't expect that I'm a bit worried about here. In any case, the bottom line is that having a timeout on WaitOnLatch is a kludge, and we should try to avoid it. regards, tom lane
On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The major problem I'm aware of for getting rid of periodic wakeups is > the need for child processes to notice when the postmaster has died > unexpectedly. Your patch appears to degrade the archiver's response > time for that really significantly, like from O(1 sec) to O(1 min), > which I don't think is acceptable. We've occasionally kicked around > ideas for mechanisms that would solve this problem, but nothing's gotten > done. It doesn't seem to be an easy problem to solve portably... Could you please expand upon this? Why is it of any consequence if the archiver notices that the postmaster is dead after 60 seconds rather than after 1? So control in the archiver is going to stay in its event loop for longer than it would have before, until pgarch_MainLoop() finally returns. The DBA might be required to kill the archiver where before they wouldn't have been (they wouldn't have had time to), but they are also required to kill other backends anyway before deleting postmaster.pid, or there will be dire consequences. Nothing important happens after waiting on the latch but before checking PostmasterIsAlive(), and nothing important happens after the postmaster is found to be dead. ISTM that it wouldn't be particularly bad if the archiver was SIGKILL'd while waiting on a latch. The only salient thread I found concerning the problem of making children know when the postmaster died is this one: http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php Fujii Masao suggests removing wal_sender_delay in that thread, and replacing it with a generic default. That does work well with my suggestion to unify these sorts of timeouts under a single GUC. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The major problem I'm aware of for getting rid of periodic wakeups is >> the need for child processes to notice when the postmaster has died >> unexpectedly. > Could you please expand upon this? Why is it of any consequence if the > archiver notices that the postmaster is dead after 60 seconds rather > than after 1? Because somebody might try to start a new postmaster before that, and it's not really a good idea to have a rogue archiver running in addition to the new one. You might be able to construct an argument about how that was safe, but it would be a fragile one. What's more, it would not apply to any other child process, and we need a solution that scales to all the children or we're going nowhere in terms of saving power. In the case of the children that are connected to shared memory, such as bgwriter, a long delay until child exit means a long delay until a new postmaster can start at all --- which means you're effectively creating a denial of service, with the length directly proportional to how aggressively you're trying to avoid "unnecessary" wakeups. So that's not a tradeoff I want to be making. I'd rather have a solution in which children somehow get notified of postmaster death without having to wake up just to poll for it. Then, once we fix the other issues, there are no timeouts needed at all, which is obviously the ideal situation for power consumption as well as response time. > The only salient thread I found concerning the problem of making > children know when the postmaster died is this one: > http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php You didn't look terribly hard then. Here are two recent threads: http://archives.postgresql.org/pgsql-hackers/2011-01/msg01011.php http://archives.postgresql.org/pgsql-hackers/2011-02/msg02142.php The pipe solution mentioned in the first one would work on all Unixen, and we could possibly optimize things a bit on Linux using the second method. (There was also a bit of speculation about relying on SEM_UNDO, but I don't think we followed that idea far.) I don't know however what we'd need on Windows. regards, tom lane
On Fri, May 6, 2011 at 8:16 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Could you please expand upon this? Why is it of any consequence if the > archiver notices that the postmaster is dead after 60 seconds rather > than after 1? So control in the archiver is going to stay in its event > loop for longer than it would have before, until pgarch_MainLoop() > finally returns. The DBA might be required to kill the archiver where > before they wouldn't have been (they wouldn't have had time to), but > they are also required to kill other backends anyway before deleting > postmaster.pid, or there will be dire consequences. Nothing important > happens after waiting on the latch but before checking > PostmasterIsAlive(), and nothing important happens after the > postmaster is found to be dead. ISTM that it wouldn't be particularly > bad if the archiver was SIGKILL'd while waiting on a latch. Well, IMHO, the desirable state of affairs is for all child processes, including regular backends, to exit near-instantaneously once the postmaster dies. Among many other problems, once the postmaster is gone, there's no guard against shared memory corruption. And as long as there is at least one backend kicking around attached to shared memory, you won't be able to restart postmaster, which is something you typically want to do as quickly as humanly possible. http://www.postgresql.org/support/submitbug -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 6, 2011 at 10:13 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 6, 2011 at 8:16 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Could you please expand upon this? Why is it of any consequence if the >> archiver notices that the postmaster is dead after 60 seconds rather >> than after 1? So control in the archiver is going to stay in its event >> loop for longer than it would have before, until pgarch_MainLoop() >> finally returns. The DBA might be required to kill the archiver where >> before they wouldn't have been (they wouldn't have had time to), but >> they are also required to kill other backends anyway before deleting >> postmaster.pid, or there will be dire consequences. Nothing important >> happens after waiting on the latch but before checking >> PostmasterIsAlive(), and nothing important happens after the >> postmaster is found to be dead. ISTM that it wouldn't be particularly >> bad if the archiver was SIGKILL'd while waiting on a latch. > > Well, IMHO, the desirable state of affairs is for all child processes, > including regular backends, to exit near-instantaneously once the > postmaster dies. Among many other problems, once the postmaster is > gone, there's no guard against shared memory corruption. And as long > as there is at least one backend kicking around attached to shared > memory, you won't be able to restart postmaster, which is something > you typically want to do as quickly as humanly possible. > > http://www.postgresql.org/support/submitbug The apparently irrelevant link at the bottom of this email is the result of a cut-and-paste into the wrong email window. Sorry.... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6 May 2011 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> On 5 May 2011 21:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The major problem I'm aware of for getting rid of periodic wakeups is >>> the need for child processes to notice when the postmaster has died >>> unexpectedly. > >> Could you please expand upon this? Why is it of any consequence if the >> archiver notices that the postmaster is dead after 60 seconds rather >> than after 1? > > Because somebody might try to start a new postmaster before that, and > it's not really a good idea to have a rogue archiver running in addition > to the new one. You might be able to construct an argument about how > that was safe, but it would be a fragile one. What's more, it would not > apply to any other child process, and we need a solution that scales to > all the children or we're going nowhere in terms of saving power. > > In the case of the children that are connected to shared memory, such as > bgwriter, a long delay until child exit means a long delay until a new > postmaster can start at all --- which means you're effectively creating > a denial of service, with the length directly proportional to how > aggressively you're trying to avoid "unnecessary" wakeups. Perhaps I'm missing the point here, but I don't think that I have to make an argument for why it might be acceptable to have two archivers running at once, or two of any other auxiliary process. Let's assume that it's completely unacceptable. It may still be worth while applying this patch essentially as-is. It's also clearly completely unacceptable to have orphaned regular backends running at the same time as another, freshly started sets of backends with their own shared buffers that aren't in contact with the orphans, but have the same data directory. That's still possible today though. This is the main reason that we caution people against kill -9'ing the postmaster - if they do so, but then delete postmaster.pid before starting a new postmaster, that causes data corruption. This happens under the same circumstances that any conceivable problem (or at least any problem that I can immediately think of) with auxiliary processes co-existing as children of different postmasters (or Ex-Postmasters). I don't think that we've lost anything by allowing two completely unacceptable things to happen under those circumstances rather than just one. The precedent for having completely unacceptable things happen, like data loss, under those circumstances exists already. You could argue that that is a bad state of affairs that we should fix, and I'd be inclined to agree, but it seems like a separate issue. > So that's not a tradeoff I want to be making. I'd rather have a > solution in which children somehow get notified of postmaster death > without having to wake up just to poll for it. Then, once we fix the > other issues, there are no timeouts needed at all, which is obviously > the ideal situation for power consumption as well as response time. > >> The only salient thread I found concerning the problem of making >> children know when the postmaster died is this one: >> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php > > You didn't look terribly hard then. Here are two recent threads: > http://archives.postgresql.org/pgsql-hackers/2011-01/msg01011.php > http://archives.postgresql.org/pgsql-hackers/2011-02/msg02142.php > > The pipe solution mentioned in the first one would work on all Unixen, > and we could possibly optimize things a bit on Linux using the second > method. (There was also a bit of speculation about relying on SEM_UNDO, > but I don't think we followed that idea far.) I don't know however what > we'd need on Windows. I've taken a look at Florian Pflug's work in the first thread. The most promising lead I have on a method for monitoring if the Postmaster has died on windows is PsSetCreateProcessNotifyRoutine(), which necessitates registering a kernel mode driver and dynamically loading it. That sounds very kludgey indeed. Here is a sample program that demonstrates that sort of usage: http://www.codeproject.com/KB/threads/procmon.aspx Alternatively, we could do something with PSAPI. It apparently doesn't allow you to define hooks on any kind for when a process ends. We could, I suppose, have a heartbeat process that monitors running backends on windows using much the same "nap and check" pattern, that wakes up child processes to finish their little bit of remaining work and exit() on finding the Postmaster dead. That has the same "fundamental race condition" that Tom described in the first of the above threads though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > Perhaps I'm missing the point here, but I don't think that I have to > make an argument for why it might be acceptable to have two archivers > running at once, or two of any other auxiliary process. Let's assume > that it's completely unacceptable. It may still be worth while > applying this patch essentially as-is. > It's also clearly completely unacceptable to have orphaned regular > backends running at the same time as another, freshly started sets of > backends with their own shared buffers that aren't in contact with the > orphans, but have the same data directory. That's still possible today > though. This is the main reason that we caution people against kill > -9'ing the postmaster - if they do so, but then delete postmaster.pid > before starting a new postmaster, that causes data corruption. Indeed, which is why we have the postmaster.pid interlock against doing that. What you describe is a DBA with a death wish who's going out of his way to defeat the safety interlock. We can't do much about that level of idiocy. However, it's quite irrelevant to the current discussion. The aspect of this that *is* relevant is that if you haven't deliberately defeated the interlock (and thereby put your data at risk), you won't be able to start a new postmaster until all the old shmem-attached children are gone. And that's why having a child with a very long reaction time for parent death represents a denial of service. regards, tom lane
On 7 May 2011 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The aspect of this that *is* relevant is that if you haven't > deliberately defeated the interlock (and thereby put your data at risk), > you won't be able to start a new postmaster until all the old > shmem-attached children are gone. And that's why having a child with a > very long reaction time for parent death represents a denial of service. Alright. I don't suppose it would be acceptable to have the startup process signal any auxiliary process that it might find with init as a parent through ps, and within the handler for that signal in each auxiliary (I suppose it's a SIGUSR2), take appropriate action, typically just waking up through a SetLatch() call once we independently verify that we are in fact orphaned? If we find orphans, we could perform a "nap and check" loop within the startup process (probably tighter than 1 second per iteration), until the shmem-attached children that are liable to block us from starting a new postmaster exit(). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
I've taken a look into it, and I'm not optimistic about the likelihood of the way I've suggested that we can register a callback on process termination on windows being acceptable. It seems to be a kludge too far. It does work on Vista, just not very well. There is a considerable delay on closing the above console application that uses this technique, for example, and there seems to be an unpredictable delay in the callback occurring. A simpler solution on Windows might be to make the timeout on auxiliary processes much smaller, but have it increase on each subsequent timeout (starting from scratch if we wakeup for any reason other than timeout) until eventually it maxes out at something like the current value for PGARCH_AUTOWAKE_INTERVAL. If backends are sleeping for increasing periods of time, the chance of the postmaster crashing goes down, so denial of service is much less of a concern. An alternative might be to just not do this on Windows. Certainly, idle wakeups are likely to be less important on that platform, which is not a very popular choice for virtual machines deployed on cloudy infrastructure, the use case that will benefit from these enhancements the most, by some margin. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 09.05.2011 12:20, Peter Geoghegan wrote: > I've taken a look into it, and I'm not optimistic about the likelihood > of the way I've suggested that we can register a callback on process > termination on windows being acceptable. It seems to be a kludge too > far. It does work on Vista, just not very well. There is a > considerable delay on closing the above console application that uses > this technique, for example, and there seems to be an unpredictable > delay in the callback occurring. Can't we use the pipe trick on Windows? The API is different, but we use pipes on Windows for other things already. When a process is launched, open a pipe between postmaster and the child process. In the child, spawn a thread that just calls ReadFile() on the pipe, which blocks. If postmaster dies, the ReadFile() call will return with an error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 9 May 2011 11:19, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Can't we use the pipe trick on Windows? The API is different, but we use > pipes on Windows for other things already. When a process is launched, open > a pipe between postmaster and the child process. In the child, spawn a > thread that just calls ReadFile() on the pipe, which blocks. If postmaster > dies, the ReadFile() call will return with an error. Alright. I'm currently working on a proof-of-concept implementation of that. In the meantime, any thoughts on how this should meld with the existing latch implementation? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, May 9, 2011 at 8:27 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 9 May 2011 11:19, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > >> Can't we use the pipe trick on Windows? The API is different, but we use >> pipes on Windows for other things already. When a process is launched, open >> a pipe between postmaster and the child process. In the child, spawn a >> thread that just calls ReadFile() on the pipe, which blocks. If postmaster >> dies, the ReadFile() call will return with an error. > > Alright. I'm currently working on a proof-of-concept implementation of > that. In the meantime, any thoughts on how this should meld with the > existing latch implementation? How about making WaitLatch monitor the file descriptor for the pipe by using select()? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 10 May 2011 02:58, Fujii Masao <masao.fujii@gmail.com> wrote: >> Alright. I'm currently working on a proof-of-concept implementation of >> that. In the meantime, any thoughts on how this should meld with the >> existing latch implementation? > > How about making WaitLatch monitor the file descriptor for the pipe > by using select()? Alright, so it's reasonable to assume that all clients of the latch code are happy to be invariably woken up on Postmaster death? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 10.05.2011 11:22, Peter Geoghegan wrote: > On 10 May 2011 02:58, Fujii Masao<masao.fujii@gmail.com> wrote: >>> Alright. I'm currently working on a proof-of-concept implementation of >>> that. In the meantime, any thoughts on how this should meld with the >>> existing latch implementation? >> >> How about making WaitLatch monitor the file descriptor for the pipe >> by using select()? > > Alright, so it's reasonable to assume that all clients of the latch > code are happy to be invariably woken up on Postmaster death? That doesn't sound like a safe assumption. All the helper processes should die quickly on postmaster death, but I'm not sure if that holds for all inter-process communication. I think the caller needs to specify if he wants that or not. Once you add that to the WaitLatchOrSocket function, it's quite clear that the API is getting out of hand. There's five different events that can wake it up: * latch is set * a socket becomes readable * a socket becomes writeable * timeout * postmaster dies I think we need to refactor the function into something like: #define WL_LATCH_SET 1 #define WL_SOCKET_READABLE 2 #define WL_SOCKET_WRITEABLE 4 #define WL_TIMEOUT 8 #define WL_POSTMASTER_DEATH 16 int WaitLatch(Latch latch, int events, long timeout) Where 'event's is a bitmask of events that should cause a wakeup, and return value is a bitmask identifying which event(s) caused the call to return. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 10 May 2011 09:45, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I think we need to refactor the function into something like: > > #define WL_LATCH_SET 1 > #define WL_SOCKET_READABLE 2 > #define WL_SOCKET_WRITEABLE 4 > #define WL_TIMEOUT 8 > #define WL_POSTMASTER_DEATH 16 While I agree with the need to not box ourselves into a corner on the latch interface by making sweeping assumptions, isn't the fact that a socket became readable or writable strictly an implementation detail? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attached is win32 implementation of the "named pipe trick". It consists of a Visual Studio 2008 solution that contains two projects, named_pipe_trick (which represents the postmaster) and auxiliary_backend (which represents each auxiliary process). I split the solution into two projects/programs because Windows lacks fork() to make it all happen with a single program. Thoughts? Once I have some buy-in, I'd like to write a patch for the latch code that incorporates monitoring the postmaster using the named pipe trick (for both unix_latch.c and win32_latch.c), plus Heikki's suggestions. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 10 May 2011 09:45, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > >> I think we need to refactor the function into something like: >> >> #define WL_LATCH_SET 1 >> #define WL_SOCKET_READABLE 2 >> #define WL_SOCKET_WRITEABLE 4 >> #define WL_TIMEOUT 8 >> #define WL_POSTMASTER_DEATH 16 > > While I agree with the need to not box ourselves into a corner on the > latch interface by making sweeping assumptions, isn't the fact that a > socket became readable or writable strictly an implementation detail? The thing about the socket being readable/writeable is needed for walsender. It needs to notice when its connection to walreceiver is writeable (so it can send more WAL) or readable (so it can receive a reply message). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 10, 2011 at 12:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> On 10 May 2011 09:45, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >> >>> I think we need to refactor the function into something like: >>> >>> #define WL_LATCH_SET 1 >>> #define WL_SOCKET_READABLE 2 >>> #define WL_SOCKET_WRITEABLE 4 >>> #define WL_TIMEOUT 8 >>> #define WL_POSTMASTER_DEATH 16 >> >> While I agree with the need to not box ourselves into a corner on the >> latch interface by making sweeping assumptions, isn't the fact that a >> socket became readable or writable strictly an implementation detail? > > The thing about the socket being readable/writeable is needed for > walsender. It needs to notice when its connection to walreceiver is > writeable (so it can send more WAL) or readable (so it can receive a > reply message). I've got a feeling that things will go easier if we have a separate connection for the feedback channel. Yes, two connections, one in either direction. That would make everything simple, nice one way connections. It would also mean we could stream at higher data rates. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > I've got a feeling that things will go easier if we have a separate > connection for the feedback channel. > Yes, two connections, one in either direction. > That would make everything simple, nice one way connections. It would > also mean we could stream at higher data rates. The above sounds like complete nonsense. TCP connections are already full-duplex. regards, tom lane
On 10.05.2011 14:39, Peter Geoghegan wrote: > Attached is win32 implementation of the "named pipe trick". > > It consists of a Visual Studio 2008 solution that contains two > projects, named_pipe_trick (which represents the postmaster) and > auxiliary_backend (which represents each auxiliary process). I split > the solution into two projects/programs because Windows lacks fork() > to make it all happen with a single program. > > Thoughts? Once I have some buy-in, I'd like to write a patch for the > latch code that incorporates monitoring the postmaster using the named > pipe trick (for both unix_latch.c and win32_latch.c), plus Heikki's > suggestions. It should be an anonymous pipe that's inherited by the child process by rather than a named pipe. Otherwise seems fine to me, as far as this proof of concept program goes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 10 May 2011 17:43, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > It should be an anonymous pipe that's inherited by the child process by > rather than a named pipe. Otherwise seems fine to me, as far as this proof > of concept program goes. Alright, thanks. I'll use an anonymous pipe in the patch itself. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 9 May 2011 11:19, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > In the child, spawn a thread How exactly should I go about this? The one place in the code that I knew to use multiple threads, pgbench, falls back on "emulation with fork()" on some platforms. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Wed, May 11, 2011 at 10:52, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 9 May 2011 11:19, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > >> In the child, spawn a thread > > How exactly should I go about this? The one place in the code that I > knew to use multiple threads, pgbench, falls back on "emulation with > fork()" on some platforms. If you're doing this Win32 specific, take a look at src/backend/port/win32/signal.c for an example. If you're not doing this win32-specific, I doubt we really want threads to be involved... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 11 May 2011 09:54, Magnus Hagander <magnus@hagander.net> wrote: > If you're doing this Win32 specific, take a look at > src/backend/port/win32/signal.c for an example. > > If you're not doing this win32-specific, I doubt we really want > threads to be involved... Well, that seems to be the traditional wisdom. It seems sensible to me that each process should look out for postmaster death itself though. Tom described potential race conditions in looking at ps output...do we really want to double the number of auxiliary processes in a single release of Postgres? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 11.05.2011 13:34, Peter Geoghegan wrote: > On 11 May 2011 09:54, Magnus Hagander<magnus@hagander.net> wrote: > >> If you're doing this Win32 specific, take a look at >> src/backend/port/win32/signal.c for an example. >> >> If you're not doing this win32-specific, I doubt we really want >> threads to be involved... > > Well, that seems to be the traditional wisdom. It seems sensible to me > that each process should look out for postmaster death itself though. > Tom described potential race conditions in looking at ps output...do > we really want to double the number of auxiliary processes in a single > release of Postgres? Uh, no you don't want any new processes on Unix. You want each process to check for postmaster death every once in a while, like they do today. The pipe-trick is to make sure the processes wake up promptly to notice the death when the postmaster dies. You just need to add the postmaster-pipe to the select() calls we already do. I'm not sure if on Windows you can similarly just add to the postmaster-pipe to the WaitForMultipleObjects() calls we already do. Then you won't need new threads on Windows either. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com