Thread: bug in SignalSomeChildren
Hi, I found a bug which always prevents SignalSomeChildren with BACKEND_TYPE_WALSND from sending a signal to walsender. Though currently SignalSomeChildren with BACKEND_TYPE_WALSND has not been called anywhere, it's not hard to believe that will be called in the future. So we should apply the following change. ---------------------- diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 6f934ee..2d86fb6 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3162,7 +3162,8 @@ SignalSomeChildren(int signal, int target) if (bp->dead_end) continue; - if (!(target & BACKEND_TYPE_NORMAL) && !bp->is_autovacuum) + if (!(target & BACKEND_TYPE_NORMAL) && !bp->is_autovacuum && + !IsPostmasterChildWalSender(bp->child_slot)) continue; if (!(target& BACKEND_TYPE_AUTOVAC) && bp->is_autovacuum) continue; ---------------------- Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Dec 14, 2010 at 10:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I found a bug which always prevents SignalSomeChildren with > BACKEND_TYPE_WALSND from sending a signal to walsender. > > Though currently SignalSomeChildren with BACKEND_TYPE_WALSND > has not been called anywhere, it's not hard to believe that will > be called in the future. So we should apply the following change. I think the attached might be a little tidier. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Excerpts from Robert Haas's message of vie dic 17 10:08:04 -0300 2010: > On Tue, Dec 14, 2010 at 10:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > I found a bug which always prevents SignalSomeChildren with > > BACKEND_TYPE_WALSND from sending a signal to walsender. > > > > Though currently SignalSomeChildren with BACKEND_TYPE_WALSND > > has not been called anywhere, it's not hard to believe that will > > be called in the future. So we should apply the following change. > > I think the attached might be a little tidier. Thoughts? Yeah, that looks more readable to me. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas <robertmhaas@gmail.com> writes: > I think the attached might be a little tidier. Thoughts? I'm not really thrilled at the idea of calling IsPostmasterChildWalSender for every child whether or not it will have any impact on the decision. That involves touching shared memory which can be rather expensive (see previous discussions about shared cache lines and so forth). The existing coding is pretty ugly, I agree. regards, tom lane
On Fri, Dec 17, 2010 at 10:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think the attached might be a little tidier. Thoughts? > > I'm not really thrilled at the idea of calling > IsPostmasterChildWalSender for every child whether or not it will have > any impact on the decision. That involves touching shared memory which > can be rather expensive (see previous discussions about shared cache > lines and so forth). The existing code already does that, unless I'm missing something. We could improve on my proposed patch a bit by doing the is_autovacuum test first and the walsender test second. I'm not sure how to improve on it beyond that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Tom Lane's message of vie dic 17 12:27:30 -0300 2010: > Robert Haas <robertmhaas@gmail.com> writes: > > I think the attached might be a little tidier. Thoughts? > > I'm not really thrilled at the idea of calling > IsPostmasterChildWalSender for every child whether or not it will have > any impact on the decision. That involves touching shared memory which > can be rather expensive (see previous discussions about shared cache > lines and so forth). Is it possible to save the "is walsender" flag in the Backend struct? That would make it possible to solve the problem very easily. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Dec 17, 2010 at 10:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not really thrilled at the idea of calling >> IsPostmasterChildWalSender for every child whether or not it will have >> any impact on the decision. �That involves touching shared memory which >> can be rather expensive (see previous discussions about shared cache >> lines and so forth). > The existing code already does that, unless I'm missing something. No, it only makes that call when it's actually relevant to the decision and it's exhausted all other possible tests. In particular, the call is never made for target = ALL which I think is the common case. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Is it possible to save the "is walsender" flag in the Backend struct? > That would make it possible to solve the problem very easily. Yeah, I was wondering about that too, but the problem is that the postmaster doesn't know that at the time it forks the child. The flag in shared memory will get set later, but it's hard to tell how much later. Of course, that observation also means that anyplace the postmaster tries to distinguish walsenders from other children is fundamentally broken anyhow: a walsender that hasn't set the flag yet will get treated like a regular backend. I think what we ought to be looking to do is get rid of the distinction, so that the postmaster treats walsenders the same as other children. regards, tom lane
On Fri, Dec 17, 2010 at 11:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think what we ought to be looking to do is get rid of the distinction, > so that the postmaster treats walsenders the same as other children. It's not apparent to me that the existing places where postmaster.c makes that distinction are in fact correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Fujii Masao's message of mié dic 15 00:54:39 -0300 2010: > Hi, > > I found a bug which always prevents SignalSomeChildren with > BACKEND_TYPE_WALSND from sending a signal to walsender. > > Though currently SignalSomeChildren with BACKEND_TYPE_WALSND > has not been called anywhere, it's not hard to believe that will > be called in the future. So we should apply the following change. BTW doesn't CountChildren have the same problem? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Tom Lane's message of vie dic 17 13:18:35 -0300 2010: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Is it possible to save the "is walsender" flag in the Backend struct? > > That would make it possible to solve the problem very easily. > > Yeah, I was wondering about that too, but the problem is that the > postmaster doesn't know that at the time it forks the child. The > flag in shared memory will get set later, but it's hard to tell > how much later. Yeah, I arrived at the same conclusion. I was wondering if we could cache the result of the shared mem lookup the first time it was done, but as you say there would still be a race condition. > I think what we ought to be looking to do is get rid of the distinction, > so that the postmaster treats walsenders the same as other children. I think the problem with this is that walsenders are treated in a very special way during shutdown -- they need to stay up until after bgwriter is gone. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of vie dic 17 13:18:35 -0300 2010: >> I think what we ought to be looking to do is get rid of the distinction, >> so that the postmaster treats walsenders the same as other children. > I think the problem with this is that walsenders are treated in a very > special way during shutdown -- they need to stay up until after bgwriter > is gone. Why do they need to survive the bgwriter? And more to the point, why does that logic need to be implemented on the postmaster side? Since only the child process really knows reliably whether it's a walsender, it'd be far safer if the behavioral difference could be handled on the child side. I haven't looked at the details, but I'm wondering if we couldn't make this go by having walsender children react differently to the same signals the postmaster sends other children. regards, tom lane
On Fri, Dec 17, 2010 at 11:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Excerpts from Tom Lane's message of vie dic 17 13:18:35 -0300 2010: >>> I think what we ought to be looking to do is get rid of the distinction, >>> so that the postmaster treats walsenders the same as other children. > >> I think the problem with this is that walsenders are treated in a very >> special way during shutdown -- they need to stay up until after bgwriter >> is gone. > > Why do they need to survive the bgwriter? And more to the point, why > does that logic need to be implemented on the postmaster side? Since > only the child process really knows reliably whether it's a walsender, > it'd be far safer if the behavioral difference could be handled on the > child side. I haven't looked at the details, but I'm wondering if we > couldn't make this go by having walsender children react differently > to the same signals the postmaster sends other children. I'm not too sure we're shutting down the WAL senders right now. I think they may just be exiting on their own when the postmaster goes away. /* * Emergency bailout if postmaster has died. This is to avoid the * necessity for manual cleanup of all postmaster children. */ if(!PostmasterIsAlive(true)) exit(1); I'm having a bit of trouble confirming this on MacOS X, though. Attaching gdb to either the startup process or a WAL sender causes PostmasterIsAlive to return false, resulting in a near-immediate exit.Seems pretty stupid for attaching gdb to change thereturn value of getppid() but it seems like that must be what's happening. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17.12.2010 19:08, Robert Haas wrote: > On Fri, Dec 17, 2010 at 11:43 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera<alvherre@commandprompt.com> writes: >>> Excerpts from Tom Lane's message of vie dic 17 13:18:35 -0300 2010: >>>> I think what we ought to be looking to do is get rid of the distinction, >>>> so that the postmaster treats walsenders the same as other children. >> >>> I think the problem with this is that walsenders are treated in a very >>> special way during shutdown -- they need to stay up until after bgwriter >>> is gone. >> >> Why do they need to survive the bgwriter? Because we want the shutdown checkpoint record that bgwriter writes just before it dies to be replicated to the standbys. >> And more to the point, why >> does that logic need to be implemented on the postmaster side? Since >> only the child process really knows reliably whether it's a walsender, >> it'd be far safer if the behavioral difference could be handled on the >> child side. I haven't looked at the details, but I'm wondering if we >> couldn't make this go by having walsender children react differently >> to the same signals the postmaster sends other children. > > I'm not too sure we're shutting down the WAL senders right now. Sure we do. postmaster sends walsenders SIGUSR2 when bgwriter dies. When a walsender receives SIGUSR2, it tries to send all pending WAL, and terminates after that. The postmaster goes into PM_SHUTDOWN_2 state, waiting for all the walsenders and the archiver process to die. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, Dec 18, 2010 at 1:24 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Fujii Masao's message of mié dic 15 00:54:39 -0300 2010: >> Hi, >> >> I found a bug which always prevents SignalSomeChildren with >> BACKEND_TYPE_WALSND from sending a signal to walsender. >> >> Though currently SignalSomeChildren with BACKEND_TYPE_WALSND >> has not been called anywhere, it's not hard to believe that will >> be called in the future. So we should apply the following change. > > BTW doesn't CountChildren have the same problem? Oh, yes. We should fix that, too. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Excerpts from Robert Haas's message of vie dic 17 14:08:04 -0300 2010: > I'm having a bit of trouble confirming this on MacOS X, though. > Attaching gdb to either the startup process or a WAL sender causes > PostmasterIsAlive to return false, resulting in a near-immediate exit. > Seems pretty stupid for attaching gdb to change the return value of > getppid() but it seems like that must be what's happening. Yeah, this problem has been known for some time and causes quite some pain. We have an open problem report on autovacuum failing to run after some time, and we haven't been able to get a backtrace or strace because of this issue -- trying to attach to it causes a full system restart (PostmasterIsAlive returns false, autovac launcher dies, this death causes postmaster to shut everything down and restart). This is on FreeBSD. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > ... We have an open problem report on autovacuum failing to run after > some time, and we haven't been able to get a backtrace or strace because > of this issue ... I wonder whether that's the already-fixed problem with autovacuum cost limit going to zero in long-lived workers. regards, tom lane
On Mon, Dec 20, 2010 at 8:20 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of vie dic 17 14:08:04 -0300 2010: > >> I'm having a bit of trouble confirming this on MacOS X, though. >> Attaching gdb to either the startup process or a WAL sender causes >> PostmasterIsAlive to return false, resulting in a near-immediate exit. >> Seems pretty stupid for attaching gdb to change the return value of >> getppid() but it seems like that must be what's happening. > > Yeah, this problem has been known for some time and causes quite some > pain. We have an open problem report on autovacuum failing to run after > some time, and we haven't been able to get a backtrace or strace because > of this issue -- trying to attach to it causes a full system restart > (PostmasterIsAlive returns false, autovac launcher dies, this death > causes postmaster to shut everything down and restart). This is on > FreeBSD. Can we add a develop option to force use of the kill(0) method? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >>> Attaching gdb to either the startup process or a WAL sender causes >>> PostmasterIsAlive to return false, resulting in a near-immediate exit. >>> �Seems pretty stupid for attaching gdb to change the return value of >>> getppid() but it seems like that must be what's happening. > Can we add a develop option to force use of the kill(0) method? How will that avoid needing to have an honest answer from getppid()? Without that you can't know what to issue kill() against. Seems like the correct path here is to complain to gdb and/or BSD upstreams about this misbehavior. regards, tom lane
On Mon, Dec 20, 2010 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>>> Attaching gdb to either the startup process or a WAL sender causes >>>> PostmasterIsAlive to return false, resulting in a near-immediate exit. >>>> Seems pretty stupid for attaching gdb to change the return value of >>>> getppid() but it seems like that must be what's happening. > >> Can we add a develop option to force use of the kill(0) method? > > How will that avoid needing to have an honest answer from getppid()? > Without that you can't know what to issue kill() against. The answer to this question will probably be entirely self-evident if you stare at PostmasterIsAlive() for, well, it took me about 10 seconds. So probably less than five for you. > Seems like the correct path here is to complain to gdb and/or BSD > upstreams about this misbehavior. That might be a good thing to do too, but even if they agree to fix it and do in fact fix it right away, it'll take months or years before all of the major PostgreSQL contributors can benefit from those fixes, as opposed to, say, this afternoon. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Dec 20, 2010 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Can we add a develop option to force use of the kill(0) method? >> How will that avoid needing to have an honest answer from getppid()? >> Without that you can't know what to issue kill() against. > The answer to this question will probably be entirely self-evident if > you stare at PostmasterIsAlive() for, well, it took me about 10 > seconds. So probably less than five for you. Hmm, I was thinking that PostmasterPid was set originally from getppid, but it looks like we rely on inheriting it through fork instead. So maybe this will work. It's still slower and less reliable than the getppid case for normal use, though. regards, tom lane
On Mon, Dec 20, 2010 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Dec 20, 2010 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> Can we add a develop option to force use of the kill(0) method? > >>> How will that avoid needing to have an honest answer from getppid()? >>> Without that you can't know what to issue kill() against. > >> The answer to this question will probably be entirely self-evident if >> you stare at PostmasterIsAlive() for, well, it took me about 10 >> seconds. So probably less than five for you. > > Hmm, I was thinking that PostmasterPid was set originally from getppid, > but it looks like we rely on inheriting it through fork instead. > So maybe this will work. It's still slower and less reliable than the > getppid case for normal use, though. Well, that's why it'd be a developer option, rather than the default behavior. If we can agree on a name I'll work up a patch. Bikeshedding in 3... 2... 1... check_postmaster_via_kill avoid_backend_getppid ...? Another option that might be workable (but I have reservations, and haven't tested it either) is to check whether the return value of getppid() is equal to 1. If it's neither 1 nor PostmasterPid then try kill(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Another option that might be workable (but I have reservations, and > haven't tested it either) is to check whether the return value of > getppid() is equal to 1. If it's neither 1 nor PostmasterPid then try > kill(). I like that better actually ... one less thing for developers to get wrong. regards, tom lane
On Mon, Dec 20, 2010 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Another option that might be workable (but I have reservations, and >> haven't tested it either) is to check whether the return value of >> getppid() is equal to 1. If it's neither 1 nor PostmasterPid then try >> kill(). > > I like that better actually ... one less thing for developers to get wrong. The attached patch appears to work correctly on MacOS X. I did check, BTW: getppid() in the attached process returns gdb's pid. Poor! For my own purposes, I would be just as happy to apply this only to master. But I wonder if anyone wants to argue for back-patching, to help debug existing installations? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Dec 20, 2010 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I like that better actually ... one less thing for developers to get wrong. > The attached patch appears to work correctly on MacOS X. I did check, > BTW: getppid() in the attached process returns gdb's pid. Poor! Looks good to me. > For my own purposes, I would be just as happy to apply this only to > master. But I wonder if anyone wants to argue for back-patching, to > help debug existing installations? Given the lack of non-developer complaints, I see no need to backpatch. regards, tom lane
On Mon, Dec 20, 2010 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Dec 20, 2010 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I like that better actually ... one less thing for developers to get wrong. > >> The attached patch appears to work correctly on MacOS X. I did check, >> BTW: getppid() in the attached process returns gdb's pid. Poor! > > Looks good to me. > >> For my own purposes, I would be just as happy to apply this only to >> master. But I wonder if anyone wants to argue for back-patching, to >> help debug existing installations? > > Given the lack of non-developer complaints, I see no need to backpatch. Well, non-developers don't tend to attach gdb very often. Alvaro mentioned a problem installation upthread, thus the question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote: > The attached patch appears to work correctly on MacOS X. I did check, > BTW: getppid() in the attached process returns gdb's pid. Poor! This appears to be a BSDism at least. On Linux and BSD derivatives the man pages specifically mention the reparenting (needed for catching signals) but on Linux getppid() is specifically documented to return the correct value anyway. Frankly it's a wart, for example strace/truss/whatever could (since it's tracing anyway) just fudge the correct value in the getppid() call so the userspace process doesn't notice. This has been a bug since forever though, so I wouldn't hold my breath. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > Frankly it's a wart, for example strace/truss/whatever could (since > it's tracing anyway) just fudge the correct value in the getppid() call > so the userspace process doesn't notice. This has been a bug since > forever though, so I wouldn't hold my breath. Yeah, I have a vague recollection of noticing this in some other context ~10 years ago. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 20, 2010 at 3:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 20, 2010 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Mon, Dec 20, 2010 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> I like that better actually ... one less thing for developers to get wrong. >> >>> The attached patch appears to work correctly on MacOS X. I did check, >>> BTW: getppid() in the attached process returns gdb's pid. Poor! >> >> Looks good to me. >> >>> For my own purposes, I would be just as happy to apply this only to >>> master. But I wonder if anyone wants to argue for back-patching, to >>> help debug existing installations? >> >> Given the lack of non-developer complaints, I see no need to backpatch. > > Well, non-developers don't tend to attach gdb very often. Alvaro > mentioned a problem installation upthread, thus the question. Hearing no cries of "please-oh-please-backpatch-this", I've committed it just to master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote: >> The attached patch appears to work correctly on MacOS X. I did check, >> BTW: getppid() in the attached process returns gdb's pid. Poor! > > This appears to be a BSDism at least. On Linux and BSD derivatives the > man pages specifically mention the reparenting (needed for catching > signals) but on Linux getppid() is specifically documented to return > the correct value anyway. I'm just a random lurker here, and happened to catch the last bit of this thread. Could one of you that understand this issue straighten something out for me? Every now and again we've been known to attach gdb to a production Postgres backend to troubleshoot problems. Ya know, just trying to get an idea of what Postgres is actually doing via a backtrace. This is always on Linux, BTW. Does this thread mean that the above no longer works with v9? Or is this only on non-Linux systems, or did the patch Robert Haas commit "fix" fix? We're still using 8.1 (slowly moving to 8.4) in production, but have plans of picking up 9.x later in '11. Just wondering if we need to actually be a bit more careful in the future? Thanks! eric
On Tue, Dec 21, 2010 at 1:45 PM, Eric Ridge <eebbrr@gmail.com> wrote: > On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout > <kleptog@svana.org> wrote: >> On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote: >>> The attached patch appears to work correctly on MacOS X. I did check, >>> BTW: getppid() in the attached process returns gdb's pid. Poor! >> >> This appears to be a BSDism at least. On Linux and BSD derivatives the >> man pages specifically mention the reparenting (needed for catching >> signals) but on Linux getppid() is specifically documented to return >> the correct value anyway. > > I'm just a random lurker here, and happened to catch the last bit of > this thread. Could one of you that understand this issue straighten > something out for me? > > Every now and again we've been known to attach gdb to a production > Postgres backend to troubleshoot problems. Ya know, just trying to > get an idea of what Postgres is actually doing via a backtrace. This > is always on Linux, BTW. > > Does this thread mean that the above no longer works with v9? Or is > this only on non-Linux systems, or did the patch Robert Haas commit > "fix" fix? We're still using 8.1 (slowly moving to 8.4) in > production, but have plans of picking up 9.x later in '11. Just > wondering if we need to actually be a bit more careful in the future? The point of the patch was to improve cases where attaching gdb *didn't* work well. Any cases where it was already working for you aren't going to be made worse by this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 21, 2010 at 2:33 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The point of the patch was to improve cases where attaching gdb > *didn't* work well. Any cases where it was already working for you > aren't going to be made worse by this. Okay, great. Thanks for the clarification. eric
Excerpts from Robert Haas's message of mar dic 21 08:40:49 -0300 2010: > > Well, non-developers don't tend to attach gdb very often. Alvaro > > mentioned a problem installation upthread, thus the question. > > Hearing no cries of "please-oh-please-backpatch-this", I've committed > it just to master. Please-oh-please backpatch this ... at least to 8.4. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Sat, Dec 18, 2010 at 1:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 17, 2010 at 10:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I think the attached might be a little tidier. Thoughts? >> >> I'm not really thrilled at the idea of calling >> IsPostmasterChildWalSender for every child whether or not it will have >> any impact on the decision. That involves touching shared memory which >> can be rather expensive (see previous discussions about shared cache >> lines and so forth). > > The existing code already does that, unless I'm missing something. We > could improve on my proposed patch a bit by doing the is_autovacuum > test first and the walsender test second. I'm not sure how to improve > on it beyond that. How about doing target != ALL test at the head for the most common case (target == ALL)? I added that test into your patch and changed it so that the is_autovacuum test is done first. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Fujii Masao <masao.fujii@gmail.com> writes: > How about doing target != ALL test at the head for the most common case > (target == ALL)? That's an idea, but the test you propose implements it incorrectly. regards, tom lane
On Wed, Dec 22, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> How about doing target != ALL test at the head for the most common case >> (target == ALL)? > > That's an idea, but the test you propose implements it incorrectly. Thanks! I revised the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
--On 22. Dezember 2010 15:51:09 +0900 Fujii Masao <masao.fujii@gmail.com> wrote: >>> How about doing target != ALL test at the head for the most common case >>> (target == ALL)? >> >> That's an idea, but the test you propose implements it incorrectly. > > Thanks! I revised the patch. I had a look at this for the current CF and the patch looks reasonable to me. Some testing shows that the changes are working as intended (at least, the wal sender actually receives now signals from SignalSomeChildren() as far as the DEBUG4 output shows). Maybe we should put in a small comment, why we special case BACKEND_TYPE_ALL (following Tom's comment about expensive shared memory access and IsPostmasterChildWalSender()). I marked it as "Ready for Committer". Question for my understanding: While reading the small patch, i realized that there's no BACKEND_TYPE_WALRECV or similar. If i understand correctly there's no need to handle it this way, since there's only one wal receiver process per instance? -- Thanks Bernd
On Thu, Jan 20, 2011 at 9:53 PM, Bernd Helmle <mailings@oopsware.de> wrote: > I had a look at this for the current CF and the patch looks reasonable to > me. Some testing shows that the changes are working as intended (at least, > the wal sender actually receives now signals from SignalSomeChildren() as > far as the DEBUG4 output shows). Thanks for the review and test! > Maybe we should put in a small comment, why > we special case BACKEND_TYPE_ALL (following Tom's comment about expensive > shared memory access and IsPostmasterChildWalSender()). I added the following small comment. Patch attached. + /* + * Since target == BACKEND_TYPE_ALL is the most common case, + * we test it first and avoid touching shared memory for + * every child. + */ > Question for my understanding: > > While reading the small patch, i realized that there's no > BACKEND_TYPE_WALRECV or similar. If i understand correctly there's no need > to handle it this way, since there's only one wal receiver process per > instance? Yes. But also that's because walreceiver is an auxiliary process (like bgwriter and walwriter ..etc) but not a backend. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center