Thread: fork/exec patch: pre-CreateProcess finalization
For application to HEAD, pending community review. Final rearrangement of main postgresql child process (ie. BackendFork/SSDataBase/pgstat) startup, to allow fork/exec calls to closely mimic (the soon to be provided) Win32 CreateProcess equivalent calls. Whereas the existing code forks and allow the child process to marshal up its "argument list", the EXEC_BACKEND case has been changed to allow the parent process to do this marshalling, with the child process exec'ing immediately after the fork. This has required some reworking of the existing code base, particularly to BackendFork (which now actually does the fork()). As such, I've been anticipating that this will be the most controversial of the fork/exec patches, so critique away :-) This is the last patch required before the actual Win32 calls can be put in place. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Attachment
Claudio Natoli <claudio.natoli@memetrics.com> writes: > This has required some reworking of the existing code base, particularly to > BackendFork (which now actually does the fork()). As such, I've been > anticipating that this will be the most controversial of the fork/exec > patches, so critique away :-) You haven't explained why that's necessary. Given the fact that this patch seems to hugely complicate the postmaster logic --- not so much either path individually, but the messy #ifdef interleaving of two radically different programs --- I am inclined to reject it on style grounds alone. We should either find a way to make the fork/exec path more like the existing code, or bite the bullet and change them both. Doing it the way you have here will create an unmaintainable mess. I'm not prepared to work on a postmaster in which a step as basic as fork() happens in two different places depending on an #ifdef. If you want to change them both, let's first see the reason why it's necessary. regards, tom lane
I said: > We should either find a way to make the fork/exec path more like the > existing code, or bite the bullet and change them both. It occurs to me that you probably need some concrete suggestions about what to do. Here's one. There are really three separate sections in this code: the postmaster (which no longer does much except fork off a subprocess on receipt of a new connection), the "sub-postmaster" which does client authentication, and the backend. Historical note: formerly the sub-postmaster actions were done in the parent postmaster process, with a sort of poor man's threading logic to allow the postmaster to interleave multiple client authentication operations. We had to change that because various libraries like SSL, PAM, and Kerberos weren't amenable to pseudo-threading. It is worth maintaining a clean distinction between sub-postmaster and backend because when we launch a standalone backend, we don't want the sub-postmaster part of the code. The command line syntax accepted by PostgresMain() is largely designed for the convenience of the standalone backend case, with a couple of simple hacks for the under-postmaster case. Now it seems to me that a lot of the mess in your latest patch comes from trying to set up the backend command string before you've forked, and therefore before you've done any of the sub-postmaster actions. That's not gonna work. What I'd suggest is adding a new entry point from main.c, say SubPostmasterMain(), with its own command line syntax. Perhaps postmaster -fork ... additional args ... (where -fork is what cues main.c where to dispatch to, and the additional args are just those that need to be passed from the postmaster to the sub-postmaster, without any of the additional information that will be learned by the sub-postmaster during client authentication.) In the Windows case the postmaster would fork/exec to this routine, which would re-load as much of the postmaster context as was needed and then call BackendFork (which we really oughta rename to something else). BackendFork would execute the client auth process and then simply call PostgresMain with a correctly constructed argument list. In the Unix case we don't bother compiling SubPostmasterMain, we just fork and go straight to BackendFork as the code does now. In essence, this design makes SubPostmasterMain responsible for emulating Unix-style fork(), ie, fork without loss of static variable contents. It would need to reload as many variables as we needed. I think this is a cleaner approach since it creates a layer of code which is simply there in one case and not there in the other, rather than making arbitrary changes in the layers above and below depending on #ifdefs. regards, tom lane
Thanks for your comments Tom. Tom Lane writes: > Claudio Natoli <claudio.natoli@memetrics.com> writes: > > This has required some reworking of the existing code base, particularly to > > BackendFork (which now actually does the fork()). As such, I've been > > anticipating that this will be the most controversial of > the fork/exec patches, so critique away :-) > > You haven't explained why that's necessary. Given the fact that this > patch seems to hugely complicate the postmaster logic --- not so much > either path individually, but the messy #ifdef interleaving of two > radically different programs --- I am inclined to reject it on style > grounds alone. I have to agree, as I wasn't particularly happy with the BackendFork section of this patch. You didn't really seem to comment on the pgstat changes, which were much cleaner, so perhaps (like me) you were happier with these changes. So, could you give me an indication as to whether or not making changes to the BackendFork logic as done for the pgstat logic (specifically, replacing fork() call with a new, argument marshalling routine to do fork/exec) would be more acceptable? The reason I avoided this in the BackendFork case was, predominantly, code duplication. Creating a new "BackendForkExec" would, I think, result in a lot of similarity between the two routines. [yes, BackendFork is poorly named, although I'd still think of moving the fork() call into BackendFork anyway, thereby justifying its name. But, much more importantly, for symmetry with the code format that will inevitably arise in the fork/exec case; see below] > We should either find a way to make the fork/exec path more like the > existing code, or bite the bullet and change them both. Doing it the > way you have here will create an unmaintainable mess. I'm not prepared > to work on a postmaster in which a step as basic as fork() happens in > two different places depending on an #ifdef. ["unmaintainable mess": yeah, I had the exact same thought when I'd finished. "No one's ever going to be able to rework this properly".] I'm afraid that, short of removing all SubPostmaster actions from BackendFork, the need to do fork in two different places (at least, physically, if not logically different) will remain. After fork/exec'ing, the child process can't recover the context needed to create the argument list which it'll need to build up the PostgresMain arg list. So, in summary, I see two solutions: a) do something similar to BackendFork as done for pgstat, specifically: - move the fork call into BackendFork - create a fork/exec equivalent to BackendFork - #ifdef the call to the appropriate function in BackendStartup PRO: child is forked in the same logical place CON: possible duplication in code between BackendFork + BackendForkExec b) delay the fork() call to the end of BackendFork in both fork/exec and normal cases, turning it into solely an argument formatter for PostgresMain - move BackendInit, port->cmdline_options parsing + MemoryContext calls into PostgresMain PRO: forking in same (physical) location in code; no duplication CON: additional complexity at the start of PostgresMain to avoid these calls in stand-alone backend case Which is the lesser of two evils? FWIW, I prefer the latter, as it makes the normal + fork/exec cases effectively identical, for a little added complexity to PostgresMain. There's also the alternative you outlined, but I think it is less workable than the above options: > Now it seems to me that a lot of the mess in your latest patch comes > from trying to set up the backend command string before you've forked, > and therefore before you've done any of the sub-postmaster actions. > That's not gonna work. I agree that this is where the mess comes from, but I think this (setting up backend command string before fork) is *exactly* what needs to occur in the fork/exec case (leading into the Win32 CreateProcess calls). If we want this code to work in the Win32 case, we basically can't do any processing between the fork + exec calls, and I think recovering all the context to do the command string marshalling post fork/exec is untenable. > What I'd suggest is adding a new entry point from main.c, say > SubPostmasterMain(), with its own command line syntax. Perhaps > postmaster -fork ... additional args ... > (where -fork is what cues main.c where to dispatch to, and the > additional args are just those that need to be passed from the > postmaster to the sub-postmaster, without any of the additional > information that will be learned by the sub-postmaster during > client authentication.) > > In the Windows case the postmaster would fork/exec to this routine, > which would re-load as much of the postmaster context as was needed > and then call BackendFork (which we really oughta rename to something > else). BackendFork would execute the client auth process and then > simply call PostgresMain with a correctly constructed argument list. Here's the critical difference in our thinking: ISTM that we'll have to go to a lot of work to get the fork/exec case to re-load the context required to format up the argument list to PostgresMain. Certainly, it is a lot more work than allowing the postmaster itself (not the forked child) to create the argument list, albeit moving auth to PostgresMain. Ok. So, in conclusion we are in agreement on at least one thing: the current patch sucks. But, I suspect we may have an impasse on a sole crucial point: setting up the backend command string before forking. I'm all for it, and you seem pretty much against it. So, I guess I could try to bring you around with another, cleaner patch using one of the alternatives listed above. But, before doing that, I guess I should give you a chance to: a) try to convince me otherwise b) indicate which of the two alternatives you (even grudgingly :-) prefer (or another alternative in light of the above) Again, thanks for your comments, and looking forward to your reply (so that, hopefully, I can get cracking on a new patch). Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: > Ok. So, in conclusion we are in agreement on at least one thing: the current > patch sucks. Good, I'm glad we see eye to eye on that ;-) > Here's the critical difference in our thinking: > ISTM that we'll have to go to a lot of work to get the fork/exec > case to re-load the context required to format up the argument list to > PostgresMain. Certainly, it is a lot more work than allowing the postmaster > itself (not the forked child) to create the argument list, albeit moving > auth to PostgresMain. I don't follow your thinking here. The information that has to be reloaded *must* be passed across the fork/exec boundary in either case. For example, there is no alternative to telling the backend process where PGDATA is: it's got to be passed down some way or another. The Unix implementation assumes it can just inherit the value of a static variable, while the Windows version will have to do something else. I take no position here on whether we pass it as a command line item, or through a temporary file, or whatever: it has to be passed down. The reason your patch is messy is that the PostgresMain command line string involves both information passed down from the postmaster and information acquired from the client's connection-request packet. We could consider redesigning that command line behavior, but we don't really want to since it would impact the standalone-backend case. So I'm proposing substituting two levels of command-line processing within the exec'd backend process. The first level is responsible for transferring information inherited from the postmaster (which we are going to have to do *anyway*, somewhere, and this provides a nice localized place to do it) and then the second level is fully compatible with the standalone-backend case and the Unix-fork case. > I'm afraid that, short of removing all SubPostmaster actions from > BackendFork, the need to do fork in two different places (at least, > physically, if not logically different) will remain. I am not by any means saying that the fork() call has to stay exactly where it is in the existing code. We clearly want to refactor things so that fork() is close to the invocation of FooMain() (or equivalently exec() in the Windows case). I think it's a historical artifact that these steps got separated in the existing code base. > After fork/exec'ing, > the child process can't recover the context needed to create the argument > list which it'll need to build up the PostgresMain arg list. If that's literally true, then we are screwed because things cannot work. Pardon me for doubting this statement. All that information *must* be available in the child, once it has analyzed the information it must be able to collect from the postmaster and performed the client authentication handshake. > So, in summary, I see two solutions: I still think there's a third answer: create an intermediate layer that's responsible for recovering the information that has to be inherited from the parent process. I don't say this requires zero rearrangement of existing code, I just say it's a more maintainable design than either of the alternatives you suggest. In particular, this alternative is quite unworkable: > b) delay the fork() call to the end of BackendFork in both fork/exec and > normal cases, turning it into solely an argument formatter for PostgresMain We can *not* postpone the fork() until after client authentication. That loses all of the work that's been done since circa 7.1 to eliminate cases where the postmaster gets blocked waiting for a single client to respond. SSL, PAM, and I think Kerberos all create denial-of-service risks if we do that. I believe most of what you are presently struggling with revolves around the fact that the fork() got pushed up to happen before client authentication, while the interface to PostgresMain didn't change. I do not want to alter either of those decisions, however. What we need is to provide an alternative exec-able interface that encapsulates the processing that now occurs between these steps. regards, tom lane
> > Here's the critical difference in our thinking: > > > ISTM that we'll have to go to a lot of work to get the fork/exec > > case to re-load the context required to format up the argument list to > > PostgresMain. Certainly, it is a lot more work than allowing the postmaster > > itself (not the forked child) to create the argument list, albeit moving > > auth to PostgresMain. > > I don't follow your thinking here. The information that has to be > reloaded *must* be passed across the fork/exec boundary in > either case. > For example, there is no alternative to telling the backend process > where PGDATA is: it's got to be passed down some way or another. Yes. But not all of it. Sure, PGDATA is a trivial example of something that's gotta get across one way or another. I take no argument with that. How about things like, for instance, canAcceptConnections(). To make this call successfully in a fork/exec'd child would take a bit of work. Of course, there is a work-around... pass the value into BackendFork (or, in the fork/exec case, on the command line). But, ISTM that, do this a few times, and you might as well have just had the postmaster format up the argument list. > The reason your patch is messy is that the PostgresMain command line > string involves both information passed down from the postmaster and > information acquired from the client's connection-request packet. Hmm.. that's not the way I see it. > > After fork/exec'ing, the child process can't recover the context needed to > > create the argument list which it'll need to build up the PostgresMain arg list. > > If that's literally true, then we are screwed because things cannot > work. Pardon me for doubting this statement. All that information > *must* be available in the child, once it has analyzed the information > it must be able to collect from the postmaster and performed > the client authentication handshake. Ok. That, in the context of where it was written, instead of "can't" should have read "can no longer ... [in the absence of passing a great deal of additional context]". Mea culpa. Anyway, moving along... > In particular, this alternative is quite unworkable: > > > b) delay the fork() call to the end of BackendFork in both fork/exec and > > normal cases, turning it into solely an argument formatter for PostgresMain > > We can *not* postpone the fork() until after client authentication. > That loses all of the work that's been done since circa 7.1 > to eliminate cases where the postmaster gets blocked waiting for a > single client to respond. SSL, PAM, and I think Kerberos all create > denial-of-service risks if we do that. Ah. OK! Now I see where we are talking at cross-purposes. Forking after client auth was NOT what I was trying to suggest at all. Let's have a look at BackendFork. Here's what it needs to set up the arg list for PostgresMain, or otherwise uses: * PreAuthDelay * canAcceptConnections() * ExtraOptions * debugbuf * DataDir (in EXEC_BACKEND case) BackendInit (ie. client authentication) port->cmdline_options port->proto port->database_name port->user_name What I was suggesting with b) was to format up the command line for the items prefixed by * in the postmaster, do the fork (or fork/exec), and then run the authentication in, say PostgresMain. (Note: this is essentially what the fork/exec case currently does). Now, what I think you are suggesting (correct me if I'm wrong), is that we should pass along whatever we need to in order to be able to setup the fork/exec'd process to run BackendFork more or less unchanged. I prefer the former option, as there is less duplication. Should anything be changed/added to the items required for BackendFork'ing, no changes would be required to the fork/exec code. This will not necessarily be true if we need to pass a bunch of (additional) context, as in the latter case. ISTM this would let us make BackendFork *very* simple, and pretty much identical in both the normal and fork/exec cases, for the cost of pushing the authenication step across to PostgresMain (non stand-alone case), which is what the fork/exec case does now anyway! What do you think? Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
[patch edited + resubmitted for review; not for committing] Hi Tom, figuring that, on balance, you are in fact going to prefer to take a potential future hit in duplicated work (in passing context in the fork/exec case) over moving the client auth code to PostgresMain, I've just gone ahead and made a patch that implements your BackendFork ideas... Please let me know: * if the changes to the BackendFork / SubPostmasterMain logic are more or less what you envisaged, and if you are content with them [Note: we can also roll BackendInit back into BackendFork, making BackendFork (now BackendRun?) pretty much exactly as it was before the fork/exec changes began] * if you are content with the above, whether or not you think I ought to do the same for the SSDataBase logic. I'm hoping for a negative, as the #ifdef logic is not as convoluted as that originally presented in BackendFork (ie. to me, it looks like overkill in this case), but anticipating otherwise :-) Also: * are you ok with the pgstat changes (I'm guessing yes, as you haven't mentioned them, and since these changes are pretty similar to what you suggested for BackendFork) Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Attachment
Claudio Natoli <claudio.natoli@memetrics.com> writes: >> I don't follow your thinking here. The information that has to be >> reloaded *must* be passed across the fork/exec boundary in >> either case. > Yes. But not all of it. I'll throw that same argument back at you: some of the information needed for PostgresMain's command line is available to the postmaster. But not all of it. Therefore we have to change things around. I'm proposing solving this by introducing a new layer of code that's essentially separate from the non-fork-exec case. Your solution seems much messier, and it doesn't have any redeeming social value (that I can see) to justify the mess. > How about things like, for instance, canAcceptConnections(). To make this > call successfully in a fork/exec'd child would take a bit of work. And your point is what? As long as you grant that the fork must occur before we begin talking to the client, this information will have to be passed down somehow (offhand, a command-line argument for the result of canAcceptConnections() seems fairly reasonable). Obscuring the division between "sub-postmaster" code and PostgresMain isn't going to save you from that. > What I was suggesting with b) was to format up the command line for the > items prefixed by * in the postmaster, > do the fork (or fork/exec), and then run the authentication in, say > PostgresMain. (Note: this is essentially what the fork/exec case currently > does). Yeah, I noticed. If I'd been paying more attention, the already-applied patch wouldn't have gotten in either, because it already has created what I consider an unacceptable amount of coupling between PostgresMain and the sub-postmaster code. If it's still like that when the dust settles then I will go in and change it myself. PostgresMain has no business doing *any* of the stuff that is currently #ifdef EXEC_BACKEND in postgres.c. It should certainly not be running authentication, and I see no reason for it to be messing with restoring state that should have been inherited from the postmaster. That just creates fragility and order-of-operations dependency. As an example, it doesn't seem like a good idea to me that reloading postmaster GUC variables happens after scanning of PostgresMain command line options, because the option processing both uses and sets GUC values. It is not hard to point out three or four bugs associated with that alone, including security violations (since we don't know at this point whether the user is a superuser and thus don't know what he's allowed to set). In general, PostgresMain assumes it is started with the inherited environment already set up, and I don't see a good argument for changing that. > Now, what I think you are suggesting (correct me if I'm wrong), is that we > should pass along whatever we need to in order to be able to setup the > fork/exec'd process to run BackendFork more or less unchanged. I don't mind making localized changes like passing in the result of canAcceptConnections, but by and large, yes: I want to run BackendFork pretty much as-is. I see no strong reason to do differently, and I am convinced that we will spend the next six months ferreting out startup bugs if we make wholesale revisions in the order in which things are done. regards, tom lane
> > What I was suggesting with b) was to format up the command line for the > > items prefixed by * in the postmaster, > > do the fork (or fork/exec), and then run the authentication in, say > > PostgresMain. (Note: this is essentially what the fork/exec case currently > > does). > > Yeah, I noticed. If I'd been paying more attention, the already-applied > patch wouldn't have gotten in either, because it already has created > what I consider an unacceptable amount of coupling between PostgresMain > and the sub-postmaster code. If it's still like that when the dust > settles then I will go in and change it myself. PostgresMain has no > business doing *any* of the stuff that is currently #ifdef > EXEC_BACKEND in postgres.c. > [snip] > > > Now, what I think you are suggesting (correct me if I'm wrong), is that we > > should pass along whatever we need to in order to be able to setup the > > fork/exec'd process to run BackendFork more or less unchanged. > > I don't mind making localized changes like passing in the result of > canAcceptConnections, but by and large, yes: I want to run BackendFork > pretty much as-is. Check out the last patch I sent. It implements your ideas for BackendFork'ing in the fork/exec case, and pretty much lets us revert the previous changes to postgres.c and BackendFork (which I agree is a good thing). Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
[minor corrections (some duplication in arg passing); still under discussion/review] Cheers, Claudio > -----Original Message----- > From: Claudio Natoli [mailto:claudio.natoli@memetrics.com] > Sent: Saturday, 27 December 2003 9:27 PM > To: 'Tom Lane' > Cc: 'pgsql-patches@postgresql.org' > Subject: Re: [PATCHES] fork/exec patch: pre-CreateProcess > finalization > > > > [patch edited + resubmitted for review; not for committing] > > Hi Tom, > > figuring that, on balance, you are in fact going to prefer to take a > potential future hit in duplicated work (in passing context > in the fork/exec > case) over moving the client auth code to PostgresMain, I've > just gone ahead > and made a patch that implements your BackendFork ideas... > > Please let me know: > > * if the changes to the BackendFork / SubPostmasterMain logic > are more or > less what you envisaged, and if you are content with them > [Note: we can also > roll BackendInit back into BackendFork, making BackendFork > (now BackendRun?) > pretty much exactly as it was before the fork/exec changes began] > > * if you are content with the above, whether or not you think > I ought to do > the same for the SSDataBase logic. I'm hoping for a negative, > as the #ifdef > logic is not as convoluted as that originally presented in > BackendFork (ie. > to me, it looks like overkill in this case), but anticipating > otherwise :-) > > Also: > * are you ok with the pgstat changes (I'm guessing yes, as you haven't > mentioned them, and since these changes are pretty similar to what you > suggested for BackendFork) > > Cheers, > Claudio > > > > > > > --- > Certain disclaimers and policies apply to all email sent from > Memetrics. > For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.me > metrics.com/em > ailpolicy.html</a> > > > --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Attachment
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Claudio Natoli wrote: > > [patch edited + resubmitted for review; not for committing] > > Hi Tom, > > figuring that, on balance, you are in fact going to prefer to take a > potential future hit in duplicated work (in passing context in the fork/exec > case) over moving the client auth code to PostgresMain, I've just gone ahead > and made a patch that implements your BackendFork ideas... > > Please let me know: > > * if the changes to the BackendFork / SubPostmasterMain logic are more or > less what you envisaged, and if you are content with them [Note: we can also > roll BackendInit back into BackendFork, making BackendFork (now BackendRun?) > pretty much exactly as it was before the fork/exec changes began] > > * if you are content with the above, whether or not you think I ought to do > the same for the SSDataBase logic. I'm hoping for a negative, as the #ifdef > logic is not as convoluted as that originally presented in BackendFork (ie. > to me, it looks like overkill in this case), but anticipating otherwise :-) > > Also: > * are you ok with the pgstat changes (I'm guessing yes, as you haven't > mentioned them, and since these changes are pretty similar to what you > suggested for BackendFork) > > Cheers, > Claudio > > > > > > > --- > Certain disclaimers and policies apply to all email sent from Memetrics. > For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em > ailpolicy.html</a> > > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Claudio, where are we on this patch? I see an even newer version in the archives: http://archives.postgresql.org/pgsql-patches/2003-12/msg00361.php However, I have never seen that version, and Google doesn't show it either: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&group=comp.databases.postgresql.patches Anyway, Tom has looked at your newest patch and it looks good to him. These are all marked not for application so we will just wait for you to finalize it. --------------------------------------------------------------------------- Claudio Natoli wrote: > > [patch edited + resubmitted for review; not for committing] > > Hi Tom, > > figuring that, on balance, you are in fact going to prefer to take a > potential future hit in duplicated work (in passing context in the fork/exec > case) over moving the client auth code to PostgresMain, I've just gone ahead > and made a patch that implements your BackendFork ideas... > > Please let me know: > > * if the changes to the BackendFork / SubPostmasterMain logic are more or > less what you envisaged, and if you are content with them [Note: we can also > roll BackendInit back into BackendFork, making BackendFork (now BackendRun?) > pretty much exactly as it was before the fork/exec changes began] > > * if you are content with the above, whether or not you think I ought to do > the same for the SSDataBase logic. I'm hoping for a negative, as the #ifdef > logic is not as convoluted as that originally presented in BackendFork (ie. > to me, it looks like overkill in this case), but anticipating otherwise :-) > > Also: > * are you ok with the pgstat changes (I'm guessing yes, as you haven't > mentioned them, and since these changes are pretty similar to what you > suggested for BackendFork) > > Cheers, > Claudio > > > > > > > --- > Certain disclaimers and policies apply to all email sent from Memetrics. > For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em > ailpolicy.html</a> > > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Hi Bruce, > Claudio, where are we on this patch? I see an even newer version in the archives: > > > http://archives.postgresql.org/pgsql-patches/2003-12/msg00361.php [Weird you and Google groups "missed" it!] > Anyway, Tom has looked at your newest patch and it looks good to him. I'm happy with the patch from the link above being committed if Tom has no more comments. Was only waiting for a final nod from him. Once it is in the source tree, give me a couple days and I'll fire off a patch with the actual CreateProcess calls... and then we are off into Win32 signal land [shudder]. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli wrote: > > Hi Bruce, > > > Claudio, where are we on this patch? I see an even newer version in the > archives: > > > > > > http://archives.postgresql.org/pgsql-patches/2003-12/msg00361.php > > [Weird you and Google groups "missed" it!] No, seems only _I_ missed it. The google display for groups: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&group=comp.databases.postgresql.patches Doesn't show individual messages but lists threads, _and_ it shows the date of the last message in the thread, rather than the first. That's how I missed it. How I missed it my mailbox, I have no idea. Sorry. > > Anyway, Tom has looked at your newest patch and it looks good to him. > > I'm happy with the patch from the link above being committed if Tom has no > more comments. Was only waiting for a final nod from him. Thanks. Applied. > Once it is in the source tree, give me a couple days and I'll fire off a > patch with the actual CreateProcess calls... and then we are off into Win32 > signal land [shudder]. Great. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Claudio Natoli wrote: Something after 2003/11/20 enhanced the query cancel handling. Namely, CVS tip now responds to a query cancel with a postmaster restart canceling all queries. Could the fork/exec stuff be responsible for this? Jan > Hi Bruce, > >> Claudio, where are we on this patch? I see an even newer version in the > archives: >> >> >> http://archives.postgresql.org/pgsql-patches/2003-12/msg00361.php > > [Weird you and Google groups "missed" it!] > > >> Anyway, Tom has looked at your newest patch and it looks good to him. > > I'm happy with the patch from the link above being committed if Tom has no > more comments. Was only waiting for a final nod from him. > > Once it is in the source tree, give me a couple days and I'll fire off a > patch with the actual CreateProcess calls... and then we are off into Win32 > signal land [shudder]. > > Cheers, > Claudio > > --- > Certain disclaimers and policies apply to all email sent from Memetrics. > For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em > ailpolicy.html</a> > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > Something after 2003/11/20 enhanced the query cancel handling. Namely, > CVS tip now responds to a query cancel with a postmaster restart > canceling all queries. Could the fork/exec stuff be responsible for this? Whoever changed this: status = ProcessStartupPacket(port, false); if (status != STATUS_OK) return 0; /* cancel request processed, or error */ to this: status = ProcessStartupPacket(port, false); if (status != STATUS_OK) { ereport(LOG, (errmsg("connection startup failed"))); proc_exit(status); } is responsible. May we have an explanation of the thought process, if any? regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > Something after 2003/11/20 enhanced the query cancel handling. Namely, > > CVS tip now responds to a query cancel with a postmaster restart > > canceling all queries. Could the fork/exec stuff be responsible for this? > > Whoever changed this: > > status = ProcessStartupPacket(port, false); > > if (status != STATUS_OK) > return 0; /* cancel request processed, or error */ > > to this: > > status = ProcessStartupPacket(port, false); > > if (status != STATUS_OK) > { > ereport(LOG, > (errmsg("connection startup failed"))); > proc_exit(status); > } > > is responsible. May we have an explanation of the thought process, > if any? My guess is that this used to proc_exit the postgres backend, but now proc_exits the postmaster, but that is only a guess. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > My guess is that this used to proc_exit the postgres backend, but now > proc_exits the postmaster, but that is only a guess. No. This is post-fork (and had better remain so). The change causes the sub-postmaster that has just finished handling a cancel request to exit with nonzero status, which the postmaster then interprets (correctly) as a child process crash. BTW, how are we going to do cancels in Windows-land? The sub-postmaster isn't gonna have access to the postmaster's list of child PIDs and cancel keys ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > My guess is that this used to proc_exit the postgres backend, but now > > proc_exits the postmaster, but that is only a guess. > > No. This is post-fork (and had better remain so). The change causes > the sub-postmaster that has just finished handling a cancel request > to exit with nonzero status, which the postmaster then interprets > (correctly) as a child process crash. > > BTW, how are we going to do cancels in Windows-land? The sub-postmaster > isn't gonna have access to the postmaster's list of child PIDs and > cancel keys ... I think the postmaster will have access to the cancel key and pid. It should work similar to Unix. However, I do have a win32 TODO item on "test cancel key", so we will not forget about it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > My guess is that this used to proc_exit the postgres backend, but now > > proc_exits the postmaster, but that is only a guess. > > No. This is post-fork (and had better remain so). The change causes > the sub-postmaster that has just finished handling a cancel request > to exit with nonzero status, which the postmaster then interprets > (correctly) as a child process crash. > > BTW, how are we going to do cancels in Windows-land? The sub-postmaster > isn't gonna have access to the postmaster's list of child PIDs and > cancel keys ... When you say sub-postmaster, you mean we fork, then process the cancel request? Seems we might need special handling in there, yea. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> Whoever changed this: My fault. Thanks for the catch Jan. > proc_exit(status); Should be proc_exit(0). My apologies, Claudio PS. I take it there is no regression test for this. Would this be a difficult one to knock up? --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Bruce Momjian <pgman@candle.pha.pa.us> writes: > When you say sub-postmaster, you mean we fork, then process the cancel > request? Seems we might need special handling in there, yea. We fork before we even read the request. Otherwise an attacker can trivially lock up the postmaster by sending a partial startup packet. I've just noticed another serious bit of breakage in CVS tip (though this might be fixed by Claudio's pending patch that reverts a lot of the code rearrangements). There is a reason why the 7.4 code does on_exit_reset *immediately* after fork(), and it's not acceptable to rearrange the code so that that happens at some random later time. Otherwise, any problem in between leads to the child process executing the postmaster's on_exit routines, with such pleasant side effects as destroying the shmem segment. For similar reasons, you don't get to postpone setting IsUnderPostmaster true --- elog pays attention to the value of that flag, and will do the wrong thing if called in a child process that doesn't yet have it set. regards, tom lane
> BTW, how are we going to do cancels in Windows-land? The sub-postmaster > isn't gonna have access to the postmaster's list of child PIDs and > cancel keys ... Good question (the Win32/EXEC_BACKEND case is #def'd out to issue an altogether unhelpful abort(), so I know it is there). The only things I've thought of so far are: a) sticking the PID/cancel key list in shared mem [yeech] b) rearranging the entire cancel handling to occur in the postmaster [double yeech] Any better ideas? Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Tom Lane wrote: > I've just noticed another serious bit of breakage in CVS tip (though > this might be fixed by Claudio's pending patch that reverts a lot of > the code rearrangements). It's already been applied. Your push in the SubPostmaster direction was really useful; a lot cleaner, and fixed things that I wasn't even familiar enough with the code to know were broken. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli wrote: > > > BTW, how are we going to do cancels in Windows-land? The sub-postmaster > > isn't gonna have access to the postmaster's list of child PIDs and > > cancel keys ... > > Good question (the Win32/EXEC_BACKEND case is #def'd out to issue an > altogether unhelpful abort(), so I know it is there). > > The only things I've thought of so far are: > a) sticking the PID/cancel key list in shared mem [yeech] > b) rearranging the entire cancel handling to occur in the postmaster [double > yeech] As I remember, the only per-backend value to be passed is the cancel key, and seeing that this is going to be a problem for postmaster too, I think we need to move in the direction of a separate fork/exec-only shared memory segment that holds the pids and cancel keys for all the backends. I will update the Win32 TODO list to mention this issue in more detail. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> Claudio Natoli wrote: >> The only things I've thought of so far are: >> a) sticking the PID/cancel key list in shared mem [yeech] >> b) rearranging the entire cancel handling to occur in the postmaster [double >> yeech] (a) seems like the lesser of the available evils (unless someone has a bright idea for a plan C). Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think we need to move in the direction of a separate fork/exec-only > shared memory segment that holds the pids and cancel keys for all the > backends. That doesn't seem worth the trouble. I'd be inclined to just stick the cancel keys in the PGPROC structures (I believe the PIDs are already in there). The original motivation for keeping them only in postmaster local memory was to keep backend A's cancel key away from the prying eyes of backend B, but is there really any security added? Anyone who can inspect PGPROC hardly needs to know the cancel key --- he can just issue a SIGINT (or worse) directly to the target backend. regards, tom lane
Tom Lane wrote: > > Claudio Natoli wrote: > >> The only things I've thought of so far are: > >> a) sticking the PID/cancel key list in shared mem [yeech] > >> b) rearranging the entire cancel handling to occur in the postmaster [double > >> yeech] > > (a) seems like the lesser of the available evils (unless someone has a > bright idea for a plan C). > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think we need to move in the direction of a separate fork/exec-only > > shared memory segment that holds the pids and cancel keys for all the > > backends. > > That doesn't seem worth the trouble. I'd be inclined to just stick the > cancel keys in the PGPROC structures (I believe the PIDs are already in > there). The original motivation for keeping them only in postmaster > local memory was to keep backend A's cancel key away from the prying > eyes of backend B, but is there really any security added? Anyone who > can inspect PGPROC hardly needs to know the cancel key --- he can just > issue a SIGINT (or worse) directly to the target backend. Agreed. I was going for a separate one just to be paranoid. This will only be done for exec(), so I don't see a problem for normal Unix use anyway. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > Tom Lane wrote: >> > Claudio Natoli wrote: >> >> The only things I've thought of so far are: >> >> a) sticking the PID/cancel key list in shared mem [yeech] >> >> b) rearranging the entire cancel handling to occur in the postmaster [double >> >> yeech] >> >> (a) seems like the lesser of the available evils (unless someone has a >> bright idea for a plan C). >> >> Bruce Momjian <pgman@candle.pha.pa.us> writes: >> > I think we need to move in the direction of a separate fork/exec-only >> > shared memory segment that holds the pids and cancel keys for all the >> > backends. >> >> That doesn't seem worth the trouble. I'd be inclined to just stick the >> cancel keys in the PGPROC structures (I believe the PIDs are already in >> there). The original motivation for keeping them only in postmaster >> local memory was to keep backend A's cancel key away from the prying >> eyes of backend B, but is there really any security added? Anyone who >> can inspect PGPROC hardly needs to know the cancel key --- he can just >> issue a SIGINT (or worse) directly to the target backend. > > Agreed. I was going for a separate one just to be paranoid. This will > only be done for exec(), so I don't see a problem for normal Unix use > anyway. > It doesn't hurt to keep the locations and code as much in sync as possible. I think Tom's idea to move the information into the PGPROC entry is the winner and does not need any OS specific handling. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > It doesn't hurt to keep the locations and code as much in sync as > possible. I think Tom's idea to move the information into the PGPROC > entry is the winner and does not need any OS specific handling. Actually, on further reflection a separate array to store PIDs and cancel keys is probably a better idea. If we put this stuff in PGPROC then the postmaster will need to be able to obtain the ProcStructLock (or whatever it's called this week) in order to examine/modify that data structure. That gets us into the usual concerns about backend bugs locking up the postmaster, etc. But if it's a separate array then we can just have the rule that no one but the postmaster gets to write in it. I still think it's unnecessary to make a separate shmem segment for it, though. regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: >> It doesn't hurt to keep the locations and code as much in sync as >> possible. I think Tom's idea to move the information into the PGPROC >> entry is the winner and does not need any OS specific handling. > > Actually, on further reflection a separate array to store PIDs and > cancel keys is probably a better idea. If we put this stuff in PGPROC > then the postmaster will need to be able to obtain the ProcStructLock > (or whatever it's called this week) in order to examine/modify that > data structure. That gets us into the usual concerns about backend bugs > locking up the postmaster, etc. But if it's a separate array then we > can just have the rule that no one but the postmaster gets to write in it. > > I still think it's unnecessary to make a separate shmem segment for it, > though. I'd like to avoid the additional shmem segment if possible. The postmaster can keep the stuff he needs in local memory. I did not mean to rip everything out of postmaster local memory, and that little bit of redundancy does not hurt. The pid's of processes aren't likely to change very often. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Tom Lane writes: > Actually, on further reflection a separate array to store PIDs and cancel keys is probably a better idea. [snip] > I still think it's unnecessary to make a separate shmem segment for it, though. Why is that? Don't we need the backends to have access to it to do a cancel request? I think I've missed something here... Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli wrote: > > Tom Lane writes: > > Actually, on further reflection a separate array to store PIDs and > cancel keys is probably a better idea. > [snip] > > I still think it's unnecessary to make a separate shmem segment for it, > though. > > Why is that? Don't we need the backends to have access to it to do a cancel > request? I think I've missed something here... I think they are saying put the cancel key inside the existing shared memory segment. I don't know when we actually attach to the main shared memory sigment in the child, but it would have to be sooner than when we need the cancel key. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> I think they are saying put the cancel key inside the existing shared > memory segment. Ok. Thanks. > I don't know when we actually attach to the main shared > memory sigment in the child, but it would have to be sooner than when we > need the cancel key. Yes. Currently, it happens too late. I'll put my hand up to have a go at this fixing this (and getting processCancelRequest to work under Win32/EXEC_BACKEND at the same time), if no one else particularly cares to. Just to be clear, this would involve turning the BackendList dlllist into an array in shared memory, right? If so, a couple of questions: - what is a suitably large size for this array (2 * MaxBackends, ala canAcceptConnections?) - the postmaster makes all calls referencing this list, with the exception of processCancelRequest, correct? So, as Tom was alluding to, no locking is required (or desired!), and we'll just need to be careful not to introduce a race condition into processCancelRequest. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Bruce Momjian wrote: > Claudio Natoli wrote: >> >> Tom Lane writes: >> > Actually, on further reflection a separate array to store PIDs and >> cancel keys is probably a better idea. >> [snip] >> > I still think it's unnecessary to make a separate shmem segment for it, >> though. >> >> Why is that? Don't we need the backends to have access to it to do a cancel >> request? I think I've missed something here... > > I think they are saying put the cancel key inside the existing shared > memory segment. I don't know when we actually attach to the main shared > memory sigment in the child, but it would have to be sooner than when we > need the cancel key. I said move it into the PGPROC structure. And keep the pid in both, the PGPROC structure and postmaster local memory. The backend attaches to the shared memory during AttachSharedMemoryAndSemaphores() ... where else? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Claudio Natoli wrote: > > > I think they are saying put the cancel key inside the existing shared > > memory segment. > > Ok. Thanks. > > > I don't know when we actually attach to the main shared > > memory sigment in the child, but it would have to be sooner than when we > > need the cancel key. > > Yes. Currently, it happens too late. > > I'll put my hand up to have a go at this fixing this (and getting > processCancelRequest to work under Win32/EXEC_BACKEND at the same time), if > no one else particularly cares to. > > Just to be clear, this would involve turning the BackendList dlllist into an > array in shared memory, right? If so, a couple of questions: > - what is a suitably large size for this array (2 * MaxBackends, ala > canAcceptConnections?) > - the postmaster makes all calls referencing this list, with the exception > of processCancelRequest, correct? So, as Tom was alluding to, no locking is > required (or desired!), and we'll just need to be careful not to introduce a > race condition into processCancelRequest. I assumed a much simpler solution. I thought we would just have: struct { pid_t pid; int cancel_key; } PidCancel[maxbackend]; in shared memory and we would just sequentially scan looking for a pid match? Is that wrong? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I wrote: > Just to be clear, this would involve turning the BackendList dlllist into an > array in shared memory, right? If so, a couple of questions: Bruce Momjian wrote: > I assumed a much simpler solution. I thought we would just have: > > struct { > pid_t pid; > int cancel_key; > } PidCancel[maxbackend]; > > in shared memory and we would just sequentially scan looking for a pid > match? Is that wrong? Isn't that basically "turning the BackendList dlllist into an array in shared memory"? And I don't think that an array length of maxbackend is enough. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli wrote: > > I wrote: > > Just to be clear, this would involve turning the BackendList dlllist into > an > > array in shared memory, right? If so, a couple of questions: > > Bruce Momjian wrote: > > I assumed a much simpler solution. I thought we would just have: > > > > struct { > > pid_t pid; > > int cancel_key; > > } PidCancel[maxbackend]; > > > > in shared memory and we would just sequentially scan looking for a pid > > match? Is that wrong? > > Isn't that basically "turning the BackendList dlllist into an array in > shared memory"? And I don't think that an array length of maxbackend is > enough. Uh, why would you need more than maxbackends? Can't it be indexed by slot number --- each backend has a slot? Maybe I am missing something. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Claudio Natoli <claudio.natoli@memetrics.com> writes: > Just to be clear, this would involve turning the BackendList dlllist into an > array in shared memory, right? If so, a couple of questions: Per Jan's comment, there is no need to mess with the existing datastructure. I'd think more of *copying* the dllist into some array in shared memory. This is code that would only need to exist in the Windows port. > - the postmaster makes all calls referencing this list, with the exception > of processCancelRequest, correct? The postmaster writes the array (and really would have no need to read it). Sub-postmasters would need to read the array, either to check an incoming cancel request or to get the current-session key value to pass back to the client. I don't think that PostgresMain or any subsidiary routine would ever need to touch it. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Uh, why would you need more than maxbackends? Can't it be indexed by > slot number --- each backend has a slot? Maybe I am missing something. The postmaster has no way to know what slot number each backend will get. For that matter, a sub-postmaster doesn't know yet either. I think the simplest way to make this work is to use an array that's 2*MaxBackend items long (corresponding to the max number of children the postmaster will fork). Establish the convention that unused entries are zero. Then: 1. On forking a child, the postmaster scans the array for a free (zero) slot, and stashes the cancel key and PID there (in that order). 2. On receiving a child-termination report, the postmaster scans the array for the corresponding entry, and zeroes it out (PID first). (Obviously these algorithms could be improved if they turn out to be bottlenecks, but for the first cut KISS is applicable.) 3. To find or check a cencel key, a sub-postmaster scans the array looking for the desired PID (either its own, or the one it got from an incoming cancel request message). There is a potential race condition if a sub-postmaster scans the array before the postmaster has been able to store its PID there. I think it is sufficient for the sub-postmaster to sleep a few milliseconds and try again if it can't find its own PID in the array. There is no race condition possible for the ProcessCancelRequest case --- the sub-postmaster that spawned an active backend must have found its entry before it could have sent the cancel key to the client, so any valid cancel request from a client must reference an already-existing entry in the array. regards, tom lane
Tom Lane: > Per Jan's comment, there is no need to mess with the existing > datastructure. I'd think more of *copying* the dllist into some array > in shared memory. This is code that would only need to exist in the > Windows port. (I thought Jan was referring to the PGPROC struct) This just seems a little odd to me. I mean, they are going to be basically identical (they'll even use the same struct!). Also, let's get back to why we want this: to handle processCancelRequest in the Win32 case. If this array is in Windows only, then we'll obviously need two implementations of the processCancelRequest logic. Or I'm missing something... Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Tom Lane wrote: > I think the simplest way to make this work is to use an array that's > 2*MaxBackend items long (corresponding to the max number of children the > postmaster will fork). Actually, now that I think about it, is even that big enough. There is a reason BackendList is a list. In pathological situations, the postmaster could be made to fork a much larger number than 2*MaxBackend simultaneous children, although only this many will be allowed to become backends. (I guess we could check the port->canAcceptConnections value, and not add the backend to the array when == CAC_TOOMANY). > There is no race condition possible for the ProcessCancelRequest case --- > the sub-postmaster that spawned an active backend must have found its > entry before it could have sent the cancel key to the client, so any > valid cancel request from a client must reference an already-existing > entry in the array. Make sense. Great. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli wrote: > > Tom Lane: > > Per Jan's comment, there is no need to mess with the existing > > datastructure. I'd think more of *copying* the dllist into some array > > in shared memory. This is code that would only need to exist in the > > Windows port. > > (I thought Jan was referring to the PGPROC struct) > > This just seems a little odd to me. I mean, they are going to be basically > identical (they'll even use the same struct!). Seems you should use an array of "struct bkend" or "Backend" as storage in shared memory. I think the problem with Dllist is that it is dynamically allocated and I don't see how that could be done cleanly in shared memory. I think that's why the array idea seems best. > Also, let's get back to why we want this: to handle processCancelRequest in > the Win32 case. If this array is in Windows only, then we'll obviously need > two implementations of the processCancelRequest logic. > > Or I'm missing something... Looking up they key would have to be different for fork and fork/exec. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian: > > Tom Lane: > > > Per Jan's comment, there is no need to mess with the existing > > > datastructure. I'd think more of *copying* the dllist into some array > > > in shared memory. This is code that would only need to exist in the > > > Windows port. > > > > (I thought Jan was referring to the PGPROC struct) > > > > This just seems a little odd to me. I mean, they are going to be basically > > identical (they'll even use the same struct!). > > Seems you should use an array of "struct bkend" or "Backend" as storage > in shared memory. I think the problem with Dllist is that it is > dynamically allocated and I don't see how that could be done cleanly in > shared memory. I think that's why the array idea seems best. I wasn't suggesting that we somehow put the Dllist in shared memory. I was advocating completely replacing the BackendList dlllist with this array, under both Win32 and *nix case. (Is what I've been saying sound different in this light?) AFAICS, dumping the BackendList dlllist, and just completely replacing it with "PidCancel[]" in shared memory would be a pretty small hit, and give us the same code under *nix + Win32. > Looking up they key would have to be different for fork and fork/exec. Only if we choose to have this new PidCancel array, in addition to the existing BackendList, which looks to me to be entirely redundant. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: > Tom Lane: >> Per Jan's comment, there is no need to mess with the existing >> datastructure. I'd think more of *copying* the dllist into some array >> in shared memory. This is code that would only need to exist in the >> Windows port. > Also, let's get back to why we want this: to handle processCancelRequest in > the Win32 case. If this array is in Windows only, then we'll obviously need > two implementations of the processCancelRequest logic. [ shrug... ] If that's the only redundant code we end up with in this port, we'll have done pretty durn well. What it comes down to is I don't want the postmaster to be keeping its own state in shared memory --- that is, the array must be write-only memory as far as the postmaster is concerned. If we eliminate the postmaster's private DLList of backends, then the postmaster becomes that much more vulnerable to corruption from a backend bug that leads to trashing shared memory. To take just one example: backend A goes nuts, zeroes the whole shmem segment, and then dumps core. How is the postmaster going to kill the other backends so that it can begin the recovery process? If there's no record of their PIDs left anywhere, it's got a problem. The postmaster *needs* its own copy of that DLList. You might object that backend bugs could clobber the array and thus interfere with cancel request processing, but that's not nearly as critical a function. regards, tom lane
Claudio Natoli <claudio.natoli@memetrics.com> writes: > Tom Lane wrote: >> I think the simplest way to make this work is to use an array that's >> 2*MaxBackend items long (corresponding to the max number of children the >> postmaster will fork). > Actually, now that I think about it, is even that big enough. There is a > reason BackendList is a list. In pathological situations, the postmaster > could be made to fork a much larger number than 2*MaxBackend simultaneous > children, although only this many will be allowed to become backends. > (I guess we could check the port->canAcceptConnections value, and not add > the backend to the array when == CAC_TOOMANY). Probably we could rearrange the logic to test for too-many-children before we even do the fork. It'd be a little uglier that way, but not out of the question. regards, tom lane
Claudio Natoli <claudio.natoli@memetrics.com> writes: > I wasn't suggesting that we somehow put the Dllist in shared memory. I was > advocating completely replacing the BackendList dlllist with this array, > under both Win32 and *nix case. Right, that's what I understood you to be proposing, and I dislike it on robustness grounds. One of the most important functions of the postmaster is to survive backend crashes --- as soon as you put any of its state where backends could clobber it, you compromise that function. regards, tom lane
Tom Lane wrote: > What it comes down to is I don't want the postmaster to be keeping its > own state in shared memory --- that is, the array must be write-only > memory as far as the postmaster is concerned. If we eliminate the Ok. Well, I can appreciate things from that point of view. > postmaster's private DLList of backends, then the postmaster becomes > that much more vulnerable to corruption from a backend bug that leads to > trashing shared memory. To take just one example: backend A goes nuts, > zeroes the whole shmem segment, and then dumps core. How is the > postmaster going to kill the other backends so that it can begin the > recovery process? If there's no record of their PIDs left anywhere, > it's got a problem. The postmaster *needs* its own copy of that DLList. > > You might object that backend bugs could clobber the array and thus > interfere with cancel request processing, but that's not nearly as > critical a function. Actually, if I was going to argue anything, I'd say that if a backend goes nuts and zeroes the whole shmem segment you've probably some bigger things to worry about (Aside: Would postgres actually recover from such an occurence? BTW, I'd be pretty impressed if it did, but not all that surprised ;-). But I'll happily concede the point, and prove it by knocking up a patch for it over the weekend (unless anyone else particularly wants to). Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: > Actually, if I was going to argue anything, I'd say that if a backend goes > nuts and zeroes the whole shmem segment you've probably some bigger things > to worry about (Aside: Would postgres actually recover from such an > occurence? BTW, I'd be pretty impressed if it did, but not all that > surprised ;-). It should, although there are limits (for instance, if someone is actively writing out a page of WAL at the same time the bogus backend comes by and zeroes that buffer, you might lose WAL entries for already-committed transactions, which would be unhappy-making). As a developer, though, I crash backends all the time, and I can say that this mechanism is indeed pretty robust. The postmaster never goes down (what, never? well, hardly ever) and it's *extremely* seldom that a crash results in on-disk corruption, because the postmaster generally manages to kill the other backends before any corruption in shared memory gets propagated to disk. regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > Something after 2003/11/20 enhanced the query cancel handling. Namely, > > CVS tip now responds to a query cancel with a postmaster restart > > canceling all queries. Could the fork/exec stuff be responsible for this? > > Whoever changed this: > > status = ProcessStartupPacket(port, false); > > if (status != STATUS_OK) > return 0; /* cancel request processed, or error */ > > to this: > > status = ProcessStartupPacket(port, false); > > if (status != STATUS_OK) > { > ereport(LOG, > (errmsg("connection startup failed"))); > proc_exit(status); > } > > is responsible. May we have an explanation of the thought process, > if any? Claudio specified the attached fix, which I have applied. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > Something after 2003/11/20 enhanced the query cancel handling. Namely, > > CVS tip now responds to a query cancel with a postmaster restart > > canceling all queries. Could the fork/exec stuff be responsible for this? > > Whoever changed this: > > status = ProcessStartupPacket(port, false); > > if (status != STATUS_OK) > return 0; /* cancel request processed, or error */ > > to this: > > status = ProcessStartupPacket(port, false); > > if (status != STATUS_OK) > { > ereport(LOG, > (errmsg("connection startup failed"))); > proc_exit(status); > } > > is responsible. May we have an explanation of the thought process, > if any? Claudio specified the attached fix, which I have applied (this time). -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/postmaster/postmaster.c,v retrieving revision 1.355 diff -c -c -r1.355 postmaster.c *** src/backend/postmaster/postmaster.c 7 Jan 2004 18:56:27 -0000 1.355 --- src/backend/postmaster/postmaster.c 9 Jan 2004 23:08:01 -0000 *************** *** 2450,2456 **** { ereport(LOG, (errmsg("connection startup failed"))); ! proc_exit(status); } /* --- 2450,2456 ---- { ereport(LOG, (errmsg("connection startup failed"))); ! proc_exit(0); } /*
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Claudio specified the attached fix, which I have applied (this time). The ereport must vanish back into its black hole, also. ProcessStartupPacket has already issued any appropriate log message. > *** 2450,2456 **** > { > ereport(LOG, > (errmsg("connection startup failed"))); > ! proc_exit(status); > } > /* > --- 2450,2456 ---- > { > ereport(LOG, > (errmsg("connection startup failed"))); > ! proc_exit(0); > } > /* regards, tom lane
Done and attached. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Claudio specified the attached fix, which I have applied (this time). > > The ereport must vanish back into its black hole, also. > ProcessStartupPacket has already issued any appropriate log message. > > > *** 2450,2456 **** > > { > > ereport(LOG, > > (errmsg("connection startup failed"))); > > ! proc_exit(status); > > } > > > /* > > --- 2450,2456 ---- > > { > > ereport(LOG, > > (errmsg("connection startup failed"))); > > ! proc_exit(0); > > } > > > /* > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/postmaster/postmaster.c,v retrieving revision 1.356 diff -c -c -r1.356 postmaster.c *** src/backend/postmaster/postmaster.c 9 Jan 2004 23:11:39 -0000 1.356 --- src/backend/postmaster/postmaster.c 9 Jan 2004 23:25:36 -0000 *************** *** 2447,2457 **** status = ProcessStartupPacket(port, false); if (status != STATUS_OK) - { - ereport(LOG, - (errmsg("connection startup failed"))); proc_exit(0); - } /* * Now that we have the user and database name, we can set the process --- 2447,2453 ----