Thread: PostmasterContext survives into parallel workers!?
I noticed $subject while fooling around with the tqueue.c memory leak issues. This does not seem like a good idea to me. At the very least, it's a waste of space that could be used for something else, and at the worst, it might be a security issue because it leaves security-sensitive pg_hba and pg_ident information laying about in places where it might be recoverable (if only through memory-disclosure bugs, which we've had before and no doubt will have again). The reason is that the parallel worker launch path contains no equivalent of PostgresMain's stanza if (PostmasterContext){ MemoryContextDelete(PostmasterContext); PostmasterContext = NULL;} Now, I'm undecided whether to flush that context only in parallel workers, or to try to make it go away for all bgworkers of any stripe. The latter seems a little better from a security standpoint, but I wonder if anyone has a use-case where that'd be a bad idea? regards, tom lane
On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I noticed $subject while fooling around with the tqueue.c memory leak > issues. This does not seem like a good idea to me. At the very least, > it's a waste of space that could be used for something else, and at the > worst, it might be a security issue because it leaves security-sensitive > pg_hba and pg_ident information laying about in places where it might be > recoverable (if only through memory-disclosure bugs, which we've had > before and no doubt will have again). > > The reason is that the parallel worker launch path contains no equivalent > of PostgresMain's stanza > > if (PostmasterContext) > { > MemoryContextDelete(PostmasterContext); > PostmasterContext = NULL; > } > > Now, I'm undecided whether to flush that context only in parallel workers, > or to try to make it go away for all bgworkers of any stripe. The latter > seems a little better from a security standpoint, but I wonder if anyone > has a use-case where that'd be a bad idea? I think it would be better to get rid of it in all bgworkers. (Also vaguely on the list of things to clean up: can't we make it so that bgworkers aren't launched from inside a signal handler? Blech.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > (Also vaguely on the list of things to clean up: can't we make it so > that bgworkers aren't launched from inside a signal handler? Blech.) So are other postmaster children, I believe. We could probably try to rewrite the postmaster to not do useful work in signal handlers, but rely on a lot of volatile flags set by the handlers. Not convinced this would be anything but a cosmetic improvement, though. And it could create new portability problems to replace any that it removed; we'd have to be *absolutely* certain that the main select() call would return with EINTR rather than resuming after any interrupt. regards, tom lane
On 2016-08-01 18:09:03 -0400, Robert Haas wrote: > (Also vaguely on the list of things to clean up: can't we make it so > that bgworkers aren't launched from inside a signal handler? Blech.) Isn't pretty much everything on-demand below postmaster started from a signal handler?
Andres Freund <andres@anarazel.de> writes: > On 2016-08-01 18:09:03 -0400, Robert Haas wrote: >> (Also vaguely on the list of things to clean up: can't we make it so >> that bgworkers aren't launched from inside a signal handler? Blech.) > Isn't pretty much everything on-demand below postmaster started from a > signal handler? I think it depends. As an example, maybe_start_bgworker is called from PostmasterMain, *and* from ServerLoop, *and* from reaper, *and* from sigusr1_handler. That's likely excessive, but it's what we've got at the moment. I'm pretty sure regular backends are only launched from ServerLoop, but for anything else it's uncertain. regards, tom lane
Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-08-01 18:09:03 -0400, Robert Haas wrote: > >> (Also vaguely on the list of things to clean up: can't we make it so > >> that bgworkers aren't launched from inside a signal handler? Blech.) > > > Isn't pretty much everything on-demand below postmaster started from a > > signal handler? > > I think it depends. As an example, maybe_start_bgworker is called > from PostmasterMain, *and* from ServerLoop, *and* from reaper, > *and* from sigusr1_handler. That's likely excessive, but it's what > we've got at the moment. As I recall, each of those calls correspond to some particular need; keep in mind that bgworkers can request to be started at a few different points: either right at postmaster start, or when consistent state is reached on a standby, or when recovery is finished. maybe_start_bgworker only starts one bgworker, and it also sets a flag so that ServerLoop will call it another time if there are pending workers for the same timing event. That explains the four calls to maybe_start_bgworker, and as I recall removing any of these would break some possible usage. And yes, I did put two of these in signal handlers precisely because that is postmaster's longstanding practice. (It's perhaps possible to remove the call from ServerLoop if you make maybe_start_bgworker process all of them at once instead, but as I recall we decided to do only one at a time so that ServerLoop had a chance to run other control logic in between, just in case.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-08-01 18:28:44 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-08-01 18:09:03 -0400, Robert Haas wrote: > >> (Also vaguely on the list of things to clean up: can't we make it so > >> that bgworkers aren't launched from inside a signal handler? Blech.) > > > Isn't pretty much everything on-demand below postmaster started from a > > signal handler? > > I think it depends. As an example, maybe_start_bgworker is called > from PostmasterMain, *and* from ServerLoop, *and* from reaper, > *and* from sigusr1_handler. That's likely excessive, but it's what > we've got at the moment. Personally I think the whole logic should be reworked so we do most of that that only from one place. Especially the signal handler stuff should imo just be replaced by setting latch, which then does the work inside the normal main loop. Andres
Andres Freund <andres@anarazel.de> writes: > Personally I think the whole logic should be reworked so we do most of > that that only from one place. Alvaro already mentioned a couple of reasons why that might not be so easy. > Especially the signal handler stuff > should imo just be replaced by setting latch, which then does the work > inside the normal main loop. Of course, then we're utterly dependent on the latch logic to be zero-failure, as any bug in it takes out the postmaster. That idea would've been rejected out of hand earlier in the development of the latch code. Maybe it's safe enough today, but I still wonder what it is that we're buying after we do such a massive rewrite of the postmaster. regards, tom lane
On Tue, Aug 2, 2016 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2016-08-01 18:09:03 -0400, Robert Haas wrote: >>> (Also vaguely on the list of things to clean up: can't we make it so >>> that bgworkers aren't launched from inside a signal handler? Blech.) > >> Isn't pretty much everything on-demand below postmaster started from a >> signal handler? > > I think it depends. As an example, maybe_start_bgworker is called > from PostmasterMain, *and* from ServerLoop, *and* from reaper, > *and* from sigusr1_handler. That's likely excessive, but it's what > we've got at the moment. I found this apparently unresolved bug report about glibc fork() inside a signal handler deadlocking: https://sourceware.org/bugzilla/show_bug.cgi?id=4737 I wonder if that could bite postmaster. It's interesting because comments 16 and 19 and 22 suggest that it may not be fixed. -- Thomas Munro http://www.enterprisedb.com
On 2016-08-02 11:27:25 +1200, Thomas Munro wrote: > On Tue, Aug 2, 2016 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > >> On 2016-08-01 18:09:03 -0400, Robert Haas wrote: > >>> (Also vaguely on the list of things to clean up: can't we make it so > >>> that bgworkers aren't launched from inside a signal handler? Blech.) > > > >> Isn't pretty much everything on-demand below postmaster started from a > >> signal handler? > > > > I think it depends. As an example, maybe_start_bgworker is called > > from PostmasterMain, *and* from ServerLoop, *and* from reaper, > > *and* from sigusr1_handler. That's likely excessive, but it's what > > we've got at the moment. > > I found this apparently unresolved bug report about glibc fork() > inside a signal handler deadlocking: > > https://sourceware.org/bugzilla/show_bug.cgi?id=4737 > > I wonder if that could bite postmaster. It's interesting because > comments 16 and 19 and 22 suggest that it may not be fixed. Moreover the solution appears to be to define the problem away: http://www.opengroup.org/austin/docs/austin_445.txt https://www.opengroup.org/austin/docs/austin_446.txt
Thomas Munro <thomas.munro@enterprisedb.com> writes: > I found this apparently unresolved bug report about glibc fork() > inside a signal handler deadlocking: > https://sourceware.org/bugzilla/show_bug.cgi?id=4737 > I wonder if that could bite postmaster. I seriously doubt it. The key thing about the postmaster is that it runs with signals blocked practically everywhere. So we're not taking risks of a signal handler interrupting, say, malloc() (which seemed to be the core of at least the first example given in that report). This is what makes me dubious that getting rid of doing work in the postmaster's signal handlers is really going to add any noticeable increment of safety. It might make the code look cleaner, but I'm afraid it's going to be a lot of churn for not much gain. regards, tom lane
On Mon, Aug 1, 2016 at 7:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> I found this apparently unresolved bug report about glibc fork() >> inside a signal handler deadlocking: >> https://sourceware.org/bugzilla/show_bug.cgi?id=4737 > >> I wonder if that could bite postmaster. > > I seriously doubt it. The key thing about the postmaster is that > it runs with signals blocked practically everywhere. So we're not > taking risks of a signal handler interrupting, say, malloc() > (which seemed to be the core of at least the first example given > in that report). This is what makes me dubious that getting rid > of doing work in the postmaster's signal handlers is really going > to add any noticeable increment of safety. It might make the > code look cleaner, but I'm afraid it's going to be a lot of churn > for not much gain. It's not just a cosmetic issue. See https://www.postgresql.org/message-id/CA+Tgmobr+Q2WgWeasdbDNefVwJkAGALxA=-VtEGNtQgL1V2Ryw@mail.gmail.com and d0410d66037c2f3f9bee45e0a2db9e47eeba2bb4. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 1, 2016 at 7:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... This is what makes me dubious that getting rid >> of doing work in the postmaster's signal handlers is really going >> to add any noticeable increment of safety. It might make the >> code look cleaner, but I'm afraid it's going to be a lot of churn >> for not much gain. > It's not just a cosmetic issue. > See https://www.postgresql.org/message-id/CA+Tgmobr+Q2WgWeasdbDNefVwJkAGALxA=-VtEGNtQgL1V2Ryw@mail.gmail.com > and d0410d66037c2f3f9bee45e0a2db9e47eeba2bb4. I do not think that patch particularly bears on this question. We have had logic bugs in the postmaster in the current coding structure, and we undoubtedly would still have logic bugs in the postmaster if it were rewritten to avoid using nontrivial signal handlers. They'd just be in different places. Also, the message you mention does a fine job of summarizing why getting out of the signal-handler-based design will be fraught with a bunch of new portability hazards. But what I was responding to here is the claim that we have portability hazards built into the current code as a consequence of doing work in the signal handlers. AFAICT, that's just FUD, and is sufficiently disproven by the many years of reliable service we've gotten out of the current design. Anyway, if someone is really motivated to rewrite the postmaster, I won't stand in the way. I'm just opining that that it will be a lot of work that would be better expended elsewhere. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Now, I'm undecided whether to flush that context only in parallel workers, >> or to try to make it go away for all bgworkers of any stripe. The latter >> seems a little better from a security standpoint, but I wonder if anyone >> has a use-case where that'd be a bad idea? > I think it would be better to get rid of it in all bgworkers. I looked into this, and immediately found this in the spot in postmaster.c that would be the obvious place to kill the PostmasterContext: /* Do NOT release postmaster's working memory context */ MyBgworkerEntry = &rw->rw_worker; StartBackgroundWorker(); This comment was in Alvaro's original commit adding bgworkers (da07a1e8). It looks to me like the reason for it is simply not having bothered to copy the rw->rw_worker data to somewhere that would survive deletion of the PostmasterContext. I wonder though if anyone remembers a more fundamental reason? Surely the bgworker is not supposed to touch any of the rest of the BackgroundWorkerList? regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Now, I'm undecided whether to flush that context only in parallel workers, > >> or to try to make it go away for all bgworkers of any stripe. The latter > >> seems a little better from a security standpoint, but I wonder if anyone > >> has a use-case where that'd be a bad idea? > > > I think it would be better to get rid of it in all bgworkers. > > I looked into this, and immediately found this in the spot in postmaster.c > that would be the obvious place to kill the PostmasterContext: > > /* Do NOT release postmaster's working memory context */ > > MyBgworkerEntry = &rw->rw_worker; > StartBackgroundWorker(); > > This comment was in Alvaro's original commit adding bgworkers (da07a1e8). Hm, I don't have the development branch in this laptop. I might find some evidence in the old one, but I won't be able to reach it till tonight. > It looks to me like the reason for it is simply not having bothered to > copy the rw->rw_worker data to somewhere that would survive deletion > of the PostmasterContext. I wonder though if anyone remembers a more > fundamental reason? Surely the bgworker is not supposed to touch any > of the rest of the BackgroundWorkerList? I just checked BDR, which is the more complex code using workers I know of, and I don't see any reason why this cannot be changed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> It looks to me like the reason for it is simply not having bothered to >> copy the rw->rw_worker data to somewhere that would survive deletion >> of the PostmasterContext. I wonder though if anyone remembers a more >> fundamental reason? Surely the bgworker is not supposed to touch any >> of the rest of the BackgroundWorkerList? > I just checked BDR, which is the more complex code using workers I know > of, and I don't see any reason why this cannot be changed. The attached patch passes "make check-world" for me. Can you check it against BDR? (I'd be hesitant to back-patch it in any case, but I think it's okay for HEAD unless we can easily find something it breaks.) regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 19d11e0..e48703a 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** do_start_bgworker(RegisteredBgWorker *rw *** 5529,5537 **** /* Close the postmaster's sockets */ ClosePostmasterPorts(false); ! /* Do NOT release postmaster's working memory context */ - MyBgworkerEntry = &rw->rw_worker; StartBackgroundWorker(); break; #endif --- 5529,5547 ---- /* Close the postmaster's sockets */ ClosePostmasterPorts(false); ! /* ! * Before blowing away PostmasterContext, save this bgworker's ! * data where it can find it. ! */ ! MyBgworkerEntry = (BackgroundWorker *) ! MemoryContextAlloc(TopMemoryContext, sizeof(BackgroundWorker)); ! memcpy(MyBgworkerEntry, &rw->rw_worker, sizeof(BackgroundWorker)); ! ! /* Release postmaster's working memory context */ ! MemoryContextSwitchTo(TopMemoryContext); ! MemoryContextDelete(PostmasterContext); ! PostmasterContext = NULL; StartBackgroundWorker(); break; #endif
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Tom Lane wrote: > >> It looks to me like the reason for it is simply not having bothered to > >> copy the rw->rw_worker data to somewhere that would survive deletion > >> of the PostmasterContext. I wonder though if anyone remembers a more > >> fundamental reason? Surely the bgworker is not supposed to touch any > >> of the rest of the BackgroundWorkerList? > > > I just checked BDR, which is the more complex code using workers I know > > of, and I don't see any reason why this cannot be changed. > > The attached patch passes "make check-world" for me. Can you check it > against BDR? Just checked. It works fine. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> The attached patch passes "make check-world" for me. Can you check it >> against BDR? > Just checked. It works fine. Thanks! I'll push it shortly. regards, tom lane