Thread: PostmasterContext survives into parallel workers!?

PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

01 August 2016, 20:18:42

I noticed $subject while fooling around with the tqueue.c memory leak
issues.  This does not seem like a good idea to me.  At the very least,
it's a waste of space that could be used for something else, and at the
worst, it might be a security issue because it leaves security-sensitive
pg_hba and pg_ident information laying about in places where it might be
recoverable (if only through memory-disclosure bugs, which we've had
before and no doubt will have again).

The reason is that the parallel worker launch path contains no equivalent
of PostgresMain's stanza
if (PostmasterContext){    MemoryContextDelete(PostmasterContext);    PostmasterContext = NULL;}

Now, I'm undecided whether to flush that context only in parallel workers,
or to try to make it go away for all bgworkers of any stripe.  The latter
seems a little better from a security standpoint, but I wonder if anyone
has a use-case where that'd be a bad idea?
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Robert Haas

Date:

01 August 2016, 22:09:08

On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I noticed $subject while fooling around with the tqueue.c memory leak
> issues.  This does not seem like a good idea to me.  At the very least,
> it's a waste of space that could be used for something else, and at the
> worst, it might be a security issue because it leaves security-sensitive
> pg_hba and pg_ident information laying about in places where it might be
> recoverable (if only through memory-disclosure bugs, which we've had
> before and no doubt will have again).
>
> The reason is that the parallel worker launch path contains no equivalent
> of PostgresMain's stanza
>
>         if (PostmasterContext)
>         {
>                 MemoryContextDelete(PostmasterContext);
>                 PostmasterContext = NULL;
>         }
>
> Now, I'm undecided whether to flush that context only in parallel workers,
> or to try to make it go away for all bgworkers of any stripe.  The latter
> seems a little better from a security standpoint, but I wonder if anyone
> has a use-case where that'd be a bad idea?

I think it would be better to get rid of it in all bgworkers.

(Also vaguely on the list of things to clean up: can't we make it so
that bgworkers aren't launched from inside a signal handler?  Blech.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

01 August 2016, 22:18:32

Robert Haas <robertmhaas@gmail.com> writes:
> (Also vaguely on the list of things to clean up: can't we make it so
> that bgworkers aren't launched from inside a signal handler?  Blech.)

So are other postmaster children, I believe.  We could probably try
to rewrite the postmaster to not do useful work in signal handlers,
but rely on a lot of volatile flags set by the handlers.  Not convinced
this would be anything but a cosmetic improvement, though.  And it
could create new portability problems to replace any that it removed;
we'd have to be *absolutely* certain that the main select() call would
return with EINTR rather than resuming after any interrupt.
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Andres Freund

Date:

01 August 2016, 22:18:40

On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> (Also vaguely on the list of things to clean up: can't we make it so
> that bgworkers aren't launched from inside a signal handler?  Blech.)

Isn't pretty much everything on-demand below postmaster started from a
signal handler?

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

01 August 2016, 22:28:51

Andres Freund <andres@anarazel.de> writes:
> On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
>> (Also vaguely on the list of things to clean up: can't we make it so
>> that bgworkers aren't launched from inside a signal handler?  Blech.)

> Isn't pretty much everything on-demand below postmaster started from a
> signal handler?

I think it depends.  As an example, maybe_start_bgworker is called
from PostmasterMain, *and* from ServerLoop, *and* from reaper,
*and* from sigusr1_handler.  That's likely excessive, but it's what
we've got at the moment.

I'm pretty sure regular backends are only launched from ServerLoop,
but for anything else it's uncertain.
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Alvaro Herrera

Date:

01 August 2016, 22:50:42

Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> >> (Also vaguely on the list of things to clean up: can't we make it so
> >> that bgworkers aren't launched from inside a signal handler?  Blech.)
> 
> > Isn't pretty much everything on-demand below postmaster started from a
> > signal handler?
> 
> I think it depends.  As an example, maybe_start_bgworker is called
> from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> *and* from sigusr1_handler.  That's likely excessive, but it's what
> we've got at the moment.

As I recall, each of those calls correspond to some particular need;
keep in mind that bgworkers can request to be started at a few different
points: either right at postmaster start, or when consistent state is
reached on a standby, or when recovery is finished.
maybe_start_bgworker only starts one bgworker, and it also sets a flag
so that ServerLoop will call it another time if there are pending
workers for the same timing event.  That explains the four calls to
maybe_start_bgworker, and as I recall removing any of these would break
some possible usage.  And yes, I did put two of these in signal handlers
precisely because that is postmaster's longstanding practice.

(It's perhaps possible to remove the call from ServerLoop if you make
maybe_start_bgworker process all of them at once instead, but as I
recall we decided to do only one at a time so that ServerLoop had a
chance to run other control logic in between, just in case.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostmasterContext survives into parallel workers!?

From

Andres Freund

Date:

01 August 2016, 22:54:40

On 2016-08-01 18:28:44 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> >> (Also vaguely on the list of things to clean up: can't we make it so
> >> that bgworkers aren't launched from inside a signal handler?  Blech.)
> 
> > Isn't pretty much everything on-demand below postmaster started from a
> > signal handler?
> 
> I think it depends.  As an example, maybe_start_bgworker is called
> from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> *and* from sigusr1_handler.  That's likely excessive, but it's what
> we've got at the moment.

Personally I think the whole logic should be reworked so we do most of
that that only from one place. Especially the signal handler stuff
should imo just be replaced by setting latch, which then does the work
inside the normal main loop.

Andres

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

01 August 2016, 23:21:29

Andres Freund <andres@anarazel.de> writes:
> Personally I think the whole logic should be reworked so we do most of
> that that only from one place.

Alvaro already mentioned a couple of reasons why that might not be so
easy.

> Especially the signal handler stuff
> should imo just be replaced by setting latch, which then does the work
> inside the normal main loop.

Of course, then we're utterly dependent on the latch logic to be
zero-failure, as any bug in it takes out the postmaster.  That idea
would've been rejected out of hand earlier in the development of
the latch code.  Maybe it's safe enough today, but I still wonder what
it is that we're buying after we do such a massive rewrite of the
postmaster.
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Thomas Munro

Date:

01 August 2016, 23:27:33

On Tue, Aug 2, 2016 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
>>> (Also vaguely on the list of things to clean up: can't we make it so
>>> that bgworkers aren't launched from inside a signal handler?  Blech.)
>
>> Isn't pretty much everything on-demand below postmaster started from a
>> signal handler?
>
> I think it depends.  As an example, maybe_start_bgworker is called
> from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> *and* from sigusr1_handler.  That's likely excessive, but it's what
> we've got at the moment.

I found this apparently unresolved bug report about glibc fork()
inside a signal handler deadlocking:

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

I wonder if that could bite postmaster.  It's interesting because
comments 16 and 19 and 22 suggest that it may not be fixed.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: PostmasterContext survives into parallel workers!?

From

Andres Freund

Date:

01 August 2016, 23:36:58

On 2016-08-02 11:27:25 +1200, Thomas Munro wrote:
> On Tue, Aug 2, 2016 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> >>> (Also vaguely on the list of things to clean up: can't we make it so
> >>> that bgworkers aren't launched from inside a signal handler?  Blech.)
> >
> >> Isn't pretty much everything on-demand below postmaster started from a
> >> signal handler?
> >
> > I think it depends.  As an example, maybe_start_bgworker is called
> > from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> > *and* from sigusr1_handler.  That's likely excessive, but it's what
> > we've got at the moment.
> 
> I found this apparently unresolved bug report about glibc fork()
> inside a signal handler deadlocking:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
> 
> I wonder if that could bite postmaster.  It's interesting because
> comments 16 and 19 and 22 suggest that it may not be fixed.

Moreover the solution appears to be to define the problem away:
http://www.opengroup.org/austin/docs/austin_445.txt
https://www.opengroup.org/austin/docs/austin_446.txt

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

01 August 2016, 23:42:35

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> I found this apparently unresolved bug report about glibc fork()
> inside a signal handler deadlocking:
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737

> I wonder if that could bite postmaster.

I seriously doubt it.  The key thing about the postmaster is that
it runs with signals blocked practically everywhere.  So we're not
taking risks of a signal handler interrupting, say, malloc()
(which seemed to be the core of at least the first example given
in that report).  This is what makes me dubious that getting rid
of doing work in the postmaster's signal handlers is really going
to add any noticeable increment of safety.  It might make the
code look cleaner, but I'm afraid it's going to be a lot of churn
for not much gain.
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Robert Haas

Date:

02 August 2016, 03:33:04

On Mon, Aug 1, 2016 at 7:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> I found this apparently unresolved bug report about glibc fork()
>> inside a signal handler deadlocking:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
>
>> I wonder if that could bite postmaster.
>
> I seriously doubt it.  The key thing about the postmaster is that
> it runs with signals blocked practically everywhere.  So we're not
> taking risks of a signal handler interrupting, say, malloc()
> (which seemed to be the core of at least the first example given
> in that report).  This is what makes me dubious that getting rid
> of doing work in the postmaster's signal handlers is really going
> to add any noticeable increment of safety.  It might make the
> code look cleaner, but I'm afraid it's going to be a lot of churn
> for not much gain.

It's not just a cosmetic issue.

See https://www.postgresql.org/message-id/CA+Tgmobr+Q2WgWeasdbDNefVwJkAGALxA=-VtEGNtQgL1V2Ryw@mail.gmail.com
and d0410d66037c2f3f9bee45e0a2db9e47eeba2bb4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

02 August 2016, 14:57:22

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 1, 2016 at 7:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ...  This is what makes me dubious that getting rid
>> of doing work in the postmaster's signal handlers is really going
>> to add any noticeable increment of safety.  It might make the
>> code look cleaner, but I'm afraid it's going to be a lot of churn
>> for not much gain.

> It's not just a cosmetic issue.

> See https://www.postgresql.org/message-id/CA+Tgmobr+Q2WgWeasdbDNefVwJkAGALxA=-VtEGNtQgL1V2Ryw@mail.gmail.com
> and d0410d66037c2f3f9bee45e0a2db9e47eeba2bb4.

I do not think that patch particularly bears on this question.  We have
had logic bugs in the postmaster in the current coding structure, and
we undoubtedly would still have logic bugs in the postmaster if it were
rewritten to avoid using nontrivial signal handlers.  They'd just be
in different places.

Also, the message you mention does a fine job of summarizing why getting
out of the signal-handler-based design will be fraught with a bunch of
new portability hazards.

But what I was responding to here is the claim that we have portability
hazards built into the current code as a consequence of doing work in the
signal handlers.  AFAICT, that's just FUD, and is sufficiently disproven
by the many years of reliable service we've gotten out of the current
design.

Anyway, if someone is really motivated to rewrite the postmaster, I
won't stand in the way.  I'm just opining that that it will be a lot
of work that would be better expended elsewhere.
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

02 August 2016, 18:06:40

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now, I'm undecided whether to flush that context only in parallel workers,
>> or to try to make it go away for all bgworkers of any stripe.  The latter
>> seems a little better from a security standpoint, but I wonder if anyone
>> has a use-case where that'd be a bad idea?

> I think it would be better to get rid of it in all bgworkers.

I looked into this, and immediately found this in the spot in postmaster.c
that would be the obvious place to kill the PostmasterContext:
           /* Do NOT release postmaster's working memory context */
           MyBgworkerEntry = &rw->rw_worker;           StartBackgroundWorker();

This comment was in Alvaro's original commit adding bgworkers (da07a1e8).
It looks to me like the reason for it is simply not having bothered to
copy the rw->rw_worker data to somewhere that would survive deletion
of the PostmasterContext.  I wonder though if anyone remembers a more
fundamental reason?  Surely the bgworker is not supposed to touch any
of the rest of the BackgroundWorkerList?
        regards, tom lane

Re: PostmasterContext survives into parallel workers!?

From

Alvaro Herrera

Date:

02 August 2016, 18:24:11

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Now, I'm undecided whether to flush that context only in parallel workers,
> >> or to try to make it go away for all bgworkers of any stripe.  The latter
> >> seems a little better from a security standpoint, but I wonder if anyone
> >> has a use-case where that'd be a bad idea?
> 
> > I think it would be better to get rid of it in all bgworkers.
> 
> I looked into this, and immediately found this in the spot in postmaster.c
> that would be the obvious place to kill the PostmasterContext:
> 
>             /* Do NOT release postmaster's working memory context */
> 
>             MyBgworkerEntry = &rw->rw_worker;
>             StartBackgroundWorker();
> 
> This comment was in Alvaro's original commit adding bgworkers (da07a1e8).

Hm, I don't have the development branch in this laptop.  I might find
some evidence in the old one, but I won't be able to reach it till
tonight.

> It looks to me like the reason for it is simply not having bothered to
> copy the rw->rw_worker data to somewhere that would survive deletion
> of the PostmasterContext.  I wonder though if anyone remembers a more
> fundamental reason?  Surely the bgworker is not supposed to touch any
> of the rest of the BackgroundWorkerList?

I just checked BDR, which is the more complex code using workers I know
of, and I don't see any reason why this cannot be changed.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

02 August 2016, 18:38:24

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> It looks to me like the reason for it is simply not having bothered to
>> copy the rw->rw_worker data to somewhere that would survive deletion
>> of the PostmasterContext.  I wonder though if anyone remembers a more
>> fundamental reason?  Surely the bgworker is not supposed to touch any
>> of the rest of the BackgroundWorkerList?

> I just checked BDR, which is the more complex code using workers I know
> of, and I don't see any reason why this cannot be changed.

The attached patch passes "make check-world" for me.  Can you check it
against BDR?

(I'd be hesitant to back-patch it in any case, but I think it's okay for
HEAD unless we can easily find something it breaks.)

            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 19d11e0..e48703a 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** do_start_bgworker(RegisteredBgWorker *rw
*** 5529,5537 ****
              /* Close the postmaster's sockets */
              ClosePostmasterPorts(false);

!             /* Do NOT release postmaster's working memory context */

-             MyBgworkerEntry = &rw->rw_worker;
              StartBackgroundWorker();
              break;
  #endif
--- 5529,5547 ----
              /* Close the postmaster's sockets */
              ClosePostmasterPorts(false);

!             /*
!              * Before blowing away PostmasterContext, save this bgworker's
!              * data where it can find it.
!              */
!             MyBgworkerEntry = (BackgroundWorker *)
!                 MemoryContextAlloc(TopMemoryContext, sizeof(BackgroundWorker));
!             memcpy(MyBgworkerEntry, &rw->rw_worker, sizeof(BackgroundWorker));
!
!             /* Release postmaster's working memory context */
!             MemoryContextSwitchTo(TopMemoryContext);
!             MemoryContextDelete(PostmasterContext);
!             PostmasterContext = NULL;

              StartBackgroundWorker();
              break;
  #endif

Re: PostmasterContext survives into parallel workers!?

From

Alvaro Herrera

Date:

03 August 2016, 17:55:18

Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> It looks to me like the reason for it is simply not having bothered to
> >> copy the rw->rw_worker data to somewhere that would survive deletion
> >> of the PostmasterContext.  I wonder though if anyone remembers a more
> >> fundamental reason?  Surely the bgworker is not supposed to touch any
> >> of the rest of the BackgroundWorkerList?
> 
> > I just checked BDR, which is the more complex code using workers I know
> > of, and I don't see any reason why this cannot be changed.
> 
> The attached patch passes "make check-world" for me.  Can you check it
> against BDR?

Just checked.  It works fine.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostmasterContext survives into parallel workers!?

From

Tom Lane

Date:

03 August 2016, 18:09:28

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> The attached patch passes "make check-world" for me.  Can you check it
>> against BDR?

> Just checked.  It works fine.

Thanks!  I'll push it shortly.
        regards, tom lane