Thread: on_exit_reset fails to clear DSM-related exit actions

on_exit_reset fails to clear DSM-related exit actions

From
Tom Lane
Date:
I just noticed that the DSM patch has introduced a whole new class of
failures related to the bug #9464 issue: to wit, any on_detach
actions registered in a parent process will also be performed when a
child process exits, because nothing has been added to on_exit_reset
to prevent that.  It seems likely that this is undesirable.
        regards, tom lane



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I just noticed that the DSM patch has introduced a whole new class of
> failures related to the bug #9464 issue: to wit, any on_detach
> actions registered in a parent process will also be performed when a
> child process exits, because nothing has been added to on_exit_reset
> to prevent that.  It seems likely that this is undesirable.

I don't think this can actually happen.  There are quite a number of
things that would go belly-up if you tried to use dynamic shared
memory from the postmaster, which is why dsm_create() and dsm_attach()
both Assert(IsUnderPostmaster).  Without calling those function,
there's no way for code running in the postmaster to call
on_dsm_detach(), because it's got nothing to pass for the first
argument.

In case you're wondering, the major reason I disallowed this is that
the control segment tracks the number of backends attached to each
segment, and there's no provision for adjusting that value each time
the postmaster forks.  We could add such provision, but it seems like
there would be race conditions, and the postmaster might have to
participate in locking, so it might be pretty ugly, and a performance
suck for anyone who doesn't need this functionality.  And it doesn't
seem very useful anyway.  The postmaster really shouldn't be touching
any shared memory segment more than the absolutely minimal amount, so
the main possible benefit I can see is that you could have the mapping
already in place for each new backend rather than having to set it up
in every backend.  But I'm prepared to hope that isn't actually
important enough to be worth worrying about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: on_exit_reset fails to clear DSM-related exit actions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I just noticed that the DSM patch has introduced a whole new class of
>> failures related to the bug #9464 issue: to wit, any on_detach
>> actions registered in a parent process will also be performed when a
>> child process exits, because nothing has been added to on_exit_reset
>> to prevent that.  It seems likely that this is undesirable.

> I don't think this can actually happen.  There are quite a number of
> things that would go belly-up if you tried to use dynamic shared
> memory from the postmaster, which is why dsm_create() and dsm_attach()
> both Assert(IsUnderPostmaster).

Nonetheless it seems like a good idea to make on_exit_reset drop any
such queued actions.

The big picture here is that in the scenario being debated in the other
thread, exit() in a child process forked from a backend will execute that
backend's on_detach actions *even if the code had done on_exit_reset after
the fork*.  So whether or not you buy Andres' argument that it's not
necessary for atexit_callback to defend against this scenario, there's
actually no other defense possible given the way things work in HEAD.
        regards, tom lane



Re: on_exit_reset fails to clear DSM-related exit actions

From
Andres Freund
Date:
Hi,

On 2014-03-07 13:54:42 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I don't think this can actually happen.  There are quite a number of
> > things that would go belly-up if you tried to use dynamic shared
> > memory from the postmaster, which is why dsm_create() and dsm_attach()
> > both Assert(IsUnderPostmaster).
>
> Nonetheless it seems like a good idea to make on_exit_reset drop any
> such queued actions.
>
> The big picture here is that in the scenario being debated in the other
> thread, exit() in a child process forked from a backend will execute that
> backend's on_detach actions *even if the code had done on_exit_reset after
> the fork*.

Hm, aren't those actions called via shmem_exit() calling
dsm_backend_shutdown() et al? I think that should be cleared by
on_shmem_exit()?

> So whether or not you buy Andres' argument that it's not
> necessary for atexit_callback to defend against this scenario, there's
> actually no other defense possible given the way things work in HEAD.

I think you're misunderstanding me. I am saying we *should* defend
against it. Our opinions just seem to differ on what to do when the
scenario is detected. I am saying we should scream bloody murder (which
admittedly is a hard in a child), you want to essentially declare it
supported.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: on_exit_reset fails to clear DSM-related exit actions

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-07 13:54:42 -0500, Tom Lane wrote:
>> The big picture here is that in the scenario being debated in the other
>> thread, exit() in a child process forked from a backend will execute that
>> backend's on_detach actions *even if the code had done on_exit_reset after
>> the fork*.

> Hm, aren't those actions called via shmem_exit() calling
> dsm_backend_shutdown() et al? I think that should be cleared by
> on_shmem_exit()?

But dsm_backend_shutdown gets called whether or not any shmem_exit
actions are registered.

> I think you're misunderstanding me. I am saying we *should* defend
> against it. Our opinions just seem to differ on what to do when the
> scenario is detected. I am saying we should scream bloody murder (which
> admittedly is a hard in a child), you want to essentially declare it
> supported.

And if we scream bloody murder, what will happen?  Absolutely nothing
except we'll annoy our users.  They won't have control over the
third-party libraries that are doing what you want to complain about.
        regards, tom lane



Re: on_exit_reset fails to clear DSM-related exit actions

From
Andres Freund
Date:
On 2014-03-07 14:14:17 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-03-07 13:54:42 -0500, Tom Lane wrote:
> >> The big picture here is that in the scenario being debated in the other
> >> thread, exit() in a child process forked from a backend will execute that
> >> backend's on_detach actions *even if the code had done on_exit_reset after
> >> the fork*.
> 
> > Hm, aren't those actions called via shmem_exit() calling
> > dsm_backend_shutdown() et al? I think that should be cleared by
> > on_shmem_exit()?
> 
> But dsm_backend_shutdown gets called whether or not any shmem_exit
> actions are registered.

Yikes. I thought on_exit_reset() disarmed the atexit callback, but
indeed, it does not...

> > I think you're misunderstanding me. I am saying we *should* defend
> > against it. Our opinions just seem to differ on what to do when the
> > scenario is detected. I am saying we should scream bloody murder (which
> > admittedly is a hard in a child), you want to essentially declare it
> > supported.
> 
> And if we scream bloody murder, what will happen?  Absolutely nothing
> except we'll annoy our users.  They won't have control over the
> third-party libraries that are doing what you want to complain about.

If the third party library is suitably careful it will only use fork()
and then exec() or _exit(), otherwise there are other things that'll be
broken (e.g. stdio). In that case everything was and is fine. If not,
the library's user can then fix or file a bug against the library.

Both perl and glibc seem to do just that in system() btw...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Fri, Mar 7, 2014 at 1:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I just noticed that the DSM patch has introduced a whole new class of
>>> failures related to the bug #9464 issue: to wit, any on_detach
>>> actions registered in a parent process will also be performed when a
>>> child process exits, because nothing has been added to on_exit_reset
>>> to prevent that.  It seems likely that this is undesirable.
>
>> I don't think this can actually happen.  There are quite a number of
>> things that would go belly-up if you tried to use dynamic shared
>> memory from the postmaster, which is why dsm_create() and dsm_attach()
>> both Assert(IsUnderPostmaster).
>
> Nonetheless it seems like a good idea to make on_exit_reset drop any
> such queued actions.
>
> The big picture here is that in the scenario being debated in the other
> thread, exit() in a child process forked from a backend will execute that
> backend's on_detach actions *even if the code had done on_exit_reset after
> the fork*.

Hmm.  So the problematic sequence of events is where a postmaster
child forks, and then exits without exec-ing, perhaps because e.g.
exec fails?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: on_exit_reset fails to clear DSM-related exit actions

From
Peter LaDow
Date:
On Friday, March 7, 2014, Andres Freund <andres@2ndquadrant.com> wrote:
If the third party library is suitably careful it will only use fork()
and then exec() or _exit(), otherwise there are other things that'll be

But that is not possible* in our case of trying to spawn an asynchronous backgound process. The goal here is for the extension to spawn the process and return immediately. If we exec in the first level child, and immediately return, who reaps the child when done?

This is why we fork twice and exit in the first level child so that the extension can reap the first.

Unless Postgres happily reaps all children perhaps through a SIGCHLD handler?

(* I suppose we could exec a program that itself forks/execs a background process. But that is essentially no different than what we are doing now, other than trying to meet this fork/exec/_exit requirement. And that's fine if that is to be the case. We just need to know what it is.)
  
Both perl and glibc seem to do just that in system() btw...

I don't know about Perl, but system is blocking. What if you don't want to wait for the child to exit?

Pete 

Re: on_exit_reset fails to clear DSM-related exit actions

From
Andres Freund
Date:
On 2014-03-07 12:09:45 -0800, Peter LaDow wrote:
> On Friday, March 7, 2014, Andres Freund <andres@2ndquadrant.com> wrote:
> >
> > If the third party library is suitably careful it will only use fork()
> > and then exec() or _exit(), otherwise there are other things that'll be

> But that is not possible* in our case of trying to spawn an asynchronous
> backgound process.

Forking twice is ok as well, as long as you just use _exit() after the
fork. The thing is that you shouldn't run any nontrivial code in the
fork, as long as you're connected to the original environment (fd's,
memory mappings and so forth).

> > Both perl and glibc seem to do just that in system() btw...

> I don't know about Perl, but system is blocking. What if you don't want to
> wait for the child to exit?

Man. The point is that that the library code is carefully written to not
use exit() but _exit() after a fork failure, that's it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: on_exit_reset fails to clear DSM-related exit actions

From
Peter LaDow
Date:
On Fri, Mar 7, 2014 at 12:17 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Man. The point is that that the library code is carefully written to not
> use exit() but _exit() after a fork failure, that's it.

I understand your point.  I understand that in the case of the
postmaster we don't want to invoke behavior that can cause problems by
calling exit(). But how does this jive with Tom's point (on the bug
list) about other 3rd party libraries perhaps registering atexit
handlers?

If the convention is that only fork/exec/_exit is permissible after a
fork (what about on_exit_reset?), then third party libraries also need
to be aware of that convention and not depend on their atexit handlers
being called.

And I'm not advocating a certain solution.  I'm only trying to clarify
what the solution is so that we "comply" with the convention.  We
don't want to break posgres or any other "well behaved" third party
libraries.  I don't know the internals of postgres (hence original bug
report and this thread), so I of course defer to you and the other
experts here.  So, in our case we call on_exit_reset() after the fork
in the child, and then from there on only use fork, exec, or _exit, as
you mention above.

Pete



Re: on_exit_reset fails to clear DSM-related exit actions

From
Peter LaDow
Date:
On Fri, Mar 7, 2014 at 12:17 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Forking twice is ok as well, as long as you just use _exit() after the
> fork. The thing is that you shouldn't run any nontrivial code in the
> fork, as long as you're connected to the original environment (fd's,
> memory mappings and so forth).

Just to be clear, what do you mean by "nontrivial code"?  Do you mean
C library calls, other than fork/exec/_exit?

Pete



Re: on_exit_reset fails to clear DSM-related exit actions

From
Peter LaDow
Date:
On Fri, Mar 7, 2014 at 12:55 PM, Peter LaDow <petela@gocougs.wsu.edu> wrote:
> Just to be clear, what do you mean by "nontrivial code"?  Do you mean
> C library calls, other than fork/exec/_exit?

I think I've answered my own question:

http://man7.org/linux/man-pages/man7/signal.7.html

The 'Async-signal-safe functions' are the nontrivial C calls that may be called.

Pete



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Fri, Mar 7, 2014 at 2:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The big picture here is that in the scenario being debated in the other
>> thread, exit() in a child process forked from a backend will execute that
>> backend's on_detach actions *even if the code had done on_exit_reset after
>> the fork*.
>
> Hmm.  So the problematic sequence of events is where a postmaster
> child forks, and then exits without exec-ing, perhaps because e.g.
> exec fails?

I've attempted a fix for this case.  The attached patch makes
test_shm_mq fork() a child process that calls on_exit_reset() and then
exits.  Without the fix I just pushed, this causes the tests to fail;
with this fix, this does not cause the tests to fail.

I'm not entirely sure that this is exactly right, but I think it's an
improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: on_exit_reset fails to clear DSM-related exit actions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 7, 2014 at 2:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Hmm.  So the problematic sequence of events is where a postmaster
>> child forks, and then exits without exec-ing, perhaps because e.g.
>> exec fails?

> I've attempted a fix for this case.  The attached patch makes
> test_shm_mq fork() a child process that calls on_exit_reset() and then
> exits.  Without the fix I just pushed, this causes the tests to fail;
> with this fix, this does not cause the tests to fail.

> I'm not entirely sure that this is exactly right, but I think it's an
> improvement.

This looks generally sane to me, but I wonder whether you should also
teach PGSharedMemoryDetach() to physically detach from DSM segments
not just the main shmem seg.  Or maybe better, adjust most of the
places that call on_exit_reset and then PGSharedMemoryDetach to also
make a detach-from-DSM call.  It looks like both sysv_shmem.c and
win32_shmem.c have internal callers of PGSharedMemoryDetach, which
probably shouldn't affect DSM.

You could argue that that's unnecessary on the grounds that the postmaster
will never have any DSM segments attached, but I think it would be
good coding practice anyway.  People who copy-and-paste those bits of
code into other places are not likely to think of adding DSM calls.
        regards, tom lane



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Mon, Mar 10, 2014 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Mar 7, 2014 at 2:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Hmm.  So the problematic sequence of events is where a postmaster
>>> child forks, and then exits without exec-ing, perhaps because e.g.
>>> exec fails?
>
>> I've attempted a fix for this case.  The attached patch makes
>> test_shm_mq fork() a child process that calls on_exit_reset() and then
>> exits.  Without the fix I just pushed, this causes the tests to fail;
>> with this fix, this does not cause the tests to fail.
>
>> I'm not entirely sure that this is exactly right, but I think it's an
>> improvement.
>
> This looks generally sane to me, but I wonder whether you should also
> teach PGSharedMemoryDetach() to physically detach from DSM segments
> not just the main shmem seg.  Or maybe better, adjust most of the
> places that call on_exit_reset and then PGSharedMemoryDetach to also
> make a detach-from-DSM call.

Hmm, good catch.  Maybe we should invent a new function that is
defined to mean "detach ALL shared memory segments".

A related point that's been bugging me for a while, and has just
struck me again, is that background workers for which
BGWORKER_SHMEM_ACCESS is not passed probably ought to be detaching
shared memory (and DSMs).  Currently, and since Alvaro originally
added the facility, the death of a  non-BGWORKER_SHMEM_ACCESS backend
is used in postmaster.c to decide whether to call HandleChildCrash().
But such workers could still clobber shared memory arbitrarily; they
haven't unmapped it.  Oddly, the code in postmaster.c is only cares
about the flag when checking EXIT_STATUS_0()/EXIT_STATUS_1(), not when
checking ReleasePostmasterChildSlot()...

> It looks like both sysv_shmem.c and
> win32_shmem.c have internal callers of PGSharedMemoryDetach, which
> probably shouldn't affect DSM.
>
> You could argue that that's unnecessary on the grounds that the postmaster
> will never have any DSM segments attached, but I think it would be
> good coding practice anyway.  People who copy-and-paste those bits of
> code into other places are not likely to think of adding DSM calls.

Well, actually the postmaster WILL have the control segment attached
(not nothing else).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: on_exit_reset fails to clear DSM-related exit actions

From
Alvaro Herrera
Date:
Robert Haas escribió:

> A related point that's been bugging me for a while, and has just
> struck me again, is that background workers for which
> BGWORKER_SHMEM_ACCESS is not passed probably ought to be detaching
> shared memory (and DSMs).  Currently, and since Alvaro originally
> added the facility, the death of a  non-BGWORKER_SHMEM_ACCESS backend
> is used in postmaster.c to decide whether to call HandleChildCrash().
> But such workers could still clobber shared memory arbitrarily; they
> haven't unmapped it.  Oddly, the code in postmaster.c is only cares
> about the flag when checking EXIT_STATUS_0()/EXIT_STATUS_1(), not when
> checking ReleasePostmasterChildSlot()...

Clearly there's not a lot of consistency on that.  I don't think I had
made up my mind completely about such details.  I do remember that
unmapping/detaching the shared memory segment didn't cross my mind; the
flag, as I recall, only controls (controlled) whether to attach to it
explicitely.  

IOW feel free to whack around.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Mon, Mar 10, 2014 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Mar 7, 2014 at 2:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Hmm.  So the problematic sequence of events is where a postmaster
>>> child forks, and then exits without exec-ing, perhaps because e.g.
>>> exec fails?
>
>> I've attempted a fix for this case.  The attached patch makes
>> test_shm_mq fork() a child process that calls on_exit_reset() and then
>> exits.  Without the fix I just pushed, this causes the tests to fail;
>> with this fix, this does not cause the tests to fail.
>
>> I'm not entirely sure that this is exactly right, but I think it's an
>> improvement.
>
> This looks generally sane to me, but I wonder whether you should also
> teach PGSharedMemoryDetach() to physically detach from DSM segments
> not just the main shmem seg.

I looked at this a little more.  It seems dsm_backend_shutdown()
already does almost the right thing; it fires all the on-detach
callbacks and unmaps the corresponding segments.  It does NOT unmap
the control segment, though, and processes that are unmapping the main
shared memory segment for safety should really be getting rid of the
control segment too (even though the consequences of corrupting the
control segment are much less bad).

One option is to just change that function to also unmap the control
segment, and maybe rename it to dsm_detach_all(), and then use that
everywhere.  The problem is that I'm not sure we really want to incur
the overhead of an extra munmap() during every backend exit, just to
get rid of a control segment which was going to be unmapped anyway by
process termination.  For that matter, I'm not sure why we bother
arranging that for the main shared memory segment, either: surely
whatever function the shmdt() and munmap() calls in IpcMemoryDetach
may have will be equally well-served by the forthcoming exit()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: on_exit_reset fails to clear DSM-related exit actions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> One option is to just change that function to also unmap the control
> segment, and maybe rename it to dsm_detach_all(), and then use that
> everywhere.  The problem is that I'm not sure we really want to incur
> the overhead of an extra munmap() during every backend exit, just to
> get rid of a control segment which was going to be unmapped anyway by
> process termination.  For that matter, I'm not sure why we bother
> arranging that for the main shared memory segment, either: surely
> whatever function the shmdt() and munmap() calls in IpcMemoryDetach
> may have will be equally well-served by the forthcoming exit()?

Before you lobotomize that code too much, consider the postmaster
crash-recovery case.  That path does need to detach from the old
shmem segment.

Also, I might be wrong, but I think IpcMemoryDetach is a *postmaster*
on_shmem_exit routine; it isn't called during backend exit.
        regards, tom lane



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Mon, Mar 17, 2014 at 11:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> One option is to just change that function to also unmap the control
>> segment, and maybe rename it to dsm_detach_all(), and then use that
>> everywhere.  The problem is that I'm not sure we really want to incur
>> the overhead of an extra munmap() during every backend exit, just to
>> get rid of a control segment which was going to be unmapped anyway by
>> process termination.  For that matter, I'm not sure why we bother
>> arranging that for the main shared memory segment, either: surely
>> whatever function the shmdt() and munmap() calls in IpcMemoryDetach
>> may have will be equally well-served by the forthcoming exit()?
>
> Before you lobotomize that code too much, consider the postmaster
> crash-recovery case.  That path does need to detach from the old
> shmem segment.
>
> Also, I might be wrong, but I think IpcMemoryDetach is a *postmaster*
> on_shmem_exit routine; it isn't called during backend exit.

Ah, right.  I verified using strace that, at least on Linux
2.6.32-279.el6.x86_64, the only system call made on disconnecting a
psql session is exit_group(0).  I also tried setting breakpoints on
PGSharedMemoryDetach and IpcMemoryDetach and they do not fire.

After mulling over a few possible approaches, I came up with the
attached, which seems short and to the point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: on_exit_reset fails to clear DSM-related exit actions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> After mulling over a few possible approaches, I came up with the
> attached, which seems short and to the point.

Looks reasonable in principle.  I didn't run through all the existing
PGSharedMemoryDetach calls to see if there are any other places to
call dsm_detach_all, but it's an easy fix if there are any.
        regards, tom lane



Re: on_exit_reset fails to clear DSM-related exit actions

From
Robert Haas
Date:
On Mon, Mar 17, 2014 at 1:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> After mulling over a few possible approaches, I came up with the
>> attached, which seems short and to the point.
>
> Looks reasonable in principle.  I didn't run through all the existing
> PGSharedMemoryDetach calls to see if there are any other places to
> call dsm_detach_all, but it's an easy fix if there are any.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company