Thread: Dependency between bgw_notify_pid and bgw_flags

Dependency between bgw_notify_pid and bgw_flags

From

Ashutosh Bapat

Date:

04 June 2015, 14:56:30

Hi,

Documentation here http://www.postgresql.org/docs/devel/static/bgworker.html does not indicate any relation between the fields bgw_notify_pid and bgw_flags of BackgroundWorker structure. But in one has to set BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature.

In BackgroundWorkerStateChange
318         /*
319          * Copy the PID to be notified about state changes, but only if the
320          * postmaster knows about a backend with that PID. It isn't an error
321          * if the postmaster doesn't know about the PID, because the backend
322          * that requested the worker could have died (or been killed) just
323          * after doing so. Nonetheless, at least until we get some experience
324          * with how this plays out in the wild, log a message at a relative
325          * high debug level.
326          */
327         rw->rw_worker.bgw_notify_pid = slot->worker.bgw_notify_pid;
328         if (!PostmasterMarkPIDForWorkerNotify(rw->rw_worker.bgw_notify_pid))
329         {
330             elog(DEBUG1, "worker notification PID %lu is not valid",
331                  (long) rw->rw_worker.bgw_notify_pid);
332             rw->rw_worker.bgw_notify_pid = 0;
333         }

bgw_notify_pid gets wiped out (and that too silently) if PostmasterMarkPIDForWorkerNotify() returns false. PostmasterMarkPIDForWorkerNotify() only looks at the BackendList, which does not contain all the background worker created by Register*BackgroundWorker() calls. Only a baground worker which has set BGWORKER_BACKEND_DATABASE_CONNECTION, gets added into BackendList in maybe_start_bgworker()

5629             if (rw->rw_worker.bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
5630             {
5631                 if (!assign_backendlist_entry(rw))
5632                     return;
5633             }
5634             else
5635                 rw->rw_child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
5636
5637             do_start_bgworker(rw);      /* sets rw->rw_pid */
5638
5639             if (rw->rw_backend)
5640             {
5641                 dlist_push_head(&BackendList, &rw->rw_backend->elem);

Should we change the documentation to say "one needs to set BGWORKER_BACKEND_DATABASE_CONNECTION" in order to use bgw_notify_pid feature? OR we should fix the code not to wipe out bgw_notify_pid in the code above (may be change PostmasterMarkPIDForWorkerNotify() to scan the list of other background workers as well)?

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Dependency between bgw_notify_pid and bgw_flags

From

Robert Haas

Date:

08 June 2015, 17:31:14

On Thu, Jun 4, 2015 at 8:52 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Documentation here http://www.postgresql.org/docs/devel/static/bgworker.html
> does not indicate any relation between the fields bgw_notify_pid and
> bgw_flags of BackgroundWorker structure. But in one has to set
> BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature.

The comments in maybe_start_bgworker make it fairly clear that this
was done intentionally, but they don't explain why:
                        * If not connected, we don't need a Backend
element, but we still                        * need a PMChildSlot.

But there's another comment elsewhere that says that the criteria is
shared-memory access, not database connection:
* Background workers that request shared memory access during registration are* in this list, too.

So at a minimum, the current situation is not self-consistent.

After studying this, I think it's a bug.  See this comment:
* Normal child backends can only be launched when we are in PM_RUN or* PM_HOT_STANDBY state.  (We also allow launch of
normal*child backends in PM_WAIT_BACKUP state, but only for superusers.)* In other states we handle connection requests
bylaunching "dead_end"* child processes, which will simply send the client an error message and* quit.  (We track these
inthe BackendList so that we can know when they* are all gone; this is important because they're still connected to
shared*memory, and would interfere with an attempt to destroy the shmem segment,* possibly leading to SHMALL failure
whenwe try to make a new one.)* In PM_WAIT_DEAD_END state we are waiting for all the dead_end children* to drain out of
thesystem, and therefore stop accepting connection* requests at all until the last existing child has quit (which
hopefully*will not be very long).

That comment seems to imply that, at the very least, all backends with
shared-memory access need to be part of BackendList.  But really, I'm
unclear why we'd ever want to exit without all background workers
having shut down, so maybe they should all be in there.  Isn't that
sorta the point of this facility?

Alvaro, any thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Dependency between bgw_notify_pid and bgw_flags

From

Alvaro Herrera

Date:

08 June 2015, 17:51:48

Robert Haas wrote:

> After studying this, I think it's a bug.  See this comment:
> 
>  * Normal child backends can only be launched when we are in PM_RUN or
>  * PM_HOT_STANDBY state.  (We also allow launch of normal
>  * child backends in PM_WAIT_BACKUP state, but only for superusers.)
>  * In other states we handle connection requests by launching "dead_end"
>  * child processes, which will simply send the client an error message and
>  * quit.  (We track these in the BackendList so that we can know when they
>  * are all gone; this is important because they're still connected to shared
>  * memory, and would interfere with an attempt to destroy the shmem segment,
>  * possibly leading to SHMALL failure when we try to make a new one.)
>  * In PM_WAIT_DEAD_END state we are waiting for all the dead_end children
>  * to drain out of the system, and therefore stop accepting connection
>  * requests at all until the last existing child has quit (which hopefully
>  * will not be very long).
> 
> That comment seems to imply that, at the very least, all backends with
> shared-memory access need to be part of BackendList.  But really, I'm
> unclear why we'd ever want to exit without all background workers
> having shut down, so maybe they should all be in there.  Isn't that
> sorta the point of this facility?
> 
> Alvaro, any thoughts?

As I recall, my thinking was:

* anything connected to shmem that crashes could leave things in bad
state (for example locks improperly held), whereas things not connected
to shmem could crash without it being a problem for the rest of the
system and thus do not require a full restart cycle.  This stuff is
detected with the PMChildSlot thingy; therefore all bgworkers with shmem
access should have one of these.

* I don't recall offhand what the criteria is for stuff to be in
postmaster's BackendList.  According to the comment on top of struct
Backend, bgworkers connected to shmem should be on that list, even if
they did not have the BGWORKER_BACKEND_DATABASE_CONNECTION flag on
registration.  So that would be a bug.

Does this help you any?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Dependency between bgw_notify_pid and bgw_flags

From

Robert Haas

Date:

08 June 2015, 17:53:03

On Mon, Jun 8, 2015 at 1:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>
>> After studying this, I think it's a bug.  See this comment:
>>
>>  * Normal child backends can only be launched when we are in PM_RUN or
>>  * PM_HOT_STANDBY state.  (We also allow launch of normal
>>  * child backends in PM_WAIT_BACKUP state, but only for superusers.)
>>  * In other states we handle connection requests by launching "dead_end"
>>  * child processes, which will simply send the client an error message and
>>  * quit.  (We track these in the BackendList so that we can know when they
>>  * are all gone; this is important because they're still connected to shared
>>  * memory, and would interfere with an attempt to destroy the shmem segment,
>>  * possibly leading to SHMALL failure when we try to make a new one.)
>>  * In PM_WAIT_DEAD_END state we are waiting for all the dead_end children
>>  * to drain out of the system, and therefore stop accepting connection
>>  * requests at all until the last existing child has quit (which hopefully
>>  * will not be very long).
>>
>> That comment seems to imply that, at the very least, all backends with
>> shared-memory access need to be part of BackendList.  But really, I'm
>> unclear why we'd ever want to exit without all background workers
>> having shut down, so maybe they should all be in there.  Isn't that
>> sorta the point of this facility?
>>
>> Alvaro, any thoughts?
>
> As I recall, my thinking was:
>
> * anything connected to shmem that crashes could leave things in bad
> state (for example locks improperly held), whereas things not connected
> to shmem could crash without it being a problem for the rest of the
> system and thus do not require a full restart cycle.  This stuff is
> detected with the PMChildSlot thingy; therefore all bgworkers with shmem
> access should have one of these.
>
> * I don't recall offhand what the criteria is for stuff to be in
> postmaster's BackendList.  According to the comment on top of struct
> Backend, bgworkers connected to shmem should be on that list, even if
> they did not have the BGWORKER_BACKEND_DATABASE_CONNECTION flag on
> registration.  So that would be a bug.
>
> Does this help you any?

Well, it sounds like we at least need to think about making it
consistently depend on shmem-access rather than database-connection.
I'm less sure what to do with workers that have not even got shmem
access.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Dependency between bgw_notify_pid and bgw_flags

From

Ashutosh Bapat

Date:

07 July 2015, 08:35:10

In CleanupBackgroundWorker(), we seem to differentiate between a background worker with shared memory access and a backend.

2914         /*
2915          * Additionally, for shared-memory-connected workers, just like a
2916          * backend, any exit status other than 0 or 1 is considered a crash
2917          * and causes a system-wide restart.
2918          */

There is an assertion on line 2943 which implies that a backend should have a database connection.

2939
2940         /* Get it out of the BackendList and clear out remaining data */
2941         if (rw->rw_backend)
2942         {
2943             Assert(rw->rw_worker.bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION);

A backend is a process created to handle a client connection [1], which is always connected to a database. After custom background workers were introduced we seem to have continued that notion. Hence only the background workers which request database connection are in BackendList. So, may be we should continue with that notion and correct the comment as "Background workers that request database connection during registration are in this list, too." or altogether delete that comment.

With that notion of backend, to fix the original problem I reported, PostmasterMarkPIDForWorkerNotify() should also look at the BackgroundWorkerList. As per the comments in the prologue of this function, it assumes that only backends need to be notified about worker state change, which looks too restrictive. Any backend or background worker, which wants to be notified about a background worker it requested to be spawned, should be allowed to do so.

5655 /*
5656 * When a backend asks to be notified about worker state changes, we
5657 * set a flag in its backend entry. The background worker machinery needs
5658 * to know when such backends exit.
5659 */
5660 bool
5661 PostmasterMarkPIDForWorkerNotify(int pid)

PFA the patch which changes PostmasterMarkPIDForWorkerNotify to look at BackgroundWorkerList as well.

The backends that request to be notified are marked using bgworker_notify, so that when such a backend dies the notifications to it can be turned off using BackgroundWorkerStopNotifications(). Now that we allow a background worker to be notified, we have to build similar flag in RegisteredBgWorker for the same purpose. The patch does that.
[1]. http://www.postgresql.org/developer/backend/

On Mon, Jun 8, 2015 at 11:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 8, 2015 at 1:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>
>> After studying this, I think it's a bug. See this comment:
>>
>> * Normal child backends can only be launched when we are in PM_RUN or
>> * PM_HOT_STANDBY state. (We also allow launch of normal
>> * child backends in PM_WAIT_BACKUP state, but only for superusers.)
>> * In other states we handle connection requests by launching "dead_end"
>> * child processes, which will simply send the client an error message and
>> * quit. (We track these in the BackendList so that we can know when they
>> * are all gone; this is important because they're still connected to shared
>> * memory, and would interfere with an attempt to destroy the shmem segment,
>> * possibly leading to SHMALL failure when we try to make a new one.)
>> * In PM_WAIT_DEAD_END state we are waiting for all the dead_end children
>> * to drain out of the system, and therefore stop accepting connection
>> * requests at all until the last existing child has quit (which hopefully
>> * will not be very long).
>>
>> That comment seems to imply that, at the very least, all backends with
>> shared-memory access need to be part of BackendList. But really, I'm
>> unclear why we'd ever want to exit without all background workers
>> having shut down, so maybe they should all be in there. Isn't that
>> sorta the point of this facility?
>>
>> Alvaro, any thoughts?
>
> As I recall, my thinking was:
>
> * anything connected to shmem that crashes could leave things in bad
> state (for example locks improperly held), whereas things not connected
> to shmem could crash without it being a problem for the rest of the
> system and thus do not require a full restart cycle. This stuff is
> detected with the PMChildSlot thingy; therefore all bgworkers with shmem
> access should have one of these.
>
> * I don't recall offhand what the criteria is for stuff to be in
> postmaster's BackendList. According to the comment on top of struct
> Backend, bgworkers connected to shmem should be on that list, even if
> they did not have the BGWORKER_BACKEND_DATABASE_CONNECTION flag on
> registration. So that would be a bug.
>
> Does this help you any?

Well, it sounds like we at least need to think about making it
consistently depend on shmem-access rather than database-connection.
I'm less sure what to do with workers that have not even got shmem
access.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_bgw_flags.patch

Re: Dependency between bgw_notify_pid and bgw_flags

From

Robert Haas

Date:

04 August 2015, 20:37:11

On Tue, Jul 7, 2015 at 4:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> With that notion of backend, to fix the original problem I reported,
> PostmasterMarkPIDForWorkerNotify() should also look at the
> BackgroundWorkerList. As per the comments in the prologue of this function,
> it assumes that only backends need to be notified about worker state change,
> which looks too restrictive. Any backend or background worker, which wants
> to be notified about a background worker it requested to be spawned, should
> be allowed to do so.

Yeah.  I'm wondering if we should fix this problem by just insisting
that all workers have an entry in BackendList.  At first look, this
seems like it would make the code a lot simpler and have virtually no
downside.  As it is, we're repeatedly reinventing new and different
ways for unconnected background workers to do things that work fine in
all other cases, and I don't see the point of that.

I haven't really tested the attached patch - sadly, we have no testing
of background workers without shared memory access in the tree - but
it shows what I have in mind.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

all-workers-in-backend-list.patch

Re: Dependency between bgw_notify_pid and bgw_flags

From

Ashutosh Bapat

Date:

05 August 2015, 07:33:35

On Wed, Aug 5, 2015 at 2:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 7, 2015 at 4:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> With that notion of backend, to fix the original problem I reported,
> PostmasterMarkPIDForWorkerNotify() should also look at the
> BackgroundWorkerList. As per the comments in the prologue of this function,
> it assumes that only backends need to be notified about worker state change,
> which looks too restrictive. Any backend or background worker, which wants
> to be notified about a background worker it requested to be spawned, should
> be allowed to do so.

Yeah. I'm wondering if we should fix this problem by just insisting
that all workers have an entry in BackendList. At first look, this
seems like it would make the code a lot simpler and have virtually no
downside. As it is, we're repeatedly reinventing new and different
ways for unconnected background workers to do things that work fine in
all other cases, and I don't see the point of that.

I haven't really tested the attached patch - sadly, we have no testing
of background workers without shared memory access in the tree - but
it shows what I have in mind.

Thoughts?

This idea looks good.

Looking at larger picture, we should also enable this feature to be used by auxilliary processes. It's very hard to add a new auxilliary process in current code. One has to go add code at many places to make sure that the auxilliary processes die and are re-started correctly. Even tougher to add a parent auxilliary process, which spawns multiple worker processes.That would be whole lot simpler if we could allow the auxilliary processes to use background worker infrastructure (which is what they are utlimately).

About the flags BGWORKER_BACKEND_DATABASE_CONNECTION and BGWORKER_SHMEM_ACCESS
BGWORKER_BACKEND_DATABASE_CONNECTION is used at seven places in the code: one is assertion, two check existence of this flag, when backend actually connects to a database, fourth checks whether BGWORKER_SHMEM_ACCESS is also set, fifth creates parallel workers with this flag, sixth uses the flag to add backend to the backed list (which you have removed). Seventh usage is only usage which installs signal handlers based on this flag, which too I guess can be overridden (or set correctly) by the actual background worker code.

BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit callbacks and detaches the shared memory segment from the background worker. That avoids a full cluster restart when one of those worker which can not corrupt shared memory dies. But I do not see any check to prevent such backend from calling PGSharedMemoryReattach()

So it looks like, it suffices to assume that background worker either needs to access shared memory or doesn't. Any background worker having shared memory access can also access database and thus becomes part of the backend list. Or may be we just avoid these flags and treat every background worker as if it passed both these flags. That will simplify a lot of code.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Dependency between bgw_notify_pid and bgw_flags

From

Alvaro Herrera

Date:

05 August 2015, 16:01:23

Ashutosh Bapat wrote:

> Looking at larger picture, we should also enable this feature to be
> used by auxilliary processes. It's very hard to add a new auxilliary
> process in current code.  One has to go add code at many places to
> make sure that the auxilliary processes die and are re-started
> correctly.

No kidding.

> Even tougher to add a parent auxilliary process, which spawns multiple
> worker processes.

I'm not sure about this point.  The autovacuum launcher, which is an
obvious candidate to have "children" processes, does not do that but
instead talks to postmaster to do it instead.  We did it that way to
avoid the danger of children processes connected to shared memory that
would not be direct children of postmaster; that could cause reliability
problems if the intermediate process fails to signal its children for
some reason.  Along the same lines I would suggest that other bgworker
processes should also be controllers only, that is it can ask postmaster
to start other processes but not start them directly.

> That
> would be whole lot simpler if we could allow the auxilliary processes to
> use background worker infrastructure (which is what they are utlimately).
> 
> About the flags BGWORKER_BACKEND_DATABASE_CONNECTION and
> BGWORKER_SHMEM_ACCESS
>  BGWORKER_BACKEND_DATABASE_CONNECTION is used at seven places in the code:
> one is assertion, two check existence of this flag, when backend actually
> connects to a database, fourth checks whether BGWORKER_SHMEM_ACCESS is also
> set, fifth creates parallel workers with this flag, sixth uses the flag to
> add backend to the backed list (which you have removed). Seventh usage is
> only usage which installs signal handlers based on this flag, which too I
> guess can be overridden (or set correctly) by the actual background worker
> code.
> 
> BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit
> callbacks and detaches the shared memory segment from the background
> worker. That avoids a full cluster restart when one of those worker which
> can not corrupt shared memory dies. But I do not see any check to prevent
> such backend from calling PGSharedMemoryReattach()
> 
> So it looks like, it suffices to assume that background worker either needs
> to access shared memory or doesn't. Any background worker having shared
> memory access can also access database and thus becomes part of the backend
> list. Or may be we just avoid these flags and treat every background worker
> as if it passed both these flags. That will simplify a lot of code.

If you want to make aux processes use the bgworker infrastructure (a
worthy goal I think), it is possible that more uses would be found for
those flags.  For instance, avlauncher is equivalent to a worker that
has SHMEM_ACCESS but no DATABASE_CONNECTION; maybe some of the code
would differ depending on whether or not DATABASE_CONNECTION is
specified.  The parallel to the avlauncher was the main reason I decided
to separate those two flags.  Therefore I wouldn't try to simplify just
yet.  If you succeed in making the avlauncher (and more generally all of
autovacuum) use bgworker code, perhaps that would be the time to search
for simplifications to make, because we would know more about how it
would be used.

From your list above it doesn't sound like DATABASE_CONNECTION actually
does anything useful other than sanity checks.  I wonder if the behavior
that avlauncher becomes part of the backend list would be sane (this is
what would happen if you simply remove the DATABASE_CONNECTION flag and
then turn avlauncher into a regular bgworker).

On further thought, given that almost every aux process has particular
timing needs for start/stop under different conditions, I am doubtful
that you could turn any of them into a bgworker.  Maybe pgstat would be
the only exception, perhaps bgwriter, not sure.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Dependency between bgw_notify_pid and bgw_flags

From

Robert Haas

Date:

08 August 2015, 14:16:32

On Wed, Aug 5, 2015 at 3:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> This idea looks good.

Thanks.  It needs testing though to see if it really works as
intended.  Can you look into that?

> Looking at larger picture, we should also enable this feature to be used by
> auxilliary processes. It's very hard to add a new auxilliary process in
> current code. One has to go add code at many places to make sure that the
> auxilliary processes die and are re-started correctly. Even tougher to add a
> parent auxilliary process, which spawns multiple worker processes.That would
> be whole lot simpler if we could allow the auxilliary processes to use
> background worker infrastructure (which is what they are utlimately).

That's a separate patch, but, sure, we could do that.  I agree with
Alvaro's comments: the postmaster should start all children.  Other
processes should just request that it do so.  We have two mechanisms
for that right now: the one used by bgworkers, and the one used by the
AV launcher.

> BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit
> callbacks and detaches the shared memory segment from the background worker.
> That avoids a full cluster restart when one of those worker which can not
> corrupt shared memory dies. But I do not see any check to prevent such
> backend from calling PGSharedMemoryReattach()

There isn't, but you shouldn't do that.  :-)

This is C code; you can't protect against actively malicious code.

> So it looks like, it suffices to assume that background worker either needs
> to access shared memory or doesn't. Any background worker having shared
> memory access can also access database and thus becomes part of the backend
> list. Or may be we just avoid these flags and treat every background worker
> as if it passed both these flags. That will simplify a lot of code.

I think it's useful to support workers that don't have shared memory
access at all, because those can crash without causing a system-wide
reset.  But I don't see the point in distinguishing between workers
with shared-memory access and those with a database connection.  I
mean, obviously the worker needs to be able to initialize itself
either way, but there seems to be no reason to force that to be
signalled in bgw_flags.  It can just depend on whether
BackgroundWorkerInitializeConnection gets called.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Dependency between bgw_notify_pid and bgw_flags

From

Ashutosh Bapat

Date:

31 August 2015, 12:01:40

On Sat, Aug 8, 2015 at 7:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 5, 2015 at 3:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> This idea looks good.

Thanks. It needs testing though to see if it really works as
intended. Can you look into that?

PFA the patch containing your code changes + test module. See if that meets your expectations.

> Looking at larger picture, we should also enable this feature to be used by
> auxilliary processes. It's very hard to add a new auxilliary process in
> current code. One has to go add code at many places to make sure that the
> auxilliary processes die and are re-started correctly. Even tougher to add a
> parent auxilliary process, which spawns multiple worker processes.That would
> be whole lot simpler if we could allow the auxilliary processes to use
> background worker infrastructure (which is what they are utlimately).

That's a separate patch, but, sure, we could do that. I agree with
Alvaro's comments: the postmaster should start all children. Other
processes should just request that it do so. We have two mechanisms
for that right now: the one used by bgworkers, and the one used by the
AV launcher.

BY children I really meant workers that it requests postmaster to start, not the OS definition of child.

> BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit
> callbacks and detaches the shared memory segment from the background worker.
> That avoids a full cluster restart when one of those worker which can not
> corrupt shared memory dies. But I do not see any check to prevent such
> backend from calling PGSharedMemoryReattach()

There isn't, but you shouldn't do that. :-)

This is C code; you can't protect against actively malicious code.

We have taken pains to check whether the worker was started with BGWORKER_BACKEND_DATABASE_CONNECTION flag, when it requests to connect to a database. I think it makes sense to do that with ACCESS_SHMEM flag as well. Otherwise, some buggy extension would connect to the shared memory and exit without postmaster restarting all the backends. Obvious one can argue that, memory corruption is possible even without this flag, but we should try to protect exposed interfaces.

> So it looks like, it suffices to assume that background worker either needs
> to access shared memory or doesn't. Any background worker having shared
> memory access can also access database and thus becomes part of the backend
> list. Or may be we just avoid these flags and treat every background worker
> as if it passed both these flags. That will simplify a lot of code.

I think it's useful to support workers that don't have shared memory
access at all, because those can crash without causing a system-wide
reset. But I don't see the point in distinguishing between workers
with shared-memory access and those with a database connection. I
mean, obviously the worker needs to be able to initialize itself
either way, but there seems to be no reason to force that to be
signalled in bgw_flags. It can just depend on whether
BackgroundWorkerInitializeConnection gets called.

+1.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_bgw_flags.patch

Re: Dependency between bgw_notify_pid and bgw_flags

From

Robert Haas

Date:

01 September 2015, 20:32:54

On Mon, Aug 31, 2015 at 8:01 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> Thanks.  It needs testing though to see if it really works as
>> intended.  Can you look into that?
>
> PFA the patch containing your code changes + test module. See if that meets
> your expectations.

Thanks.  I don't think this test module is suitable for commit for a
number of reasons, including the somewhat hackish use of exit(0)
instead of proper error reporting, the fact that you didn't integrate
it into the Makefile structure properly, and the fact that without the
postmaster.c changes it hangs forever instead of causing a test
failure.  But it's sufficient to show that the code changes have the
intended effect.

I've committed this and back-patched it to 9.5, but not further.  It's
a bug fix, but it's also a refactoring exercise, so I'd rather not
push it into released branches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Dependency between bgw_notify_pid and bgw_flags

From

Ashutosh Bapat

Date:

09 September 2015, 12:14:44

On Wed, Sep 2, 2015 at 2:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 31, 2015 at 8:01 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>> Thanks. It needs testing though to see if it really works as
>> intended. Can you look into that?
>
> PFA the patch containing your code changes + test module. See if that meets
> your expectations.

PFA patch with improved test module and fix for a bug.

bgworker_sigusr1_handler() should set the latch when set_latch_on_sigusr1 is true, similar to procsignal_sigusr1_handler(). Without this fix, if a background worker without DATABASE_CONNECTION flag calls WaitForBackgroundWorker*() functions, those functions wait indefinitely as the latch is never set upon receipt of SIGUSR1.

Thanks. I don't think this test module is suitable for commit for a
number of reasons, including the somewhat hackish use of exit(0)
instead of proper error reporting

I have changed this part of code.

, the fact that you didn't integrate
it into the Makefile structure properly

What was missing? I am able to make {check,clean,install) from the directory. I can also make -C <dirpath> check from repository's root.

and the fact that without the
postmaster.c changes it hangs forever instead of causing a test
failure.

Changed this too. The SQL level function test_bgwnotify() now errors out if it doesn't receive notification in specific time.

But it's sufficient to show that the code changes have the
intended effect.

Looking at the kind of bugs I am getting, we should commit this test module. Let me know your comments, I will fix those.

I've committed this and back-patched it to 9.5, but not further. It's
a bug fix, but it's also a refactoring exercise, so I'd rather not
push it into released branches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_test_bgwnotify.patch

Re: Dependency between bgw_notify_pid and bgw_flags

From

Robert Haas

Date:

09 September 2015, 13:28:09

On Wed, Sep 9, 2015 at 8:14 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> PFA patch with improved test module and fix for a bug.
>
> bgworker_sigusr1_handler() should set the latch when set_latch_on_sigusr1 is
> true, similar to procsignal_sigusr1_handler(). Without this fix, if a
> background worker without DATABASE_CONNECTION flag calls
> WaitForBackgroundWorker*() functions, those functions wait indefinitely as
> the latch is never set upon receipt of SIGUSR1.

This is hardly an insurmountable problem given that they can replace
bgworker_sigusr1_handler() with another handler whenever they like.
It might be a good idea anyway, but not without saving and restoring
errno.

>> Thanks.  I don't think this test module is suitable for commit for a
>> number of reasons, including the somewhat hackish use of exit(0)
>> instead of proper error reporting
> I have changed this part of code.

Maybe I should have been more clear: I don't really want a test module
for this.  Yeah, there was a bug, but we fixed it, and I don't see
that it's going to come back.  We might have a different one, but this
module is so special-purpose that it won't catch that.  If there are
some +1s to the idea of this test module from other people then I am
not unwilling to reconsider, but my opinion is that this is the wrong
thing to spend time on.  If we had some plan to build out a test
framework here that would really thoroughly exercise these code paths,
that might be valuable -- I'm not opposed to the idea of trying to
have better test coverage of the bgworker code.  But I just don't see
that this particular module does enough to make it worthwhile.

Considering that, I'm not really excited about spending a lot of time
on review right now, but FWIW I still think the error handling in here
is weak.  Why elog() and not ereport()?  Yeah, this is not user-facing
stuff exactly because it's just a test module, but where else do we
use elog() like this?  Why elog(WARNING) and proc_exit(1) instead of
elog(FATAL) or elog(PANIC)? The messages don't follow the style
guidelines either, like "found launcher in unexpected state (expected
status %d, got %d), exiting." -- that doesn't look anything like our
usual style for reporting errors.

>> , the fact that you didn't integrate
>> it into the Makefile structure properly
>
> What was missing? I am able to make {check,clean,install) from the
> directory. I can also make -C <dirpath> check from repository's root.

make check-world won't run it, right?  So there would be no BF coverage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company