Thread: ParallelContexts can get confused about which worker is which
While testing last night, I discovered a serious case of brain fade in parallel.c; the same conceptual mistake has also spread to nodeGather.c. parallel.c creates an array of ParallelWorkerInfo structures, which are defined like this: typedef struct ParallelWorkerInfo { BackgroundWorkerHandle *bgwhandle; shm_mq_handle *error_mqh; int32 pid; } ParallelWorkerInfo; These structures are typically accessed as pcxt->worker[i]. The trouble is that pcxt->worker[i].bgwhandle is the bgwhandler for the i'th worker to be registered, while pcxt->worker[i].error_mqh is the error_mqh for the i'th worker to attach to the dynamic shared memory segment. But those might be totally different processes. This happens to mostly work, because the postmaster will probably start the processes in the same order that they are registered, and the operating system will probably schedule them in the same order the postmaster starts them. And if you only have one worker, then you're fine! It also seems to work out that if all workers exit cleanly, any shuffling of the background worker handles relative to the error queue handles is harmless. But it's still pretty broken. There seem to be two ways to fix this. One is to keep the bgwhandle objects in a separate array from the error_mqh objects and reconstruct after the fact what the mapping between those two sets of indexes is. This solution looks complicated to code and generally pretty annoying to get right. The other way to fix this is to pass down the index that the leader assigns to any given worker, and have the worker use that index instead of allocating its own separate index after connecting to the DSM segment. Unfortunately, there's not really a good way to pass that additional information down to the worker right now, but we could fix that pretty easily by adding an additional field to the BackgroundWorker structure, which the worker would then be able to access via MyBgworkerEntry. I suggest something like this: --- a/src/include/postmaster/bgworker.h +++ b/src/include/postmaster/bgworker.h @@ -74,6 +74,7 @@ typedef enum#define BGW_DEFAULT_RESTART_INTERVAL 60#define BGW_NEVER_RESTART -1#define BGW_MAXLEN 64 +#define BGW_EXTRALEN 128 typedef struct BackgroundWorker{ @@ -85,6 +86,7 @@ typedef struct BackgroundWorker char bgw_library_name[BGW_MAXLEN]; /* only if bgw_main is NULL */ char bgw_function_name[BGW_MAXLEN]; /* only if bgw_main is NULL */ Datum bgw_main_arg; + char bgw_extra[BGW_EXTRALEN]; pid_t bgw_notify_pid; /* SIGUSR1 this backend on start/stop*/} BackgroundWorker; The ability to pass down a little more information from the registering process to the background worker seems like it would be useful for more than just parallelism, so I imagine this change might be generally welcomed. Parallel workers would use the first four bytes of bgw_extra to store the worker index, and other users of the background worker facility could use it for whatever they like. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 30, 2015 at 11:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > The other way to fix this is to pass down the index > that the leader assigns to any given worker, and have the worker use > that index instead of allocating its own separate index after > connecting to the DSM segment. Unfortunately, there's not really a > good way to pass that additional information down to the worker right > now, but we could fix that pretty easily by adding an additional field > to the BackgroundWorker structure, which the worker would then be able > to access via MyBgworkerEntry. Here's a patch implementing that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sat, Oct 31, 2015 at 11:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Oct 30, 2015 at 11:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > The other way to fix this is to pass down the index
> > that the leader assigns to any given worker, and have the worker use
> > that index instead of allocating its own separate index after
> > connecting to the DSM segment. Unfortunately, there's not really a
> > good way to pass that additional information down to the worker right
> > now, but we could fix that pretty easily by adding an additional field
> > to the BackgroundWorker structure, which the worker would then be able
> > to access via MyBgworkerEntry.
>
> Here's a patch implementing that.
>
If we are going to add a new parameter to BackgroundWorker structure,
>
> On Fri, Oct 30, 2015 at 11:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > The other way to fix this is to pass down the index
> > that the leader assigns to any given worker, and have the worker use
> > that index instead of allocating its own separate index after
> > connecting to the DSM segment. Unfortunately, there's not really a
> > good way to pass that additional information down to the worker right
> > now, but we could fix that pretty easily by adding an additional field
> > to the BackgroundWorker structure, which the worker would then be able
> > to access via MyBgworkerEntry.
>
> Here's a patch implementing that.
>
If we are going to add a new parameter to BackgroundWorker structure,
then the same needs to be updated in docs [1] as well. I think adding
a new parameter to this structure might require some updations in
client applications. It seems worth to add a note for the same in commit
message, so that same could be reflected in Release Notes.
Also, I don't know why BGW_EXTRALEN needs to be 128 and not 64?
I guess you kept it so that in future if we need to pass more information,
then the same could be used which seems reasonable considering that
this is an exposed structure and we don't want to change it again.
slock_t mutex;
/* Maximum XactLastRecEnd of any worker. */
XLogRecPtr last_xlog_end;
} FixedParallelState;
Now as above mutex will be used to just protect last_xlog_end, do you
think it is better to modify the comment above mutex declaration?
[1] - http://www.postgresql.org/docs/devel/static/bgworker.html
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sun, Nov 1, 2015 at 1:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > If we are going to add a new parameter to BackgroundWorker structure, > then the same needs to be updated in docs [1] as well. Right, good point. > I think adding > a new parameter to this structure might require some updations in > client applications. It seems worth to add a note for the same in commit > message, so that same could be reflected in Release Notes. Client background worker modules need to be recompiled so that they know about the new structure size, but they shouldn't need any code changes. > Also, I don't know why BGW_EXTRALEN needs to be 128 and not 64? > I guess you kept it so that in future if we need to pass more information, > then the same could be used which seems reasonable considering that > this is an exposed structure and we don't want to change it again. Well, I only need 4 bytes for this purpose, but I figured other users of the background worker facility might like to have a little more. I don't think 64 vs. 128 matters very much; I'm happy to hear other opinions on this topic. What I think mostly matters is that we don't make it so much that the overhead of copying it would add substantially to background worker startup times. I assume that either 64 or 128 is safe in that regard. > /* Mutex protects remaining fields. */ > slock_t mutex; > > > /* Maximum XactLastRecEnd of any worker. */ > XLogRecPtr last_xlog_end; > > } FixedParallelState; > > Now as above mutex will be used to just protect last_xlog_end, do you > think it is better to modify the comment above mutex declaration? Meh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 2, 2015 at 4:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Nov 1, 2015 at 1:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> If we are going to add a new parameter to BackgroundWorker structure, >> then the same needs to be updated in docs [1] as well. > > Right, good point. Here's an updated patch with documentation added and two bugs fixed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Nov 3, 2015 at 8:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 2, 2015 at 4:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Sun, Nov 1, 2015 at 1:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> If we are going to add a new parameter to BackgroundWorker structure,
> >> then the same needs to be updated in docs [1] as well.
> >
> > Right, good point.
>
> Here's an updated patch with documentation added and two bugs fixed.
>
>
> On Mon, Nov 2, 2015 at 4:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Sun, Nov 1, 2015 at 1:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> If we are going to add a new parameter to BackgroundWorker structure,
> >> then the same needs to be updated in docs [1] as well.
> >
> > Right, good point.
>
> Here's an updated patch with documentation added and two bugs fixed.
>
Patch looks good to me.
On Tue, Nov 3, 2015 at 11:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Nov 3, 2015 at 8:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Nov 2, 2015 at 4:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> > On Sun, Nov 1, 2015 at 1:11 AM, Amit Kapila <amit.kapila16@gmail.com> >> > wrote: >> >> If we are going to add a new parameter to BackgroundWorker structure, >> >> then the same needs to be updated in docs [1] as well. >> > >> > Right, good point. >> >> Here's an updated patch with documentation added and two bugs fixed. >> > > Patch looks good to me. OK, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company