Thread: pgsql: Fix contrib/auto_explain to not cause problems in parallelworke
Fix contrib/auto_explain to not cause problems in parallel workers. A parallel worker process should not be making any decisions of its own about whether to auto-explain. If the parent session process passed down flags asking for instrumentation data, do that, otherwise not. Trying to enable instrumentation anyway leads to bugs like the "could not find key N in shm TOC" failure reported in bug #15821 from Christian Hofstaedtler. We can implement this cheaply by piggybacking on the existing logic for not doing anything when we've chosen not to sample a statement. While at it, clean up some tin-eared coding related to the sampling feature, including an off-by-one error that meant that asking for 1.0 sampling rate didn't actually result in sampling every statement. Although the specific case reported here only manifested in >= v11, I believe that related misbehaviors can be demonstrated in any version that has parallel query; and the off-by-one error is certainly there back to 9.6 where that feature was added. So back-patch to 9.6. Discussion: https://postgr.es/m/15821-5eb422e980594075@postgresql.org Branch ------ REL_10_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/ba38967d7567342bb547d8aafa8e9bab24396df4 Modified Files -------------- contrib/auto_explain/auto_explain.c | 43 ++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 15 deletions(-)
Re: pgsql: Fix contrib/auto_explain to not cause problems inparallel worke
From
Michael Paquier
Date:
Hi Tom, On Mon, Jun 03, 2019 at 10:06:21PM +0000, Tom Lane wrote: > Fix contrib/auto_explain to not cause problems in parallel workers. > > A parallel worker process should not be making any decisions of its > own about whether to auto-explain. If the parent session process > passed down flags asking for instrumentation data, do that, otherwise > not. Trying to enable instrumentation anyway leads to bugs like the > "could not find key N in shm TOC" failure reported in bug #15821 > from Christian Hofstaedtler. This has broken the builds on Windows for 9.6 and 10: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=woodlouse&dt=2019-06-04%2000%3A42%3A07&stg=make Here is the failure: auto_explain.obj : error LNK2001: unresolved external symbol ParallelWorkerNumber [C:\buildfarm\buildenv\REL_10_STABLE\pgsql.build\auto_explain.vcxproj] This gets included as a dependency because of the call to IsParallelWorker() you have added in this commit, and I think that this can be fixed by adding PGDLLIMPORT to the variable in parallel.h. Would you fix it or should I? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Jun 03, 2019 at 10:06:21PM +0000, Tom Lane wrote: >> Fix contrib/auto_explain to not cause problems in parallel workers. > This has broken the builds on Windows for 9.6 and 10: > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=woodlouse&dt=2019-06-04%2000%3A42%3A07&stg=make Ooops. > This gets included as a dependency because of the call to > IsParallelWorker() you have added in this commit, and I think that > this can be fixed by adding PGDLLIMPORT to the variable in > parallel.h. Would you fix it or should I? Yeah, I see we already did that in later branches. I'll go fix it, thanks for the heads-up! regards, tom lane
Re: pgsql: Fix contrib/auto_explain to not cause problems inparallel worke
From
Michael Paquier
Date:
On Mon, Jun 03, 2019 at 09:16:56PM -0400, Tom Lane wrote: > Yeah, I see we already did that in later branches. I'll go fix it, > thanks for the heads-up! Thanks. -- Michael