Thread: pgsql: Fix contrib/auto_explain to not cause problems in parallelworke

Fix contrib/auto_explain to not cause problems in parallel workers.

A parallel worker process should not be making any decisions of its
own about whether to auto-explain.  If the parent session process
passed down flags asking for instrumentation data, do that, otherwise
not.  Trying to enable instrumentation anyway leads to bugs like the
"could not find key N in shm TOC" failure reported in bug #15821
from Christian Hofstaedtler.

We can implement this cheaply by piggybacking on the existing logic
for not doing anything when we've chosen not to sample a statement.

While at it, clean up some tin-eared coding related to the sampling
feature, including an off-by-one error that meant that asking for 1.0
sampling rate didn't actually result in sampling every statement.

Although the specific case reported here only manifested in >= v11,
I believe that related misbehaviors can be demonstrated in any version
that has parallel query; and the off-by-one error is certainly there
back to 9.6 where that feature was added.  So back-patch to 9.6.

Discussion: https://postgr.es/m/15821-5eb422e980594075@postgresql.org

Branch
------
REL_10_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/ba38967d7567342bb547d8aafa8e9bab24396df4

Modified Files
--------------
contrib/auto_explain/auto_explain.c | 43 ++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 15 deletions(-)


Re: pgsql: Fix contrib/auto_explain to not cause problems inparallel worke

From
Michael Paquier
Date:
Hi Tom,

On Mon, Jun 03, 2019 at 10:06:21PM +0000, Tom Lane wrote:
> Fix contrib/auto_explain to not cause problems in parallel workers.
>
> A parallel worker process should not be making any decisions of its
> own about whether to auto-explain.  If the parent session process
> passed down flags asking for instrumentation data, do that, otherwise
> not.  Trying to enable instrumentation anyway leads to bugs like the
> "could not find key N in shm TOC" failure reported in bug #15821
> from Christian Hofstaedtler.

This has broken the builds on Windows for 9.6 and 10:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=woodlouse&dt=2019-06-04%2000%3A42%3A07&stg=make

Here is the failure:
auto_explain.obj : error LNK2001: unresolved external symbol
ParallelWorkerNumber [C:\buildfarm\buildenv\REL_10_STABLE\pgsql.build\auto_explain.vcxproj]

This gets included as a dependency because of the call to
IsParallelWorker() you have added in this commit, and I think that
this can be fixed by adding PGDLLIMPORT to the variable in
parallel.h.  Would you fix it or should I?
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Jun 03, 2019 at 10:06:21PM +0000, Tom Lane wrote:
>> Fix contrib/auto_explain to not cause problems in parallel workers.

> This has broken the builds on Windows for 9.6 and 10:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=woodlouse&dt=2019-06-04%2000%3A42%3A07&stg=make

Ooops.

> This gets included as a dependency because of the call to
> IsParallelWorker() you have added in this commit, and I think that
> this can be fixed by adding PGDLLIMPORT to the variable in
> parallel.h.  Would you fix it or should I?

Yeah, I see we already did that in later branches.  I'll go fix it,
thanks for the heads-up!

            regards, tom lane



Re: pgsql: Fix contrib/auto_explain to not cause problems inparallel worke

From
Michael Paquier
Date:
On Mon, Jun 03, 2019 at 09:16:56PM -0400, Tom Lane wrote:
> Yeah, I see we already did that in later branches.  I'll go fix it,
> thanks for the heads-up!

Thanks.
--
Michael

Attachment