Thread: Unstable select_parallel regression output in 12rc1

Unstable select_parallel regression output in 12rc1

From
Christoph Berg
Date:
Building the 12rc1 package on Ubuntu eoan/amd64, I got this
regression diff:

12:06:27 diff -U3 /<<PKGBUILDDIR>>/build/../src/test/regress/expected/select_parallel.out
/<<PKGBUILDDIR>>/build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
12:06:27 --- /<<PKGBUILDDIR>>/build/../src/test/regress/expected/select_parallel.out    2019-09-23 20:24:42.000000000
+0000
12:06:27 +++ /<<PKGBUILDDIR>>/build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out    2019-09-26
10:06:21.171683801+0000
 
12:06:27 @@ -21,8 +21,8 @@
12:06:27           Workers Planned: 3
12:06:27           ->  Partial Aggregate
12:06:27                 ->  Parallel Append
12:06:27 -                     ->  Parallel Seq Scan on d_star
12:06:27                       ->  Parallel Seq Scan on f_star
12:06:27 +                     ->  Parallel Seq Scan on d_star
12:06:27                       ->  Parallel Seq Scan on e_star
12:06:27                       ->  Parallel Seq Scan on b_star
12:06:27                       ->  Parallel Seq Scan on c_star
12:06:27 @@ -75,8 +75,8 @@
12:06:27           Workers Planned: 3
12:06:27           ->  Partial Aggregate
12:06:27                 ->  Parallel Append
12:06:27 -                     ->  Seq Scan on d_star
12:06:27                       ->  Seq Scan on f_star
12:06:27 +                     ->  Seq Scan on d_star
12:06:27                       ->  Seq Scan on e_star
12:06:27                       ->  Seq Scan on b_star
12:06:27                       ->  Seq Scan on c_star
12:06:27 @@ -103,7 +103,7 @@
12:06:27  -----------------------------------------------------
12:06:27   Finalize Aggregate
12:06:27     ->  Gather
12:06:27 -         Workers Planned: 1
12:06:27 +         Workers Planned: 3
12:06:27           ->  Partial Aggregate
12:06:27                 ->  Append
12:06:27                       ->  Parallel Seq Scan on a_star

Retriggering the build worked, though.

Christoph



Re: Unstable select_parallel regression output in 12rc1

From
Tom Lane
Date:
Christoph Berg <myon@debian.org> writes:
> Building the 12rc1 package on Ubuntu eoan/amd64, I got this
> regression diff:

The append-order differences have been seen before, per this thread:

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B0CxrKRWRMf5ymN3gm%2BBECHna2B-q1w8onKBep4HasUw%40mail.gmail.com

We haven't seen it in quite some time in HEAD, though I fear that's
just due to bad luck or change of timing of unrelated tests.  I've
been hoping to catch it in HEAD to validate the theory I posited in
<22315.1563378828@sss.pgh.pa.us>, but your report doesn't help because
the additional checking queries aren't there in the v12 branch :-(

> 12:06:27 @@ -103,7 +103,7 @@
> 12:06:27  -----------------------------------------------------
> 12:06:27   Finalize Aggregate
> 12:06:27     ->  Gather
> 12:06:27 -         Workers Planned: 1
> 12:06:27 +         Workers Planned: 3
> 12:06:27           ->  Partial Aggregate
> 12:06:27                 ->  Append
> 12:06:27                       ->  Parallel Seq Scan on a_star

We've also seen this on a semi-regular basis, and I've been intending
to bitch about it, though it didn't seem very useful to do so as long
as there were other instabilities in the regression tests.  What we
could do, perhaps, is feed the plan output through a filter that
suppresses the exact number-of-workers value.  There's precedent
for such plan-filtering elsewhere in the tests already.

            regards, tom lane



Re: Unstable select_parallel regression output in 12rc1

From
Christoph Berg
Date:
Re: Tom Lane 2019-09-26 <12685.1569510771@sss.pgh.pa.us>
> We haven't seen it in quite some time in HEAD, though I fear that's
> just due to bad luck or change of timing of unrelated tests.

The v13 package builds that are running every 6h here haven't seen a
problem yet either, so the probability of triggering it seems very
low. So it's not a pressing problem. (There's some extension modules
where the testsuite fails at a much higher rate, getting all targets
to pass at the same time is next to impossible there :(. )

Christoph



Re: Unstable select_parallel regression output in 12rc1

From
Tom Lane
Date:
Christoph Berg <myon@debian.org> writes:
> Re: Tom Lane 2019-09-26 <12685.1569510771@sss.pgh.pa.us>
>> We haven't seen it in quite some time in HEAD, though I fear that's
>> just due to bad luck or change of timing of unrelated tests.

> The v13 package builds that are running every 6h here haven't seen a
> problem yet either, so the probability of triggering it seems very
> low. So it's not a pressing problem.

I've pushed some changes to try to ameliorate the issue.

> (There's some extension modules
> where the testsuite fails at a much higher rate, getting all targets
> to pass at the same time is next to impossible there :(. )

I feel your pain, believe me.  Used to fight the same kind of problems
when I was at Red Hat.  Are any of those extension modules part of
Postgres?

            regards, tom lane



Re: Unstable select_parallel regression output in 12rc1

From
Christoph Berg
Date:
Re: Tom Lane 2019-09-28 <24917.1569692191@sss.pgh.pa.us>
> > (There's some extension modules
> > where the testsuite fails at a much higher rate, getting all targets
> > to pass at the same time is next to impossible there :(. )
> 
> I feel your pain, believe me.  Used to fight the same kind of problems
> when I was at Red Hat.  Are any of those extension modules part of
> Postgres?

No, external ones. The main offenders at the moment are pglogical and
patroni (admittedly not an extension in the strict sense). Both have
extensive testsuites that exercise replication scenarios that are
prone to race conditions. (Maybe we should just run less tests for the
packaging.)

Christoph