Re: Parallel Append subplan order instability on aye-aye - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Parallel Append subplan order instability on aye-aye
Date
Msg-id 4174.1563239552@sss.pgh.pa.us
Whole thread Raw
In response to Re: Parallel Append subplan order instability on aye-aye  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Parallel Append subplan order instability on aye-aye  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I wrote:
> So that data-collection patch has been in place for nearly 2 months
> (since 2019-05-21), and in that time we've seen a grand total of
> no repeats of the original problem, as far as I've seen.

Oh ... wait a minute.  I decided to go scrape the buildfarm logs to
confirm my impression that there were no matching failures, and darn
if I didn't find one:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-06-04%2021%3A00%3A22

For the archives' sake, that's a pg_upgradeCheck failure, and here
are the regression diffs:

=========================== regression.diffs ================
diff -w -U3
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
--- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
2019-05-2114:00:23 -0400 
+++
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
  2019-06-04 17:42:27 -0400 
@@ -21,12 +21,12 @@
          Workers Planned: 3
          ->  Partial Aggregate
                ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on d_star
                      ->  Parallel Seq Scan on f_star
                      ->  Parallel Seq Scan on e_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
-                     ->  Parallel Seq Scan on a_star
 (11 rows)

 select round(avg(aa)), sum(aa) from a_star a1;
@@ -49,10 +49,10 @@
                ->  Parallel Append
                      ->  Seq Scan on d_star
                      ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on f_star
                      ->  Parallel Seq Scan on e_star
                      ->  Parallel Seq Scan on b_star
-                     ->  Parallel Seq Scan on a_star
 (11 rows)

 select round(avg(aa)), sum(aa) from a_star a2;
@@ -75,12 +75,12 @@
          Workers Planned: 3
          ->  Partial Aggregate
                ->  Parallel Append
+                     ->  Seq Scan on a_star
                      ->  Seq Scan on d_star
                      ->  Seq Scan on f_star
                      ->  Seq Scan on e_star
                      ->  Seq Scan on b_star
                      ->  Seq Scan on c_star
-                     ->  Seq Scan on a_star
 (11 rows)

 select round(avg(aa)), sum(aa) from a_star a3;
@@ -95,7 +95,7 @@
 where relname like '__star' order by relname;
  relname | relpages | reltuples
 ---------+----------+-----------
- a_star  |        1 |         3
+ a_star  |        0 |         0
  b_star  |        1 |         4
  c_star  |        1 |         4
  d_star  |        1 |        16
diff -w -U3 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out
--- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out    2019-05-21
14:00:23-0400 
+++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out
2019-06-0417:43:06 -0400 
@@ -205,7 +205,7 @@
 where relname like '__star' order by relname;
  relname | relpages | reltuples
 ---------+----------+-----------
- a_star  |        1 |         3
+ a_star  |        0 |         0
  b_star  |        1 |         4
  c_star  |        1 |         4
  d_star  |        1 |        16


This plan shape change matches some, though by no means all, of the
previous failures.  And we can now see why the planner did that: a_star
has a smaller recorded size than the other tables in the query.

So what happened there?  There's no diff in the pg_stat_all_tables
query, which proves that a vacuum on a_star did happen, since it
transmitted a vacuum_count increment to the stats collector.
It seems like there are two possible theories:

(1) The vacuum for some reason saw the table's size as zero
    (whereupon it'd read no blocks and count no tuples).
(2) The vacuum's update of the pg_class row failed to "take".

Theory (2) seems a bit more plausible, but still very unsettling.

The similar failures that this result doesn't exactly match
all look, in the light of this data, like some one of the "X_star"
tables unexpectedly moved to the top of the parallel plan, which
we can now hypothesize means that that table had zero relpages/
reltuples after supposedly being vacuumed.  So it's not only
a_star that's got the issue, which lets out my half-formed theory
that being the topmost parent of the inheritance hierarchy has
something to do with it.  But I bet that these tables forming
an inheritance hierarchy (with multiple inheritance even) does
have something to do with it somehow, because if this were a
generic VACUUM bug surely we'd be seeing it elsewhere.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: doc: mention pg_reload_conf() in pg_hba.conf documentation
Next
From: David Rowley
Date:
Subject: Re: Change ereport level for QueuePartitionConstraintValidation