Thread: Parallel Append subplan order instability on aye-aye

Parallel Append subplan order instability on aye-aye

From

Thomas Munro

Date:

20 May 2019, 03:36:29

Hi,

Here's a one-off regression test failure of a sort that commit
624e440a intended to fix.  a_star unexpectedly sorted higher.  I
checked the space weather forecast for this morning but no sign of
solar flares.  More seriously, it did the same in all 3 Parallel
Append queries.  Recent commits look irrelevant.  Could a UDP stats
packet dropped on the floor cause that?  Otherwise maybe you'd need a
weird result from FileSize() to explain it.  Based on log output no
other tests ran around the same time.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10

-- 
Thomas Munro
https://enterprisedb.com

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

20 May 2019, 04:46:25

Thomas Munro <thomas.munro@gmail.com> writes:
> Here's a one-off regression test failure of a sort that commit
> 624e440a intended to fix.

Note that in the discussion that led up to 624e440a, we never did
think that we'd completely explained the original irreproducible
failure.

I think I've seen a couple of other cases of this same failure
in the buildfarm recently, but too tired to go looking right now.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Thomas Munro

Date:

20 May 2019, 23:31:40

On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Here's a one-off regression test failure of a sort that commit
> > 624e440a intended to fix.
>
> Note that in the discussion that led up to 624e440a, we never did
> think that we'd completely explained the original irreproducible
> failure.
>
> I think I've seen a couple of other cases of this same failure
> in the buildfarm recently, but too tired to go looking right now.

I think it might be dependent on incidental vacuum/analyze activity
having updated reltuples.  With the attached script, I get the two
plan variants depending on whether I comment out "analyze a_star".  I
guess we should explicitly analyze these X_star tables somewhere?

-- 
Thomas Munro
https://enterprisedb.com

Attachment

explain-parallel-append.sql.txt

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

21 May 2019, 00:07:22

Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Note that in the discussion that led up to 624e440a, we never did
>> think that we'd completely explained the original irreproducible
>> failure.

> I think it might be dependent on incidental vacuum/analyze activity
> having updated reltuples.

The problem is to explain where said activity came from.  a_star and
its children are too small to attract autovacuum's attention.  They
get created/filled in create_table.sql/create_misc.sql, and then they
get explicitly vacuum'd by sanity_check.sql, and then after that
things are 100% stable.  Or should be.

There are some incidental ALTER TABLEs on them in misc.sql and
select_parallel.sql, but those shouldn't have any interesting
effects on the rowcount estimates ... and even if they do,
why would such effects not be reproducible?

So I'm not excited about sticking in an extra vacuum or analyze
without actually understanding why the irreproducible behavior
happens.  It's not exactly implausible that that'd make it
worse not better.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

David Rowley

Date:

21 May 2019, 00:43:03

On Tue, 21 May 2019 at 11:32, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > Here's a one-off regression test failure of a sort that commit
> > > 624e440a intended to fix.
> >
> > Note that in the discussion that led up to 624e440a, we never did
> > think that we'd completely explained the original irreproducible
> > failure.
> >
> > I think I've seen a couple of other cases of this same failure
> > in the buildfarm recently, but too tired to go looking right now.
>
> I think it might be dependent on incidental vacuum/analyze activity
> having updated reltuples.  With the attached script, I get the two
> plan variants depending on whether I comment out "analyze a_star".  I
> guess we should explicitly analyze these X_star tables somewhere?

That's the only theory I came up with yesterday when thinking about
this.  We can't really go adding an ANALYZE in a test in a parallel
group though since there'd be race conditions around other parallel
tests which could cause plan changes.

At the moment, these tables are only vacuumed in sanity_check.sql,
which as you can see is run by itself.

# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
# results. So it should not run parallel to other tests.
# ----------
test: sanity_check

I did add the following query just before the failing one and included
the expected output from below.  The tests pass for me in make check
and the post-upgrade test passes in make check-world too.  I guess we
could commit that and see if it fails along with the other mentioned
failure.  Alternatively, we could just invent some local tables
instead of using the ?_star tables and analyze them just before the
test, although, that does not guarantee a fix as there may be
something else to blame that we've not thought of.

select relname,last_vacuum is null,last_analyze is
null,last_autovacuum is null,last_autoanalyze is null from
pg_stat_all_tables where relname like '__star' order by relname;
 relname | ?column? | ?column? | ?column? | ?column?
---------+----------+----------+----------+----------
 a_star  | f        | t        | t        | t
 b_star  | f        | t        | t        | t
 c_star  | f        | t        | t        | t
 d_star  | f        | t        | t        | t
 e_star  | f        | t        | t        | t
 f_star  | f        | t        | t        | t
(6 rows)

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

21 May 2019, 03:15:47

Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Note that in the discussion that led up to 624e440a, we never did
>> think that we'd completely explained the original irreproducible
>> failure.
>>
>> I think I've seen a couple of other cases of this same failure
>> in the buildfarm recently, but too tired to go looking right now.

> I think it might be dependent on incidental vacuum/analyze activity
> having updated reltuples.

I got around to excavating in the buildfarm archives, and found a round
dozen of more-or-less-similar incidents.  I went back 18 months, which
by coincidence (i.e., I didn't realize it till just now) is just about
the time since 624e440a:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2018-01-14%2006%3A30%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-03-02%2011%3A30%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-11%2023%3A25%3A46
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-15%2000%3A02%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2018-04-05%2003%3A22%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07%2018%3A32%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-08%2019%3A55%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2018-04-23%2010%3A00%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-04-19%2001%3A50%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-23%2021%3A23%3A12
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-05-14%2014%3A59%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10

There are two really interesting things about this list:

* All the failures are on HEAD.  This implies that the issue was
not there when we forked off v11, else we'd surely have seen an
instance on that branch by now.  The dates above are consistent
with the idea that we eliminated the problem in roughly May 2018,
and then it came back about a month ago.  (Of course, maybe this
just traces to unrelated changes in test timing.)

* All the failures are in the pg_upgrade test (and some are before,
some after, we switched that from serial to parallel schedule).
This makes very little sense; how is that meaningfully different
from the buildfarm's straight-up invocations of "make check" and
"make installcheck"?

Note that I excluded a bunch of cases where we managed to run
select_parallel despite having suffered failures earlier in the
test run, typically failures that caused the sanity_check test
to not run.  These led to diffs in the X_star queries that look
roughly similar to these, but not the same.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

21 May 2019, 14:39:12

David Rowley <david.rowley@2ndquadrant.com> writes:
> I did add the following query just before the failing one and included
> the expected output from below.  The tests pass for me in make check
> and the post-upgrade test passes in make check-world too.  I guess we
> could commit that and see if it fails along with the other mentioned
> failure.

I'm thinking this is a good idea, although I think we could be more
aggressive about the data collected, as attached.  Since all of these
ought to be single-page tables, the relpages and reltuples counts
should be machine-independent.  In theory anyway.

            regards, tom lane

diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 0eca76c..9775cc8 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -89,6 +89,33 @@ select round(avg(aa)), sum(aa) from a_star a3;
     14 | 355
 (1 row)

+-- Temporary hack to investigate whether extra vacuum/analyze is happening
+select relname, relpages, reltuples
+from pg_class
+where relname like '__star' order by relname;
+ relname | relpages | reltuples
+---------+----------+-----------
+ a_star  |        1 |         3
+ b_star  |        1 |         4
+ c_star  |        1 |         4
+ d_star  |        1 |        16
+ e_star  |        1 |         7
+ f_star  |        1 |        16
+(6 rows)
+
+select relname, vacuum_count, analyze_count, autovacuum_count, autoanalyze_count
+from pg_stat_all_tables
+where relname like '__star' order by relname;
+ relname | vacuum_count | analyze_count | autovacuum_count | autoanalyze_count
+---------+--------------+---------------+------------------+-------------------
+ a_star  |            1 |             0 |                0 |                 0
+ b_star  |            1 |             0 |                0 |                 0
+ c_star  |            1 |             0 |                0 |                 0
+ d_star  |            1 |             0 |                0 |                 0
+ e_star  |            1 |             0 |                0 |                 0
+ f_star  |            1 |             0 |                0 |                 0
+(6 rows)
+
 -- Disable Parallel Append
 alter table a_star reset (parallel_workers);
 alter table b_star reset (parallel_workers);
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 03c056b..f96812b 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -36,6 +36,14 @@ explain (costs off)
   select round(avg(aa)), sum(aa) from a_star;
 select round(avg(aa)), sum(aa) from a_star a3;

+-- Temporary hack to investigate whether extra vacuum/analyze is happening
+select relname, relpages, reltuples
+from pg_class
+where relname like '__star' order by relname;
+select relname, vacuum_count, analyze_count, autovacuum_count, autoanalyze_count
+from pg_stat_all_tables
+where relname like '__star' order by relname;
+
 -- Disable Parallel Append
 alter table a_star reset (parallel_workers);
 alter table b_star reset (parallel_workers);

Re: Parallel Append subplan order instability on aye-aye

From

Thomas Munro

Date:

22 May 2019, 04:24:35

On Wed, May 22, 2019 at 2:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > I did add the following query just before the failing one and included
> > the expected output from below.  The tests pass for me in make check
> > and the post-upgrade test passes in make check-world too.  I guess we
> > could commit that and see if it fails along with the other mentioned
> > failure.
>
> I'm thinking this is a good idea, although I think we could be more
> aggressive about the data collected, as attached.  Since all of these
> ought to be single-page tables, the relpages and reltuples counts
> should be machine-independent.  In theory anyway.

Huh, idiacanthus failed showing vacuum_count 0, in select_parallel.
So ... the VACUUM command somehow skipped those tables?

-- 
Thomas Munro
https://enterprisedb.com

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

22 May 2019, 04:44:35

Thomas Munro <thomas.munro@gmail.com> writes:
> Huh, idiacanthus failed showing vacuum_count 0, in select_parallel.
> So ... the VACUUM command somehow skipped those tables?

No, because the reltuples counts are correct.  I think what we're
looking at there is the stats collector dropping a packet that
told it about vacuum activity.

I'm surprised that we saw such a failure so quickly.  I'd always
figured that the collector mechanism, while it's designed to be
unreliable, is only a little bit unreliable.  Maybe it's more
than a little bit.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Noah Misch

Date:

05 June 2019, 05:00:37

On Mon, May 20, 2019 at 11:15:47PM -0400, Tom Lane wrote:
> I got around to excavating in the buildfarm archives, and found a round
> dozen of more-or-less-similar incidents.  I went back 18 months, which
> by coincidence (i.e., I didn't realize it till just now) is just about
> the time since 624e440a:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2018-01-14%2006%3A30%3A02
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-03-02%2011%3A30%3A19
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-11%2023%3A25%3A46
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-15%2000%3A02%3A04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2018-04-05%2003%3A22%3A05
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07%2018%3A32%3A02
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-08%2019%3A55%3A06
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2018-04-23%2010%3A00%3A15
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-04-19%2001%3A50%3A08
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-23%2021%3A23%3A12
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-05-14%2014%3A59%3A43
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10

> * All the failures are in the pg_upgrade test (and some are before,
> some after, we switched that from serial to parallel schedule).
> This makes very little sense; how is that meaningfully different
> from the buildfarm's straight-up invocations of "make check" and
> "make installcheck"?

Two behaviors are unique to pg_upgrade's check and have been in place since
2018-01-14.  It uses "initdb --wal-segsize 1".  It creates three additional
databases, having long names.  Neither of those is clearly meaningful in this
context, but it would be a simple matter of programming to make pg_regress.c
do those things and see if the buildfarm starts witnessing this failure mode
outside the pg_upgrade check.

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

16 July 2019, 00:21:03

[ reviving a thread that's been idle for awhile ]

I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> Huh, idiacanthus failed showing vacuum_count 0, in select_parallel.
>> So ... the VACUUM command somehow skipped those tables?

> No, because the reltuples counts are correct.  I think what we're
> looking at there is the stats collector dropping a packet that
> told it about vacuum activity.

> I'm surprised that we saw such a failure so quickly.  I'd always
> figured that the collector mechanism, while it's designed to be
> unreliable, is only a little bit unreliable.  Maybe it's more
> than a little bit.

So that data-collection patch has been in place for nearly 2 months
(since 2019-05-21), and in that time we've seen a grand total of
no repeats of the original problem, as far as I've seen.  That's
fairly annoying considering we'd had four repeats in the month
prior to putting the patch in, but such is life.

In the meantime, we've had *lots* of buildfarm failures in the
added pg_stat_all_tables query, which indicate that indeed the
stats collector mechanism isn't terribly reliable.  But that
doesn't directly prove anything about the original problem,
since the planner doesn't look at stats collector data.

Anyway, I'm now starting to feel that these failures are more
of a pain than they're worth, especially since there's not much
reason to hope that the original problem will recur soon.

What I propose to do is remove the pg_stat_all_tables query
but keep the relpages/reltuples query.  That should fix the
buildfarm instability, but we can still hope to get at least
some insight if the original problem ever does recur.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

16 July 2019, 01:12:32

I wrote:
> So that data-collection patch has been in place for nearly 2 months
> (since 2019-05-21), and in that time we've seen a grand total of
> no repeats of the original problem, as far as I've seen.

Oh ... wait a minute.  I decided to go scrape the buildfarm logs to
confirm my impression that there were no matching failures, and darn
if I didn't find one:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-06-04%2021%3A00%3A22

For the archives' sake, that's a pg_upgradeCheck failure, and here
are the regression diffs:

=========================== regression.diffs ================
diff -w -U3
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
--- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
2019-05-2114:00:23 -0400 
+++
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
  2019-06-04 17:42:27 -0400 
@@ -21,12 +21,12 @@
          Workers Planned: 3
          ->  Partial Aggregate
                ->  Parallel Append
+                     ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on d_star
                      ->  Parallel Seq Scan on f_star
                      ->  Parallel Seq Scan on e_star
                      ->  Parallel Seq Scan on b_star
                      ->  Parallel Seq Scan on c_star
-                     ->  Parallel Seq Scan on a_star
 (11 rows)

 select round(avg(aa)), sum(aa) from a_star a1;
@@ -49,10 +49,10 @@
                ->  Parallel Append
                      ->  Seq Scan on d_star
                      ->  Seq Scan on c_star
+                     ->  Parallel Seq Scan on a_star
                      ->  Parallel Seq Scan on f_star
                      ->  Parallel Seq Scan on e_star
                      ->  Parallel Seq Scan on b_star
-                     ->  Parallel Seq Scan on a_star
 (11 rows)

 select round(avg(aa)), sum(aa) from a_star a2;
@@ -75,12 +75,12 @@
          Workers Planned: 3
          ->  Partial Aggregate
                ->  Parallel Append
+                     ->  Seq Scan on a_star
                      ->  Seq Scan on d_star
                      ->  Seq Scan on f_star
                      ->  Seq Scan on e_star
                      ->  Seq Scan on b_star
                      ->  Seq Scan on c_star
-                     ->  Seq Scan on a_star
 (11 rows)

 select round(avg(aa)), sum(aa) from a_star a3;
@@ -95,7 +95,7 @@
 where relname like '__star' order by relname;
  relname | relpages | reltuples
 ---------+----------+-----------
- a_star  |        1 |         3
+ a_star  |        0 |         0
  b_star  |        1 |         4
  c_star  |        1 |         4
  d_star  |        1 |        16
diff -w -U3 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out
--- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out    2019-05-21
14:00:23-0400 
+++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out
2019-06-0417:43:06 -0400 
@@ -205,7 +205,7 @@
 where relname like '__star' order by relname;
  relname | relpages | reltuples
 ---------+----------+-----------
- a_star  |        1 |         3
+ a_star  |        0 |         0
  b_star  |        1 |         4
  c_star  |        1 |         4
  d_star  |        1 |        16


This plan shape change matches some, though by no means all, of the
previous failures.  And we can now see why the planner did that: a_star
has a smaller recorded size than the other tables in the query.

So what happened there?  There's no diff in the pg_stat_all_tables
query, which proves that a vacuum on a_star did happen, since it
transmitted a vacuum_count increment to the stats collector.
It seems like there are two possible theories:

(1) The vacuum for some reason saw the table's size as zero
    (whereupon it'd read no blocks and count no tuples).
(2) The vacuum's update of the pg_class row failed to "take".

Theory (2) seems a bit more plausible, but still very unsettling.

The similar failures that this result doesn't exactly match
all look, in the light of this data, like some one of the "X_star"
tables unexpectedly moved to the top of the parallel plan, which
we can now hypothesize means that that table had zero relpages/
reltuples after supposedly being vacuumed.  So it's not only
a_star that's got the issue, which lets out my half-formed theory
that being the topmost parent of the inheritance hierarchy has
something to do with it.  But I bet that these tables forming
an inheritance hierarchy (with multiple inheritance even) does
have something to do with it somehow, because if this were a
generic VACUUM bug surely we'd be seeing it elsewhere.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Andres Freund

Date:

16 July 2019, 19:23:44

Hi,

On 2019-07-15 21:12:32 -0400, Tom Lane wrote:
> But I bet that these tables forming
> an inheritance hierarchy (with multiple inheritance even) does
> have something to do with it somehow, because if this were a
> generic VACUUM bug surely we'd be seeing it elsewhere.

It's possible that it's hidden in other cases, because of

void
table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
                                   BlockNumber *pages, double *tuples,
                                   double *allvisfrac,
                                   Size overhead_bytes_per_tuple,
                                   Size usable_bytes_per_page)
...
     * If the table has inheritance children, we don't apply this heuristic.
     * Totally empty parent tables are quite common, so we should be willing
     * to believe that they are empty.
     */
    if (curpages < 10 &&
        relpages == 0 &&
        !rel->rd_rel->relhassubclass)
        curpages = 10;

which'd not make us actually take a relpages=0 into account for tables
without inheritance.  A lot of these tables never get 10+ pages long, so
the heuristic would always apply...

Greetings,

Andres Freund

Re: Parallel Append subplan order instability on aye-aye

From

David Rowley

Date:

17 July 2019, 01:20:18

On Wed, 17 Jul 2019 at 07:23, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-07-15 21:12:32 -0400, Tom Lane wrote:
> > But I bet that these tables forming
> > an inheritance hierarchy (with multiple inheritance even) does
> > have something to do with it somehow, because if this were a
> > generic VACUUM bug surely we'd be seeing it elsewhere.
>
> It's possible that it's hidden in other cases, because of
>
> void
> table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
>                                                                    BlockNumber *pages, double *tuples,
>                                                                    double *allvisfrac,
>                                                                    Size overhead_bytes_per_tuple,
>                                                                    Size usable_bytes_per_page)
> ...
>          * If the table has inheritance children, we don't apply this heuristic.
>          * Totally empty parent tables are quite common, so we should be willing
>          * to believe that they are empty.
>          */
>         if (curpages < 10 &&
>                 relpages == 0 &&
>                 !rel->rd_rel->relhassubclass)
>                 curpages = 10;
>
> which'd not make us actually take a relpages=0 into account for tables
> without inheritance.  A lot of these tables never get 10+ pages long, so
> the heuristic would always apply...

Surely it can't be that since that just sets what *pages gets set to.
Tom mentioned that following was returning 0 pages and tuples:

-- Temporary hack to investigate whether extra vacuum/analyze is happening
select relname, relpages, reltuples
from pg_class
where relname like '__star' order by relname;
 relname | relpages | reltuples
---------+----------+-----------
 a_star  |        1 |         3

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

17 July 2019, 15:53:48

David Rowley <david.rowley@2ndquadrant.com> writes:
> Surely it can't be that since that just sets what *pages gets set to.
> Tom mentioned that following was returning 0 pages and tuples:

> -- Temporary hack to investigate whether extra vacuum/analyze is happening
> select relname, relpages, reltuples
> from pg_class
> where relname like '__star' order by relname;
>  relname | relpages | reltuples
> ---------+----------+-----------
>  a_star  |        1 |         3

I poked around a little and came up with a much simpler theory:
VACUUM will not change relpages/reltuples if it does not scan any pages
(cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343
in HEAD's vacuumlazy.c).  And, because sanity_check.sql's VACUUM is a
plain unaggressive vacuum, all that it takes to make it skip over a_star's
one page is for somebody else to have a pin on that page.  So a chance
collision with the bgwriter or checkpointer could cause the observed
symptom, not just for a_star but for the other single-page relations that
are at stake here.  Those pages are certainly dirty after create_misc.sql,
so it's hardly implausible for one of these processes to be holding pin
while trying to write out the buffer at the time sanity_check.sql runs.

A brute-force way to fix this (or at least reduce the odds quite a bit)
would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM,
thereby guaranteeing that none of these pages are still in need of being
written.  Not sure how much that'd penalize the regression tests' runtime,
or whether we'd have a loss of test coverage of VACUUM behaviors.

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Andres Freund

Date:

17 July 2019, 23:12:32

On 2019-07-17 11:53:48 -0400, Tom Lane wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > Surely it can't be that since that just sets what *pages gets set to.
> > Tom mentioned that following was returning 0 pages and tuples:
> 
> > -- Temporary hack to investigate whether extra vacuum/analyze is happening
> > select relname, relpages, reltuples
> > from pg_class
> > where relname like '__star' order by relname;
> >  relname | relpages | reltuples
> > ---------+----------+-----------
> >  a_star  |        1 |         3
> 
> I poked around a little and came up with a much simpler theory:
> VACUUM will not change relpages/reltuples if it does not scan any pages
> (cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343
> in HEAD's vacuumlazy.c).  And, because sanity_check.sql's VACUUM is a
> plain unaggressive vacuum, all that it takes to make it skip over a_star's
> one page is for somebody else to have a pin on that page.

I wonder if we could set log_min_messages to DEBUG2 on occasionally
failing machines to test that theory. That ought to hit

    appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ",
                                    "Skipped %u pages due to buffer pins, ",
                                    vacrelstats->pinskipped_pages),
        ...
    ereport(elevel,
            (errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
                    RelationGetRelationName(onerel),
                    tups_vacuumed, num_tuples,
                    vacrelstats->scanned_pages, nblocks),
             errdetail_internal("%s", buf.data)));



> So a chance
> collision with the bgwriter or checkpointer could cause the observed
> symptom, not just for a_star but for the other single-page relations that
> are at stake here.  Those pages are certainly dirty after create_misc.sql,
> so it's hardly implausible for one of these processes to be holding pin
> while trying to write out the buffer at the time sanity_check.sql runs.
> 
> A brute-force way to fix this (or at least reduce the odds quite a bit)
> would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM,
> thereby guaranteeing that none of these pages are still in need of being
> written.  Not sure how much that'd penalize the regression tests' runtime,
> or whether we'd have a loss of test coverage of VACUUM behaviors.

Alternatively we could VACUUM FREEZE the relevant tables? That then
ought to hit the blocking codepath in lazu_scan_heap()?

Greetings,

Andres Freund

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

17 July 2019, 23:20:04

Andres Freund <andres@anarazel.de> writes:
> On 2019-07-17 11:53:48 -0400, Tom Lane wrote:
>> A brute-force way to fix this (or at least reduce the odds quite a bit)
>> would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM,
>> thereby guaranteeing that none of these pages are still in need of being
>> written.  Not sure how much that'd penalize the regression tests' runtime,
>> or whether we'd have a loss of test coverage of VACUUM behaviors.

> Alternatively we could VACUUM FREEZE the relevant tables? That then
> ought to hit the blocking codepath in lazu_scan_heap()?

If we want to target just the X_star tables, I'd be inclined to do
an ANALYZE instead.  (Although that would create inheritance-tree
statistics, which might change some plan choices?  Haven't tried.)

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Thomas Munro

Date:

23 July 2019, 23:59:58

On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In the meantime, we've had *lots* of buildfarm failures in the
> added pg_stat_all_tables query, which indicate that indeed the
> stats collector mechanism isn't terribly reliable.  But that
> doesn't directly prove anything about the original problem,
> since the planner doesn't look at stats collector data.

I noticed that if you look at the list of failures of this type, there
are often pairs of animals belonging to Andres that failed at the same
time.  I wonder if he might be running a bunch of animals on one
kernel, and need to increase net.core.rmem_max and
net.core.rmem_default (or maybe the write side variants, or both, or
something like that).

-- 
Thomas Munro
https://enterprisedb.com

Re: Parallel Append subplan order instability on aye-aye

From

Thomas Munro

Date:

05 August 2019, 05:58:55

On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > In the meantime, we've had *lots* of buildfarm failures in the
> > added pg_stat_all_tables query, which indicate that indeed the
> > stats collector mechanism isn't terribly reliable.  But that
> > doesn't directly prove anything about the original problem,
> > since the planner doesn't look at stats collector data.
>
> I noticed that if you look at the list of failures of this type, there
> are often pairs of animals belonging to Andres that failed at the same
> time.  I wonder if he might be running a bunch of animals on one
> kernel, and need to increase net.core.rmem_max and
> net.core.rmem_default (or maybe the write side variants, or both, or
> something like that).

In further support of that theory, here are the counts of 'stats'
failures (excluding bogus reports due to crashes) for the past 90
days:

          owner          |    animal    | count
-------------------------+--------------+-------
 andres-AT-anarazel.de   | desmoxytes   |     5
 andres-AT-anarazel.de   | dragonet     |     9
 andres-AT-anarazel.de   | flaviventris |     1
 andres-AT-anarazel.de   | idiacanthus  |     5
 andres-AT-anarazel.de   | komodoensis  |    11
 andres-AT-anarazel.de   | pogona       |     1
 andres-AT-anarazel.de   | serinus      |     3
 andrew-AT-dunslane.net  | lorikeet     |     1
 buildfarm-AT-coelho.net | moonjelly    |     1
 buildfarm-AT-coelho.net | seawasp      |    17
 clarenceho-AT-gmail.com | mayfly       |     2

Andres's animals report the same hostname and run at the same time, so
it'd be interesting to know what net.core.rmem_max is set to and
whether these problems go away if it's cranked up 10x higher or
something.  In a quick test I can see that make installcheck is
capable of sending a *lot* of 936 byte messages in the same
millisecond.

-- 
Thomas Munro
https://enterprisedb.com

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

11 August 2019, 22:41:36

Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> In the meantime, we've had *lots* of buildfarm failures in the
>>> added pg_stat_all_tables query, which indicate that indeed the
>>> stats collector mechanism isn't terribly reliable.  But that
>>> doesn't directly prove anything about the original problem,
>>> since the planner doesn't look at stats collector data.

>> I noticed that if you look at the list of failures of this type, there
>> are often pairs of animals belonging to Andres that failed at the same
>> time.  I wonder if he might be running a bunch of animals on one
>> kernel, and need to increase net.core.rmem_max and
>> net.core.rmem_default (or maybe the write side variants, or both, or
>> something like that).

> Andres's animals report the same hostname and run at the same time, so
> it'd be interesting to know what net.core.rmem_max is set to and
> whether these problems go away if it's cranked up 10x higher or
> something.  In a quick test I can see that make installcheck is
> capable of sending a *lot* of 936 byte messages in the same
> millisecond.

Yeah.  I think we've had quite enough of the stats-transmission-related
failures, and they're no longer proving anything about the original
problem.  So I will go do what I proposed in mid-July and revert the
stats queries, while keeping the reltuples/relpages check.  (I'd kind
of like to get more confirmation that the plan shape change is associated
with those fields reading as zeroes, before we decide what to do about the
underlying instability.)

            regards, tom lane

Re: Parallel Append subplan order instability on aye-aye

From

Tom Lane

Date:

26 September 2019, 17:14:02

I wrote:
> Yeah.  I think we've had quite enough of the stats-transmission-related
> failures, and they're no longer proving anything about the original
> problem.  So I will go do what I proposed in mid-July and revert the
> stats queries, while keeping the reltuples/relpages check.  (I'd kind
> of like to get more confirmation that the plan shape change is associated
> with those fields reading as zeroes, before we decide what to do about the
> underlying instability.)

Christoph Berg's recent complaint reminded me to scan the buildfarm
database again for info related to this issue, and I found this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2019-07-02%2017%3A17%3A02

in which the failure diffs are

diff -U3 /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/select_parallel.out
/home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
--- /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/select_parallel.out
2019-05-2119:17:03.472207619 +0200 
+++
/home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
  2019-07-02 19:21:53.643095637 +0200 
@@ -98,7 +98,7 @@
  a_star  |        1 |         3
  b_star  |        1 |         4
  c_star  |        1 |         4
- d_star  |        1 |        16
+ d_star  |        0 |         0
  e_star  |        1 |         7
  f_star  |        1 |        16
 (6 rows)
@@ -130,7 +130,7 @@
 -----------------------------------------------------
  Finalize Aggregate
    ->  Gather
-         Workers Planned: 1
+         Workers Planned: 3
          ->  Partial Aggregate
                ->  Append
                      ->  Parallel Seq Scan on a_star
diff -U3 /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out
/home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out
--- /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out    2019-05-21
19:17:03.472207619+0200 
+++ /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out
2019-07-0219:21:57.891105601 +0200 
@@ -208,7 +208,7 @@
  a_star  |        1 |         3
  b_star  |        1 |         4
  c_star  |        1 |         4
- d_star  |        1 |        16
+ d_star  |        0 |         0
  e_star  |        1 |         7
  f_star  |        1 |        16
 (6 rows)

While this fails to show the plan ordering difference we were looking for,
it does show that relpages/reltuples can sometimes read as zeroes for one
of these tables.  (It also indicates that at least some of the
worker-count instability we've seen might trace to this same issue.)

That's the only related failure I could find in the last three months,
which makes me think that we've changed the regression tests enough that
the chance timing needed to cause this is (once again) very improbable.
So I'm prepared to give up waiting for more buildfarm evidence.

I propose to finish reverting f03a9ca43 in HEAD, and instead install
the attached in HEAD and v12.  This follows the upthread suggestions
from Thomas and myself to use ANALYZE to ensure that these tables
have the expected relpages/reltuples entries.

In principle, we might need this further back than v12, but without having
seen a test failure in the wild I'm not tempted to back-patch further.

            regards, tom lane

diff --git a/src/test/regress/expected/create_misc.out b/src/test/regress/expected/create_misc.out
index 8366841..cee35ed 100644
--- a/src/test/regress/expected/create_misc.out
+++ b/src/test/regress/expected/create_misc.out
@@ -130,6 +130,13 @@ INSERT INTO f_star (class, e) VALUES ('f', '-12'::int2);
 INSERT INTO f_star (class, f)
    VALUES ('f', '(11111111,33333333),(22222222,44444444)'::polygon);
 INSERT INTO f_star (class) VALUES ('f');
+-- Analyze the X_star tables for better plan stability in later tests
+ANALYZE a_star;
+ANALYZE b_star;
+ANALYZE c_star;
+ANALYZE d_star;
+ANALYZE e_star;
+ANALYZE f_star;
 --
 -- for internal portal (cursor) tests
 --
diff --git a/src/test/regress/sql/create_misc.sql b/src/test/regress/sql/create_misc.sql
index d4a63b7..d0b04a8 100644
--- a/src/test/regress/sql/create_misc.sql
+++ b/src/test/regress/sql/create_misc.sql
@@ -192,6 +192,14 @@ INSERT INTO f_star (class, f)

 INSERT INTO f_star (class) VALUES ('f');

+-- Analyze the X_star tables for better plan stability in later tests
+ANALYZE a_star;
+ANALYZE b_star;
+ANALYZE c_star;
+ANALYZE d_star;
+ANALYZE e_star;
+ANALYZE f_star;
+

 --
 -- for internal portal (cursor) tests