Thread: Parallel Append subplan order instability on aye-aye
Hi, Here's a one-off regression test failure of a sort that commit 624e440a intended to fix. a_star unexpectedly sorted higher. I checked the space weather forecast for this morning but no sign of solar flares. More seriously, it did the same in all 3 Parallel Append queries. Recent commits look irrelevant. Could a UDP stats packet dropped on the floor cause that? Otherwise maybe you'd need a weird result from FileSize() to explain it. Based on log output no other tests ran around the same time. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10 -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > Here's a one-off regression test failure of a sort that commit > 624e440a intended to fix. Note that in the discussion that led up to 624e440a, we never did think that we'd completely explained the original irreproducible failure. I think I've seen a couple of other cases of this same failure in the buildfarm recently, but too tired to go looking right now. regards, tom lane
On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Here's a one-off regression test failure of a sort that commit > > 624e440a intended to fix. > > Note that in the discussion that led up to 624e440a, we never did > think that we'd completely explained the original irreproducible > failure. > > I think I've seen a couple of other cases of this same failure > in the buildfarm recently, but too tired to go looking right now. I think it might be dependent on incidental vacuum/analyze activity having updated reltuples. With the attached script, I get the two plan variants depending on whether I comment out "analyze a_star". I guess we should explicitly analyze these X_star tables somewhere? -- Thomas Munro https://enterprisedb.com
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Note that in the discussion that led up to 624e440a, we never did >> think that we'd completely explained the original irreproducible >> failure. > I think it might be dependent on incidental vacuum/analyze activity > having updated reltuples. The problem is to explain where said activity came from. a_star and its children are too small to attract autovacuum's attention. They get created/filled in create_table.sql/create_misc.sql, and then they get explicitly vacuum'd by sanity_check.sql, and then after that things are 100% stable. Or should be. There are some incidental ALTER TABLEs on them in misc.sql and select_parallel.sql, but those shouldn't have any interesting effects on the rowcount estimates ... and even if they do, why would such effects not be reproducible? So I'm not excited about sticking in an extra vacuum or analyze without actually understanding why the irreproducible behavior happens. It's not exactly implausible that that'd make it worse not better. regards, tom lane
On Tue, 21 May 2019 at 11:32, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > Here's a one-off regression test failure of a sort that commit > > > 624e440a intended to fix. > > > > Note that in the discussion that led up to 624e440a, we never did > > think that we'd completely explained the original irreproducible > > failure. > > > > I think I've seen a couple of other cases of this same failure > > in the buildfarm recently, but too tired to go looking right now. > > I think it might be dependent on incidental vacuum/analyze activity > having updated reltuples. With the attached script, I get the two > plan variants depending on whether I comment out "analyze a_star". I > guess we should explicitly analyze these X_star tables somewhere? That's the only theory I came up with yesterday when thinking about this. We can't really go adding an ANALYZE in a test in a parallel group though since there'd be race conditions around other parallel tests which could cause plan changes. At the moment, these tables are only vacuumed in sanity_check.sql, which as you can see is run by itself. # ---------- # sanity_check does a vacuum, affecting the sort order of SELECT * # results. So it should not run parallel to other tests. # ---------- test: sanity_check I did add the following query just before the failing one and included the expected output from below. The tests pass for me in make check and the post-upgrade test passes in make check-world too. I guess we could commit that and see if it fails along with the other mentioned failure. Alternatively, we could just invent some local tables instead of using the ?_star tables and analyze them just before the test, although, that does not guarantee a fix as there may be something else to blame that we've not thought of. select relname,last_vacuum is null,last_analyze is null,last_autovacuum is null,last_autoanalyze is null from pg_stat_all_tables where relname like '__star' order by relname; relname | ?column? | ?column? | ?column? | ?column? ---------+----------+----------+----------+---------- a_star | f | t | t | t b_star | f | t | t | t c_star | f | t | t | t d_star | f | t | t | t e_star | f | t | t | t f_star | f | t | t | t (6 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Note that in the discussion that led up to 624e440a, we never did >> think that we'd completely explained the original irreproducible >> failure. >> >> I think I've seen a couple of other cases of this same failure >> in the buildfarm recently, but too tired to go looking right now. > I think it might be dependent on incidental vacuum/analyze activity > having updated reltuples. I got around to excavating in the buildfarm archives, and found a round dozen of more-or-less-similar incidents. I went back 18 months, which by coincidence (i.e., I didn't realize it till just now) is just about the time since 624e440a: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2018-01-14%2006%3A30%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-03-02%2011%3A30%3A19 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-11%2023%3A25%3A46 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-15%2000%3A02%3A04 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2018-04-05%2003%3A22%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07%2018%3A32%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-08%2019%3A55%3A06 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2018-04-23%2010%3A00%3A15 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-04-19%2001%3A50%3A08 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-23%2021%3A23%3A12 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-05-14%2014%3A59%3A43 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10 There are two really interesting things about this list: * All the failures are on HEAD. This implies that the issue was not there when we forked off v11, else we'd surely have seen an instance on that branch by now. The dates above are consistent with the idea that we eliminated the problem in roughly May 2018, and then it came back about a month ago. (Of course, maybe this just traces to unrelated changes in test timing.) * All the failures are in the pg_upgrade test (and some are before, some after, we switched that from serial to parallel schedule). This makes very little sense; how is that meaningfully different from the buildfarm's straight-up invocations of "make check" and "make installcheck"? Note that I excluded a bunch of cases where we managed to run select_parallel despite having suffered failures earlier in the test run, typically failures that caused the sanity_check test to not run. These led to diffs in the X_star queries that look roughly similar to these, but not the same. regards, tom lane
David Rowley <david.rowley@2ndquadrant.com> writes: > I did add the following query just before the failing one and included > the expected output from below. The tests pass for me in make check > and the post-upgrade test passes in make check-world too. I guess we > could commit that and see if it fails along with the other mentioned > failure. I'm thinking this is a good idea, although I think we could be more aggressive about the data collected, as attached. Since all of these ought to be single-page tables, the relpages and reltuples counts should be machine-independent. In theory anyway. regards, tom lane diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out index 0eca76c..9775cc8 100644 --- a/src/test/regress/expected/select_parallel.out +++ b/src/test/regress/expected/select_parallel.out @@ -89,6 +89,33 @@ select round(avg(aa)), sum(aa) from a_star a3; 14 | 355 (1 row) +-- Temporary hack to investigate whether extra vacuum/analyze is happening +select relname, relpages, reltuples +from pg_class +where relname like '__star' order by relname; + relname | relpages | reltuples +---------+----------+----------- + a_star | 1 | 3 + b_star | 1 | 4 + c_star | 1 | 4 + d_star | 1 | 16 + e_star | 1 | 7 + f_star | 1 | 16 +(6 rows) + +select relname, vacuum_count, analyze_count, autovacuum_count, autoanalyze_count +from pg_stat_all_tables +where relname like '__star' order by relname; + relname | vacuum_count | analyze_count | autovacuum_count | autoanalyze_count +---------+--------------+---------------+------------------+------------------- + a_star | 1 | 0 | 0 | 0 + b_star | 1 | 0 | 0 | 0 + c_star | 1 | 0 | 0 | 0 + d_star | 1 | 0 | 0 | 0 + e_star | 1 | 0 | 0 | 0 + f_star | 1 | 0 | 0 | 0 +(6 rows) + -- Disable Parallel Append alter table a_star reset (parallel_workers); alter table b_star reset (parallel_workers); diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql index 03c056b..f96812b 100644 --- a/src/test/regress/sql/select_parallel.sql +++ b/src/test/regress/sql/select_parallel.sql @@ -36,6 +36,14 @@ explain (costs off) select round(avg(aa)), sum(aa) from a_star; select round(avg(aa)), sum(aa) from a_star a3; +-- Temporary hack to investigate whether extra vacuum/analyze is happening +select relname, relpages, reltuples +from pg_class +where relname like '__star' order by relname; +select relname, vacuum_count, analyze_count, autovacuum_count, autoanalyze_count +from pg_stat_all_tables +where relname like '__star' order by relname; + -- Disable Parallel Append alter table a_star reset (parallel_workers); alter table b_star reset (parallel_workers);
On Wed, May 22, 2019 at 2:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > > I did add the following query just before the failing one and included > > the expected output from below. The tests pass for me in make check > > and the post-upgrade test passes in make check-world too. I guess we > > could commit that and see if it fails along with the other mentioned > > failure. > > I'm thinking this is a good idea, although I think we could be more > aggressive about the data collected, as attached. Since all of these > ought to be single-page tables, the relpages and reltuples counts > should be machine-independent. In theory anyway. Huh, idiacanthus failed showing vacuum_count 0, in select_parallel. So ... the VACUUM command somehow skipped those tables? -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > Huh, idiacanthus failed showing vacuum_count 0, in select_parallel. > So ... the VACUUM command somehow skipped those tables? No, because the reltuples counts are correct. I think what we're looking at there is the stats collector dropping a packet that told it about vacuum activity. I'm surprised that we saw such a failure so quickly. I'd always figured that the collector mechanism, while it's designed to be unreliable, is only a little bit unreliable. Maybe it's more than a little bit. regards, tom lane
On Mon, May 20, 2019 at 11:15:47PM -0400, Tom Lane wrote: > I got around to excavating in the buildfarm archives, and found a round > dozen of more-or-less-similar incidents. I went back 18 months, which > by coincidence (i.e., I didn't realize it till just now) is just about > the time since 624e440a: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2018-01-14%2006%3A30%3A02 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-03-02%2011%3A30%3A19 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-11%2023%3A25%3A46 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-15%2000%3A02%3A04 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2018-04-05%2003%3A22%3A05 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07%2018%3A32%3A02 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-08%2019%3A55%3A06 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2018-04-23%2010%3A00%3A15 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-04-19%2001%3A50%3A08 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-23%2021%3A23%3A12 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-05-14%2014%3A59%3A43 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10 > * All the failures are in the pg_upgrade test (and some are before, > some after, we switched that from serial to parallel schedule). > This makes very little sense; how is that meaningfully different > from the buildfarm's straight-up invocations of "make check" and > "make installcheck"? Two behaviors are unique to pg_upgrade's check and have been in place since 2018-01-14. It uses "initdb --wal-segsize 1". It creates three additional databases, having long names. Neither of those is clearly meaningful in this context, but it would be a simple matter of programming to make pg_regress.c do those things and see if the buildfarm starts witnessing this failure mode outside the pg_upgrade check.
[ reviving a thread that's been idle for awhile ] I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> Huh, idiacanthus failed showing vacuum_count 0, in select_parallel. >> So ... the VACUUM command somehow skipped those tables? > No, because the reltuples counts are correct. I think what we're > looking at there is the stats collector dropping a packet that > told it about vacuum activity. > I'm surprised that we saw such a failure so quickly. I'd always > figured that the collector mechanism, while it's designed to be > unreliable, is only a little bit unreliable. Maybe it's more > than a little bit. So that data-collection patch has been in place for nearly 2 months (since 2019-05-21), and in that time we've seen a grand total of no repeats of the original problem, as far as I've seen. That's fairly annoying considering we'd had four repeats in the month prior to putting the patch in, but such is life. In the meantime, we've had *lots* of buildfarm failures in the added pg_stat_all_tables query, which indicate that indeed the stats collector mechanism isn't terribly reliable. But that doesn't directly prove anything about the original problem, since the planner doesn't look at stats collector data. Anyway, I'm now starting to feel that these failures are more of a pain than they're worth, especially since there's not much reason to hope that the original problem will recur soon. What I propose to do is remove the pg_stat_all_tables query but keep the relpages/reltuples query. That should fix the buildfarm instability, but we can still hope to get at least some insight if the original problem ever does recur. regards, tom lane
I wrote: > So that data-collection patch has been in place for nearly 2 months > (since 2019-05-21), and in that time we've seen a grand total of > no repeats of the original problem, as far as I've seen. Oh ... wait a minute. I decided to go scrape the buildfarm logs to confirm my impression that there were no matching failures, and darn if I didn't find one: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-06-04%2021%3A00%3A22 For the archives' sake, that's a pg_upgradeCheck failure, and here are the regression diffs: =========================== regression.diffs ================ diff -w -U3 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out --- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out 2019-05-2114:00:23 -0400 +++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out 2019-06-04 17:42:27 -0400 @@ -21,12 +21,12 @@ Workers Planned: 3 -> Partial Aggregate -> Parallel Append + -> Parallel Seq Scan on a_star -> Parallel Seq Scan on d_star -> Parallel Seq Scan on f_star -> Parallel Seq Scan on e_star -> Parallel Seq Scan on b_star -> Parallel Seq Scan on c_star - -> Parallel Seq Scan on a_star (11 rows) select round(avg(aa)), sum(aa) from a_star a1; @@ -49,10 +49,10 @@ -> Parallel Append -> Seq Scan on d_star -> Seq Scan on c_star + -> Parallel Seq Scan on a_star -> Parallel Seq Scan on f_star -> Parallel Seq Scan on e_star -> Parallel Seq Scan on b_star - -> Parallel Seq Scan on a_star (11 rows) select round(avg(aa)), sum(aa) from a_star a2; @@ -75,12 +75,12 @@ Workers Planned: 3 -> Partial Aggregate -> Parallel Append + -> Seq Scan on a_star -> Seq Scan on d_star -> Seq Scan on f_star -> Seq Scan on e_star -> Seq Scan on b_star -> Seq Scan on c_star - -> Seq Scan on a_star (11 rows) select round(avg(aa)), sum(aa) from a_star a3; @@ -95,7 +95,7 @@ where relname like '__star' order by relname; relname | relpages | reltuples ---------+----------+----------- - a_star | 1 | 3 + a_star | 0 | 0 b_star | 1 | 4 c_star | 1 | 4 d_star | 1 | 16 diff -w -U3 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out --- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out 2019-05-21 14:00:23-0400 +++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out 2019-06-0417:43:06 -0400 @@ -205,7 +205,7 @@ where relname like '__star' order by relname; relname | relpages | reltuples ---------+----------+----------- - a_star | 1 | 3 + a_star | 0 | 0 b_star | 1 | 4 c_star | 1 | 4 d_star | 1 | 16 This plan shape change matches some, though by no means all, of the previous failures. And we can now see why the planner did that: a_star has a smaller recorded size than the other tables in the query. So what happened there? There's no diff in the pg_stat_all_tables query, which proves that a vacuum on a_star did happen, since it transmitted a vacuum_count increment to the stats collector. It seems like there are two possible theories: (1) The vacuum for some reason saw the table's size as zero (whereupon it'd read no blocks and count no tuples). (2) The vacuum's update of the pg_class row failed to "take". Theory (2) seems a bit more plausible, but still very unsettling. The similar failures that this result doesn't exactly match all look, in the light of this data, like some one of the "X_star" tables unexpectedly moved to the top of the parallel plan, which we can now hypothesize means that that table had zero relpages/ reltuples after supposedly being vacuumed. So it's not only a_star that's got the issue, which lets out my half-formed theory that being the topmost parent of the inheritance hierarchy has something to do with it. But I bet that these tables forming an inheritance hierarchy (with multiple inheritance even) does have something to do with it somehow, because if this were a generic VACUUM bug surely we'd be seeing it elsewhere. regards, tom lane
Hi, On 2019-07-15 21:12:32 -0400, Tom Lane wrote: > But I bet that these tables forming > an inheritance hierarchy (with multiple inheritance even) does > have something to do with it somehow, because if this were a > generic VACUUM bug surely we'd be seeing it elsewhere. It's possible that it's hidden in other cases, because of void table_block_relation_estimate_size(Relation rel, int32 *attr_widths, BlockNumber *pages, double *tuples, double *allvisfrac, Size overhead_bytes_per_tuple, Size usable_bytes_per_page) ... * If the table has inheritance children, we don't apply this heuristic. * Totally empty parent tables are quite common, so we should be willing * to believe that they are empty. */ if (curpages < 10 && relpages == 0 && !rel->rd_rel->relhassubclass) curpages = 10; which'd not make us actually take a relpages=0 into account for tables without inheritance. A lot of these tables never get 10+ pages long, so the heuristic would always apply... Greetings, Andres Freund
On Wed, 17 Jul 2019 at 07:23, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-07-15 21:12:32 -0400, Tom Lane wrote: > > But I bet that these tables forming > > an inheritance hierarchy (with multiple inheritance even) does > > have something to do with it somehow, because if this were a > > generic VACUUM bug surely we'd be seeing it elsewhere. > > It's possible that it's hidden in other cases, because of > > void > table_block_relation_estimate_size(Relation rel, int32 *attr_widths, > BlockNumber *pages, double *tuples, > double *allvisfrac, > Size overhead_bytes_per_tuple, > Size usable_bytes_per_page) > ... > * If the table has inheritance children, we don't apply this heuristic. > * Totally empty parent tables are quite common, so we should be willing > * to believe that they are empty. > */ > if (curpages < 10 && > relpages == 0 && > !rel->rd_rel->relhassubclass) > curpages = 10; > > which'd not make us actually take a relpages=0 into account for tables > without inheritance. A lot of these tables never get 10+ pages long, so > the heuristic would always apply... Surely it can't be that since that just sets what *pages gets set to. Tom mentioned that following was returning 0 pages and tuples: -- Temporary hack to investigate whether extra vacuum/analyze is happening select relname, relpages, reltuples from pg_class where relname like '__star' order by relname; relname | relpages | reltuples ---------+----------+----------- a_star | 1 | 3 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > Surely it can't be that since that just sets what *pages gets set to. > Tom mentioned that following was returning 0 pages and tuples: > -- Temporary hack to investigate whether extra vacuum/analyze is happening > select relname, relpages, reltuples > from pg_class > where relname like '__star' order by relname; > relname | relpages | reltuples > ---------+----------+----------- > a_star | 1 | 3 I poked around a little and came up with a much simpler theory: VACUUM will not change relpages/reltuples if it does not scan any pages (cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343 in HEAD's vacuumlazy.c). And, because sanity_check.sql's VACUUM is a plain unaggressive vacuum, all that it takes to make it skip over a_star's one page is for somebody else to have a pin on that page. So a chance collision with the bgwriter or checkpointer could cause the observed symptom, not just for a_star but for the other single-page relations that are at stake here. Those pages are certainly dirty after create_misc.sql, so it's hardly implausible for one of these processes to be holding pin while trying to write out the buffer at the time sanity_check.sql runs. A brute-force way to fix this (or at least reduce the odds quite a bit) would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM, thereby guaranteeing that none of these pages are still in need of being written. Not sure how much that'd penalize the regression tests' runtime, or whether we'd have a loss of test coverage of VACUUM behaviors. regards, tom lane
On 2019-07-17 11:53:48 -0400, Tom Lane wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > > Surely it can't be that since that just sets what *pages gets set to. > > Tom mentioned that following was returning 0 pages and tuples: > > > -- Temporary hack to investigate whether extra vacuum/analyze is happening > > select relname, relpages, reltuples > > from pg_class > > where relname like '__star' order by relname; > > relname | relpages | reltuples > > ---------+----------+----------- > > a_star | 1 | 3 > > I poked around a little and came up with a much simpler theory: > VACUUM will not change relpages/reltuples if it does not scan any pages > (cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343 > in HEAD's vacuumlazy.c). And, because sanity_check.sql's VACUUM is a > plain unaggressive vacuum, all that it takes to make it skip over a_star's > one page is for somebody else to have a pin on that page. I wonder if we could set log_min_messages to DEBUG2 on occasionally failing machines to test that theory. That ought to hit appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ", "Skipped %u pages due to buffer pins, ", vacrelstats->pinskipped_pages), ... ereport(elevel, (errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages", RelationGetRelationName(onerel), tups_vacuumed, num_tuples, vacrelstats->scanned_pages, nblocks), errdetail_internal("%s", buf.data))); > So a chance > collision with the bgwriter or checkpointer could cause the observed > symptom, not just for a_star but for the other single-page relations that > are at stake here. Those pages are certainly dirty after create_misc.sql, > so it's hardly implausible for one of these processes to be holding pin > while trying to write out the buffer at the time sanity_check.sql runs. > > A brute-force way to fix this (or at least reduce the odds quite a bit) > would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM, > thereby guaranteeing that none of these pages are still in need of being > written. Not sure how much that'd penalize the regression tests' runtime, > or whether we'd have a loss of test coverage of VACUUM behaviors. Alternatively we could VACUUM FREEZE the relevant tables? That then ought to hit the blocking codepath in lazu_scan_heap()? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-07-17 11:53:48 -0400, Tom Lane wrote: >> A brute-force way to fix this (or at least reduce the odds quite a bit) >> would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM, >> thereby guaranteeing that none of these pages are still in need of being >> written. Not sure how much that'd penalize the regression tests' runtime, >> or whether we'd have a loss of test coverage of VACUUM behaviors. > Alternatively we could VACUUM FREEZE the relevant tables? That then > ought to hit the blocking codepath in lazu_scan_heap()? If we want to target just the X_star tables, I'd be inclined to do an ANALYZE instead. (Although that would create inheritance-tree statistics, which might change some plan choices? Haven't tried.) regards, tom lane
On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > In the meantime, we've had *lots* of buildfarm failures in the > added pg_stat_all_tables query, which indicate that indeed the > stats collector mechanism isn't terribly reliable. But that > doesn't directly prove anything about the original problem, > since the planner doesn't look at stats collector data. I noticed that if you look at the list of failures of this type, there are often pairs of animals belonging to Andres that failed at the same time. I wonder if he might be running a bunch of animals on one kernel, and need to increase net.core.rmem_max and net.core.rmem_default (or maybe the write side variants, or both, or something like that). -- Thomas Munro https://enterprisedb.com
On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > In the meantime, we've had *lots* of buildfarm failures in the > > added pg_stat_all_tables query, which indicate that indeed the > > stats collector mechanism isn't terribly reliable. But that > > doesn't directly prove anything about the original problem, > > since the planner doesn't look at stats collector data. > > I noticed that if you look at the list of failures of this type, there > are often pairs of animals belonging to Andres that failed at the same > time. I wonder if he might be running a bunch of animals on one > kernel, and need to increase net.core.rmem_max and > net.core.rmem_default (or maybe the write side variants, or both, or > something like that). In further support of that theory, here are the counts of 'stats' failures (excluding bogus reports due to crashes) for the past 90 days: owner | animal | count -------------------------+--------------+------- andres-AT-anarazel.de | desmoxytes | 5 andres-AT-anarazel.de | dragonet | 9 andres-AT-anarazel.de | flaviventris | 1 andres-AT-anarazel.de | idiacanthus | 5 andres-AT-anarazel.de | komodoensis | 11 andres-AT-anarazel.de | pogona | 1 andres-AT-anarazel.de | serinus | 3 andrew-AT-dunslane.net | lorikeet | 1 buildfarm-AT-coelho.net | moonjelly | 1 buildfarm-AT-coelho.net | seawasp | 17 clarenceho-AT-gmail.com | mayfly | 2 Andres's animals report the same hostname and run at the same time, so it'd be interesting to know what net.core.rmem_max is set to and whether these problems go away if it's cranked up 10x higher or something. In a quick test I can see that make installcheck is capable of sending a *lot* of 936 byte messages in the same millisecond. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In the meantime, we've had *lots* of buildfarm failures in the >>> added pg_stat_all_tables query, which indicate that indeed the >>> stats collector mechanism isn't terribly reliable. But that >>> doesn't directly prove anything about the original problem, >>> since the planner doesn't look at stats collector data. >> I noticed that if you look at the list of failures of this type, there >> are often pairs of animals belonging to Andres that failed at the same >> time. I wonder if he might be running a bunch of animals on one >> kernel, and need to increase net.core.rmem_max and >> net.core.rmem_default (or maybe the write side variants, or both, or >> something like that). > Andres's animals report the same hostname and run at the same time, so > it'd be interesting to know what net.core.rmem_max is set to and > whether these problems go away if it's cranked up 10x higher or > something. In a quick test I can see that make installcheck is > capable of sending a *lot* of 936 byte messages in the same > millisecond. Yeah. I think we've had quite enough of the stats-transmission-related failures, and they're no longer proving anything about the original problem. So I will go do what I proposed in mid-July and revert the stats queries, while keeping the reltuples/relpages check. (I'd kind of like to get more confirmation that the plan shape change is associated with those fields reading as zeroes, before we decide what to do about the underlying instability.) regards, tom lane
I wrote: > Yeah. I think we've had quite enough of the stats-transmission-related > failures, and they're no longer proving anything about the original > problem. So I will go do what I proposed in mid-July and revert the > stats queries, while keeping the reltuples/relpages check. (I'd kind > of like to get more confirmation that the plan shape change is associated > with those fields reading as zeroes, before we decide what to do about the > underlying instability.) Christoph Berg's recent complaint reminded me to scan the buildfarm database again for info related to this issue, and I found this: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2019-07-02%2017%3A17%3A02 in which the failure diffs are diff -U3 /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/select_parallel.out /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out --- /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/select_parallel.out 2019-05-2119:17:03.472207619 +0200 +++ /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out 2019-07-02 19:21:53.643095637 +0200 @@ -98,7 +98,7 @@ a_star | 1 | 3 b_star | 1 | 4 c_star | 1 | 4 - d_star | 1 | 16 + d_star | 0 | 0 e_star | 1 | 7 f_star | 1 | 16 (6 rows) @@ -130,7 +130,7 @@ ----------------------------------------------------- Finalize Aggregate -> Gather - Workers Planned: 1 + Workers Planned: 3 -> Partial Aggregate -> Append -> Parallel Seq Scan on a_star diff -U3 /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out --- /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out 2019-05-21 19:17:03.472207619+0200 +++ /home/fabien/pg/build-farm-10/buildroot/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out 2019-07-0219:21:57.891105601 +0200 @@ -208,7 +208,7 @@ a_star | 1 | 3 b_star | 1 | 4 c_star | 1 | 4 - d_star | 1 | 16 + d_star | 0 | 0 e_star | 1 | 7 f_star | 1 | 16 (6 rows) While this fails to show the plan ordering difference we were looking for, it does show that relpages/reltuples can sometimes read as zeroes for one of these tables. (It also indicates that at least some of the worker-count instability we've seen might trace to this same issue.) That's the only related failure I could find in the last three months, which makes me think that we've changed the regression tests enough that the chance timing needed to cause this is (once again) very improbable. So I'm prepared to give up waiting for more buildfarm evidence. I propose to finish reverting f03a9ca43 in HEAD, and instead install the attached in HEAD and v12. This follows the upthread suggestions from Thomas and myself to use ANALYZE to ensure that these tables have the expected relpages/reltuples entries. In principle, we might need this further back than v12, but without having seen a test failure in the wild I'm not tempted to back-patch further. regards, tom lane diff --git a/src/test/regress/expected/create_misc.out b/src/test/regress/expected/create_misc.out index 8366841..cee35ed 100644 --- a/src/test/regress/expected/create_misc.out +++ b/src/test/regress/expected/create_misc.out @@ -130,6 +130,13 @@ INSERT INTO f_star (class, e) VALUES ('f', '-12'::int2); INSERT INTO f_star (class, f) VALUES ('f', '(11111111,33333333),(22222222,44444444)'::polygon); INSERT INTO f_star (class) VALUES ('f'); +-- Analyze the X_star tables for better plan stability in later tests +ANALYZE a_star; +ANALYZE b_star; +ANALYZE c_star; +ANALYZE d_star; +ANALYZE e_star; +ANALYZE f_star; -- -- for internal portal (cursor) tests -- diff --git a/src/test/regress/sql/create_misc.sql b/src/test/regress/sql/create_misc.sql index d4a63b7..d0b04a8 100644 --- a/src/test/regress/sql/create_misc.sql +++ b/src/test/regress/sql/create_misc.sql @@ -192,6 +192,14 @@ INSERT INTO f_star (class, f) INSERT INTO f_star (class) VALUES ('f'); +-- Analyze the X_star tables for better plan stability in later tests +ANALYZE a_star; +ANALYZE b_star; +ANALYZE c_star; +ANALYZE d_star; +ANALYZE e_star; +ANALYZE f_star; + -- -- for internal portal (cursor) tests