Re: Parallel Append subplan order instability on aye-aye - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Parallel Append subplan order instability on aye-aye |
Date | |
Msg-id | 4174.1563239552@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Parallel Append subplan order instability on aye-aye (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Parallel Append subplan order instability on aye-aye
|
List | pgsql-hackers |
I wrote: > So that data-collection patch has been in place for nearly 2 months > (since 2019-05-21), and in that time we've seen a grand total of > no repeats of the original problem, as far as I've seen. Oh ... wait a minute. I decided to go scrape the buildfarm logs to confirm my impression that there were no matching failures, and darn if I didn't find one: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-06-04%2021%3A00%3A22 For the archives' sake, that's a pg_upgradeCheck failure, and here are the regression diffs: =========================== regression.diffs ================ diff -w -U3 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out --- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out 2019-05-2114:00:23 -0400 +++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out 2019-06-04 17:42:27 -0400 @@ -21,12 +21,12 @@ Workers Planned: 3 -> Partial Aggregate -> Parallel Append + -> Parallel Seq Scan on a_star -> Parallel Seq Scan on d_star -> Parallel Seq Scan on f_star -> Parallel Seq Scan on e_star -> Parallel Seq Scan on b_star -> Parallel Seq Scan on c_star - -> Parallel Seq Scan on a_star (11 rows) select round(avg(aa)), sum(aa) from a_star a1; @@ -49,10 +49,10 @@ -> Parallel Append -> Seq Scan on d_star -> Seq Scan on c_star + -> Parallel Seq Scan on a_star -> Parallel Seq Scan on f_star -> Parallel Seq Scan on e_star -> Parallel Seq Scan on b_star - -> Parallel Seq Scan on a_star (11 rows) select round(avg(aa)), sum(aa) from a_star a2; @@ -75,12 +75,12 @@ Workers Planned: 3 -> Partial Aggregate -> Parallel Append + -> Seq Scan on a_star -> Seq Scan on d_star -> Seq Scan on f_star -> Seq Scan on e_star -> Seq Scan on b_star -> Seq Scan on c_star - -> Seq Scan on a_star (11 rows) select round(avg(aa)), sum(aa) from a_star a3; @@ -95,7 +95,7 @@ where relname like '__star' order by relname; relname | relpages | reltuples ---------+----------+----------- - a_star | 1 | 3 + a_star | 0 | 0 b_star | 1 | 4 c_star | 1 | 4 d_star | 1 | 16 diff -w -U3 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out --- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/stats.out 2019-05-21 14:00:23-0400 +++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/stats.out 2019-06-0417:43:06 -0400 @@ -205,7 +205,7 @@ where relname like '__star' order by relname; relname | relpages | reltuples ---------+----------+----------- - a_star | 1 | 3 + a_star | 0 | 0 b_star | 1 | 4 c_star | 1 | 4 d_star | 1 | 16 This plan shape change matches some, though by no means all, of the previous failures. And we can now see why the planner did that: a_star has a smaller recorded size than the other tables in the query. So what happened there? There's no diff in the pg_stat_all_tables query, which proves that a vacuum on a_star did happen, since it transmitted a vacuum_count increment to the stats collector. It seems like there are two possible theories: (1) The vacuum for some reason saw the table's size as zero (whereupon it'd read no blocks and count no tuples). (2) The vacuum's update of the pg_class row failed to "take". Theory (2) seems a bit more plausible, but still very unsettling. The similar failures that this result doesn't exactly match all look, in the light of this data, like some one of the "X_star" tables unexpectedly moved to the top of the parallel plan, which we can now hypothesize means that that table had zero relpages/ reltuples after supposedly being vacuumed. So it's not only a_star that's got the issue, which lets out my half-formed theory that being the topmost parent of the inheritance hierarchy has something to do with it. But I bet that these tables forming an inheritance hierarchy (with multiple inheritance even) does have something to do with it somehow, because if this were a generic VACUUM bug surely we'd be seeing it elsewhere. regards, tom lane
pgsql-hackers by date: