Re: tuplesort test coverage - Mailing list pgsql-hackers

From Tom Lane
Subject Re: tuplesort test coverage
Date
Msg-id 3244.1576160824@sss.pgh.pa.us
Whole thread Raw
In response to Re: tuplesort test coverage  (Andres Freund <andres@anarazel.de>)
Responses Re: tuplesort test coverage  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> I pushed this now. We'll see what the slower buildfarm animals say. I'll
> try to see how long they took in a few days.

friarbird (a CLOBBER_CACHE_ALWAYS animal) just showed a failure in this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2019-12-12%2006%3A20%3A02

================== pgsql.build/src/test/regress/regression.diffs ===================
diff -U3 /pgbuild/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/tuplesort.out
/pgbuild/root/HEAD/pgsql.build/src/test/regress/results/tuplesort.out
--- /pgbuild/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/tuplesort.out    2019-11-13 19:54:11.000000000
-0500
+++ /pgbuild/root/HEAD/pgsql.build/src/test/regress/results/tuplesort.out    2019-12-12 08:25:23.000000000 -0500
@@ -625,13 +625,13 @@
                Group Key: a.col12
                Filter: (count(*) > 1)
                ->  Merge Join
-                     Merge Cond: (a.col12 = b.col12)
-                     ->  Sort
-                           Sort Key: a.col12 DESC
-                           ->  Seq Scan on test_mark_restore a
+                     Merge Cond: (b.col12 = a.col12)
                      ->  Sort
                            Sort Key: b.col12 DESC
                            ->  Seq Scan on test_mark_restore b
+                     ->  Sort
+                           Sort Key: a.col12 DESC
+                           ->  Seq Scan on test_mark_restore a
 (14 rows)

 :qry;

Since a and b are exactly the same table, in principle it's a matter of
chance which one the planner will put on the outside of the join.
I think what happened here is that the test ran long enough for
autovacuum/autoanalyze to come along and scan the table, changing its
stats in between where the planner picked up the stats for a and those
for b, and we ended up making the opposite join order choice.

I considered fixing this by adding some restriction clause on b so
that the join order choice isn't such a coin flip.  But it's not
clear that the problem couldn't recur anyway --- the table stats
would change significantly on auto-analyze, since the test script
isn't bothering to create any stats itself.

What seems like a simpler and more reliable fix is to make
test_mark_restore a temp table, thus keeping autovac away from it.
Is there a reason in terms of the test's goals not to do that?

Also ... why in the world does the script drop its tables at the end
with IF EXISTS?  They'd better exist at that point.  I object
to the DROP IF EXISTS up at the top, too.  The regression tests
do not need to be designed to deal with an unpredictable start state,
and coding them to do so can have no effect other than possibly
masking problems.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jeremy Finzel
Date:
Subject: Re: Index corruption / planner issue with one table in my pg 11.6 instance
Next
From: Li Japin
Date:
Subject: Duplicate function call on timestamp2tm