Thread: [MASSMAIL]To what extent should tests rely on VACUUM ANALYZE?

[MASSMAIL]To what extent should tests rely on VACUUM ANALYZE?

From
Alexander Lakhin
Date:
Hello hackers,

When running multiple 027_stream_regress.pl test instances in parallel
(and with aggressive autovacuum) on a rather slow machine, I encountered
test failures due to the subselect test instability just as the following
failures on buildfarm:
1) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-03-27%2010%3A16%3A12

--- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/subselect.out 2024-03-19 22:20:34.435867114 +0000
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/subselect.out 
2024-03-27 10:28:38.185776605 +0000
@@ -2067,16 +2067,16 @@
                     QUERY PLAN
  -------------------------------------------------
   Hash Join
-   Hash Cond: (c.odd = b.odd)
+   Hash Cond: (c.hundred = a.hundred)
     ->  Hash Join
-         Hash Cond: (a.hundred = c.hundred)
-         ->  Seq Scan on tenk1 a
+         Hash Cond: (b.odd = c.odd)
+         ->  Seq Scan on tenk2 b
           ->  Hash
                 ->  HashAggregate
                       Group Key: c.odd, c.hundred
                       ->  Seq Scan on tenk2 c
     ->  Hash
-         ->  Seq Scan on tenk2 b
+         ->  Seq Scan on tenk1 a
  (11 rows)

2) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-03-27%2009%3A49%3A38

(That query was added recently (by 9f1337639 from 2023-02-15) and the
failure evidentially depends on timing, so the number of the failures I
could find on buildfarm is moderate for now.)

With the subselect test modified as in attached, I could see what makes
the plan change:
-                     ->  Seq Scan on public.tenk2 c (cost=0.00..445.00 rows=10000 width=8)
+                     ->  Seq Scan on public.tenk2 c (cost=0.00..444.95 rows=9995 width=8)

   relname | relpages | reltuples | autovacuum_count | autoanalyze_count
  ---------+----------+-----------+------------------+-------------------
- tenk2   |      345 |     10000 |                0 |                 0
+ tenk2   |      345 |      9995 |                0 |                 0

Using the trick Thomas proposed in [1] (see my modification attached), I
could reproduce the failure easily on my workstation with no specific
conditions:
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup LOG:  !!!ConditionalLockBufferForCleanup() 
returning false
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup CONTEXT:  while scanning block 29 of relation

"public.tenk2"
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk2;
...
   relname | relpages | reltuples | autovacuum_count | autoanalyze_count
  ---------+----------+-----------+------------------+-------------------
- tenk2   |      345 |     10000 |                0 |                 0
+ tenk2   |      345 |      9996 |                0 |                 0
  (1 row)

So it looks to me like a possible cause of the failure, and I wonder
whether checks for query plans should be immune to such changes or results
of VACUUM ANALYZE should be 100% stable?

[1] https://www.postgresql.org/message-id/CA%2BhUKGKYNHmL_DhmVRiidHv6YLAL8jViifwwn2ABY__Y3BCphg%40mail.gmail.com

Best regards,
Alexander
Attachment

Re: To what extent should tests rely on VACUUM ANALYZE?

From
Tomas Vondra
Date:
On 3/28/24 16:00, Alexander Lakhin wrote:
> ...
>
> Using the trick Thomas proposed in [1] (see my modification attached), I
> could reproduce the failure easily on my workstation with no specific
> conditions:
> 2024-03-28 14:05:13.792 UTC client backend[2358012]
> pg_regress/test_setup LOG:  !!!ConditionalLockBufferForCleanup()
> returning false
> 2024-03-28 14:05:13.792 UTC client backend[2358012]
> pg_regress/test_setup CONTEXT:  while scanning block 29 of relation
> "public.tenk2"
> 2024-03-28 14:05:13.792 UTC client backend[2358012]
> pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk2;
> ...
>   relname | relpages | reltuples | autovacuum_count | autoanalyze_count
>  ---------+----------+-----------+------------------+-------------------
> - tenk2   |      345 |     10000 |                0 |                 0
> + tenk2   |      345 |      9996 |                0 |                 0
>  (1 row)
> 
> So it looks to me like a possible cause of the failure, and I wonder
> whether checks for query plans should be immune to such changes or results
> of VACUUM ANALYZE should be 100% stable?
> 

Yeah. I think it's good to design the data/queries in such a way that
the behavior does not flip due to minor noise like in this case.

But I'm a bit confused - how come the estimates do change at all? The
analyze simply fetches 30k rows, and tenk only has 10k of them. So we
should have *exact* numbers, and it should be exactly the same for all
the analyze runs. So how come it changes like this?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> Yeah. I think it's good to design the data/queries in such a way that
> the behavior does not flip due to minor noise like in this case.

+1

> But I'm a bit confused - how come the estimates do change at all? The
> analyze simply fetches 30k rows, and tenk only has 10k of them. So we
> should have *exact* numbers, and it should be exactly the same for all
> the analyze runs. So how come it changes like this?

It's plausible that the VACUUM ANALYZE done by test_setup fails
ConditionalLockBufferForCleanup() sometimes because of concurrent
activity like checkpointer writes.  I'm not quite sure how we
get from that to the observed symptom though.  Maybe the
VACUUM needs DISABLE_PAGE_SKIPPING?

            regards, tom lane



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Richard Guo
Date:

On Fri, Mar 29, 2024 at 1:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> Yeah. I think it's good to design the data/queries in such a way that
> the behavior does not flip due to minor noise like in this case.

+1

Agreed.  The query in problem is:

-- we can pull up the sublink into the inner JoinExpr.
explain (costs off)
SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);

For this query, the RHS of the semijoin can be unique-ified, allowing it
to be joined to anything else by unique-ifying the RHS.  Hence, both
join orders 'A/C/B' (as in the answer file) and 'B/C/A' (as in the
reported plan diff) are legal.

So I'm wondering if we can make this test case more stable by using
'c.odd > b.odd' instead of 'c.odd = b.odd' in the subquery, as attached.
As a result, the RHS of the semijoin cannot be unique-ified any more, so
that the only legal join order is 'A/B/C'.  We would not have different
join orders due to noises in the estimates, while still testing what we
intend to test.

Thanks
Richard
Attachment

Re: To what extent should tests rely on VACUUM ANALYZE?

From
Richard Guo
Date:

On Thu, Mar 28, 2024 at 11:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
When running multiple 027_stream_regress.pl test instances in parallel
(and with aggressive autovacuum) on a rather slow machine, I encountered
test failures due to the subselect test instability just as the following
failures on buildfarm:
1) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-03-27%2010%3A16%3A12

--- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/subselect.out 2024-03-19 22:20:34.435867114 +0000
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/subselect.out
2024-03-27 10:28:38.185776605 +0000
@@ -2067,16 +2067,16 @@
                     QUERY PLAN
  -------------------------------------------------
   Hash Join
-   Hash Cond: (c.odd = b.odd)
+   Hash Cond: (c.hundred = a.hundred)
     ->  Hash Join
-         Hash Cond: (a.hundred = c.hundred)
-         ->  Seq Scan on tenk1 a
+         Hash Cond: (b.odd = c.odd)
+         ->  Seq Scan on tenk2 b
           ->  Hash
                 ->  HashAggregate
                       Group Key: c.odd, c.hundred
                       ->  Seq Scan on tenk2 c
     ->  Hash
-         ->  Seq Scan on tenk2 b
+         ->  Seq Scan on tenk1 a
  (11 rows)

FWIW, this issue is also being reproduced in Cirrus CI, as Matthias
reported in another thread [1] days ago.

[1] https://www.postgresql.org/message-id/CAEze2WiiE-iTKxgWQzcjyiiiA4q-zsdkkAdCaD_E83xA2g2BLA%40mail.gmail.com

Thanks
Richard

Re: To what extent should tests rely on VACUUM ANALYZE?

From
Alexander Lakhin
Date:
28.03.2024 20:33, Tom Lane wrote:
>
>> But I'm a bit confused - how come the estimates do change at all? The
>> analyze simply fetches 30k rows, and tenk only has 10k of them. So we
>> should have *exact* numbers, and it should be exactly the same for all
>> the analyze runs. So how come it changes like this?
> It's plausible that the VACUUM ANALYZE done by test_setup fails
> ConditionalLockBufferForCleanup() sometimes because of concurrent
> activity like checkpointer writes.  I'm not quite sure how we
> get from that to the observed symptom though.  Maybe the
> VACUUM needs DISABLE_PAGE_SKIPPING?

Yeah, the way from ConditionalLockBufferForCleanup() returning false to
reltuples < 10000 is not one-step, as I thought initially. There is also
sanity_check doing VACUUM in between. So, effectively the troublesome
scenario is:
VACUUM ANALYZE tenk2; -- with cleanup lock not granted for some blocks
VACUUM tenk2;

In this scenario, lazy_scan_heap() -> vac_estimate_reltuples() called two
times.
First, with rel_pages: 384, vacrel->scanned_pages: 384,
vacrel->live_tuples: 10000 and it results in
vacrel->new_live_tuples = 10000,

And second, with rel_pages: 345, vacrel->scanned_pages: 80,
vacrel->live_tuples: 2315 (for instance), and we get
vacrel->new_live_tuples = 9996,

With unmodified ConditionalLockBufferForCleanup() the second call is
performed with rel_pages: 345, vacrel->scanned_pages: 1,
vacrel->live_tuples: 24 and it returns 10000.

This simple change fixes the issue for me:
-VACUUM ANALYZE tenk2;
+VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2;

But it looks like subselect is not the only test that can fail due to
vacuum instability. I see that create_index also suffers from cranky
ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)
return false; ), although it placed in parallel_schedule before
sanity_check, so this failure needs another explanation:
-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
+(4 rows)

Best regards,
Alexander



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Alexander Lakhin
Date:
29.03.2024 11:59, Alexander Lakhin wrote:
>
> This simple change fixes the issue for me:
> -VACUUM ANALYZE tenk2;
> +VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2;
>

I'm sorry, I wasn't persevering enough when testing that...
After more test runs, I see that in fact it doesn't help.

Best regards,
Alexander



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Alexander Lakhin
Date:
29.03.2024 11:59, Alexander Lakhin wrote:
>
> But it looks like subselect is not the only test that can fail due to
> vacuum instability. I see that create_index also suffers from cranky
> ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)
> return false; ), although it placed in parallel_schedule before
> sanity_check, so this failure needs another explanation:
> -                      QUERY PLAN
> --------------------------------------------------------
> - Index Only Scan using tenk1_thous_tenthous on tenk1
> -   Index Cond: (thousand < 2)
> -   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
> -(3 rows)
> +                                      QUERY PLAN
> +--------------------------------------------------------------------------------------
> + Sort
> +   Sort Key: thousand
> +   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
> +         Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
> +(4 rows)

I think that deviation can be explained by the fact that cost_index() takes
baserel->allvisfrac (derived from pg_class.relallvisible) into account for
the index-only-scan case, and I see the following difference when a test
run fails:
         relname        | relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count
  ----------------------+----------+-----------+---------------+------------+------------------+-------------------
- tenk1                |      345 |     10000 |           345 |            |                0 |                 0
+ tenk1                |      345 |     10000 |           305 |            |                0 |                 0

Best regards,
Alexander



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Mar 29, 2024 at 1:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>>> Yeah. I think it's good to design the data/queries in such a way that
>>> the behavior does not flip due to minor noise like in this case.

>> +1

> Agreed.  The query in problem is:
> -- we can pull up the sublink into the inner JoinExpr.
> explain (costs off)
> SELECT * FROM tenk1 A INNER JOIN tenk2 B
> ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);

> So I'm wondering if we can make this test case more stable by using
> 'c.odd > b.odd' instead of 'c.odd = b.odd' in the subquery, as attached.

I'm not sure that that is testing the same thing (since it's no longer
an equijoin), or that it would fix the issue.  The problem really is
that all three baserels have identical statistics so there's no
difference in cost between different join orders, and then it's mostly
a matter of unspecified implementation details which order we will
choose, and even the smallest change in one rel's statistics can
flip it.  The way we have fixed similar issues elsewhere is to add a
scan-level WHERE restriction that makes one of the baserels smaller,
breaking the symmetry.  So I'd try something like

 explain (costs off)
 SELECT * FROM tenk1 A INNER JOIN tenk2 B
-ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd)
+WHERE a.thousand < 750;

(I first tried reducing the size of B, but that caused the join
order to change; restricting A makes it keep the existing plan.)

Might or might not need to mess with the size of C, but since that
one needs uniquification it's different from the others already.

            regards, tom lane



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Tom Lane
Date:
Alexander Lakhin <exclusion@gmail.com> writes:
> I think that deviation can be explained by the fact that cost_index() takes
> baserel->allvisfrac (derived from pg_class.relallvisible) into account for
> the index-only-scan case, and I see the following difference when a test
> run fails:
>          relname        | relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count
>   ----------------------+----------+-----------+---------------+------------+------------------+-------------------
> - tenk1                |      345 |     10000 |           345 |            |                0 |                 0
> + tenk1                |      345 |     10000 |           305 |            |                0 |                 0

Ouch.  So what's triggering that?  The intention of test_setup
surely is to provide a uniform starting point.

            regards, tom lane



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Alexander Lakhin
Date:
Hello Tom,

29.03.2024 16:51, Tom Lane wrote:
> Alexander Lakhin <exclusion@gmail.com> writes:
>> I think that deviation can be explained by the fact that cost_index() takes
>> baserel->allvisfrac (derived from pg_class.relallvisible) into account for
>> the index-only-scan case, and I see the following difference when a test
>> run fails:
>>           relname        | relpages | reltuples | relallvisible | indisvalid | autovacuum_count | autoanalyze_count
>>
 ----------------------+----------+-----------+---------------+------------+------------------+-------------------
>> - tenk1                |      345 |     10000 |           345 |            |                0 |                 0
>> + tenk1                |      345 |     10000 |           305 |            |                0 |                 0
> Ouch.  So what's triggering that?  The intention of test_setup
> surely is to provide a uniform starting point.

Thanks for your attention to the issue!
Please try the attached...

Best regards,
Alexander
Attachment

Re: To what extent should tests rely on VACUUM ANALYZE?

From
Tom Lane
Date:
Alexander Lakhin <exclusion@gmail.com> writes:
> 29.03.2024 16:51, Tom Lane wrote:
>> Ouch.  So what's triggering that?  The intention of test_setup
>> surely is to provide a uniform starting point.

> Thanks for your attention to the issue!
> Please try the attached...

I experimented with the attached modified version of the patch,
which probes just after the relevant VACUUMs and reduces the
crankiness of ConditionalLockBufferForCleanup a bit to more nearly
approximate what we're likely to see in the buildfarm.  There are
two clearly-visible effects:

1. The initial VACUUM fails to count some pages as all-visible,
presumably exactly the same ones that ConditionalLockBufferForCleanup
fails on.  This feels like a bug.  We still scan the pages (else
reltuples would be wrong); why would we not recognize that they are
all-visible?

2. The re-VACUUM in sanity_check.sql corrects most of the
relallvisible discrepancy, presumably because it's preferentially
going after the pages that didn't get marked the first time.
However, it's distorting reltuples.  Interestingly, the distortion
is worse with less-cranky ConditionalLockBufferForCleanup.

I believe the cause of the reltuples distortion is that the
re-VACUUM will nearly always scan the last page of the relation,
which would usually contain fewer tuples than the rest, but
then it counts that page equally with the rest to compute the
new tuple density.  The fewer other pages are included in that
computation, the worse the new density estimate is, accounting
for the effect that when ConditionalLockBufferForCleanup is more
prone to failure the error gets smaller.

The comments in vac_estimate_reltuples already point out that
vacuum tends to always hit the last page and claim that we
"handle that here", but it's doing nothing about the likelihood
that the last page has fewer than the normal number of tuples.
I wonder if we could adjust the density calculation to account
for that.  I don't think it'd be unreasonable to just assume
that the last page is only half full.  Or we could try to get
the vacuum logic to report the last-page count separately ...

I tried the patch in v16 too and got similar results, so these
are not new problems.

            regards, tom lane



Re: To what extent should tests rely on VACUUM ANALYZE?

From
Tom Lane
Date:
I wrote:
> I experimented with the attached modified version of the patch,
> which probes just after the relevant VACUUMs and reduces the
> crankiness of ConditionalLockBufferForCleanup a bit to more nearly
> approximate what we're likely to see in the buildfarm.

Sigh, forgot to attach the patch, not that you couldn't have
guessed what's in it.

            regards, tom lane

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e63c86cae4..6accaec26d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1351,6 +1351,11 @@ vac_estimate_reltuples(Relation relation,
     old_density = old_rel_tuples / old_rel_pages;
     unscanned_pages = (double) total_pages - (double) scanned_pages;
     total_tuples = old_density * unscanned_pages + scanned_tuples;
+    ereport(LOG,
+            (errmsg("vac_estimate_reltuples(%s): od %g, sp %u tp %u, st %g orl %g tt %g",
+                    RelationGetRelationName(relation),
+                    old_density, scanned_pages, total_pages,
+                    scanned_tuples, old_rel_tuples, total_tuples)));
     return floor(total_tuples + 0.5);
 }

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f0f8d4259c..53cc74d88f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5041,6 +5041,8 @@ ConditionalLockBufferForCleanup(Buffer buffer)

     Assert(BufferIsValid(buffer));

+    if (rand() % 100 == 0) return false;
+
     if (BufferIsLocal(buffer))
     {
         refcount = LocalRefCount[-buffer - 1];
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 8370c1561c..8841581d0a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -1,4 +1,24 @@
+select c.relname,c.relpages,c.reltuples,c.relallvisible,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname like 'tenk_' order by c.relname;
+ relname | relpages | reltuples | relallvisible | autovacuum_count | autoanalyze_count
+---------+----------+-----------+---------------+------------------+-------------------
+ tenk1   |      345 |     10000 |           345 |                0 |                 0
+ tenk2   |      345 |     10000 |           345 |                0 |                 0
+(2 rows)
+
 VACUUM;
+select c.relname,c.relpages,c.reltuples,c.relallvisible,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname like 'tenk_' order by c.relname;
+ relname | relpages | reltuples | relallvisible | autovacuum_count | autoanalyze_count
+---------+----------+-----------+---------------+------------------+-------------------
+ tenk1   |      345 |     10000 |           345 |                0 |                 0
+ tenk2   |      345 |     10000 |           345 |                0 |                 0
+(2 rows)
+
 --
 -- Sanity check: every system catalog that has OIDs should have
 -- a unique index on OID.  This ensures that the OIDs will be unique,
diff --git a/src/test/regress/expected/test_setup.out b/src/test/regress/expected/test_setup.out
index 3d0eeec996..84943ecbae 100644
--- a/src/test/regress/expected/test_setup.out
+++ b/src/test/regress/expected/test_setup.out
@@ -138,6 +138,16 @@ COPY tenk1 FROM :'filename';
 VACUUM ANALYZE tenk1;
 CREATE TABLE tenk2 AS SELECT * FROM tenk1;
 VACUUM ANALYZE tenk2;
+select c.relname,c.relpages,c.reltuples,c.relallvisible,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname like 'tenk_' order by c.relname;
+ relname | relpages | reltuples | relallvisible | autovacuum_count | autoanalyze_count
+---------+----------+-----------+---------------+------------------+-------------------
+ tenk1   |      345 |     10000 |           345 |                0 |                 0
+ tenk2   |      345 |     10000 |           345 |                0 |                 0
+(2 rows)
+
 CREATE TABLE person (
     name         text,
     age            int4,
diff --git a/src/test/regress/sql/sanity_check.sql b/src/test/regress/sql/sanity_check.sql
index 162e5324b5..28635a1287 100644
--- a/src/test/regress/sql/sanity_check.sql
+++ b/src/test/regress/sql/sanity_check.sql
@@ -1,5 +1,15 @@
+select c.relname,c.relpages,c.reltuples,c.relallvisible,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname like 'tenk_' order by c.relname;
+
 VACUUM;

+select c.relname,c.relpages,c.reltuples,c.relallvisible,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname like 'tenk_' order by c.relname;
+
 --
 -- Sanity check: every system catalog that has OIDs should have
 -- a unique index on OID.  This ensures that the OIDs will be unique,
diff --git a/src/test/regress/sql/test_setup.sql b/src/test/regress/sql/test_setup.sql
index 06b0e2121f..a96c4b36f5 100644
--- a/src/test/regress/sql/test_setup.sql
+++ b/src/test/regress/sql/test_setup.sql
@@ -167,6 +167,11 @@ VACUUM ANALYZE tenk1;
 CREATE TABLE tenk2 AS SELECT * FROM tenk1;
 VACUUM ANALYZE tenk2;

+select c.relname,c.relpages,c.reltuples,c.relallvisible,s.autovacuum_count,s.autoanalyze_count
+from pg_class c
+left join pg_stat_all_tables s on c.oid = s.relid
+where c.relname like 'tenk_' order by c.relname;
+
 CREATE TABLE person (
     name         text,
     age            int4,

Re: To what extent should tests rely on VACUUM ANALYZE?

From
Alexander Lakhin
Date:
29.03.2024 11:59, Alexander Lakhin wrote:
> But it looks like subselect is not the only test that can fail due to
> vacuum instability. I see that create_index also suffers from cranky
> ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)  ...

Just for the record, I think I've reproduced the same failure as:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2024-03-17%2003%3A03%3A57
not ok 66    + create_index                            27509 ms
...

and the similar occurrences:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2024-01-02%2007%3A09%3A09
not ok 66    + create_index                            25830 ms

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2023-11-15%2006%3A16%3A15
not ok 66    + create_index                            38508 ms

by running 8 027_stream_regress instances in parallel on a slow ARMv7
device like this:
for i in `seq 10`; do echo "I $i"; parallel --halt now,fail=1  -j8 \
  --linebuffer --tag NO_TEMP_INSTALL=1 make -s check -C \
  src/test/recovery_{}/ PROVE_TESTS="t/027*" ::: `seq 8` || break; done
5
5       #   Failed test 'regression tests pass'
5       #   at t/027_stream_regress.pl line 92.
5       #          got: '256'
5       #     expected: '0'
5       t/027_stream_regress.pl ..
5       Dubious, test returned 1 (wstat 256, 0x100)
5       Failed 1/6 subtests

not ok 66    + create_index                           152995 ms
...
=== dumping .../src/test/recovery_5/tmp_check/regression.diffs ===
diff -U3 .../src/test/regress/expected/create_index.out .../src/test/recovery_5/tmp_check/results/create_index.out
--- .../src/test/regress/expected/create_index.out  2024-05-30 15:30:34.523048633 +0000
+++ .../src/test/recovery_5/tmp_check/results/create_index.out 2024-05-31 13:07:56.359001362 +0000
@@ -1916,11 +1916,15 @@
  SELECT unique1 FROM tenk1
  WHERE unique1 IN (1,42,7)
  ORDER BY unique1;
-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+         ->  Bitmap Index Scan on tenk1_unique1
+               Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

  SELECT unique1 FROM tenk1
  WHERE unique1 IN (1,42,7)
@@ -1936,12 +1940,13 @@
  SELECT thousand, tenthous FROM tenk1
  WHERE thousand < 2 AND tenthous IN (1001,3000)
  ORDER BY thousand;
-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
+(4 rows)

  SELECT thousand, tenthous FROM tenk1
  WHERE thousand < 2 AND tenthous IN (1001,3000)
=== EOF ===

I got failures on iteration 2, 3, 7, 1.

But with the repeated VACUUM ANALYZE:
--- a/src/test/regress/sql/test_setup.sql
+++ b/src/test/regress/sql/test_setup.sql
@@ -163,6 +163,8 @@ CREATE TABLE tenk1 (
  \set filename :abs_srcdir '/data/tenk.data'
  COPY tenk1 FROM :'filename';
  VACUUM ANALYZE tenk1;
+VACUUM ANALYZE tenk1;
+VACUUM ANALYZE tenk1;

20 iterations succeeded in the same environment.

So I think that that IOS plan change can be explained by the issue
discussed here.

Best regards,
Alexander