Re: create_index test fails when synchronous_commit = off @ master - Mailing list pgsql-hackers

From Andres Freund
Subject Re: create_index test fails when synchronous_commit = off @ master
Date
Msg-id 20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de
Whole thread Raw
In response to create_index test fails when synchronous_commit = off @ master  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: create_index test fails when synchronous_commit = off @ master  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2022-02-24 16:47:25 +0300, Aleksander Alekseev wrote:
> -                      QUERY PLAN
> --------------------------------------------------------
> - Index Only Scan using tenk1_thous_tenthous on tenk1
> -   Index Cond: (thousand < 2)
> -   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
> -(3 rows)
> +                                      QUERY PLAN
> +--------------------------------------------------------------------------------------
> + Sort
> +   Sort Key: thousand
> +   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
> +         Index Cond: ((thousand < 2) AND (tenthous = ANY
> ('{1001,3000}'::integer[])))
> +(4 rows)

Heh. We've been having a lot of fights with exactly this plan change in the
AIO branch, before cc50080a82, and without synchronous_commit =
off. Interestingly near-exclusively with the regression run within
pg_upgrade's tests.

For aio we (David did a lot of that IIRC) finally hunted it down to be due
vacuum skipping pages due to inability to get a cleanup lock. If that happens
enough, pg_class.relallvisible changes enough to lead to the different plan.


I first was going to suggest that we should just use VACUUM FREEZE to prevent
the issue.

But in this instance the cause isn't cleanup locks, probably that we can't yet
set hint bits due to synchronous_commit=off? But I don't *fully* understand
how it leads to this.

I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass;
just after the
VACUUM ANALYZE tenk1;

synchronous_commit=on
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |           345
+(1 row)

synchronous_commit=off
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |             0
+(1 row)

So it clearly is the explanation for the issue.


Obviously we can locally work around it by adding a
SET LOCAL synchronous_commit = local;
to the COPY. But I'd like to fully understand what's going on.


> I didn't investigate further. Do we assume that `make installcheck` suppose
> to pass with a different postgresql.conf options?

Depends on the option, I think... There's some where it's interesting to run
tests with different options and where the effort required is reasonable. And
some cases where it's not... synchronous_commit=off worked until recently, and
I think we should keep it working.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: convert libpq uri-regress tests to tap test
Next
From: Alvaro Herrera
Date:
Subject: Re: Add id's to various elements in protocol.sgml