Thread: Regression test postgres_fdw might fail due to autovacuum

Regression test postgres_fdw might fail due to autovacuum

From
Alexander Lakhin
Date:
Hello hackers,

A recent buildfarm failure [1] with the following diagnostics:
72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress               ERROR            19.04s exit status
1

postgres_fdw-running/regress/results/postgres_fdw.out
--- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11 15:21:27.681846597

+0000
+++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out 
2025-03-14 04:02:32.573999799 +0000
@@ -6392,6 +6392,7 @@
  UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
    c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8
  ------+----+-----+----+----+----+------------+----
+ 2010 |  0 | bar |    |    |    | ft2        |
   2001 |  1 | bar |    |    |    | ft2        |
   2002 |  2 | bar |    |    |    | ft2        |
   2003 |  3 | bar |    |    |    | ft2        |
@@ -6401,7 +6402,6 @@
   2007 |  7 | bar |    |    |    | ft2        |
   2008 |  8 | bar |    |    |    | ft2        |
   2009 |  9 | bar |    |    |    | ft2        |
- 2010 |  0 | bar |    |    |    | ft2        |
  (10 rows)

  EXPLAIN (verbose, costs off)

shows that the UPDATE ... RETURNING query might be unstable due to lack of
ORDER BY.

I managed to reproduce this failure locally with the attached patch
applied to make the test repeatable and with:
sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i
contrib/postgres_fdw/Makefile
(Running 10 instances of the test in parallel eases reproducing as well.)

I also added SELECT ctid, * FROM ft2; just after the query in question and
found out that the results of the SELECT more unstable, but for the UPDATE
to produce the difference, the tuple with c1 == 2010 must reside on a new
page. For example:
--- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 +0000
+++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 +0000
@@ -6399,6 +6399,7 @@
UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
   c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8
------+----+-----+----+----+----+------------+----
+ 2010 |  0 | bar |    |    |    | ft2        |
  2001 |  1 | bar |    |    |    | ft2        |
  2002 |  2 | bar |    |    |    | ft2        |
  2003 |  3 | bar |    |    |    | ft2        |
@@ -6408,7 +6409,6 @@
  2007 |  7 | bar |    |    |    | ft2        |
  2008 |  8 | bar |    |    |    | ft2        |
  2009 |  9 | bar |    |    |    | ft2        |
- 2010 |  0 | bar |    |    |    | ft2        |

SELECT ctid, * FROM ft2;
@@ -6470,15 +6470,6 @@
  (0,103)  |   57 | 407 | 00057_update7      | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7  |
7         
 
| foo
  (0,104)  |   67 | 407 | 00067_update7      | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7  |
7         
 
| foo
  (0,105)  |   77 | 407 | 00077_update7      | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7  |
7         
 
| foo
- (0,106)  |    9 | 509 | 00009_update9      | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,107)  |   19 | 509 | 00019_update9      | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,108)  |   29 | 509 | 00029_update9      | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,109)  |   39 | 509 | 00039_update9      | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,110)  |   49 | 509 | 00049_update9      | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,111)  |   59 | 509 | 00059_update9      | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,112)  |   69 | 509 | 00069_update9      | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,113)  |   79 | 509 | 00079_update9      | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9  |
ft2       
 
| foo
- (0,114)  |   89 | 509 | 00089_update9      | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9  |
ft2       
 
| foo
  (1,1)    |   98 |   8 | 00098              | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8  |
8         
 
| foo
  (1,3)    |  100 |   0 | 00100              | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0  |
0         
 
| foo
  (1,4)    |  101 |   1 | 00101              | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  |
1         
 
| foo
...

- (12,137) | 2007 |   7 | bar |                              |                          |    | ft2        |
- (12,138) | 2008 |   8 | bar |                              |                          |    | ft2        |
- (12,139) | 2009 |   9 | bar |                              |                          |    | ft2        |
- (12,140) | 2010 |   0 | bar |                              |                          |    | ft2        |
+ (11,80)  |    9 | 509 | 00009_update9      | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9  |
ft2       
 
| foo
+ (11,81)  |   19 | 509 | 00019_update9      | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9  |
ft2       
 
| foo
+ (11,82)  |   29 | 509 | 00029_update9      | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9  |
ft2       
 
| foo
+ (11,83)  |   39 | 509 | 00039_update9      | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9  |
ft2       
 
| foo
...
+ (13,126) | 1013 | 403 | 0001300013_update3 |                              |                          |    |
ft2       |
 
+ (13,127) | 1019 | 609 | 0001900019_update9 |                              |                          |    |
ft2       |
 
+ (13,128) | 1103 | 503 | ccc_update3 |                              |                          |    | ft2        |
+ (14,1)   | 2010 |   0 | bar |                              |                          |    | ft2        |
(829 rows)


That is, for all the UPDATE failures I could see, the tuple with c1 == 2010
has ctid == (14, 1).

Interestingly enough, with "log_autovacuum_min_duration = 0" added to
TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
indirectly.

With autovacuum = off, all of these fluctuations go away.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2025-03-14%2003%3A47%3A50

Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Attachment

Re: Regression test postgres_fdw might fail due to autovacuum

From
Fujii Masao
Date:

On 2025/03/23 23:00, Alexander Lakhin wrote:
> Hello hackers,
> 
> A recent buildfarm failure [1] with the following diagnostics:
> 72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress               ERROR            19.04s exit
status1
 
> 
> postgres_fdw-running/regress/results/postgres_fdw.out
> --- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11
15:21:27.681846597+0000
 
> +++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
2025-03-1404:02:32.573999799 +0000
 
> @@ -6392,6 +6392,7 @@
>   UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
>     c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8
>   ------+----+-----+----+----+----+------------+----
> + 2010 |  0 | bar |    |    |    | ft2        |
>    2001 |  1 | bar |    |    |    | ft2        |
>    2002 |  2 | bar |    |    |    | ft2        |
>    2003 |  3 | bar |    |    |    | ft2        |
> @@ -6401,7 +6402,6 @@
>    2007 |  7 | bar |    |    |    | ft2        |
>    2008 |  8 | bar |    |    |    | ft2        |
>    2009 |  9 | bar |    |    |    | ft2        |
> - 2010 |  0 | bar |    |    |    | ft2        |
>   (10 rows)
> 
>   EXPLAIN (verbose, costs off)
> 
> shows that the UPDATE ... RETURNING query might be unstable due to lack of
> ORDER BY.
> 
> I managed to reproduce this failure locally with the attached patch
> applied to make the test repeatable and with:
> sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i
contrib/postgres_fdw/Makefile
> (Running 10 instances of the test in parallel eases reproducing as well.)
> 
> I also added SELECT ctid, * FROM ft2; just after the query in question and
> found out that the results of the SELECT more unstable, but for the UPDATE
> to produce the difference, the tuple with c1 == 2010 must reside on a new
> page. For example:
> --- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 +0000
> +++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 +0000
> @@ -6399,6 +6399,7 @@
> UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
>    c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8
> ------+----+-----+----+----+----+------------+----
> + 2010 |  0 | bar |    |    |    | ft2        |
>   2001 |  1 | bar |    |    |    | ft2        |
>   2002 |  2 | bar |    |    |    | ft2        |
>   2003 |  3 | bar |    |    |    | ft2        |
> @@ -6408,7 +6409,6 @@
>   2007 |  7 | bar |    |    |    | ft2        |
>   2008 |  8 | bar |    |    |    | ft2        |
>   2009 |  9 | bar |    |    |    | ft2        |
> - 2010 |  0 | bar |    |    |    | ft2        |
> 
> SELECT ctid, * FROM ft2;
> @@ -6470,15 +6470,6 @@
>   (0,103)  |   57 | 407 | 00057_update7      | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7  | 7 |
foo
>   (0,104)  |   67 | 407 | 00067_update7      | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7  | 7 |
foo
>   (0,105)  |   77 | 407 | 00077_update7      | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7  | 7 |
foo
> - (0,106)  |    9 | 509 | 00009_update9      | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9  | ft2 |
foo
> - (0,107)  |   19 | 509 | 00019_update9      | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9  | ft2 |
foo
> - (0,108)  |   29 | 509 | 00029_update9      | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9  | ft2 |
foo
> - (0,109)  |   39 | 509 | 00039_update9      | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9  | ft2 |
foo
> - (0,110)  |   49 | 509 | 00049_update9      | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9  | ft2 |
foo
> - (0,111)  |   59 | 509 | 00059_update9      | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9  | ft2 |
foo
> - (0,112)  |   69 | 509 | 00069_update9      | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9  | ft2 |
foo
> - (0,113)  |   79 | 509 | 00079_update9      | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9  | ft2 |
foo
> - (0,114)  |   89 | 509 | 00089_update9      | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9  | ft2 |
foo
>   (1,1)    |   98 |   8 | 00098              | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8  | 8 |
foo
>   (1,3)    |  100 |   0 | 00100              | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0  | 0 |
foo
>   (1,4)    |  101 |   1 | 00101              | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1 |
foo
> ...
> 
> - (12,137) | 2007 |   7 | bar |                              |                          |    | ft2        |
> - (12,138) | 2008 |   8 | bar |                              |                          |    | ft2        |
> - (12,139) | 2009 |   9 | bar |                              |                          |    | ft2        |
> - (12,140) | 2010 |   0 | bar |                              |                          |    | ft2        |
> + (11,80)  |    9 | 509 | 00009_update9      | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9  | ft2 |
foo
> + (11,81)  |   19 | 509 | 00019_update9      | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9  | ft2 |
foo
> + (11,82)  |   29 | 509 | 00029_update9      | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9  | ft2 |
foo
> + (11,83)  |   39 | 509 | 00039_update9      | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9  | ft2 |
foo
> ...
> + (13,126) | 1013 | 403 | 0001300013_update3 |                              |                          |    |
ft2       |
 
> + (13,127) | 1019 | 609 | 0001900019_update9 |                              |                          |    |
ft2       |
 
> + (13,128) | 1103 | 503 | ccc_update3 |                              |                          |    | ft2        |
> + (14,1)   | 2010 |   0 | bar |                              |                          |    | ft2        |
> (829 rows)
> 
> 
> That is, for all the UPDATE failures I could see, the tuple with c1 == 2010
> has ctid == (14, 1).
> 
> Interestingly enough, with "log_autovacuum_min_duration = 0" added to
> TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
> to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
> indirectly.

Thanks for reporting and investigating this issue!


> With autovacuum = off, all of these fluctuations go away.

So you are suggesting disabling autovacuum during the postgres_fdw regression test?


Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this:

     WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...

Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Regression test postgres_fdw might fail due to autovacuum

From
Melanie Plageman
Date:
On Sun, Mar 23, 2025 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Interestingly enough, with "log_autovacuum_min_duration = 0" added to
> TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
> to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
> indirectly.
>
> With autovacuum = off, all of these fluctuations go away.

If autovacuum is somehow corrupting the table, then isn't there a bug
that needs to be fixed? Also, are you only able to reproduce this on
master?

- Melanie



Re: Regression test postgres_fdw might fail due to autovacuum

From
Alexander Lakhin
Date:
Hello Masao-san and Melanie,

Thank you for your attention to this!

23.03.2025 19:26, Fujii Masao wrote:
>
>> With autovacuum = off, all of these fluctuations go away.
>
> So you are suggesting disabling autovacuum during the postgres_fdw regression test?

No, I'd prefer to have the test stable with working autovacuum. I just
wanted to point out that this failure is definitely caused by autovacuum
despite the lack of corresponding messages in the postmaster log.

>
> Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this:
>
>     WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...
>
> Thought?

Yeah, maybe such a trick will do.

23.03.2025 19:30, Melanie Plageman wrote:
> On Sun, Mar 23, 2025 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> With autovacuum = off, all of these fluctuations go away.
> If autovacuum is somehow corrupting the table, then isn't there a bug
> that needs to be fixed? Also, are you only able to reproduce this on
> master?

Yes, I reproduced this on master, but I don't think the table is
corrupted in this case, it just contains tuples in other order.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)



Re: Regression test postgres_fdw might fail due to autovacuum

From
Andres Freund
Date:
Hi,

On 2025-03-24 02:26:35 +0900, Fujii Masao wrote:
> > With autovacuum = off, all of these fluctuations go away.
> 
> So you are suggesting disabling autovacuum during the postgres_fdw regression test?

I don't think that's a good fix in this case.


> Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this:
> 
>     WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...

+1


On 2025-03-23 16:00:00 +0200, Alexander Lakhin wrote:
> Interestingly enough, with "log_autovacuum_min_duration = 0" added to
> TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
> to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
> indirectly.
> 
> With autovacuum = off, all of these fluctuations go away.

Hm. Is it possible that autovacuum started but didn't finish?


FWIW, I observed some oddities around autovacuum=0 recently, but didn't have
time to track the source down yet. E.g. running pg_trgm's tests against a
server with autovacuum=0 results in test failures due to plan changes, despite
no AVs getting launched.

Greetings,

Andres