Thread: Regression test postgres_fdw might fail due to autovacuum
Hello hackers, A recent buildfarm failure [1] with the following diagnostics: 72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 19.04s exit status 1 postgres_fdw-running/regress/results/postgres_fdw.out --- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11 15:21:27.681846597 +0000 +++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out 2025-03-14 04:02:32.573999799 +0000 @@ -6392,6 +6392,7 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ------+----+-----+----+----+----+------------+---- + 2010 | 0 | bar | | | | ft2 | 2001 | 1 | bar | | | | ft2 | 2002 | 2 | bar | | | | ft2 | 2003 | 3 | bar | | | | ft2 | @@ -6401,7 +6402,6 @@ 2007 | 7 | bar | | | | ft2 | 2008 | 8 | bar | | | | ft2 | 2009 | 9 | bar | | | | ft2 | - 2010 | 0 | bar | | | | ft2 | (10 rows) EXPLAIN (verbose, costs off) shows that the UPDATE ... RETURNING query might be unstable due to lack of ORDER BY. I managed to reproduce this failure locally with the attached patch applied to make the test repeatable and with: sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i contrib/postgres_fdw/Makefile (Running 10 instances of the test in parallel eases reproducing as well.) I also added SELECT ctid, * FROM ft2; just after the query in question and found out that the results of the SELECT more unstable, but for the UPDATE to produce the difference, the tuple with c1 == 2010 must reside on a new page. For example: --- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 +0000 +++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 +0000 @@ -6399,6 +6399,7 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ------+----+-----+----+----+----+------------+---- + 2010 | 0 | bar | | | | ft2 | 2001 | 1 | bar | | | | ft2 | 2002 | 2 | bar | | | | ft2 | 2003 | 3 | bar | | | | ft2 | @@ -6408,7 +6409,6 @@ 2007 | 7 | bar | | | | ft2 | 2008 | 8 | bar | | | | ft2 | 2009 | 9 | bar | | | | ft2 | - 2010 | 0 | bar | | | | ft2 | SELECT ctid, * FROM ft2; @@ -6470,15 +6470,6 @@ (0,103) | 57 | 407 | 00057_update7 | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7 | 7 | foo (0,104) | 67 | 407 | 00067_update7 | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7 | 7 | foo (0,105) | 77 | 407 | 00077_update7 | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7 | 7 | foo - (0,106) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo - (0,107) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo - (0,108) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo - (0,109) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo - (0,110) | 49 | 509 | 00049_update9 | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9 | ft2 | foo - (0,111) | 59 | 509 | 00059_update9 | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9 | ft2 | foo - (0,112) | 69 | 509 | 00069_update9 | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9 | ft2 | foo - (0,113) | 79 | 509 | 00079_update9 | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9 | ft2 | foo - (0,114) | 89 | 509 | 00089_update9 | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9 | ft2 | foo (1,1) | 98 | 8 | 00098 | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8 | 8 | foo (1,3) | 100 | 0 | 00100 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo (1,4) | 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo ... - (12,137) | 2007 | 7 | bar | | | | ft2 | - (12,138) | 2008 | 8 | bar | | | | ft2 | - (12,139) | 2009 | 9 | bar | | | | ft2 | - (12,140) | 2010 | 0 | bar | | | | ft2 | + (11,80) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo + (11,81) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo + (11,82) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo + (11,83) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo ... + (13,126) | 1013 | 403 | 0001300013_update3 | | | | ft2 | + (13,127) | 1019 | 609 | 0001900019_update9 | | | | ft2 | + (13,128) | 1103 | 503 | ccc_update3 | | | | ft2 | + (14,1) | 2010 | 0 | bar | | | | ft2 | (829 rows) That is, for all the UPDATE failures I could see, the tuple with c1 == 2010 has ctid == (14, 1). Interestingly enough, with "log_autovacuum_min_duration = 0" added to TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table indirectly. With autovacuum = off, all of these fluctuations go away. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2025-03-14%2003%3A47%3A50 Best regards, Alexander Lakhin Neon (https://neon.tech)
Attachment
On 2025/03/23 23:00, Alexander Lakhin wrote: > Hello hackers, > > A recent buildfarm failure [1] with the following diagnostics: > 72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 19.04s exit status1 > > postgres_fdw-running/regress/results/postgres_fdw.out > --- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11 15:21:27.681846597+0000 > +++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out 2025-03-1404:02:32.573999799 +0000 > @@ -6392,6 +6392,7 @@ > UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; > c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 > ------+----+-----+----+----+----+------------+---- > + 2010 | 0 | bar | | | | ft2 | > 2001 | 1 | bar | | | | ft2 | > 2002 | 2 | bar | | | | ft2 | > 2003 | 3 | bar | | | | ft2 | > @@ -6401,7 +6402,6 @@ > 2007 | 7 | bar | | | | ft2 | > 2008 | 8 | bar | | | | ft2 | > 2009 | 9 | bar | | | | ft2 | > - 2010 | 0 | bar | | | | ft2 | > (10 rows) > > EXPLAIN (verbose, costs off) > > shows that the UPDATE ... RETURNING query might be unstable due to lack of > ORDER BY. > > I managed to reproduce this failure locally with the attached patch > applied to make the test repeatable and with: > sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i contrib/postgres_fdw/Makefile > (Running 10 instances of the test in parallel eases reproducing as well.) > > I also added SELECT ctid, * FROM ft2; just after the query in question and > found out that the results of the SELECT more unstable, but for the UPDATE > to produce the difference, the tuple with c1 == 2010 must reside on a new > page. For example: > --- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 +0000 > +++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 +0000 > @@ -6399,6 +6399,7 @@ > UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; > c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 > ------+----+-----+----+----+----+------------+---- > + 2010 | 0 | bar | | | | ft2 | > 2001 | 1 | bar | | | | ft2 | > 2002 | 2 | bar | | | | ft2 | > 2003 | 3 | bar | | | | ft2 | > @@ -6408,7 +6409,6 @@ > 2007 | 7 | bar | | | | ft2 | > 2008 | 8 | bar | | | | ft2 | > 2009 | 9 | bar | | | | ft2 | > - 2010 | 0 | bar | | | | ft2 | > > SELECT ctid, * FROM ft2; > @@ -6470,15 +6470,6 @@ > (0,103) | 57 | 407 | 00057_update7 | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7 | 7 | foo > (0,104) | 67 | 407 | 00067_update7 | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7 | 7 | foo > (0,105) | 77 | 407 | 00077_update7 | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7 | 7 | foo > - (0,106) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo > - (0,107) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo > - (0,108) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo > - (0,109) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo > - (0,110) | 49 | 509 | 00049_update9 | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9 | ft2 | foo > - (0,111) | 59 | 509 | 00059_update9 | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9 | ft2 | foo > - (0,112) | 69 | 509 | 00069_update9 | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9 | ft2 | foo > - (0,113) | 79 | 509 | 00079_update9 | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9 | ft2 | foo > - (0,114) | 89 | 509 | 00089_update9 | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9 | ft2 | foo > (1,1) | 98 | 8 | 00098 | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8 | 8 | foo > (1,3) | 100 | 0 | 00100 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo > (1,4) | 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo > ... > > - (12,137) | 2007 | 7 | bar | | | | ft2 | > - (12,138) | 2008 | 8 | bar | | | | ft2 | > - (12,139) | 2009 | 9 | bar | | | | ft2 | > - (12,140) | 2010 | 0 | bar | | | | ft2 | > + (11,80) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo > + (11,81) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo > + (11,82) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo > + (11,83) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo > ... > + (13,126) | 1013 | 403 | 0001300013_update3 | | | | ft2 | > + (13,127) | 1019 | 609 | 0001900019_update9 | | | | ft2 | > + (13,128) | 1103 | 503 | ccc_update3 | | | | ft2 | > + (14,1) | 2010 | 0 | bar | | | | ft2 | > (829 rows) > > > That is, for all the UPDATE failures I could see, the tuple with c1 == 2010 > has ctid == (14, 1). > > Interestingly enough, with "log_autovacuum_min_duration = 0" added to > TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related > to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table > indirectly. Thanks for reporting and investigating this issue! > With autovacuum = off, all of these fluctuations go away. So you are suggesting disabling autovacuum during the postgres_fdw regression test? Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this: WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ... Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Sun, Mar 23, 2025 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Interestingly enough, with "log_autovacuum_min_duration = 0" added to > TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related > to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table > indirectly. > > With autovacuum = off, all of these fluctuations go away. If autovacuum is somehow corrupting the table, then isn't there a bug that needs to be fixed? Also, are you only able to reproduce this on master? - Melanie
Hello Masao-san and Melanie, Thank you for your attention to this! 23.03.2025 19:26, Fujii Masao wrote: > >> With autovacuum = off, all of these fluctuations go away. > > So you are suggesting disabling autovacuum during the postgres_fdw regression test? No, I'd prefer to have the test stable with working autovacuum. I just wanted to point out that this failure is definitely caused by autovacuum despite the lack of corresponding messages in the postmaster log. > > Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this: > > WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ... > > Thought? Yeah, maybe such a trick will do. 23.03.2025 19:30, Melanie Plageman wrote: > On Sun, Mar 23, 2025 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: >> With autovacuum = off, all of these fluctuations go away. > If autovacuum is somehow corrupting the table, then isn't there a bug > that needs to be fixed? Also, are you only able to reproduce this on > master? Yes, I reproduced this on master, but I don't think the table is corrupted in this case, it just contains tuples in other order. Best regards, Alexander Lakhin Neon (https://neon.tech)
Hi, On 2025-03-24 02:26:35 +0900, Fujii Masao wrote: > > With autovacuum = off, all of these fluctuations go away. > > So you are suggesting disabling autovacuum during the postgres_fdw regression test? I don't think that's a good fix in this case. > Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this: > > WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ... +1 On 2025-03-23 16:00:00 +0200, Alexander Lakhin wrote: > Interestingly enough, with "log_autovacuum_min_duration = 0" added to > TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related > to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table > indirectly. > > With autovacuum = off, all of these fluctuations go away. Hm. Is it possible that autovacuum started but didn't finish? FWIW, I observed some oddities around autovacuum=0 recently, but didn't have time to track the source down yet. E.g. running pg_trgm's tests against a server with autovacuum=0 results in test failures due to plan changes, despite no AVs getting launched. Greetings, Andres