Thread: SKIP_LOCKED test causes random buildfarm failures
Every once in awhile we get failures like this one: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2019-11-05%2008%3A27%3A27 diff -U3 /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/../pgsql/src/test/regress/expected/vacuum.out /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/src/test/regress/results/vacuum.out --- /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/../pgsql/src/test/regress/expected/vacuum.out 2019-08-11 03:02:18.921535948-0700 +++ /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/src/test/regress/results/vacuum.out 2019-11-05 00:50:42.381244885-0800 @@ -204,6 +204,7 @@ -- SKIP_LOCKED option VACUUM (SKIP_LOCKED) vactst; VACUUM (SKIP_LOCKED, FULL) vactst; +WARNING: skipping vacuum of "vactst" --- lock not available ANALYZE (SKIP_LOCKED) vactst; -- ensure VACUUM and ANALYZE don't have a problem with serializable SET default_transaction_isolation = serializable; No doubt this is a conflict with autovacuum. There are two reasonable ways to remove the test instability: * Crank up client_min_messages to more than WARNING for this test stanza. * Downgrade the "skipping" messages to DEBUG1 or less. I kind of wonder why we are issuing a "WARNING" when the statement does exactly what you asked it to, anyway. At most I'd expect that to be a NOTICE condition. regards, tom lane
Hi, On 2019-11-06 16:54:38 -0500, Tom Lane wrote: > Every once in awhile we get failures like this one: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2019-11-05%2008%3A27%3A27 > > diff -U3 /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/../pgsql/src/test/regress/expected/vacuum.out /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/src/test/regress/results/vacuum.out > --- /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/../pgsql/src/test/regress/expected/vacuum.out 2019-08-1103:02:18.921535948 -0700 > +++ /home/pgsql/build-farm/buildroot-clang/HEAD/pgsql.build/src/test/regress/results/vacuum.out 2019-11-05 00:50:42.381244885-0800 > @@ -204,6 +204,7 @@ > -- SKIP_LOCKED option > VACUUM (SKIP_LOCKED) vactst; > VACUUM (SKIP_LOCKED, FULL) vactst; > +WARNING: skipping vacuum of "vactst" --- lock not available > ANALYZE (SKIP_LOCKED) vactst; > -- ensure VACUUM and ANALYZE don't have a problem with serializable > SET default_transaction_isolation = serializable; > > > No doubt this is a conflict with autovacuum. There are two reasonable > ways to remove the test instability: I assume you consider disabling autovacuum for that table not a reasonable approach? Due to the danger that it could end up still running, e.g. due to anti-wraparound or such? > * Crank up client_min_messages to more than WARNING for this test > stanza. Hm. > * Downgrade the "skipping" messages to DEBUG1 or less. > > I kind of wonder why we are issuing a "WARNING" when the statement > does exactly what you asked it to, anyway. At most I'd expect > that to be a NOTICE condition. I don't know what lead us to doing so, but it doesn't seem reasonable to allow the user to see whether the table has actually been vacuumed. I would assume that one uses SKIP_LOCKED partially to avoid unnecessary impacts in production due to other tasks starting to block on e.g. a VACUUM FULL, even though without the "ordered queueing" everything could just go on working fine. I'm not sure that indicates whether WARNING or NOTICE is the best choice. So I'd be inclined to go with the client_min_messages approach? Greetings, Andres Freund
On Wed, Nov 06, 2019 at 03:01:11PM -0800, Andres Freund wrote: > I don't know what lead us to doing so, but it doesn't seem reasonable to > allow the user to see whether the table has actually been vacuumed. I > would assume that one uses SKIP_LOCKED partially to avoid unnecessary > impacts in production due to other tasks starting to block on e.g. a > VACUUM FULL, even though without the "ordered queueing" everything could > just go on working fine. I'm not sure that indicates whether WARNING or > NOTICE is the best choice. Good question. That's a historical choice, still I have seen cases where those warnings are helpful while not making the logs too verbose to see some congestion in the jobs. > So I'd be inclined to go with the client_min_messages approach? The main purpose of the tests in regress/ is to check after the grammar, so using client_min_messages sounds like a plan. We have a second set of tests in isolation/ where I would actually like to disable autovacuum by default on a subset of tables. Thoughts about the attached? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Wed, Nov 06, 2019 at 03:01:11PM -0800, Andres Freund wrote: >> I don't know what lead us to doing so, but it doesn't seem reasonable to >> allow the user to see whether the table has actually been vacuumed. I >> would assume that one uses SKIP_LOCKED partially to avoid unnecessary >> impacts in production due to other tasks starting to block on e.g. a >> VACUUM FULL, even though without the "ordered queueing" everything could >> just go on working fine. I'm not sure that indicates whether WARNING or >> NOTICE is the best choice. > Good question. That's a historical choice, still I have seen cases > where those warnings are helpful while not making the logs too > verbose to see some congestion in the jobs. I kind of feel that NOTICE is more semantically appropriate, but perhaps there's an argument for keeping it at WARNING. >> So I'd be inclined to go with the client_min_messages approach? > The main purpose of the tests in regress/ is to check after the > grammar, so using client_min_messages sounds like a plan. We have > a second set of tests in isolation/ where I would actually like to > disable autovacuum by default on a subset of tables. Thoughts about > the attached? I do not want to fix this in the main tests by disabling autovacuum, because that'd actually reduce the tests' cross-section. The fact that this happens occasionally is a Good Thing for verifying that the code paths actually work. So it seems that there's a consensus on adjusting client_min_messages to fix the test output instability --- but we need to agree on whether to do s/WARNING/NOTICE/ first, so we can know what to set client_min_messages to. As for the case in the isolation test, shouldn't we also use client_min_messages there, rather than prevent the conflict from arising? Or would that test fail in some larger way if autovacuum gets in the way? regards, tom lane
On Thu, Nov 07, 2019 at 11:50:25AM -0500, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> Good question. That's a historical choice, still I have seen cases >> where those warnings are helpful while not making the logs too >> verbose to see some congestion in the jobs. > > I kind of feel that NOTICE is more semantically appropriate, but > perhaps there's an argument for keeping it at WARNING. Perhaps. Well, that's the same level as the one used after the permission checks on the relation vacuumed. > I do not want to fix this in the main tests by disabling autovacuum, > because that'd actually reduce the tests' cross-section. The fact > that this happens occasionally is a Good Thing for verifying that the > code paths actually work. So it seems that there's a consensus on > adjusting client_min_messages to fix the test output instability --- > but we need to agree on whether to do s/WARNING/NOTICE/ first, so we > can know what to set client_min_messages to. Makes sense. > As for the case in the isolation test, shouldn't we also use > client_min_messages there, rather than prevent the conflict > from arising? Or would that test fail in some larger way if > autovacuum gets in the way? I think that there is no risk regarding the stability of the output because we use LOCK before from a first session on the relation to vacuum in a second session. So if autovacuum runs in parallel, the consequence would be a small slow down while waiting on the lock to be taken. And per the way the test is ordered, it seems to me that it makes the most sense to disable autovacuum as it would just get in the way. In this case I think that it is actually better to show the messages as that makes the tests more verbose and we make sure to test their format, even if we could just rely on the fact that VACUUM should just be blocking or non-blocking. -- Michael
Attachment
I wrote: > Michael Paquier <michael@paquier.xyz> writes: >> Good question. That's a historical choice, still I have seen cases >> where those warnings are helpful while not making the logs too >> verbose to see some congestion in the jobs. > I kind of feel that NOTICE is more semantically appropriate, but > perhaps there's an argument for keeping it at WARNING. Well, I'm not hearing any groundswell of support for changing the message level, so let's leave that as-is and just see about stabilizing the tests. >> The main purpose of the tests in regress/ is to check after the >> grammar, so using client_min_messages sounds like a plan. We have >> a second set of tests in isolation/ where I would actually like to >> disable autovacuum by default on a subset of tables. Thoughts about >> the attached? After looking more closely at the isolation test, I agree that adding the "ALTER TABLE ... SET (autovacuum_enabled = false)" bits to it is a good idea. The LOCK operations should make that irrelevant for part1, but there's at least a theoretical hazard for part2. (Actually, is "autovacuum_enabled = false" really sufficient to keep autovacuum away? It'd probably lock the table for long enough to examine its reloptions, so it seems like all we're doing here is reducing the window for trouble a little bit. Still, maybe that's worthwhile.) As for the SKIP_LOCKED tests in vacuum.sql, what I now propose is that we just remove them. I do not see that they're offering any coverage that's not completely redundant with this isolation test. We don't need to spend cycles every day on that. regards, tom lane
Hi, On 2019-11-14 13:37:44 -0500, Tom Lane wrote: > I wrote: > > Michael Paquier <michael@paquier.xyz> writes: > >> Good question. That's a historical choice, still I have seen cases > >> where those warnings are helpful while not making the logs too > >> verbose to see some congestion in the jobs. > > > I kind of feel that NOTICE is more semantically appropriate, but > > perhaps there's an argument for keeping it at WARNING. > > Well, I'm not hearing any groundswell of support for changing the > message level, so let's leave that as-is and just see about > stabilizing the tests. Ok. > >> The main purpose of the tests in regress/ is to check after the > >> grammar, so using client_min_messages sounds like a plan. We have > >> a second set of tests in isolation/ where I would actually like to > >> disable autovacuum by default on a subset of tables. Thoughts about > >> the attached? > > After looking more closely at the isolation test, I agree that adding > the "ALTER TABLE ... SET (autovacuum_enabled = false)" bits to it is > a good idea. The LOCK operations should make that irrelevant for > part1, but there's at least a theoretical hazard for part2. > (Actually, is "autovacuum_enabled = false" really sufficient to > keep autovacuum away? It'd probably lock the table for long enough > to examine its reloptions, so it seems like all we're doing here is > reducing the window for trouble a little bit. Still, maybe that's > worthwhile.) +1 > As for the SKIP_LOCKED tests in vacuum.sql, what I now propose is that > we just remove them. I do not see that they're offering any coverage > that's not completely redundant with this isolation test. We don't > need to spend cycles every day on that. -0 on that, I'd rather just put a autovacuum_enabled = false for them. They're quick enough, and it's nice to have decent coverage of various options within the plain regression tests when possible. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-11-14 13:37:44 -0500, Tom Lane wrote: >> As for the SKIP_LOCKED tests in vacuum.sql, what I now propose is that >> we just remove them. I do not see that they're offering any coverage >> that's not completely redundant with this isolation test. We don't >> need to spend cycles every day on that. > -0 on that, I'd rather just put a autovacuum_enabled = false for > them. They're quick enough, and it's nice to have decent coverage of > various options within the plain regression tests when possible. If we're going to keep them in vacuum.sql, we should use the client_min_messages fix there, as that's a full solution not just reducing the window. But I don't agree that these tests are worth the cycles, given the coverage elsewhere. The probability of breaking this option is just not high enough to justify core-regression-test coverage. regards, tom lane
On Thu, Nov 14, 2019 at 03:20:09PM -0500, Tom Lane wrote: > If we're going to keep them in vacuum.sql, we should use the > client_min_messages fix there, as that's a full solution not just > reducing the window. But I don't agree that these tests are worth > the cycles, given the coverage elsewhere. The probability of breaking > this option is just not high enough to justify core-regression-test > coverage. I would rather keep the solution with client_min_messages, and the tests in vacuum.sql to keep those checks for the grammar parsing. So this basically brings us back to use the patch I proposed here: https://www.postgresql.org/message-id/20191107013942.GA1768@paquier.xyz Any objections? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > I would rather keep the solution with client_min_messages, and the > tests in vacuum.sql to keep those checks for the grammar parsing. So > this basically brings us back to use the patch I proposed here: > https://www.postgresql.org/message-id/20191107013942.GA1768@paquier.xyz OK. regards, tom lane
On Fri, Nov 15, 2019 at 11:22:20AM -0500, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> I would rather keep the solution with client_min_messages, and the >> tests in vacuum.sql to keep those checks for the grammar parsing. So >> this basically brings us back to use the patch I proposed here: >> https://www.postgresql.org/message-id/20191107013942.GA1768@paquier.xyz > > OK. Thanks, applied and back-patched. -- Michael