Thread: Reducing logs produced by TAP tests running pg_regress on crash
Hi all, (Thomas and Andres in CC.) Andres has complained a couple of days ago about the quantity of logs that can be fed into the TAP tests running pg_regress: https://www.postgresql.org/message-id/20220603195318.qk4voicqfdhlsnoh@alap3.anarazel.de This concerns the TAP tests of pg_upgrade, as well as 027_stream_regress.pl, where a crash while running the tests would show a couple of megs worth of regression.diffs. Most of the output generated does not make much sense to have in this case, and the parallel schedule gives a harder time to spot the exact query involved in the crash (if that's a query) while the logs of the backend should be enough to spot what's the problem with the PIDs tracked. One idea I got to limit the useless output generated is to check the status of the cluster after running the regression test suite as restart_on_crash is disabled by default in Cluster.pm, and avoid any follow-up logic in these tests if the cluster is not running anymore, as of the attached. Thoughts? -- Michael
Attachment
On Thu, Jun 23, 2022 at 02:30:13PM +0900, Michael Paquier wrote: > One idea I got to limit the useless output generated is to check the > status of the cluster after running the regression test suite as > restart_on_crash is disabled by default in Cluster.pm, and avoid any > follow-up logic in these tests if the cluster is not running anymore, > as of the attached. So, this is still an open item.. Thomas, any objections about this one? Checking for the status of the node after completing pg_regress still sounds like a good idea to me, because as restart_after_crash is disabled we would generate a ton of logs coming from regression.diffs for nothing. On top of that the parallel connections make harder finding which query failed, and the logs of the backend provide enough context already on a hard crash. -- Michael
Attachment
On Fri, Jul 22, 2022 at 1:09 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Jun 23, 2022 at 02:30:13PM +0900, Michael Paquier wrote: > > One idea I got to limit the useless output generated is to check the > > status of the cluster after running the regression test suite as > > restart_on_crash is disabled by default in Cluster.pm, and avoid any > > follow-up logic in these tests if the cluster is not running anymore, > > as of the attached. > > So, this is still an open item.. > > Thomas, any objections about this one? Checking for the status of the > node after completing pg_regress still sounds like a good idea to me, > because as restart_after_crash is disabled we would generate a ton of > logs coming from regression.diffs for nothing. On top of that the > parallel connections make harder finding which query failed, and the > logs of the backend provide enough context already on a hard crash. What if the clue we need to understand why it crashed was in the regression diffs that we didn't dump? I wonder if we should move the noise suppression check closer to pg_regress, so that it works also for the "main" pg_regress run, not only the one in this new TAP test. As discussed in this thread, inconclusively: https://www.postgresql.org/message-id/flat/CA%2BhUKGL7hxqbadkto7e1FCOLQhuHg%3DwVn_PDZd6fDMbQrrZisA%40mail.gmail.com
On Fri, Jul 22, 2022 at 01:18:34PM +1200, Thomas Munro wrote: > I wonder if we should move the noise suppression check closer to > pg_regress, so that it works also for the "main" pg_regress run, not > only the one in this new TAP test. As discussed in this thread, > inconclusively: Yes, perhaps. We could reduce the amount of junk generated in regression.diffs in a more centralized way with this approach, and I agree that this should use restart_after_crash=off as well as something to prevent more tests to run if we are not able to connect, as you mentioned there. At least this would reduce the spam down to tests running in parallel to the session that crashed. > https://www.postgresql.org/message-id/flat/CA%2BhUKGL7hxqbadkto7e1FCOLQhuHg%3DwVn_PDZd6fDMbQrrZisA%40mail.gmail.com Ah, I forgot about this recent thread. Let's just move the discussion there. Thanks! I am planning to remove this open item from the list and mark it as "won't fix", as this is a much older issue. -- Michael