Thread: Reducing logs produced by TAP tests running pg_regress on crash

Reducing logs produced by TAP tests running pg_regress on crash

From
Michael Paquier
Date:
Hi all,
(Thomas and Andres in CC.)

Andres has complained a couple of days ago about the quantity of logs
that can be fed into the TAP tests running pg_regress:
https://www.postgresql.org/message-id/20220603195318.qk4voicqfdhlsnoh@alap3.anarazel.de

This concerns the TAP tests of pg_upgrade, as well as
027_stream_regress.pl, where a crash while running the tests would
show a couple of megs worth of regression.diffs.  Most of the output
generated does not make much sense to have in this case, and the
parallel schedule gives a harder time to spot the exact query involved
in the crash (if that's a query) while the logs of the backend should
be enough to spot what's the problem with the PIDs tracked.

One idea I got to limit the useless output generated is to check the
status of the cluster after running the regression test suite as
restart_on_crash is disabled by default in Cluster.pm, and avoid any
follow-up logic in these tests if the cluster is not running anymore,
as of the attached.

Thoughts?
--
Michael

Attachment

Re: Reducing logs produced by TAP tests running pg_regress on crash

From
Michael Paquier
Date:
On Thu, Jun 23, 2022 at 02:30:13PM +0900, Michael Paquier wrote:
> One idea I got to limit the useless output generated is to check the
> status of the cluster after running the regression test suite as
> restart_on_crash is disabled by default in Cluster.pm, and avoid any
> follow-up logic in these tests if the cluster is not running anymore,
> as of the attached.

So, this is still an open item..

Thomas, any objections about this one?  Checking for the status of the
node after completing pg_regress still sounds like a good idea to me,
because as restart_after_crash is disabled we would generate a ton of
logs coming from regression.diffs for nothing.  On top of that the
parallel connections make harder finding which query failed, and the
logs of the backend provide enough context already on a hard crash.
--
Michael

Attachment

Re: Reducing logs produced by TAP tests running pg_regress on crash

From
Thomas Munro
Date:
On Fri, Jul 22, 2022 at 1:09 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Jun 23, 2022 at 02:30:13PM +0900, Michael Paquier wrote:
> > One idea I got to limit the useless output generated is to check the
> > status of the cluster after running the regression test suite as
> > restart_on_crash is disabled by default in Cluster.pm, and avoid any
> > follow-up logic in these tests if the cluster is not running anymore,
> > as of the attached.
>
> So, this is still an open item..
>
> Thomas, any objections about this one?  Checking for the status of the
> node after completing pg_regress still sounds like a good idea to me,
> because as restart_after_crash is disabled we would generate a ton of
> logs coming from regression.diffs for nothing.  On top of that the
> parallel connections make harder finding which query failed, and the
> logs of the backend provide enough context already on a hard crash.

What if the clue we need to understand why it crashed was in the
regression diffs that we didn't dump?

I wonder if we should move the noise suppression check closer to
pg_regress, so that it works also for the "main" pg_regress run, not
only the one in this new TAP test.  As discussed in this thread,
inconclusively:

https://www.postgresql.org/message-id/flat/CA%2BhUKGL7hxqbadkto7e1FCOLQhuHg%3DwVn_PDZd6fDMbQrrZisA%40mail.gmail.com



Re: Reducing logs produced by TAP tests running pg_regress on crash

From
Michael Paquier
Date:
On Fri, Jul 22, 2022 at 01:18:34PM +1200, Thomas Munro wrote:
> I wonder if we should move the noise suppression check closer to
> pg_regress, so that it works also for the "main" pg_regress run, not
> only the one in this new TAP test.  As discussed in this thread,
> inconclusively:

Yes, perhaps.  We could reduce the amount of junk generated in
regression.diffs in a more centralized way with this approach, and
I agree that this should use restart_after_crash=off as well as
something to prevent more tests to run if we are not able to connect,
as you mentioned there.  At least this would reduce the spam down to
tests running in parallel to the session that crashed.

> https://www.postgresql.org/message-id/flat/CA%2BhUKGL7hxqbadkto7e1FCOLQhuHg%3DwVn_PDZd6fDMbQrrZisA%40mail.gmail.com

Ah, I forgot about this recent thread.  Let's just move the discussion
there.  Thanks!

I am planning to remove this open item from the list and mark it as
"won't fix", as this is a much older issue.
--
Michael

Attachment