Thread: Minor improvements to test log navigability

Minor improvements to test log navigability

From
Thomas Munro
Date:
Hi,

Speaking as someone who regularly trawls through megabytes of build farm output:

1.  It seems a bit useless to have a load of "FATAL:  the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress.  Any reason not to just turn that off, as we do for
the TAP tests?

2.  The TAP test logs are strangely named.  Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?

Attachment

Re: Minor improvements to test log navigability

From
Alvaro Herrera
Date:
On 2022-May-16, Thomas Munro wrote:

> 1.  It seems a bit useless to have a load of "FATAL:  the database
> system is in recovery mode" spam whenever the server crashes under
> src/test/regress.  Any reason not to just turn that off, as we do for
> the TAP tests?

I don't know of any.  Let's.

> 2.  The TAP test logs are strangely named.  Any reason not to call
> them 001_testname.log, instead of regress_log_001_testname, so they
> appear next to the corresponding
> 001_testname_{primary,standby,xxx}.log in directory listings (CI) and
> dumps (build farm, presumably), and have a traditional .log suffix?

+1.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)



Re: Minor improvements to test log navigability

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2022-May-16, Thomas Munro wrote:
>> 1.  It seems a bit useless to have a load of "FATAL:  the database
>> system is in recovery mode" spam whenever the server crashes under
>> src/test/regress.  Any reason not to just turn that off, as we do for
>> the TAP tests?

> I don't know of any.  Let's.

Have you actually tested what happens?  I fear this would just
result in different spam.

>> 2.  The TAP test logs are strangely named.  Any reason not to call
>> them 001_testname.log, instead of regress_log_001_testname, so they
>> appear next to the corresponding
>> 001_testname_{primary,standby,xxx}.log in directory listings (CI) and
>> dumps (build farm, presumably), and have a traditional .log suffix?

> +1.

Andrew would have to weigh in on whether this'd break the buildfarm,
but if it doesn't, +1.

            regards, tom lane



Re: Minor improvements to test log navigability

From
Michael Paquier
Date:
On Mon, May 16, 2022 at 12:18:24PM -0400, Tom Lane wrote:
> Andrew would have to weigh in on whether this'd break the buildfarm,
> but if it doesn't, +1.

FWIW, the buildfarm client feeds from the contents of the directory
tmp_check/log/, so a simple renaming of the main log file would have
no impact.  Here are the parts of the code doing that:
PGBuild/Modules/TestMyTap.pm:   my @logs = glob("$self->{where}/tmp_check/log/*");
run_build.pl:   my @logs = glob("$dir/tmp_check/log/*");
--
Michael

Attachment

Re: Minor improvements to test log navigability

From
Thomas Munro
Date:
On Tue, May 17, 2022 at 4:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > On 2022-May-16, Thomas Munro wrote:
> >> 1.  It seems a bit useless to have a load of "FATAL:  the database
> >> system is in recovery mode" spam whenever the server crashes under
> >> src/test/regress.  Any reason not to just turn that off, as we do for
> >> the TAP tests?
>
> > I don't know of any.  Let's.
>
> Have you actually tested what happens?  I fear this would just
> result in different spam.

I'd forgotten that we already do this on CI, via
src/tools/ci/pg_ci_base.conf, so we can compare.  A CI postmaster.log
that ends with "shutting down because restart_after_crash is off":

https://api.cirrus-ci.com/v1/artifact/task/6537277877780480/log/src/test/regress/log/postmaster.log

The build farm version has ~350 lines of "FATAL:  the database system
is in recovery mode" instead:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=seawasp&dt=2022-05-16%2023%3A17%3A49

Admittedly that is nothing compared to the huge amount of extra log
spam caused by regression.diffs filling up with these:

- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-ZqXocK/.s.PGSQL.5678" failed: FATAL:  the database
system is in recovery mode

In the CI version, that looks like:

- ...the complete expected output of each test spanning many lines...
- ...
- ...
+psql: error: connection to server on socket
"/tmp/pg_regress-T35Yzi/.s.PGSQL.51696" failed: No such file or
directory
+    Is the server running locally and accepting connections on that socket?

I wonder if there would be a good way to filter those "never managed
to connect" cases out...  Exit code 2 (EXIT_BADCONN) is not the
answer, because you get that also for servers that go away due to a
crash where you do want to be able to see the diff, for information
about where it crashed.



Re: Minor improvements to test log navigability

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> ... Admittedly that is nothing compared to the huge amount of extra log
> spam caused by regression.diffs filling up with these:

Yeah, that's really the main problem.

> I wonder if there would be a good way to filter those "never managed
> to connect" cases out...  Exit code 2 (EXIT_BADCONN) is not the
> answer, because you get that also for servers that go away due to a
> crash where you do want to be able to see the diff, for information
> about where it crashed.

Maybe pg_regress could check that postmaster.pid is still there
before launching each new test script?  (Obviously this all applies
only to "make check" not "make installcheck".)

            regards, tom lane



Re: Minor improvements to test log navigability

From
Peter Eisentraut
Date:
On 16.05.22 01:01, Thomas Munro wrote:
> 2.  The TAP test logs are strangely named.  Any reason not to call
> them 001_testname.log, instead of regress_log_001_testname, so they
> appear next to the corresponding
> 001_testname_{primary,standby,xxx}.log in directory listings (CI) and
> dumps (build farm, presumably), and have a traditional .log suffix?

I'm in favor of a saner name, but wouldn't something.log be confusable 
with a server log?  Maybe something.out would be clearer.  Or 
something_output.log, if the .log suffix is somehow desirable.