Re: Improve error reporting in 027_stream_regress test - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Re: Improve error reporting in 027_stream_regress test
Date
Msg-id CAN55FZ31NqFqqE7yYOVSjvvG57Gio4VFUNh3mj3G3MiTLuYqYA@mail.gmail.com
Whole thread Raw
In response to Re: Improve error reporting in 027_stream_regress test  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On Thu, 17 Jul 2025 at 03:08, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 16, 2025 at 02:32:53PM +0300, Nazir Bilal Yavuz wrote:
> > On Wed, 16 Jul 2025 at 04:39, Michael Paquier <michael@paquier.xyz> wrote:
> >> Hmm.  Is that actually useful as we know that the standby has been
> >> stalen down when running the test?  Even if we report something, we
> >> could always trim the output, like the standby case.
>
> My fingers and brain have flipped here.  I meant likely
> s/stalen/taken/.
>
> > I guess you are right. Also, I tried killing standby while the
> > 027_stream_regress test was running and the test did not fail; instead
> > it continued until timeout. If that is always the case, then it makes
> > sense to report if and only if both primary and standby are alive.
>
> Okay.  Understood.  I'm pretty sure that somebody around you has also
> run the test and crashed the standby on replay, to note that the
> pattern on HEAD was bad.

Updated, now error reporting happens only if regression tests are
failed and both primary and standby are alive.

> > system_log() logs which command is run. I actually do not want to log
> > the command because it needs to be run after the regression tests and
> > before the 'regression tests pass'. If I log the command it looks like
> > this:
> >
> > # All 228 tests passed.
> > # Running: pg_isready -h /tmp/EXCcSldGZE -p 10493 >/dev/null 2>&1
> > # Running: pg_isready -h /tmp/EXCcSldGZE -p 10494 >/dev/null 2>&1
> > (21.939s) ok 2 - regression tests pass
> > (0.000s) ok 3 - primary is alive after the regression tests
> > (0.000s) ok 4 - standby is alive after the regression tests
> >
> > I think it looks like these pg_isready commands run randomly.
>
> FWIW, I'm usually in favor of a bit more logging, to understand what
> happens in the test script under-the-hood.  That's less guessing when
> calling these routines.  If I'm outvoted, that's fine.

I don't have a strong opinion on this. My point was 'pg_isready' being
on top of 'regression test  pass' and not being on top of
'primary|standby is alive after the regression tests' gives a false
impression to me. However, if you say this is okay; then I can update
it.

> >> We could use that as a routine of its own in Utils.pm?  It's not
> >> really something only for diff files, more something that we can use
> >> to trim the contents of a file, which could be the tail of a server
> >> log file as well..  It's always difficult to measure what a "good"
> >> number of lines is, but perhaps we could use an environment variable
> >> to make that configurable rather than enforce a policy of 50 lines
> >> because we think it's good enough?
> >
> > I think this is a good idea. How about a function called file_trim()
> > with three arguments: the file name, a mode ('head' or 'tail'), and
> > the number of lines to trim from that end. This approach may require
> > reading the file more than once, but it is easy to use.
>
> Sounds fine to me.  Thanks!

I added that as 0001. I used a shifting method for the 'tail'
direction to not use too much memory. I found that there is
'File::ReadBackwards' in Perl but you need to install it, so I didn't
use it.

Now patch is divided into three:

0001: 'Add trim_file() helper to Utils.pm' -> Which effectively does
nothing, just adds a function to be used for a subsequent patch. This
function accepts 'line_count' as an argument but
'PG_TEST_FILE_TRIM_LINES' environment variable overrides it. Should I
document 'PG_TEST_FILE_TRIM_LINES' somewhere?

0002: 'Improve error reporting in 027_stream_regress test' -> Uses
trim_file() function to improve error reporting by including the head
and tail of regression.diffs directly in the failure message.

0003: 'Check primary and standby are alive after regression tests in
027_stream_regress' -> Add test for checking status of primary and
standby after the regression tests in 027_stream_regress. Also, it
makes error reporting happen only if regression tests are failed and
both primary and standby are alive.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Logical Replication of sequences
Next
From: shveta malik
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart