Thread: [MASSMAIL]IPC::Run::time[r|out] vs our TAP tests

[MASSMAIL]IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

04 April 2024, 21:24:05

TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.

It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.

The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1].  Apparently it's not seeing a match to what it expects
so it eventually times out, but all we get in the log is

[03:03:42.595](0.002s) ok 82 - complete an interpolated psql variable name
[03:03:42.597](0.002s) ok 83 - \\r works
IPC::Run: timeout on timer #1 at /usr/share/perl5/IPC/Run.pm line 2944.
# Postmaster PID for node "main" is 17308
### Stopping node "main" using mode immediate

We would have learned something useful if control had returned to
pump_until, or even better 010_tab_completion.pl's check_completion,
but it doesn't.

A minimum fix that seems to make this work better is as attached,
but I feel like somebody ought to examine all the IPC::Run::timer
and IPC::Run::timeout calls and see which ones are mistaken.
It's a little scary to convert a timeout to a timer because of
the hazard that someplace that would need to be checking for
is_expired isn't.

Also, the debug printout code at the bottom of check_completion
is quite useless, because control can never reach it since
BackgroundPsql::query_until will "die" on failure.  I think that
that code worked when written, and I'm suspicious that 664d75753
broke it, but I've not dug deeply into the history.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager&dt=2024-04-04%2016%3A56%3A14

diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index 4091c311b8..72bb21e160 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -96,7 +96,7 @@ sub new
       "Forbidden caller of constructor: package: $package, file: $file:$line"
       unless $package->isa('PostgreSQL::Test::Cluster');

-    $psql->{timeout} = IPC::Run::timeout(
+    $psql->{timeout} = IPC::Run::timer(
         defined($timeout)
         ? $timeout
         : $PostgreSQL::Test::Utils::timeout_default);
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 42d5a50dc8..8fd0426d30 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -429,6 +429,7 @@ Pump until string is matched on the specified stream, or timeout occurs.
 sub pump_until
 {
     my ($proc, $timeout, $stream, $until) = @_;
+    die "timeout should be a timer" if defined($timeout->exception);
     $proc->pump_nb();
     while (1)
     {

Re: IPC::Run::time[r|out] vs our TAP tests

From

Daniel Gustafsson

Date:

04 April 2024, 21:46:22

> On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> A minimum fix that seems to make this work better is as attached,
> but I feel like somebody ought to examine all the IPC::Run::timer
> and IPC::Run::timeout calls and see which ones are mistaken.
> It's a little scary to convert a timeout to a timer because of
> the hazard that someplace that would need to be checking for
> is_expired isn't.

Skimming this and a few callsites it seems reasonable to use a timer instead of
a timeout, but careful study is needed to make sure we're not introducing
anything subtly wrong in the other direction.

> Also, the debug printout code at the bottom of check_completion
> is quite useless, because control can never reach it since
> BackgroundPsql::query_until will "die" on failure.  I think that
> that code worked when written, and I'm suspicious that 664d75753
> broke it, but I've not dug deeply into the history.

AFAICT, in the previous coding the interactive_psql object would use a timer or
timeout based on what the caller provided, and check_completion used a timer so
the debug logging probably worked as written.

--
Daniel Gustafsson

Re: IPC::Run::time[r|out] vs our TAP tests

From

Michael Paquier

Date:

04 April 2024, 23:05:41

On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote:
> The particular thing that started me down this road was wondering
> why we are getting no useful failure details from buildfarm member
> tanager's struggles with the tab-completion test case added by commit
> 927332b95 [1].

Also please note that tanager has been offline from around the 19th of
March to the 3rd of April, explaining the delay in reporting the
failure in this new psql test.  I've switched it back online two days
ago.

Tom, would you like me to test your patch directly on the host?  That
should be pretty quick, even if I've not yet checked if the failure is
reproducible with a manual build, outside the buildfarm scripts.
--
Michael

Attachment

signature.asc

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

04 April 2024, 23:09:53

Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote:
>> The particular thing that started me down this road was wondering
>> why we are getting no useful failure details from buildfarm member
>> tanager's struggles with the tab-completion test case added by commit
>> 927332b95 [1].

> Tom, would you like me to test your patch directly on the host?  That
> should be pretty quick, even if I've not yet checked if the failure is
> reproducible with a manual build, outside the buildfarm scripts.

If you have time, that'd be great.  What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
But we need to see what it's actually emitting before there's
much hope of adjusting the expected-output regex.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Michael Paquier

Date:

05 April 2024, 02:12:18

On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote:
> If you have time, that'd be great.  What I suspect is that that
> machine's readline isn't regurgitating the string verbatim but is
> doing something fancy with backspaces or other control characters.
> But we need to see what it's actually emitting before there's
> much hope of adjusting the expected-output regex.

I have been able to reproduce the failure manually and your patch is
providing more information, indeed, as of:
[10:21:44.017](0.002s) ok 83 - \r works
[10:24:45.462](181.445s) # pump_until: timeout expired when searching
for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=#
\echo :\{\?VERBOSITY\} "
psql query timed out at
/home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 281.

This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}

Attaching the log file, for reference.  Now I can see that this uses
libedit at 3.1-20181209, which is far from recent.  I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right.  Thoughts?
--
Michael

Attachment

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

05 April 2024, 02:31:24

Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote:
>> If you have time, that'd be great.  What I suspect is that that
>> machine's readline isn't regurgitating the string verbatim but is
>> doing something fancy with backspaces or other control characters.

> I have been able to reproduce the failure manually and your patch is
> providing more information, indeed, as of:
> [10:21:44.017](0.002s) ok 83 - \r works
> [10:24:45.462](181.445s) # pump_until: timeout expired when searching
> for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=#
> \echo :\{\?VERBOSITY\} "
> psql query timed out at
> /home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
> line 281.

> This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
> completes to something incorrect, as of:
> postgres=# \echo :\{\?VERBOSITY\}

Just to be clear: you see the extra backslashes if you try this
tab-completion manually?

> Attaching the log file, for reference.  Now I can see that this uses
> libedit at 3.1-20181209, which is far from recent.  I'd be OK to just
> remove libedit from the build to remove this noise, still I am
> wondering if 927332b95e77 got what it was trying to achieve actually
> right.  Thoughts?

It kind of looks like a libedit bug, but maybe we should dig more
deeply.  I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior.  But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Michael Paquier

Date:

05 April 2024, 02:37:56

On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
>> completes to something incorrect, as of:
>> postgres=# \echo :\{\?VERBOSITY\}
>
> Just to be clear: you see the extra backslashes if you try this
> tab-completion manually?

Yeah, I do, after completing "\echo :{?VERB" with this version of
libedit.  I see that this completes with backslashes added before '{',
'}' and '?'.  The test is telling the same.

>> Attaching the log file, for reference.  Now I can see that this uses
>> libedit at 3.1-20181209, which is far from recent.  I'd be OK to just
>> remove libedit from the build to remove this noise, still I am
>> wondering if 927332b95e77 got what it was trying to achieve actually
>> right.  Thoughts?
>
> It kind of looks like a libedit bug, but maybe we should dig more
> deeply.  I felt itchy about 927332b95e77 removing '{' from the
> WORD_BREAKS set, and wondered exactly how that would change readline's
> behavior.  But even if that somehow accounts for the extra backslash
> before '{', it's not clear how it could lead to '?' and '}' also
> getting backslashed.

I don't have a clear idea, either.  I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.
--
Michael

Attachment

signature.asc

Re: IPC::Run::time[r|out] vs our TAP tests

From

Erik Wienhold

Date:

05 April 2024, 02:56:40

On 2024-04-05 04:37 +0200, Michael Paquier wrote:
> On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> >> This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
> >> completes to something incorrect, as of:
> >> postgres=# \echo :\{\?VERBOSITY\}
> > 
> > Just to be clear: you see the extra backslashes if you try this
> > tab-completion manually?
> 
> Yeah, I do, after completing "\echo :{?VERB" with this version of
> libedit.  I see that this completes with backslashes added before '{',
> '}' and '?'.  The test is telling the same.
> 
> >> Attaching the log file, for reference.  Now I can see that this uses
> >> libedit at 3.1-20181209, which is far from recent.  I'd be OK to just
> >> remove libedit from the build to remove this noise, still I am
> >> wondering if 927332b95e77 got what it was trying to achieve actually
> >> right.  Thoughts?
> > 
> > It kind of looks like a libedit bug, but maybe we should dig more
> > deeply.  I felt itchy about 927332b95e77 removing '{' from the
> > WORD_BREAKS set, and wondered exactly how that would change readline's
> > behavior.  But even if that somehow accounts for the extra backslash
> > before '{', it's not clear how it could lead to '?' and '}' also
> > getting backslashed.
> 
> I don't have a clear idea, either.  I also feel uneasy about
> 927332b95e77 and its change of WORD_BREAKS, but this has the smell
> of a bug from an outdated libedit version.

It works with the latest libedit 20230828-3.1.  Have to check the NetBSD
source to find out what changed since 20181209-3.1.

https://github.com/NetBSD/src/tree/trunk/lib/libedit

-- 
Erik

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

05 April 2024, 03:10:50

Erik Wienhold <ewie@ewie.name> writes:
> It works with the latest libedit 20230828-3.1.  Have to check the NetBSD
> source to find out what changed since 20181209-3.1.

Yeah, the test is passing on mamba which is running the (just
officially released) NetBSD 10.0.  I'm not sure whether 10.0
has the "latest" libedit or something a little further back.
sidewinder, with NetBSD 9.3, is happy as well.  But 20181209
presumably belongs to NetBSD 8.x, which is theoretically still
in support, so maybe it's worth poking into.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Erik Wienhold

Date:

05 April 2024, 03:34:15

On 2024-04-05 05:10 +0200, Tom Lane wrote:
> Erik Wienhold <ewie@ewie.name> writes:
> > It works with the latest libedit 20230828-3.1.  Have to check the NetBSD
> > source to find out what changed since 20181209-3.1.
> 
> Yeah, the test is passing on mamba which is running the (just
> officially released) NetBSD 10.0.  I'm not sure whether 10.0
> has the "latest" libedit or something a little further back.
> sidewinder, with NetBSD 9.3, is happy as well.  But 20181209
> presumably belongs to NetBSD 8.x, which is theoretically still
> in support, so maybe it's worth poking into.

Having a look right now.  Change [1] looks like a good candidate which
is likely in 20221030-3.1.

I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so.  This didn't
work:

    LDFLAGS='-L/usr/local/lib' ./configure --with-libedit-preferred

(My ld fu is not so great.)

[1] https://github.com/NetBSD/src/commit/12863d4d7917df8a7ef5ad9dab6bb719018a22d1

-- 
Erik

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

05 April 2024, 03:37:48

Erik Wienhold <ewie@ewie.name> writes:
> I'm trying to build Postgres with that older libedit version but can't
> figure out what options to pass to ./configure so that it picks
> /usr/local/lib/libedit.so instead of /usr/lib/libedit.so.  This didn't
> work:

You probably want configure --with-libs=/usr/local/lib,
and likely also --with-includes=/usr/local/include.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Erik Wienhold

Date:

05 April 2024, 04:10:14

On 2024-04-05 05:37 +0200, Tom Lane wrote:
> Erik Wienhold <ewie@ewie.name> writes:
> > I'm trying to build Postgres with that older libedit version but can't
> > figure out what options to pass to ./configure so that it picks
> > /usr/local/lib/libedit.so instead of /usr/lib/libedit.so.  This didn't
> > work:
> 
> You probably want configure --with-libs=/usr/local/lib,
> and likely also --with-includes=/usr/local/include.

Thanks Tom.  But I also have to run psql with:

    LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib src/bin/psql/psql

Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected.  The previous release 20190324-3.1 still produces the escaped
output that Michael found.  That narrows down the changes to everything
between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
[2] (both inclusive).

[1] https://github.com/NetBSD/src/commit/e09538bda2f805200d0f7ae09fb9b7f2f5ed75f2
[2] https://github.com/NetBSD/src/commit/de11d876419df3570c2418468613aebcebafe6ae

-- 
Erik

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

05 April 2024, 04:15:43

Erik Wienhold <ewie@ewie.name> writes:
> Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
> expected.  The previous release 20190324-3.1 still produces the escaped
> output that Michael found.  That narrows down the changes to everything
> between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
> [2] (both inclusive).

Hm.  I just installed NetBSD 8.2 in a VM, and it passes this test:

# +++ tap install-check in src/bin/psql +++
t/001_basic.pl ........... ok     
t/010_tab_completion.pl .. ok    
t/020_cancel.pl .......... ok   
All tests successful.

So it seems like the bug does not exist in any currently-supported
NetBSD release.  Debian has been known to ship obsolete libedit
versions, though.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Dagfinn Ilmari Mannsåker

Date:

05 April 2024, 12:57:04

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Erik Wienhold <ewie@ewie.name> writes:
>> Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
>> expected.  The previous release 20190324-3.1 still produces the escaped
>> output that Michael found.  That narrows down the changes to everything
>> between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
>> [2] (both inclusive).
>
> Hm.  I just installed NetBSD 8.2 in a VM, and it passes this test:
>
> # +++ tap install-check in src/bin/psql +++
> t/001_basic.pl ........... ok     
> t/010_tab_completion.pl .. ok    
> t/020_cancel.pl .......... ok   
> All tests successful.
>
> So it seems like the bug does not exist in any currently-supported
> NetBSD release.  Debian has been known to ship obsolete libedit
> versions, though.

Both the current (bokworm/12) and previous (bullseye/11) versions of
Debian have new enough libedits to not be affected by this bug:

libedit    | 3.1-20181209-1 | oldoldstable   | source
libedit    | 3.1-20191231-2 | oldstable      | source
libedit    | 3.1-20221030-2 | stable         | source
libedit    | 3.1-20230828-1 | testing        | source
libedit    | 3.1-20230828-1 | unstable       | source
libedit    | 3.1-20230828-1 | unstable-debug | source

But in bullseye they decided that OpenSSL is a system library as far as
the GPL is concerned, so are linking directly to readline.

And even before then their psql wrapper would LD_PRELOAD readline
instead if installed, so approximately nobody actually ever used psql
with libedit on Debian.

>             regards, tom lane

- ilmari

Re: IPC::Run::time[r|out] vs our TAP tests

From

Andrew Dunstan

Date:

05 April 2024, 14:14:06

On 2024-04-04 Th 17:24, Tom Lane wrote:

TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.

It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.

I started having a look at these.

Here are the cases I found:

./src/bin/psql/t/010_tab_completion.pl:    my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    die "psql startup timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    die "psql query timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm:    die "psql query timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:            # timeout, which we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm:            unless $timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:timeout is the IPC::Run::Timeout object whose is_expired method can be tested
./src/test/perl/PostgreSQL/Test/Cluster.pm:            # timeout, which we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm:            unless $timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Utils.pm:        if ($timeout->is_expired)
./src/test/recovery/t/021_row_visibility.pl:        if ($psql_timeout->is_expired)
./src/test/recovery/t/032_relfilenode_reuse.pl:        if ($psql_timeout->is_expired)

Those in Cluster.pm look correct - they are doing the run() in an eval block and testing for the is_expired setting in an exception block. The other cases look more suspect. I'll take a closer look.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

05 April 2024, 21:18:51

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> So it seems like the bug does not exist in any currently-supported
>> NetBSD release.  Debian has been known to ship obsolete libedit
>> versions, though.

> Both the current (bokworm/12) and previous (bullseye/11) versions of
> Debian have new enough libedits to not be affected by this bug:
> ...
> But in bullseye they decided that OpenSSL is a system library as far as
> the GPL is concerned, so are linking directly to readline.

> And even before then their psql wrapper would LD_PRELOAD readline
> instead if installed, so approximately nobody actually ever used psql
> with libedit on Debian.

Based on this info, I'm disinclined to put work into trying to
make the case behave correctly with that old libedit version, or
even to lobotomize the test case enough so it would pass.

What I suggest Michael do with tanager is install the
OS-version-appropriate version of GNU readline, so that the animal
will test what ilmari describes as the actually common use-case.

(I see that what he did for the moment is add --without-readline.
Perhaps that's a decent long-term choice too, because I think we
have rather little coverage of that option except on Windows.)

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Michael Paquier

Date:

06 April 2024, 00:16:25

On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote:
> What I suggest Michael do with tanager is install the
> OS-version-appropriate version of GNU readline, so that the animal
> will test what ilmari describes as the actually common use-case.

Thanks for the investigations!  It's clear that this version of
libedit is borked.

> (I see that what he did for the moment is add --without-readline.
> Perhaps that's a decent long-term choice too, because I think we
> have rather little coverage of that option except on Windows.)

Yeah, my motivation did not go beyond that so you can blame me for
being lazy.  ;)

Note also the --without-icu on this host that has little coverage, as
well.
--
Michael

Attachment

signature.asc

Re: IPC::Run::time[r|out] vs our TAP tests

From

Michael Paquier

Date:

09 April 2024, 06:13:21

On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote:
> Based on this info, I'm disinclined to put work into trying to
> make the case behave correctly with that old libedit version, or
> even to lobotomize the test case enough so it would pass.

By the way, are you planning to do something like [1]?  I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.

[1]: https://www.postgresql.org/message-id/1100715.1712265845@sss.pgh.pa.us
--
Michael

Attachment

signature.asc

Re: IPC::Run::time[r|out] vs our TAP tests

From

Tom Lane

Date:

09 April 2024, 13:46:10

Michael Paquier <michael@paquier.xyz> writes:
> By the way, are you planning to do something like [1]?  I've not
> looked in details at the callers of IPC::Run::timeout, still the extra
> debug output would be nice.

It needs more review I think - I didn't check every call site to see
if anything would be broken.  I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Andrew Dunstan

Date:

09 April 2024, 14:27:10

On 2024-04-09 Tu 09:46, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> By the way, are you planning to do something like [1]?  I've not
>> looked in details at the callers of IPC::Run::timeout, still the extra
>> debug output would be nice.
> It needs more review I think - I didn't check every call site to see
> if anything would be broken.  I believe Andrew has undertaken a
> survey of all the timeout/timer calls, but if he doesn't produce
> anything I might have a go at it after awhile.
>
>             

What I looked at so far was the use of is_expired, but when you look 
into that you see that you need to delve further, to where timeout/timer 
objects are created and passed around. I'll take a closer look when I 
have done some incremental json housekeeping.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

[MASSMAIL]removal of '{' from WORD_BREAKS

From

Robert Haas

Date:

09 April 2024, 15:17:55

On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier <michael@paquier.xyz> wrote:
> > It kind of looks like a libedit bug, but maybe we should dig more
> > deeply.  I felt itchy about 927332b95e77 removing '{' from the
> > WORD_BREAKS set, and wondered exactly how that would change readline's
> > behavior.  But even if that somehow accounts for the extra backslash
> > before '{', it's not clear how it could lead to '?' and '}' also
> > getting backslashed.
>
> I don't have a clear idea, either.  I also feel uneasy about
> 927332b95e77 and its change of WORD_BREAKS, but this has the smell
> of a bug from an outdated libedit version.

I too felt uneasy about that commit, for the same reason. However,
there is a justification for the change in the commit message which is
not obviously wrong, namely that ":{?name} is the only psql syntax
using the '{' sign". And in fact, SQL basically doesn't use '{' for
anything, either. We do see { showing up inside of quoted strings, for
arrays or JSON, but I would guess that the word-break characters
aren't going to affect behavior within a quoted string. So it still
seems like it should be OK? Another thing that makes me think that my
unease may be unfounded is that the matching character '}' isn't in
WORD_BREAKS either, and I would have thought that if we needed one
we'd need both.

But does anyone else have a more specific reason for thinking that
this might be a problem?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

From

Daniel Gustafsson

Date:

09 April 2024, 17:10:07

> On 4 Apr 2024, at 23:46, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> A minimum fix that seems to make this work better is as attached,
>> but I feel like somebody ought to examine all the IPC::Run::timer
>> and IPC::Run::timeout calls and see which ones are mistaken.
>> It's a little scary to convert a timeout to a timer because of
>> the hazard that someplace that would need to be checking for
>> is_expired isn't.
>
> Skimming this and a few callsites it seems reasonable to use a timer instead of
> a timeout, but careful study is needed to make sure we're not introducing
> anything subtly wrong in the other direction.

Sharing a few preliminary results from looking at this, the attached passes
check-world but need more study/testing.

It seems wrong to me that we die() in query and query_until rather than giving
the caller the power to decide how to proceed.  We have even documented that we
do just this:

    "Dies on failure to invoke psql, or if psql fails to connect.  Errors
    occurring later are the caller's problem"

Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it).  Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.

A few other comments on related things that stood out while reviewing:

The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair.  Done in 0001 (but
really not necessary).

Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:

    $back_q->query_until(
        qr/logical_slot_get_changes/, q(
       \echo logical_slot_get_changes
       SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
    ));

    ... <other tests> ...

    # Since there are no slots in standby_slot_names, the function
    # pg_logical_slot_get_changes should now return, and the session can be
    # stopped.
    $back_q->quit;

There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point.  This might still work as intended, but the comment is
slightly misleading IMO.

recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating.  Calling ->quit is
mandated by the API, in turn required by IPC::Run.  Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl.  0003 fixes
this.

--
Daniel Gustafsson

Attachment

Re: removal of '{' from WORD_BREAKS

From

Alexander Korotkov

Date:

09 April 2024, 18:32:16

On Tue, Apr 9, 2024 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier <michael@paquier.xyz> wrote:
> > > It kind of looks like a libedit bug, but maybe we should dig more
> > > deeply.  I felt itchy about 927332b95e77 removing '{' from the
> > > WORD_BREAKS set, and wondered exactly how that would change readline's
> > > behavior.  But even if that somehow accounts for the extra backslash
> > > before '{', it's not clear how it could lead to '?' and '}' also
> > > getting backslashed.
> >
> > I don't have a clear idea, either.  I also feel uneasy about
> > 927332b95e77 and its change of WORD_BREAKS, but this has the smell
> > of a bug from an outdated libedit version.
>
> I too felt uneasy about that commit, for the same reason. However,
> there is a justification for the change in the commit message which is
> not obviously wrong, namely that ":{?name} is the only psql syntax
> using the '{' sign". And in fact, SQL basically doesn't use '{' for
> anything, either. We do see { showing up inside of quoted strings, for
> arrays or JSON, but I would guess that the word-break characters
> aren't going to affect behavior within a quoted string. So it still
> seems like it should be OK? Another thing that makes me think that my
> unease may be unfounded is that the matching character '}' isn't in
> WORD_BREAKS either, and I would have thought that if we needed one
> we'd need both.

FWIW, the default value of rl_basic_word_break_characters [1] has '{'
but doesn't have '}'.  The documentation says that this should "break
words for completion in Bash".  But I failed to find an explanation
why this should be so for Bash.  As you correctly get, my idea was
that our SQL isn't not heavily using '{' unlike Bash.

> But does anyone else have a more specific reason for thinking that
> this might be a problem?

I don't particularly object against reverting this commit, but I think
we should get to the bottom of this first.  Otherwise there is no
warranty to not run into the same problem again.

Links.
1. https://tiswww.case.edu/php/chet/readline/readline.html#index-rl_005fbasic_005fword_005fbreak_005fcharacters

------
Regards,
Alexander Korotkov

Re: removal of '{' from WORD_BREAKS

From

Tom Lane

Date:

09 April 2024, 21:03:01

Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Tue, Apr 9, 2024 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> I too felt uneasy about that commit, for the same reason. However,
>> there is a justification for the change in the commit message which is
>> not obviously wrong, namely that ":{?name} is the only psql syntax
>> using the '{' sign". And in fact, SQL basically doesn't use '{' for
>> anything, either.

True.

> FWIW, the default value of rl_basic_word_break_characters [1] has '{'
> but doesn't have '}'.  The documentation says that this should "break
> words for completion in Bash".  But I failed to find an explanation
> why this should be so for Bash.  As you correctly get, my idea was
> that our SQL isn't not heavily using '{' unlike Bash.

Yeah, there's no doubt that what we are currently using for
WORD_BREAKS has mostly been cargo-culted from Bash rather than having
any solid foundation in SQL syntax.  It works all right for us today
because we don't really try to complete anything in general SQL
expression contexts, so as long as we have whitespace and parens in
there we're more or less fine.

I wonder a bit why comma isn't in there, though.  As an example,
    vacuum (full,fre<TAB>
fails to complete "freeze", though it works fine with a space
after the comma.  I've not experimented, but it seems certain
that it'd behave better with comma in WORD_BREAKS.  Whether
that'd pessimize other behaviors, I dunno.

The longer-range concern that I have is that if we ever want to
complete stuff within expressions, I think we're going to need
all the valid operator characters to be in WORD_BREAKS.  And that
will be a problem for this patch, not because of '{' but because
of '?'.  So I'd be happier if the parsing were done in a way that
did not rely on '{' and '?' being treated as word characters.
But I've not looked into how hard that'd be.  In any case, it's
likely that expanding WORD_BREAKS like that would cause some other
problems that'd have to be fixed, so it's not very reasonable to
expect this patch to avoid a hypothetical future problem.

            regards, tom lane

Re: IPC::Run::time[r|out] vs our TAP tests

From

Heikki Linnakangas

Date:

28 October 2024, 13:56:03

On 09/04/2024 20:10, Daniel Gustafsson wrote:
> Turning the timeout into a timer and returning undef along with logging a test
> failure in case of expiration seems a bit saner (maybe Andrew can suggest an
> API which has a better Perl feel to it).  Most callsites don't need any changes
> to accommodate for this, the attached 0002 implements this timer change and
> modify the few sites that need it, converting one to plain query() where the
> added complexity of query_until isn't required.

+1. The patch looks good to me. I didn't comb through the tests to check 
for bugs of omission, i.e. cases where the caller would need adjustments 
because of this, I trust that you found them all.

> =item $session->quit
> 
> Close the session and clean up resources. Each test run must be closed with
> C<quit>.  Returns TRUE when the session was cleanly terminated, otherwise
> FALSE.  A testfailure will be issued in case the session failed to finish.

What does "session failed to finish" mean? Does it mean the same as "not 
cleanly terminated", i.e. a test failure is issued whenever this returns 
FALSE?

typo: testfailure -> test failure


> diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
> index d87efa823fd..62936f52d20 100644
> --- a/src/test/recovery/t/031_recovery_conflict.pl
> +++ b/src/test/recovery/t/031_recovery_conflict.pl
> @@ -253,9 +253,7 @@ $res = $psql_standby->query_until(
>      -- wait for lock held by prepared transaction
>         SELECT * FROM $table2;
>      ]);
> -ok(1,
> -       "$sect: cursor holding conflicting pin, also waiting for lock, established"
> -);
> +isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, established");
>  
>  # just to make sure we're waiting for lock already
>  ok( $node_standby->poll_query_until(
> diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl
> index 6d1c7117964..c8c20077f85 100644
> --- a/src/test/recovery/t/037_invalid_database.pl
> +++ b/src/test/recovery/t/037_invalid_database.pl
> @@ -94,6 +94,7 @@ is($node->psql('postgres', 'DROP DATABASE regression_invalid'),
>  my $cancel = $node->background_psql('postgres', on_error_stop => 1);
>  my $bgpsql = $node->background_psql('postgres', on_error_stop => 0);
>  my $pid = $bgpsql->query('SELECT pg_backend_pid()');
> +isnt($pid, undef, 'Get backend PID');
>  
>  # create the database, prevent drop database via lock held by a 2PC transaction
>  ok( $bgpsql->query_safe(

I'm not sure I understand these changes. These can only fail if the 
query() or query_until() function times out, right? In that case, the 
query() or query_until() would also report a test failure, so these 
additional checks after the call seem redundant. They don't do any harm 
either, but I wonder why have them in these particular call sites and 
not in other query() or query_until() calls.

> The tab completion test can use the API call for automatically restart the
> timer to reduce the complexity of check_completion a hair.  Done in 0001 (but
> really not necessary).

+1

> Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
> the recovery tests:
> 
>     $back_q->query_until(
>         qr/logical_slot_get_changes/, q(
>        \echo logical_slot_get_changes
>        SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
>     ));
> 
>     ... <other tests> ...
> 
>     # Since there are no slots in standby_slot_names, the function
>     # pg_logical_slot_get_changes should now return, and the session can be
>     # stopped.
>     $back_q->quit;
> 
> There is no guarantee that pg_logical_slot_get_changes has returned when
> reaching this point.  This might still work as intended, but the comment is
> slightly misleading IMO.

Agreed, it would be good to actually check that it returns.

> recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
> in a background session which it then skips terminating.  Calling ->quit is
> mandated by the API, in turn required by IPC::Run.  Calling ->quit on the
> process makes the test fail from the process having already exited, but we can
> call ->finish directly like we do in test_misc/t/005_timeouts.pl.  0003 fixes
> this.

Alexander included this fix in commit 3c5db1d6b016 already.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: IPC::Run::time[r|out] vs our TAP tests

From

Heikki Linnakangas

Date:

01 November 2024, 01:18:00

On 31/10/2024 14:27, Daniel Gustafsson wrote:
>> On 28 Oct 2024, at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 09/04/2024 20:10, Daniel Gustafsson wrote:
>>> =item $session->quit
>>> Close the session and clean up resources. Each test run must be closed with
>>> C<quit>.  Returns TRUE when the session was cleanly terminated, otherwise
>>> FALSE.  A testfailure will be issued in case the session failed to finish.
>>
>> What does "session failed to finish" mean? Does it mean the same as "not cleanly terminated", i.e. a test failure is
issuedwhenever this returns FALSE?

> 
> It was very literally referring to the finish() method.  I've reworded the
> comment to indicated that it throws a failure in case the process returns a
> non-zero exit status to finish().

I see.

> @@ -148,7 +148,9 @@ sub _wait_connect
>  =item $session->quit
>  
>  Close the session and clean up resources. Each test run must be closed with
> -C<quit>.
> +C<quit>.  Returns TRUE when the session was cleanly terminated, otherwise
> +FALSE.  A test failure will be issued in case the session exited with a non-
> +zero exit status (the finish() method returns TRUE for 0 exit status).

I still find that confusing. What finish() method? Yes, there's a 
finish() method in IPC::Run, but that's BackgroundPsql's internal 
affair, not exposed to the callers in any other way. And why do I care 
what that finish() returns for 0 exit status? That's not visible to the 
quit method's caller.

Perhaps sommething like this:

"Close the psql session and clean up resources. Each psql session must 
be closed with C<quit> before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code), 
otherwise returns FALSE and reports a test failure. "

Would that be accurate?

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: IPC::Run::time[r|out] vs our TAP tests

From

Andrew Dunstan

Date:

20 February, 01:08:03

On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:
> On 31/10/2024 14:27, Daniel Gustafsson wrote:
>>> On 28 Oct 2024, at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>>
>>> On 09/04/2024 20:10, Daniel Gustafsson wrote:
>>>> =item $session->quit
>>>> Close the session and clean up resources. Each test run must be 
>>>> closed with
>>>> C<quit>.  Returns TRUE when the session was cleanly terminated, 
>>>> otherwise
>>>> FALSE.  A testfailure will be issued in case the session failed to 
>>>> finish.
>>>
>>> What does "session failed to finish" mean? Does it mean the same as 
>>> "not cleanly terminated", i.e. a test failure is issued whenever 
>>> this returns FALSE?
>>
>> It was very literally referring to the finish() method.  I've 
>> reworded the
>> comment to indicated that it throws a failure in case the process 
>> returns a
>> non-zero exit status to finish().
>
> I see.
>
>> @@ -148,7 +148,9 @@ sub _wait_connect
>>  =item $session->quit
>>
>>  Close the session and clean up resources. Each test run must be 
>> closed with
>> -C<quit>.
>> +C<quit>.  Returns TRUE when the session was cleanly terminated, 
>> otherwise
>> +FALSE.  A test failure will be issued in case the session exited 
>> with a non-
>> +zero exit status (the finish() method returns TRUE for 0 exit status).
>
> I still find that confusing. What finish() method? Yes, there's a 
> finish() method in IPC::Run, but that's BackgroundPsql's internal 
> affair, not exposed to the callers in any other way. And why do I care 
> what that finish() returns for 0 exit status? That's not visible to 
> the quit method's caller.
>
> Perhaps sommething like this:
>
> "Close the psql session and clean up resources. Each psql session must 
> be closed with C<quit> before the end of the test.
> Returns TRUE if psql exited successfully (i.e. with zero exit code), 
> otherwise returns FALSE and reports a test failure. "
>
> Would that be accurate?
>

I would be OK with Heikki's version.


The patches have bitrotted slightly.


Also this is wrong, I think:


     isnt($self->{timeout}->is_expired, 'psql query_until timed out');


I think it should be


     ok(! $self->{timeout}->is_expired, 'psql query_until did not time 
out');


isnt() normally takes 3 arguments, and the message you have seems 
backwards.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

From

Daniel Gustafsson

Date:

20 February, 02:56:56

> On 19 Feb 2025, at 23:08, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:

Thanks for review!

>> Perhaps sommething like this:
>>
>> "Close the psql session and clean up resources. Each psql session must be closed with C<quit> before the end of the
test.
>> Returns TRUE if psql exited successfully (i.e. with zero exit code), otherwise returns FALSE and reports a test
failure."
>>
>> Would that be accurate?
>
> I would be OK with Heikki's version.

Fixed.

> The patches have bitrotted slightly.

The attached rebases over current HEAD and passes check-world locally for me.

> Also this is wrong, I think:
>
>     isnt($self->{timeout}->is_expired, 'psql query_until timed out');
>
> I think it should be
>
>     ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');

Fixed.

--
Daniel Gustafsson

Attachment

Re: IPC::Run::time[r|out] vs our TAP tests

From

Andrew Dunstan

Date:

20 February, 16:06:12

On 2025-02-19 We 6:56 PM, Daniel Gustafsson wrote:

On 19 Feb 2025, at 23:08, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:

Thanks for review!

Perhaps sommething like this:

"Close the psql session and clean up resources. Each psql session must be closed with C<quit> before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code), otherwise returns FALSE and reports a test failure."

Would that be accurate?

I would be OK with Heikki's version.

Fixed.

The patches have bitrotted slightly.

The attached rebases over current HEAD and passes check-world locally for me.

Also this is wrong, I think:
    isnt($self->{timeout}->is_expired, 'psql query_until timed out');

I think it should be
    ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');

Fixed.

Actually, since ok() and friends return true iff the test succeeds, instead of

+ ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');
+ return undef if $self->{timeout}->is_expired;

you can avoid doing the same test twice and say:

ok(! $self->{timeout}->is_expired, 'psql query_until did not time out') || return undef;

although for slightly technical reasons perlcritic disapproves of "return undef" and prefers that you just write a bare "return" so we should also fix that.

Sorry for taking a second bite at the cherry.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

From

Daniel Gustafsson

Date:

20 February, 17:09:58

> On 20 Feb 2025, at 14:06, Andrew Dunstan <andrew@dunslane.net> wrote:

> Actually, since ok() and friends return true iff the test succeeds, instead of
> +    ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');
> +    return undef if $self->{timeout}->is_expired;
> you can avoid doing the same test twice and say:
>      ok(! $self->{timeout}->is_expired, 'psql query_until did not time out') || return undef;
> although for slightly technical reasons perlcritic disapproves of "return undef" and prefers that you just write a
bare"return" so we should also fix that. 
> Sorry for taking a second bite at the cherry.

Not at all, I agree that this is an improvement so fixed in the attached along
with a fresh pgperltidy.

--
Daniel Gustafsson