Thread: [MASSMAIL]IPC::Run::time[r|out] vs our TAP tests
TIL that IPC::Run::timer is not the same as IPC::Run::timeout. With a timer object you have to check $timer->is_expired to see if the timeout has elapsed, but with a timeout object you don't because it will throw a Perl exception upon timing out, probably killing your test program. It appears that a good chunk of our TAP codebase has not read this memo, because I see plenty of places that are checking is_expired in the naive belief that they'll still have control after a timeout has fired. The particular thing that started me down this road was wondering why we are getting no useful failure details from buildfarm member tanager's struggles with the tab-completion test case added by commit 927332b95 [1]. Apparently it's not seeing a match to what it expects so it eventually times out, but all we get in the log is [03:03:42.595](0.002s) ok 82 - complete an interpolated psql variable name [03:03:42.597](0.002s) ok 83 - \\r works IPC::Run: timeout on timer #1 at /usr/share/perl5/IPC/Run.pm line 2944. # Postmaster PID for node "main" is 17308 ### Stopping node "main" using mode immediate We would have learned something useful if control had returned to pump_until, or even better 010_tab_completion.pl's check_completion, but it doesn't. A minimum fix that seems to make this work better is as attached, but I feel like somebody ought to examine all the IPC::Run::timer and IPC::Run::timeout calls and see which ones are mistaken. It's a little scary to convert a timeout to a timer because of the hazard that someplace that would need to be checking for is_expired isn't. Also, the debug printout code at the bottom of check_completion is quite useless, because control can never reach it since BackgroundPsql::query_until will "die" on failure. I think that that code worked when written, and I'm suspicious that 664d75753 broke it, but I've not dug deeply into the history. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager&dt=2024-04-04%2016%3A56%3A14 diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm index 4091c311b8..72bb21e160 100644 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm +++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm @@ -96,7 +96,7 @@ sub new "Forbidden caller of constructor: package: $package, file: $file:$line" unless $package->isa('PostgreSQL::Test::Cluster'); - $psql->{timeout} = IPC::Run::timeout( + $psql->{timeout} = IPC::Run::timer( defined($timeout) ? $timeout : $PostgreSQL::Test::Utils::timeout_default); diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm index 42d5a50dc8..8fd0426d30 100644 --- a/src/test/perl/PostgreSQL/Test/Utils.pm +++ b/src/test/perl/PostgreSQL/Test/Utils.pm @@ -429,6 +429,7 @@ Pump until string is matched on the specified stream, or timeout occurs. sub pump_until { my ($proc, $timeout, $stream, $until) = @_; + die "timeout should be a timer" if defined($timeout->exception); $proc->pump_nb(); while (1) {
> On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A minimum fix that seems to make this work better is as attached, > but I feel like somebody ought to examine all the IPC::Run::timer > and IPC::Run::timeout calls and see which ones are mistaken. > It's a little scary to convert a timeout to a timer because of > the hazard that someplace that would need to be checking for > is_expired isn't. Skimming this and a few callsites it seems reasonable to use a timer instead of a timeout, but careful study is needed to make sure we're not introducing anything subtly wrong in the other direction. > Also, the debug printout code at the bottom of check_completion > is quite useless, because control can never reach it since > BackgroundPsql::query_until will "die" on failure. I think that > that code worked when written, and I'm suspicious that 664d75753 > broke it, but I've not dug deeply into the history. AFAICT, in the previous coding the interactive_psql object would use a timer or timeout based on what the caller provided, and check_completion used a timer so the debug logging probably worked as written. -- Daniel Gustafsson
On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote: > The particular thing that started me down this road was wondering > why we are getting no useful failure details from buildfarm member > tanager's struggles with the tab-completion test case added by commit > 927332b95 [1]. Also please note that tanager has been offline from around the 19th of March to the 3rd of April, explaining the delay in reporting the failure in this new psql test. I've switched it back online two days ago. Tom, would you like me to test your patch directly on the host? That should be pretty quick, even if I've not yet checked if the failure is reproducible with a manual build, outside the buildfarm scripts. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote: >> The particular thing that started me down this road was wondering >> why we are getting no useful failure details from buildfarm member >> tanager's struggles with the tab-completion test case added by commit >> 927332b95 [1]. > Tom, would you like me to test your patch directly on the host? That > should be pretty quick, even if I've not yet checked if the failure is > reproducible with a manual build, outside the buildfarm scripts. If you have time, that'd be great. What I suspect is that that machine's readline isn't regurgitating the string verbatim but is doing something fancy with backspaces or other control characters. But we need to see what it's actually emitting before there's much hope of adjusting the expected-output regex. regards, tom lane
On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote: > If you have time, that'd be great. What I suspect is that that > machine's readline isn't regurgitating the string verbatim but is > doing something fancy with backspaces or other control characters. > But we need to see what it's actually emitting before there's > much hope of adjusting the expected-output regex. I have been able to reproduce the failure manually and your patch is providing more information, indeed, as of: [10:21:44.017](0.002s) ok 83 - \r works [10:24:45.462](181.445s) # pump_until: timeout expired when searching for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=# \echo :\{\?VERBOSITY\} " psql query timed out at /home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm line 281. This stuff is actually kind of funny on this host, "\echo :{?VERB\t" completes to something incorrect, as of: postgres=# \echo :\{\?VERBOSITY\} Attaching the log file, for reference. Now I can see that this uses libedit at 3.1-20181209, which is far from recent. I'd be OK to just remove libedit from the build to remove this noise, still I am wondering if 927332b95e77 got what it was trying to achieve actually right. Thoughts? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote: >> If you have time, that'd be great. What I suspect is that that >> machine's readline isn't regurgitating the string verbatim but is >> doing something fancy with backspaces or other control characters. > I have been able to reproduce the failure manually and your patch is > providing more information, indeed, as of: > [10:21:44.017](0.002s) ok 83 - \r works > [10:24:45.462](181.445s) # pump_until: timeout expired when searching > for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=# > \echo :\{\?VERBOSITY\} " > psql query timed out at > /home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm > line 281. > This stuff is actually kind of funny on this host, "\echo :{?VERB\t" > completes to something incorrect, as of: > postgres=# \echo :\{\?VERBOSITY\} Just to be clear: you see the extra backslashes if you try this tab-completion manually? > Attaching the log file, for reference. Now I can see that this uses > libedit at 3.1-20181209, which is far from recent. I'd be OK to just > remove libedit from the build to remove this noise, still I am > wondering if 927332b95e77 got what it was trying to achieve actually > right. Thoughts? It kind of looks like a libedit bug, but maybe we should dig more deeply. I felt itchy about 927332b95e77 removing '{' from the WORD_BREAKS set, and wondered exactly how that would change readline's behavior. But even if that somehow accounts for the extra backslash before '{', it's not clear how it could lead to '?' and '}' also getting backslashed. regards, tom lane
On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> This stuff is actually kind of funny on this host, "\echo :{?VERB\t" >> completes to something incorrect, as of: >> postgres=# \echo :\{\?VERBOSITY\} > > Just to be clear: you see the extra backslashes if you try this > tab-completion manually? Yeah, I do, after completing "\echo :{?VERB" with this version of libedit. I see that this completes with backslashes added before '{', '}' and '?'. The test is telling the same. >> Attaching the log file, for reference. Now I can see that this uses >> libedit at 3.1-20181209, which is far from recent. I'd be OK to just >> remove libedit from the build to remove this noise, still I am >> wondering if 927332b95e77 got what it was trying to achieve actually >> right. Thoughts? > > It kind of looks like a libedit bug, but maybe we should dig more > deeply. I felt itchy about 927332b95e77 removing '{' from the > WORD_BREAKS set, and wondered exactly how that would change readline's > behavior. But even if that somehow accounts for the extra backslash > before '{', it's not clear how it could lead to '?' and '}' also > getting backslashed. I don't have a clear idea, either. I also feel uneasy about 927332b95e77 and its change of WORD_BREAKS, but this has the smell of a bug from an outdated libedit version. -- Michael
Attachment
On 2024-04-05 04:37 +0200, Michael Paquier wrote: > On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote: > > Michael Paquier <michael@paquier.xyz> writes: > >> This stuff is actually kind of funny on this host, "\echo :{?VERB\t" > >> completes to something incorrect, as of: > >> postgres=# \echo :\{\?VERBOSITY\} > > > > Just to be clear: you see the extra backslashes if you try this > > tab-completion manually? > > Yeah, I do, after completing "\echo :{?VERB" with this version of > libedit. I see that this completes with backslashes added before '{', > '}' and '?'. The test is telling the same. > > >> Attaching the log file, for reference. Now I can see that this uses > >> libedit at 3.1-20181209, which is far from recent. I'd be OK to just > >> remove libedit from the build to remove this noise, still I am > >> wondering if 927332b95e77 got what it was trying to achieve actually > >> right. Thoughts? > > > > It kind of looks like a libedit bug, but maybe we should dig more > > deeply. I felt itchy about 927332b95e77 removing '{' from the > > WORD_BREAKS set, and wondered exactly how that would change readline's > > behavior. But even if that somehow accounts for the extra backslash > > before '{', it's not clear how it could lead to '?' and '}' also > > getting backslashed. > > I don't have a clear idea, either. I also feel uneasy about > 927332b95e77 and its change of WORD_BREAKS, but this has the smell > of a bug from an outdated libedit version. It works with the latest libedit 20230828-3.1. Have to check the NetBSD source to find out what changed since 20181209-3.1. https://github.com/NetBSD/src/tree/trunk/lib/libedit -- Erik
Erik Wienhold <ewie@ewie.name> writes: > It works with the latest libedit 20230828-3.1. Have to check the NetBSD > source to find out what changed since 20181209-3.1. Yeah, the test is passing on mamba which is running the (just officially released) NetBSD 10.0. I'm not sure whether 10.0 has the "latest" libedit or something a little further back. sidewinder, with NetBSD 9.3, is happy as well. But 20181209 presumably belongs to NetBSD 8.x, which is theoretically still in support, so maybe it's worth poking into. regards, tom lane
On 2024-04-05 05:10 +0200, Tom Lane wrote: > Erik Wienhold <ewie@ewie.name> writes: > > It works with the latest libedit 20230828-3.1. Have to check the NetBSD > > source to find out what changed since 20181209-3.1. > > Yeah, the test is passing on mamba which is running the (just > officially released) NetBSD 10.0. I'm not sure whether 10.0 > has the "latest" libedit or something a little further back. > sidewinder, with NetBSD 9.3, is happy as well. But 20181209 > presumably belongs to NetBSD 8.x, which is theoretically still > in support, so maybe it's worth poking into. Having a look right now. Change [1] looks like a good candidate which is likely in 20221030-3.1. I'm trying to build Postgres with that older libedit version but can't figure out what options to pass to ./configure so that it picks /usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't work: LDFLAGS='-L/usr/local/lib' ./configure --with-libedit-preferred (My ld fu is not so great.) [1] https://github.com/NetBSD/src/commit/12863d4d7917df8a7ef5ad9dab6bb719018a22d1 -- Erik
Erik Wienhold <ewie@ewie.name> writes: > I'm trying to build Postgres with that older libedit version but can't > figure out what options to pass to ./configure so that it picks > /usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't > work: You probably want configure --with-libs=/usr/local/lib, and likely also --with-includes=/usr/local/include. regards, tom lane
On 2024-04-05 05:37 +0200, Tom Lane wrote: > Erik Wienhold <ewie@ewie.name> writes: > > I'm trying to build Postgres with that older libedit version but can't > > figure out what options to pass to ./configure so that it picks > > /usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't > > work: > > You probably want configure --with-libs=/usr/local/lib, > and likely also --with-includes=/usr/local/include. Thanks Tom. But I also have to run psql with: LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib src/bin/psql/psql Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as expected. The previous release 20190324-3.1 still produces the escaped output that Michael found. That narrows down the changes to everything between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and [2] (both inclusive). [1] https://github.com/NetBSD/src/commit/e09538bda2f805200d0f7ae09fb9b7f2f5ed75f2 [2] https://github.com/NetBSD/src/commit/de11d876419df3570c2418468613aebcebafe6ae -- Erik
Erik Wienhold <ewie@ewie.name> writes: > Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as > expected. The previous release 20190324-3.1 still produces the escaped > output that Michael found. That narrows down the changes to everything > between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and > [2] (both inclusive). Hm. I just installed NetBSD 8.2 in a VM, and it passes this test: # +++ tap install-check in src/bin/psql +++ t/001_basic.pl ........... ok t/010_tab_completion.pl .. ok t/020_cancel.pl .......... ok All tests successful. So it seems like the bug does not exist in any currently-supported NetBSD release. Debian has been known to ship obsolete libedit versions, though. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Erik Wienhold <ewie@ewie.name> writes: >> Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as >> expected. The previous release 20190324-3.1 still produces the escaped >> output that Michael found. That narrows down the changes to everything >> between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and >> [2] (both inclusive). > > Hm. I just installed NetBSD 8.2 in a VM, and it passes this test: > > # +++ tap install-check in src/bin/psql +++ > t/001_basic.pl ........... ok > t/010_tab_completion.pl .. ok > t/020_cancel.pl .......... ok > All tests successful. > > So it seems like the bug does not exist in any currently-supported > NetBSD release. Debian has been known to ship obsolete libedit > versions, though. Both the current (bokworm/12) and previous (bullseye/11) versions of Debian have new enough libedits to not be affected by this bug: libedit | 3.1-20181209-1 | oldoldstable | source libedit | 3.1-20191231-2 | oldstable | source libedit | 3.1-20221030-2 | stable | source libedit | 3.1-20230828-1 | testing | source libedit | 3.1-20230828-1 | unstable | source libedit | 3.1-20230828-1 | unstable-debug | source But in bullseye they decided that OpenSSL is a system library as far as the GPL is concerned, so are linking directly to readline. And even before then their psql wrapper would LD_PRELOAD readline instead if installed, so approximately nobody actually ever used psql with libedit on Debian. > regards, tom lane - ilmari
TIL that IPC::Run::timer is not the same as IPC::Run::timeout. With a timer object you have to check $timer->is_expired to see if the timeout has elapsed, but with a timeout object you don't because it will throw a Perl exception upon timing out, probably killing your test program. It appears that a good chunk of our TAP codebase has not read this memo, because I see plenty of places that are checking is_expired in the naive belief that they'll still have control after a timeout has fired.
I started having a look at these.
Here are the cases I found:
./src/bin/psql/t/010_tab_completion.pl: my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql startup timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm: unless $timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:timeout is the IPC::Run::Timeout object whose is_expired method can be tested
./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm: unless $timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Utils.pm: if ($timeout->is_expired)
./src/test/recovery/t/021_row_visibility.pl: if ($psql_timeout->is_expired)
./src/test/recovery/t/032_relfilenode_reuse.pl: if ($psql_timeout->is_expired)
Those in Cluster.pm look correct - they are doing the run() in an eval block and testing for the is_expired setting in an exception block. The other cases look more suspect. I'll take a closer look.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> So it seems like the bug does not exist in any currently-supported >> NetBSD release. Debian has been known to ship obsolete libedit >> versions, though. > Both the current (bokworm/12) and previous (bullseye/11) versions of > Debian have new enough libedits to not be affected by this bug: > ... > But in bullseye they decided that OpenSSL is a system library as far as > the GPL is concerned, so are linking directly to readline. > And even before then their psql wrapper would LD_PRELOAD readline > instead if installed, so approximately nobody actually ever used psql > with libedit on Debian. Based on this info, I'm disinclined to put work into trying to make the case behave correctly with that old libedit version, or even to lobotomize the test case enough so it would pass. What I suggest Michael do with tanager is install the OS-version-appropriate version of GNU readline, so that the animal will test what ilmari describes as the actually common use-case. (I see that what he did for the moment is add --without-readline. Perhaps that's a decent long-term choice too, because I think we have rather little coverage of that option except on Windows.) regards, tom lane
On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote: > What I suggest Michael do with tanager is install the > OS-version-appropriate version of GNU readline, so that the animal > will test what ilmari describes as the actually common use-case. Thanks for the investigations! It's clear that this version of libedit is borked. > (I see that what he did for the moment is add --without-readline. > Perhaps that's a decent long-term choice too, because I think we > have rather little coverage of that option except on Windows.) Yeah, my motivation did not go beyond that so you can blame me for being lazy. ;) Note also the --without-icu on this host that has little coverage, as well. -- Michael
Attachment
On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote: > Based on this info, I'm disinclined to put work into trying to > make the case behave correctly with that old libedit version, or > even to lobotomize the test case enough so it would pass. By the way, are you planning to do something like [1]? I've not looked in details at the callers of IPC::Run::timeout, still the extra debug output would be nice. [1]: https://www.postgresql.org/message-id/1100715.1712265845@sss.pgh.pa.us -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > By the way, are you planning to do something like [1]? I've not > looked in details at the callers of IPC::Run::timeout, still the extra > debug output would be nice. It needs more review I think - I didn't check every call site to see if anything would be broken. I believe Andrew has undertaken a survey of all the timeout/timer calls, but if he doesn't produce anything I might have a go at it after awhile. regards, tom lane
On 2024-04-09 Tu 09:46, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> By the way, are you planning to do something like [1]? I've not >> looked in details at the callers of IPC::Run::timeout, still the extra >> debug output would be nice. > It needs more review I think - I didn't check every call site to see > if anything would be broken. I believe Andrew has undertaken a > survey of all the timeout/timer calls, but if he doesn't produce > anything I might have a go at it after awhile. > > What I looked at so far was the use of is_expired, but when you look into that you see that you need to delve further, to where timeout/timer objects are created and passed around. I'll take a closer look when I have done some incremental json housekeeping. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier <michael@paquier.xyz> wrote: > > It kind of looks like a libedit bug, but maybe we should dig more > > deeply. I felt itchy about 927332b95e77 removing '{' from the > > WORD_BREAKS set, and wondered exactly how that would change readline's > > behavior. But even if that somehow accounts for the extra backslash > > before '{', it's not clear how it could lead to '?' and '}' also > > getting backslashed. > > I don't have a clear idea, either. I also feel uneasy about > 927332b95e77 and its change of WORD_BREAKS, but this has the smell > of a bug from an outdated libedit version. I too felt uneasy about that commit, for the same reason. However, there is a justification for the change in the commit message which is not obviously wrong, namely that ":{?name} is the only psql syntax using the '{' sign". And in fact, SQL basically doesn't use '{' for anything, either. We do see { showing up inside of quoted strings, for arrays or JSON, but I would guess that the word-break characters aren't going to affect behavior within a quoted string. So it still seems like it should be OK? Another thing that makes me think that my unease may be unfounded is that the matching character '}' isn't in WORD_BREAKS either, and I would have thought that if we needed one we'd need both. But does anyone else have a more specific reason for thinking that this might be a problem? -- Robert Haas EDB: http://www.enterprisedb.com
> On 4 Apr 2024, at 23:46, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> A minimum fix that seems to make this work better is as attached, >> but I feel like somebody ought to examine all the IPC::Run::timer >> and IPC::Run::timeout calls and see which ones are mistaken. >> It's a little scary to convert a timeout to a timer because of >> the hazard that someplace that would need to be checking for >> is_expired isn't. > > Skimming this and a few callsites it seems reasonable to use a timer instead of > a timeout, but careful study is needed to make sure we're not introducing > anything subtly wrong in the other direction. Sharing a few preliminary results from looking at this, the attached passes check-world but need more study/testing. It seems wrong to me that we die() in query and query_until rather than giving the caller the power to decide how to proceed. We have even documented that we do just this: "Dies on failure to invoke psql, or if psql fails to connect. Errors occurring later are the caller's problem" Turning the timeout into a timer and returning undef along with logging a test failure in case of expiration seems a bit saner (maybe Andrew can suggest an API which has a better Perl feel to it). Most callsites don't need any changes to accommodate for this, the attached 0002 implements this timer change and modify the few sites that need it, converting one to plain query() where the added complexity of query_until isn't required. A few other comments on related things that stood out while reviewing: The tab completion test can use the API call for automatically restart the timer to reduce the complexity of check_completion a hair. Done in 0001 (but really not necessary). Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in the recovery tests: $back_q->query_until( qr/logical_slot_get_changes/, q( \echo logical_slot_get_changes SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL); )); ... <other tests> ... # Since there are no slots in standby_slot_names, the function # pg_logical_slot_get_changes should now return, and the session can be # stopped. $back_q->quit; There is no guarantee that pg_logical_slot_get_changes has returned when reaching this point. This might still work as intended, but the comment is slightly misleading IMO. recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e in a background session which it then skips terminating. Calling ->quit is mandated by the API, in turn required by IPC::Run. Calling ->quit on the process makes the test fail from the process having already exited, but we can call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes this. -- Daniel Gustafsson
Attachment
On Tue, Apr 9, 2024 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier <michael@paquier.xyz> wrote: > > > It kind of looks like a libedit bug, but maybe we should dig more > > > deeply. I felt itchy about 927332b95e77 removing '{' from the > > > WORD_BREAKS set, and wondered exactly how that would change readline's > > > behavior. But even if that somehow accounts for the extra backslash > > > before '{', it's not clear how it could lead to '?' and '}' also > > > getting backslashed. > > > > I don't have a clear idea, either. I also feel uneasy about > > 927332b95e77 and its change of WORD_BREAKS, but this has the smell > > of a bug from an outdated libedit version. > > I too felt uneasy about that commit, for the same reason. However, > there is a justification for the change in the commit message which is > not obviously wrong, namely that ":{?name} is the only psql syntax > using the '{' sign". And in fact, SQL basically doesn't use '{' for > anything, either. We do see { showing up inside of quoted strings, for > arrays or JSON, but I would guess that the word-break characters > aren't going to affect behavior within a quoted string. So it still > seems like it should be OK? Another thing that makes me think that my > unease may be unfounded is that the matching character '}' isn't in > WORD_BREAKS either, and I would have thought that if we needed one > we'd need both. FWIW, the default value of rl_basic_word_break_characters [1] has '{' but doesn't have '}'. The documentation says that this should "break words for completion in Bash". But I failed to find an explanation why this should be so for Bash. As you correctly get, my idea was that our SQL isn't not heavily using '{' unlike Bash. > But does anyone else have a more specific reason for thinking that > this might be a problem? I don't particularly object against reverting this commit, but I think we should get to the bottom of this first. Otherwise there is no warranty to not run into the same problem again. Links. 1. https://tiswww.case.edu/php/chet/readline/readline.html#index-rl_005fbasic_005fword_005fbreak_005fcharacters ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > On Tue, Apr 9, 2024 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote: >> I too felt uneasy about that commit, for the same reason. However, >> there is a justification for the change in the commit message which is >> not obviously wrong, namely that ":{?name} is the only psql syntax >> using the '{' sign". And in fact, SQL basically doesn't use '{' for >> anything, either. True. > FWIW, the default value of rl_basic_word_break_characters [1] has '{' > but doesn't have '}'. The documentation says that this should "break > words for completion in Bash". But I failed to find an explanation > why this should be so for Bash. As you correctly get, my idea was > that our SQL isn't not heavily using '{' unlike Bash. Yeah, there's no doubt that what we are currently using for WORD_BREAKS has mostly been cargo-culted from Bash rather than having any solid foundation in SQL syntax. It works all right for us today because we don't really try to complete anything in general SQL expression contexts, so as long as we have whitespace and parens in there we're more or less fine. I wonder a bit why comma isn't in there, though. As an example, vacuum (full,fre<TAB> fails to complete "freeze", though it works fine with a space after the comma. I've not experimented, but it seems certain that it'd behave better with comma in WORD_BREAKS. Whether that'd pessimize other behaviors, I dunno. The longer-range concern that I have is that if we ever want to complete stuff within expressions, I think we're going to need all the valid operator characters to be in WORD_BREAKS. And that will be a problem for this patch, not because of '{' but because of '?'. So I'd be happier if the parsing were done in a way that did not rely on '{' and '?' being treated as word characters. But I've not looked into how hard that'd be. In any case, it's likely that expanding WORD_BREAKS like that would cause some other problems that'd have to be fixed, so it's not very reasonable to expect this patch to avoid a hypothetical future problem. regards, tom lane
On 09/04/2024 20:10, Daniel Gustafsson wrote: > Turning the timeout into a timer and returning undef along with logging a test > failure in case of expiration seems a bit saner (maybe Andrew can suggest an > API which has a better Perl feel to it). Most callsites don't need any changes > to accommodate for this, the attached 0002 implements this timer change and > modify the few sites that need it, converting one to plain query() where the > added complexity of query_until isn't required. +1. The patch looks good to me. I didn't comb through the tests to check for bugs of omission, i.e. cases where the caller would need adjustments because of this, I trust that you found them all. > =item $session->quit > > Close the session and clean up resources. Each test run must be closed with > C<quit>. Returns TRUE when the session was cleanly terminated, otherwise > FALSE. A testfailure will be issued in case the session failed to finish. What does "session failed to finish" mean? Does it mean the same as "not cleanly terminated", i.e. a test failure is issued whenever this returns FALSE? typo: testfailure -> test failure > diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl > index d87efa823fd..62936f52d20 100644 > --- a/src/test/recovery/t/031_recovery_conflict.pl > +++ b/src/test/recovery/t/031_recovery_conflict.pl > @@ -253,9 +253,7 @@ $res = $psql_standby->query_until( > -- wait for lock held by prepared transaction > SELECT * FROM $table2; > ]); > -ok(1, > - "$sect: cursor holding conflicting pin, also waiting for lock, established" > -); > +isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, established"); > > # just to make sure we're waiting for lock already > ok( $node_standby->poll_query_until( > diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl > index 6d1c7117964..c8c20077f85 100644 > --- a/src/test/recovery/t/037_invalid_database.pl > +++ b/src/test/recovery/t/037_invalid_database.pl > @@ -94,6 +94,7 @@ is($node->psql('postgres', 'DROP DATABASE regression_invalid'), > my $cancel = $node->background_psql('postgres', on_error_stop => 1); > my $bgpsql = $node->background_psql('postgres', on_error_stop => 0); > my $pid = $bgpsql->query('SELECT pg_backend_pid()'); > +isnt($pid, undef, 'Get backend PID'); > > # create the database, prevent drop database via lock held by a 2PC transaction > ok( $bgpsql->query_safe( I'm not sure I understand these changes. These can only fail if the query() or query_until() function times out, right? In that case, the query() or query_until() would also report a test failure, so these additional checks after the call seem redundant. They don't do any harm either, but I wonder why have them in these particular call sites and not in other query() or query_until() calls. > The tab completion test can use the API call for automatically restart the > timer to reduce the complexity of check_completion a hair. Done in 0001 (but > really not necessary). +1 > Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in > the recovery tests: > > $back_q->query_until( > qr/logical_slot_get_changes/, q( > \echo logical_slot_get_changes > SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL); > )); > > ... <other tests> ... > > # Since there are no slots in standby_slot_names, the function > # pg_logical_slot_get_changes should now return, and the session can be > # stopped. > $back_q->quit; > > There is no guarantee that pg_logical_slot_get_changes has returned when > reaching this point. This might still work as intended, but the comment is > slightly misleading IMO. Agreed, it would be good to actually check that it returns. > recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e > in a background session which it then skips terminating. Calling ->quit is > mandated by the API, in turn required by IPC::Run. Calling ->quit on the > process makes the test fail from the process having already exited, but we can > call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes > this. Alexander included this fix in commit 3c5db1d6b016 already. -- Heikki Linnakangas Neon (https://neon.tech)
On 31/10/2024 14:27, Daniel Gustafsson wrote: >> On 28 Oct 2024, at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> On 09/04/2024 20:10, Daniel Gustafsson wrote: >>> =item $session->quit >>> Close the session and clean up resources. Each test run must be closed with >>> C<quit>. Returns TRUE when the session was cleanly terminated, otherwise >>> FALSE. A testfailure will be issued in case the session failed to finish. >> >> What does "session failed to finish" mean? Does it mean the same as "not cleanly terminated", i.e. a test failure is issuedwhenever this returns FALSE? > > It was very literally referring to the finish() method. I've reworded the > comment to indicated that it throws a failure in case the process returns a > non-zero exit status to finish(). I see. > @@ -148,7 +148,9 @@ sub _wait_connect > =item $session->quit > > Close the session and clean up resources. Each test run must be closed with > -C<quit>. > +C<quit>. Returns TRUE when the session was cleanly terminated, otherwise > +FALSE. A test failure will be issued in case the session exited with a non- > +zero exit status (the finish() method returns TRUE for 0 exit status). I still find that confusing. What finish() method? Yes, there's a finish() method in IPC::Run, but that's BackgroundPsql's internal affair, not exposed to the callers in any other way. And why do I care what that finish() returns for 0 exit status? That's not visible to the quit method's caller. Perhaps sommething like this: "Close the psql session and clean up resources. Each psql session must be closed with C<quit> before the end of the test. Returns TRUE if psql exited successfully (i.e. with zero exit code), otherwise returns FALSE and reports a test failure. " Would that be accurate? -- Heikki Linnakangas Neon (https://neon.tech)