Thread: Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
Re: Tom Lane 2020-01-02 <E1in6ft-0004zR-6l@gemulon.postgresql.org> > Add basic TAP tests for psql's tab-completion logic. The \DRD test fails on Debian/unstable: # check case-sensitive keyword replacement # XXX the output here might vary across readline versions check_completion( "\\DRD\t", "\\DRD\b\b\bdrds ", "complete \\DRD<tab> to \\drds"); 00:58:02 rm -rf '/<<PKGBUILDDIR>>/build/src/bin/psql'/tmp_check 00:58:02 /bin/mkdir -p '/<<PKGBUILDDIR>>/build/src/bin/psql'/tmp_check 00:58:02 cd /<<PKGBUILDDIR>>/build/../src/bin/psql && TESTDIR='/<<PKGBUILDDIR>>/build/src/bin/psql' PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/postgresql/13/bin:$PATH" LD_LIBRARY_PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/x86_64-linux-gnu" PGPORT='65432' PG_REGRESS='/<<PKGBUILDDIR>>/build/src/bin/psql/../../../src/test/regress/pg_regress' REGRESS_SHLIB='/<<PKGBUILDDIR>>/build/src/test/regress/regress.so'/usr/bin/prove -I /<<PKGBUILDDIR>>/build/../src/test/perl/-I /<<PKGBUILDDIR>>/build/../src/bin/psql t/*.pl 00:58:09 00:58:09 # Failed test 'complete \DRD<tab> to \drds' 00:58:09 # at t/010_tab_completion.pl line 64. 00:58:09 # Looks like you failed 1 test of 12. 00:58:09 t/010_tab_completion.pl .. 00:58:09 Dubious, test returned 1 (wstat 256, 0x100) 00:58:09 Failed 1/12 subtests 00:58:09 00:58:09 Test Summary Report 00:58:09 ------------------- 00:58:09 t/010_tab_completion.pl (Wstat: 256 Tests: 12 Failed: 1) 00:58:09 Failed test: 11 00:58:09 Non-zero exit status: 1 00:58:09 Files=1, Tests=12, 7 wallclock secs ( 0.01 usr 0.01 sys + 0.77 cusr 0.23 csys = 1.02 CPU) 00:58:09 Result: FAIL 00:58:09 make[2]: *** [Makefile:87: check] Error 1 https://pgdgbuild.dus.dg-i.net/job/postgresql-13-binaries/architecture=amd64,distribution=sid/444/console Shouldn't this print some "expected foo, got bar" diagnostics instead of just dying? Christoph
Christoph Berg <myon@debian.org> writes: > Re: Tom Lane 2020-01-02 <E1in6ft-0004zR-6l@gemulon.postgresql.org> >> Add basic TAP tests for psql's tab-completion logic. > The \DRD test fails on Debian/unstable: Indeed. It appears that recent libedit breaks tab-completion for words involving a backslash, which is the fault of this upstream commit: http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/filecomplete.c.diff?r1=1.52&r2=1.53 Basically what that's doing is applying de-backslashing to EVERY word that completion is attempted on, whether it might be a filename or not. So what psql_complete sees in this test case is just "DRD" which of course it does not recognize as a possible psql backslash command. I found out while investigating this that the libedit version shipping with buster (3.1-20181209) is differently broken for the same case: instead of inapproriate forced de-escaping of the input of the application-specific completion function, it applies inapproriate forced escaping to the output of said function, so that when we see "\DRD" and return "\drds", what comes out to the user is "\\drds". libedit apparently needs a regression test suite even worse than we do. I was kind of despairing of fixing this last night, but in the light of morning it occurs to me that there's a possible workaround for the de-escape bug: we could make psql_completion ignore the passed "text" string and look at the original input buffer, as get_previous_words() is already doing. I don't see any way to dodge buster's bug, though. regards, tom lane
Christoph Berg <myon@debian.org> writes: > Shouldn't this print some "expected foo, got bar" diagnostics instead > of just dying? BTW, as far as that goes, we do: see for instance the tail end of https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-01-02%2020%3A04%3A03 ok 8 - offer multiple table choices ok 9 - finish completion of one of multiple table choices ok 10 - \r works not ok 11 - complete \DRD<tab> to \drds # Failed test 'complete \DRD<tab> to \drds' # at t/010_tab_completion.pl line 64. # Actual output was "\DRD" ok 12 - \r works Not sure why you are not seeing the "Actual output" bit in your log. I used a "note" command to print it, maybe that's not best practice? Also, while I'm asking for Perl advice: I can see in my editor that there's a control-G bell character in that string, but this is far from obvious on the web page. I'd kind of like to get the report to escapify control characters so that what comes out is more like # Actual output was "\DRD^G" or # Actual output was "\\DRD\007" or some such. Anybody know an easy way to do that in Perl? regards, tom lane
Re: Tom Lane 2020-01-03 <13708.1578059577@sss.pgh.pa.us> > I found out while investigating this that the libedit version shipping > with buster (3.1-20181209) is differently broken for the same case: (Fwiw this wasn't spotted before because we have this LD_PRELOAD hack that replaces libedit with readline at psql runtime. I guess that means that the hack is pretty stable... Still, looking forward to the day that OpenSSL is finally relicensing so we can properly link to readline.) Re: Tom Lane 2020-01-03 <14261.1578060227@sss.pgh.pa.us> > > Shouldn't this print some "expected foo, got bar" diagnostics instead > > of just dying? > > BTW, as far as that goes, we do: see for instance the tail end of > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-01-02%2020%3A04%3A03 > > ok 8 - offer multiple table choices > ok 9 - finish completion of one of multiple table choices > ok 10 - \r works > not ok 11 - complete \DRD<tab> to \drds > > # Failed test 'complete \DRD<tab> to \drds' > # at t/010_tab_completion.pl line 64. > # Actual output was "\DRD" > ok 12 - \r works > > Not sure why you are not seeing the "Actual output" bit in your log. > I used a "note" command to print it, maybe that's not best practice? I think best practice is to use something like like($out, qr/$pattern/, $annotation) instead of plain "ok()" which doesn't know about the actual values compared. The "&& !$timer->is_expired" condition can be dropped from the test because all we care about is if the output matches. I never really grasped in which contexts TAP is supposed to print the full test output ("ok 10 -..."). Apparently the way the testsuite is invoked at package build time only prints the terse failure summary in which "note"s aren't included. Is there a switch to configure that? > Also, while I'm asking for Perl advice: I can see in my editor that > there's a control-G bell character in that string, but this is far > from obvious on the web page. I'd kind of like to get the report > to escapify control characters so that what comes out is more like > > # Actual output was "\DRD^G" > or > # Actual output was "\\DRD\007" > > or some such. Anybody know an easy way to do that in Perl? I don't know for note(), but maybe like() would do that automatically. Christoph
Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Also, while I'm asking for Perl advice: I can see in my editor that > there's a control-G bell character in that string, but this is far > from obvious on the web page. I'd kind of like to get the report > to escapify control characters so that what comes out is more like > > # Actual output was "\DRD^G" > or > # Actual output was "\\DRD\007" > > or some such. Anybody know an easy way to do that in Perl? I was going to suggest using Test::More's like() function to do the regex check, but sadly that only escapes things that would break the TAP stream syntax, not non-printables in general. The next obvious thing is Data::Dumper with the 'Useqq' option enabled, which makes it use double-quoted-string escapes (e.g. "\a" for ^G). The attaced patch does that, and also bumps $Test::Builder::Level so the diagnostic references the calling line, and uses diag() instad of note(), so it shows even in non-verbose mode. - ilmari -- "A disappointingly low fraction of the human race is, at any given time, on fire." - Stig Sandbeck Mathisen From c4541abc826c40e88f729c5a71e5d06a11295aa2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Fri, 3 Jan 2020 17:07:10 +0000 Subject: [PATCH] Escape non-printable characters in psql tab-completion TAP tests By using Data::Dumper in Useqq mode. In passing, bump $Test::Builder::Level so the diagnostic references the calling line, and use diag() instad of note(), so it shows even in non-verbose mode. --- src/bin/psql/t/010_tab_completion.pl | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index a02cbd8e47..553288bda7 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -5,6 +5,7 @@ use PostgresNode; use TestLib; use Test::More; use IPC::Run qw(pump finish timer); +use Data::Dumper; if (!defined($ENV{with_readline}) || $ENV{with_readline} ne 'yes') { @@ -52,6 +53,9 @@ sub check_completion { my ($send, $pattern, $annotation) = @_; + # report test failures from caller location + local $Test::Builder::Level = $Test::Builder::Level + 1; + # reset output collector $out = ""; # restart per-command timer @@ -63,7 +67,9 @@ sub check_completion my $okay = ($out =~ m/$pattern/ && !$timer->is_expired); ok($okay, $annotation); # for debugging, log actual output if it didn't match - note 'Actual output was "' . $out . "\"\n" if !$okay; + local $Data::Dumper::Terse = 1; + local $Data::Dumper::Useqq = 1; + diag 'Actual output was ' . Dumper($out) . "\n" if !$okay; return; } -- 2.22.0
Christoph Berg <myon@debian.org> writes: > Re: Tom Lane 2020-01-03 <13708.1578059577@sss.pgh.pa.us> >> I found out while investigating this that the libedit version shipping >> with buster (3.1-20181209) is differently broken for the same case: > (Fwiw this wasn't spotted before because we have this LD_PRELOAD hack > that replaces libedit with readline at psql runtime. You do? I went looking in the Debian package source repo just the other day for some evidence that that was true, and couldn't find any, so I concluded that it was only an urban legend. Where is that done exactly? Perhaps more importantly, *why* is it done? It seems to me that it takes a pretty fevered imagination to suppose that using libreadline that way meets the terms of its license but just building against the library normally would not. Certainly when I worked for Red Hat, their lawyers did not think there was any problem with building Postgres using both openssl and readline. The reason I'm concerned about this is that there's a patch on the table [1] that will probably not behave nicely at all if it's compiled against libedit headers and then executed with libreadline, because it will draw the wrong conclusions about whether the filename quoting hooks are available. So that hack is going to fail on you soon, especially after I add regression testing around the filename completion stuff ;-) >> I used a "note" command to print it, maybe that's not best practice? > I think best practice is to use something like > like($out, qr/$pattern/, $annotation) I'll check into that, thanks! regards, tom lane [1] https://www.postgresql.org/message-id/flat/16059-8836946734c02b84@postgresql.org
Re: Tom Lane 2020-01-03 <26339.1578072930@sss.pgh.pa.us> > Christoph Berg <myon@debian.org> writes: > > Re: Tom Lane 2020-01-03 <13708.1578059577@sss.pgh.pa.us> > >> I found out while investigating this that the libedit version shipping > >> with buster (3.1-20181209) is differently broken for the same case: > > > (Fwiw this wasn't spotted before because we have this LD_PRELOAD hack > > that replaces libedit with readline at psql runtime. > > You do? I went looking in the Debian package source repo just the > other day for some evidence that that was true, and couldn't find > any, so I concluded that it was only an urban legend. Where is that > done exactly? /usr/share/postgresql-common/pg_wrapper https://salsa.debian.org/postgresql/postgresql-common/blob/master/pg_wrapper#L129-157 > Perhaps more importantly, *why* is it done? It seems to me that it > takes a pretty fevered imagination to suppose that using libreadline Tom, claiming that things are "fevered" just because you didn't like them is not appropriate. It's not fun working with PostgreSQL when the tone is like that. > that way meets the terms of its license but just building against > the library normally would not. Certainly when I worked for Red Hat, > their lawyers did not think there was any problem with building > Postgres using both openssl and readline. I'm not starting that debate here, but Debian thinks otherwise: https://lwn.net/Articles/428111/ > The reason I'm concerned about this is that there's a patch on the > table [1] that will probably not behave nicely at all if it's > compiled against libedit headers and then executed with libreadline, > because it will draw the wrong conclusions about whether the > filename quoting hooks are available. So that hack is going to > fail on you soon, especially after I add regression testing around > the filename completion stuff ;-) Well, so far, it worked well. (The biggest problem used to be that libedit didn't have the history append function so it wasn't used even with readline, but that got implemented ~2 years ago.) Christoph
ilmari@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Anybody know an easy way to do that in Perl? > I was going to suggest using Test::More's like() function to do the > regex check, but sadly that only escapes things that would break the TAP > stream syntax, not non-printables in general. The next obvious thing is > Data::Dumper with the 'Useqq' option enabled, which makes it use > double-quoted-string escapes (e.g. "\a" for ^G). > The attaced patch does that, and also bumps $Test::Builder::Level so the > diagnostic references the calling line, and uses diag() instad of > note(), so it shows even in non-verbose mode. LGTM, pushed (along with a fix to deal with what hopefully is the only remaining obstacle for Andres' critters). regards, tom lane
On Fri, Jan 3, 2020 at 12:48 PM Christoph Berg <myon@debian.org> wrote: > > Perhaps more importantly, *why* is it done? It seems to me that it > > takes a pretty fevered imagination to suppose that using libreadline > > Tom, claiming that things are "fevered" just because you didn't like > them is not appropriate. It's not fun working with PostgreSQL when the > tone is like that. +1. > > that way meets the terms of its license but just building against > > the library normally would not. Certainly when I worked for Red Hat, > > their lawyers did not think there was any problem with building > > Postgres using both openssl and readline. > > I'm not starting that debate here, but Debian thinks otherwise: > > https://lwn.net/Articles/428111/ I take no position on whether Debian is correct in its assessment of such things, but I reiterate my previous opposition to breaking it just because we don't agree with it, or because Tom specifically doesn't. It's too mainstream a platform to arbitrarily break. And it will probably just have the effect of increasing the number of patches they're carrying against our sources, which will not make things better for anybody. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Christoph Berg <myon@debian.org> writes: > Re: Tom Lane 2020-01-03 <26339.1578072930@sss.pgh.pa.us> >> You do? I went looking in the Debian package source repo just the >> other day for some evidence that that was true, and couldn't find >> any, so I concluded that it was only an urban legend. Where is that >> done exactly? > /usr/share/postgresql-common/pg_wrapper > https://salsa.debian.org/postgresql/postgresql-common/blob/master/pg_wrapper#L129-157 Oh, so not in the Postgres package per se. What that means is that our regression tests will pass (as it's just a regular libedit install while they're running) but then filename completion will not work well for actual users. And there's not really anything I can do about that from this end. (On the other hand, filename completion is already kind of buggy, which is why that patch exists in the first place. So maybe it won't get any worse. Hard to say.) regards, tom lane
On Fri, Jan 3, 2020 at 9:59 AM Robert Haas <robertmhaas@gmail.com> wrote: > I take no position on whether Debian is correct in its assessment of > such things, but I reiterate my previous opposition to breaking it > just because we don't agree with it, or because Tom specifically > doesn't. It's too mainstream a platform to arbitrarily break. And it > will probably just have the effect of increasing the number of patches > they're carrying against our sources, which will not make things > better for anybody. Even with commit 56a3921a, "make check-world" is broken on my Ubuntu 18.04 workstation. This is now adversely impacting my work, so I hope it can be resolved soon. Not sure if the specifics matter, but FWIW "make check-world" ended with the following failure just now: make[2]: Entering directory '/code/postgresql/patch/build/src/bin/psql' rm -rf '/code/postgresql/patch/build/src/bin/psql'/tmp_check /bin/mkdir -p '/code/postgresql/patch/build/src/bin/psql'/tmp_check cd /code/postgresql/patch/build/../source/src/bin/psql && TESTDIR='/code/postgresql/patch/build/src/bin/psql' PATH="/code/postgresql/patch/build/tmp_install/code/postgresql/patch/install/bin:$PATH" LD_LIBRARY_PATH="/code/postgresql/patch/build/tmp_install/code/postgresql/patch/install/lib" PGPORT='65432' PG_REGRESS='/code/postgresql/patch/build/src/bin/psql/../../../src/test/regress/pg_regress' REGRESS_SHLIB='/code/postgresql/patch/build/src/test/regress/regress.so' /usr/bin/prove -I /code/postgresql/patch/build/../source/src/test/perl/ -I /code/postgresql/patch/build/../source/src/bin/psql t/*.pl t/010_tab_completion.pl .. 8/? # Failed test 'offer multiple table choices' # at t/010_tab_completion.pl line 105. # Actual output was "\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from mytab\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from mytab" # # Looks like you failed 1 test of 12. t/010_tab_completion.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/12 subtests Test Summary Report ------------------- t/010_tab_completion.pl (Wstat: 256 Tests: 12 Failed: 1) Failed test: 8 Non-zero exit status: 1 Files=1, Tests=12, 7 wallclock secs ( 0.02 usr 0.00 sys + 0.80 cusr 0.09 csys = 0.91 CPU) Result: FAIL Makefile:87: recipe for target 'check' failed make[2]: *** [check] Error 1 make[2]: Leaving directory '/code/postgresql/patch/build/src/bin/psql' Makefile:41: recipe for target 'check-psql-recurse' failed make[1]: *** [check-psql-recurse] Error 2 make[1]: Leaving directory '/code/postgresql/patch/build/src/bin' GNUmakefile:70: recipe for target 'check-world-src/bin-recurse' failed make: *** [check-world-src/bin-recurse] Error 2 -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > Not sure if the specifics matter, but FWIW "make check-world" ended > with the following failure just now: > # Failed test 'offer multiple table choices' > # at t/010_tab_completion.pl line 105. > # Actual output was "\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K > \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from > mytab\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K > \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from > mytab" Huh. What readline or libedit version are you using, on what platform? I'm curious also what is your prevailing setting of TERM? (I've been wondering if the test doesn't need to force that to something standard. The buildfarm hasn't shown any signs of needing that, but manual invocations might be a different story.) regards, tom lane
On Fri, Jan 3, 2020 at 6:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Huh. What readline or libedit version are you using, on what > platform? Ubuntu 18.04. I used ldd to verify that psql links to the system libreadline, which is libreadline7:amd64 -- that's what Debian packages as "7.0-3". > I'm curious also what is your prevailing setting > of TERM? I use zsh, with a fairly customized setup. $TERM is "xterm-256color" in the affected shell. (I have a feeling that this has something to do with my amazing technicolor terminal.) -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: >> I'm curious also what is your prevailing setting >> of TERM? > I use zsh, with a fairly customized setup. $TERM is "xterm-256color" > in the affected shell. (I have a feeling that this has something to do > with my amazing technicolor terminal.) Hmm. If you set it to plain "xterm", does the test pass? regards, tom lane
On Fri, Jan 3, 2020 at 6:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm. If you set it to plain "xterm", does the test pass? No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also didn't help. (This was based on possibly-relevant vars that "env" showed were set). -- Peter Geoghegan
On Fri, Jan 3, 2020 at 7:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also > didn't help. (This was based on possibly-relevant vars that "env" > showed were set). Removing the single check_completion() test from 010_tab_completion.pl that actually fails on my system ("offer multiple table choices") fixes the problem for me -- everything else passes. I suppose that this means that the problem is in "offer multiple table choices" specifically. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Jan 3, 2020 at 7:06 PM Peter Geoghegan <pg@bowt.ie> wrote: >> No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also >> didn't help. (This was based on possibly-relevant vars that "env" >> showed were set). Yeah, that's not terribly surprising, because if I'm reading those escape sequences correctly they're not about color. They seem to be just cursor movement and line clearing, according to [1]. What I'm mystified by is why your copy of libreadline is choosing to do that, rather than just space over to where the word should be printed which is what every other copy seems to be doing. I have a fresh new Debian installation at hand, with $ dpkg -l | grep readline ii libreadline-dev:amd64 7.0-5 amd64 GNU readline and history libraries,development files ii libreadline5:amd64 5.2+dfsg-3+b13 amd64 GNU readline and history libraries,run-time libraries ii libreadline7:amd64 7.0-5 amd64 GNU readline and history libraries,run-time libraries ii readline-common 7.0-5 all GNU readline and history libraries,common files and I'm not seeing the failure on it, either with TERM=xterm or with TERM=xterm-256color. So what's the missing ingredient? > Removing the single check_completion() test from 010_tab_completion.pl > that actually fails on my system ("offer multiple table choices") > fixes the problem for me -- everything else passes. > I suppose that this means that the problem is in "offer multiple table > choices" specifically. I'd hate to conclude that we can't test any completion behavior that involves offering a list. If we can't coerce libreadline into being less avant-garde in its screen management, I suppose we could write a regex to recognize xterm escape sequences and ignore those. But I'd be happier about this if I could reproduce the behavior. I don't like the feeling that there's something going on here that I don't understand. BTW, it seems somewhat likely that this is less about libreadline than about its dependency libtinfo. On my machine that's from ii libtinfo6:amd64 6.1+20181013-2+deb10u2 amd64 shared low-level terminfo libraryfor terminal handling what about yours? regards, tom lane [1] https://www.xfree86.org/current/ctlseqs.html
On Fri, Jan 3, 2020 at 9:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, it seems somewhat likely that this is less about libreadline > than about its dependency libtinfo. On my machine that's from > > ii libtinfo6:amd64 6.1+20181013-2+deb10u2 amd64 shared low-level terminfo libraryfor terminal handling This seems promising. By following the same ldd + dpkg -S workflow as before, I can see that my libtinfo is "libtinfo5:amd64". This libtinfo appears to be a Ubuntu-specific package: $ dpkg -l libtinfo5:amd64 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-============================-===================-===================-============================================================== ii libtinfo5:amd64 6.1-1ubuntu1.18.04 amd64 shared low-level terminfo library for terminal handling -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Jan 3, 2020 at 9:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, it seems somewhat likely that this is less about libreadline >> than about its dependency libtinfo. On my machine that's from >> ii libtinfo6:amd64 6.1+20181013-2+deb10u2 amd64 shared low-level terminfolibrary for terminal handling > This seems promising. By following the same ldd + dpkg -S workflow as > before, I can see that my libtinfo is "libtinfo5:amd64". Hmm. Usually this sort of software gets more weird in newer versions, not less so ;-). Still, it's a starting point. Attached is a blind attempt to fix this by allowing escape sequence(s) instead of spaces between the words. Does this work for you? regards, tom lane diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 96221f8..7f1797c 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -38,6 +38,12 @@ $node->safe_psql('postgres', my $historyfile = "${TestLib::log_path}/010_psql_history.txt"; $ENV{PSQL_HISTORY} = $historyfile; +# Ensure that readline/libedit puts out xterm escapes, not something else. +$ENV{TERM} = 'xterm'; + +# regexp to match one xterm escape sequence (CSI style only, for now) +my $escseq = "(\e\\[[0-9;]*[A-Za-z])"; + # fire up an interactive psql session my $in = ''; my $out = ''; @@ -101,8 +107,12 @@ check_completion( "select \\* from my\a?tab", "complete my<tab> to mytab when there are multiple choices"); -# some versions of readline/libedit require two tabs here, some only need one -check_completion("\t\t", "mytab123 +mytab246", +# some versions of readline/libedit require two tabs here, some only need one. +# also, some might issue escape sequences to reposition the cursor, instead +# of just printing some spaces. +check_completion( + "\t\t", + "mytab$escseq*123( +|$escseq+)mytab$escseq*246", "offer multiple table choices"); check_completion("2\t", "246 ",
On Fri, Jan 3, 2020 at 10:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Attached is a blind attempt to fix this by allowing escape > sequence(s) instead of spaces between the words. Does this > work for you? I'm afraid not; no apparent change. No change in the "Actual output was" line, either. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Jan 3, 2020 at 10:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Attached is a blind attempt to fix this by allowing escape >> sequence(s) instead of spaces between the words. Does this >> work for you? > I'm afraid not; no apparent change. No change in the "Actual output > was" line, either. Meh. I must be too tired to get the regexp syntax right. Will try tomorrow. regards, tom lane
On Fri, Jan 3, 2020 at 10:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm. Usually this sort of software gets more weird in newer > versions, not less so ;-). Still, it's a starting point. In case I was unclear: I meant to suggest that this may have something to do with Ubuntu having patched the Debian package for who-knows-what reason. This is indicated by the fact that the Version string is "6.1-1ubuntu1.18.04", as opposed to a Debian style Version without the "ubuntu" (I believe that that's the convention they follow). -- Peter Geoghegan
I wrote: > Meh. I must be too tired to get the regexp syntax right. Looking closer, I see that your actual output included *both* spaces and escape sequences between the table names, so it needs to be more like the attached. Also, I apparently misread the control sequences. What they look like in the light of morning is \e[0m Character Attributes = Normal (no bold, color, etc) \e[K Erase in Line to Right So now I'm thinking again that there must be something about your colorized setup that triggers use of at least the first one. But why didn't clearing the relevant environment variables change anything? regards, tom lane diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 96221f8..ed9e9e1 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -38,6 +38,12 @@ $node->safe_psql('postgres', my $historyfile = "${TestLib::log_path}/010_psql_history.txt"; $ENV{PSQL_HISTORY} = $historyfile; +# Ensure that readline/libedit puts out xterm escapes, not something else. +$ENV{TERM} = 'xterm'; + +# regexp to match one xterm escape sequence (CSI style only, for now) +my $escseq = "(\e\\[[0-9;]*[A-Za-z])"; + # fire up an interactive psql session my $in = ''; my $out = ''; @@ -101,8 +107,12 @@ check_completion( "select \\* from my\a?tab", "complete my<tab> to mytab when there are multiple choices"); -# some versions of readline/libedit require two tabs here, some only need one -check_completion("\t\t", "mytab123 +mytab246", +# some versions of readline/libedit require two tabs here, some only need one. +# also, some might issue escape sequences to reposition the cursor, clear the +# line, etc, instead of just printing some spaces. +check_completion( + "\t\t", + "mytab$escseq*123( |$escseq)+mytab$escseq*246", "offer multiple table choices"); check_completion("2\t", "246 ",
On Sat, Jan 4, 2020 at 10:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Looking closer, I see that your actual output included *both* > spaces and escape sequences between the table names, so it > needs to be more like the attached. This patch makes the tests pass. Thanks! > So now I'm thinking again that there must be something about > your colorized setup that triggers use of at least the first one. > But why didn't clearing the relevant environment variables > change anything? I don't know. It also failed with bash, which doesn't have any of that stuff. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Sat, Jan 4, 2020 at 10:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Looking closer, I see that your actual output included *both* >> spaces and escape sequences between the table names, so it >> needs to be more like the attached. > This patch makes the tests pass. Thanks! Ah, good. Pushed. >> So now I'm thinking again that there must be something about >> your colorized setup that triggers use of at least the first one. >> But why didn't clearing the relevant environment variables >> change anything? > I don't know. It also failed with bash, which doesn't have any of that stuff. I got another data point just now. I was experimenting with a proposed fix for the current libedit problem [1], so I needed to build libedit from source, and the path of least resistance was to do so on my RHEL6 workstation. I find that that conglomeration fails the current 010_tab_completion.pl tests, even with this patch, with symptoms like not ok 2 - complete SEL<tab> to SELECT # Failed test 'complete SEL<tab> to SELECT' # at t/010_tab_completion.pl line 91. # Actual output was "postgres=# SEL\a\r\e[15GECT " and similarly in a couple other tests. I know where the bell (\a) is coming from: there's a different logic bug in libedit that causes it to do el_beep() even when there's a unique completion. But why the unnecessary cursor repositioning? Looking closer, I realize that libedit on this software stack is depending on libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fa39a5e2000) which is from $ rpm -qf /lib64/libtinfo.so.5 ncurses-libs-5.7-4.20090207.el6.x86_64 Seeing that you're also having issues with a stack involving libtinfo.so.5, here's my theory: libtinfo version 5 is brain-dead about whether it needs to issue cursor repositioning commands, and tends to do so even when the cursor is in the right place already. Version 6 fixed that, which is why we're not seeing these escape sequences on any of the libedit-using buildfarm critters. So we're going to have to make some decisions about what range of libedit builds we want to cater for in the tab-completion tests. Given that the plan is to considerably increase the coverage of those tests, I'm hesitant to promise that we'll keep passing with anything that isn't explicitly tested in the buildfarm, or by active developers such as yourself. Maybe that's fine given that end users probably don't run the TAP tests. It would likely help if we could constrain the behavior to avoid variations such as colorization, which is why I'm concerned that we don't seem to have figured out how to turn that off in your setup. I think that bears more investigation, even though we've managed to work around it for the moment. regards, tom lane [1] http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54510
I wrote: > Seeing that you're also having issues with a stack involving > libtinfo.so.5, here's my theory: libtinfo version 5 is brain-dead > about whether it needs to issue cursor repositioning commands, and > tends to do so even when the cursor is in the right place already. > Version 6 fixed that, which is why we're not seeing these escape > sequences on any of the libedit-using buildfarm critters. Nope, the buildfarm just blew up that theory: Andres' critters are failing in the wake of fac1c04fe, with symptoms exactly like those of my franken-libedit build. So newer libtinfo doesn't fix it. What has to have broken those machines was the change to explicitly force TERM to "xterm". Now I'm wondering what their prevailing setting was before that. Maybe it was undef, or some absolutely vanilla thing that prevents libtinfo from thinking it can use any escape sequences at all. I'm going to go find out, because if we can use that behavior globally, it'd be a heck of a lot safer solution than the path of dealing with escape sequences explicitly. regards, tom lane
On Sun, Jan 5, 2020 at 6:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > Seeing that you're also having issues with a stack involving > > libtinfo.so.5, here's my theory: libtinfo version 5 is brain-dead > > about whether it needs to issue cursor repositioning commands, and > > tends to do so even when the cursor is in the right place already. > > Version 6 fixed that, which is why we're not seeing these escape > > sequences on any of the libedit-using buildfarm critters. > > Nope, the buildfarm just blew up that theory: Andres' critters are > failing in the wake of fac1c04fe, with symptoms exactly like those > of my franken-libedit build. So newer libtinfo doesn't fix it. > > What has to have broken those machines was the change to explicitly > force TERM to "xterm". Now I'm wondering what their prevailing > setting was before that. Maybe it was undef, or some absolutely > vanilla thing that prevents libtinfo from thinking it can use any > escape sequences at all. I'm going to go find out, because if we > can use that behavior globally, it'd be a heck of a lot safer > solution than the path of dealing with escape sequences explicitly. > You can see what settings it started with, although only certain values are whitelisted. See orig_env in the config. e.g. crake, which is now failing, has no TERM setting. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 4, 2020 at 11:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So we're going to have to make some decisions about what range of > libedit builds we want to cater for in the tab-completion tests. > Given that the plan is to considerably increase the coverage of > those tests, I'm hesitant to promise that we'll keep passing with > anything that isn't explicitly tested in the buildfarm, or by active > developers such as yourself. Maybe that's fine given that end users > probably don't run the TAP tests. This sounds similar to "EXTRA_TESTS=collate.linux.utf8". I think that it's fine to go that way, provided it isn't hard to work around. FWIW, I find it very surprising that it was possible for the test to fail on my workstation/server, without it failing on any buildfarm animals. -- Peter Geoghegan
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On Sun, Jan 5, 2020 at 6:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What has to have broken those machines was the change to explicitly >> force TERM to "xterm". Now I'm wondering what their prevailing >> setting was before that. Maybe it was undef, or some absolutely >> vanilla thing that prevents libtinfo from thinking it can use any >> escape sequences at all. I'm going to go find out, because if we >> can use that behavior globally, it'd be a heck of a lot safer >> solution than the path of dealing with escape sequences explicitly. > You can see what settings it started with, although only certain > values are whitelisted. See orig_env in the config. e.g. crake, which > is now failing, has no TERM setting. Ah, right. The one-off patch I just pushed also confirms that the TAP tests are seeing TERM-not-set on Andres' machines. I'm currently investigating the possibility of just unsetting it in the tab-completion test and then being able to revert all the escape-sequence hocus-pocus. It's looking promising in local testing, although Peter said off-list that it didn't seem to work in his setup. regards, tom lane
Peter Geoghegan <pg@bowt.ie> writes: > FWIW, I find it very surprising that it was possible for the test to > fail on my workstation/server, without it failing on any buildfarm > animals. Yeah, there is still something unexplained about that. We've so far failed to pin the blame on either readline version or environment settings ... but what else could be causing you to get different results? For the record, I'm currently running around and trying the attached (on top of latest HEAD, 60ab7c80b) on the various configurations I have here. Could you confirm that it works, or doesn't, in your environment --- and if it doesn't, what's the output? regards, tom lane diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 1dc87b5..ebf603f 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -38,14 +38,12 @@ $node->safe_psql('postgres', my $historyfile = "${TestLib::log_path}/010_psql_history.txt"; $ENV{PSQL_HISTORY} = $historyfile; -# Debug investigation -note "TERM is set to '" . ($ENV{TERM} || "<undef>") . "'"; - -# Ensure that readline/libedit puts out xterm escapes, not something else. -$ENV{TERM} = 'xterm'; - -# regexp to match one xterm escape sequence (CSI style only, for now) -my $escseq = "(\e\\[[0-9;]*[A-Za-z])"; +# Unset $TERM so that readline/libedit won't use any terminal-dependent +# escape sequences; that leads to way too many cross-version variations +# in the output. +delete $ENV{TERM}; +# Some versions of readline inspect LS_COLORS, so for luck unset that too. +delete $ENV{LS_COLORS}; # fire up an interactive psql session my $in = ''; @@ -110,12 +108,8 @@ check_completion( "select \\* from my\a?tab", "complete my<tab> to mytab when there are multiple choices"); -# some versions of readline/libedit require two tabs here, some only need one. -# also, some might issue escape sequences to reposition the cursor, clear the -# line, etc, instead of just printing some spaces. -check_completion( - "\t\t", - "mytab$escseq*123( |$escseq)+mytab$escseq*246", +# some versions of readline/libedit require two tabs here, some only need one +check_completion("\t\t", "mytab123 +mytab246", "offer multiple table choices"); check_completion("2\t", "246 ",
On Sat, Jan 4, 2020 at 12:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, there is still something unexplained about that. We've so far > failed to pin the blame on either readline version or environment > settings ... but what else could be causing you to get different > results? Beats me. Could have something to do with the fact that my libtinfo5 is in fact a patched version -- "6.1-1ubuntu1.18.04". (Whatever that means.) > For the record, I'm currently running around and trying the attached > (on top of latest HEAD, 60ab7c80b) on the various configurations > I have here. Could you confirm that it works, or doesn't, in your > environment --- and if it doesn't, what's the output? This patch shows the same old failure as before: cd /code/postgresql/patch/build/../source/src/bin/psql && TESTDIR='/code/postgresql/patch/build/src/bin/psql' PATH="/code/postgresql/patch/build/tmp_install/code/postgresql/patch/install/bin:$PATH" LD_LIBRARY_PATH="/code/postgresql/patch/build/tmp_install/code/postgresql/patch/install/lib" PGPORT='65432' PG_REGRESS='/code/postgresql/patch/build/src/bin/psql/../../../src/test/regress/pg_regress' REGRESS_SHLIB='/code/postgresql/patch/build/src/test/regress/regress.so' /usr/bin/prove -I /code/postgresql/patch/build/../source/src/test/perl/ -I /code/postgresql/patch/build/../source/src/bin/psql t/*.pl t/010_tab_completion.pl .. 8/? # Failed test 'offer multiple table choices' # at t/010_tab_completion.pl line 112. # Actual output was "\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from mytab\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from mytab" # # Looks like you failed 1 test of 12. t/010_tab_completion.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/12 subtests Test Summary Report ------------------- t/010_tab_completion.pl (Wstat: 256 Tests: 12 Failed: 1) Failed test: 8 Non-zero exit status: 1 Files=1, Tests=12, 7 wallclock secs ( 0.01 usr 0.00 sys + 0.37 cusr 0.09 csys = 0.47 CPU) Result: FAIL Makefile:87: recipe for target 'check' failed make: *** [check] Error 1 -- Peter Geoghegan
On Sat, Jan 4, 2020 at 12:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, there is still something unexplained about that. We've so far > failed to pin the blame on either readline version or environment > settings ... but what else could be causing you to get different > results? Tom just advised me via private e-mail that my termcap database may be the issue here. Looks like he was right. I did a "sudo aptitude reinstall ncurses-base", and now the tests pass against HEAD. (I probably should have tried to preserve things before going ahead with that, but that didn't happen -- I had no reason to think that the system was affected by any kind of corruption.) I am very sorry for having wasted your time on this wild goose chase, Tom. The only explanation I can think of is that maybe it relates to my upgrading the OS in-place. That is, maybe my system was affected by a subtle, low probability bug due to a major version change in ncurses-base. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > Tom just advised me via private e-mail that my termcap database may be > the issue here. Looks like he was right. I did a "sudo aptitude > reinstall ncurses-base", and now the tests pass against HEAD. (I > probably should have tried to preserve things before going ahead with > that, but that didn't happen -- I had no reason to think that the > system was affected by any kind of corruption.) Hm, well, HEAD still has the hackery with explicit accounting for escape sequences. Could you try it with the patch I showed to unset TERM and remove that stuff? (It won't apply exactly to HEAD, but the diffs are simple enough, or you could revert to 60ab7c80b first.) > I am very sorry for having wasted your time on this wild goose chase, Don't worry about it --- even if it was some upgrade glitch, there was no way to know that in advance. regards, tom lane
On Sat, Jan 4, 2020 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm, well, HEAD still has the hackery with explicit accounting for > escape sequences. Could you try it with the patch I showed to unset > TERM and remove that stuff? (It won't apply exactly to HEAD, but > the diffs are simple enough, or you could revert to 60ab7c80b first.) With the attached patch against HEAD (which is based on your earlier unset-TERM-in-tab-completion-test.patch), I find that the tests fail as follows: cd /code/postgresql/patch/build/../source/src/bin/psql && TESTDIR='/code/postgresql/patch/build/src/bin/psql' PATH="/code/postgresql/patch/build/tmp_install/code/postgresql/patch/install/bin:$PATH" LD_LIBRARY_PATH="/code/postgresql/patch/build/tmp_install/code/postgresql/patch/install/lib" PGPORT='65432' PG_REGRESS='/code/postgresql/patch/build/src/bin/psql/../../../src/test/regress/pg_regress' REGRESS_SHLIB='/code/postgresql/patch/build/src/test/regress/regress.so' /usr/bin/prove -I /code/postgresql/patch/build/../source/src/test/perl/ -I /code/postgresql/patch/build/../source/src/bin/psql t/*.pl t/010_tab_completion.pl .. 8/? # Failed test 'offer multiple table choices' # at t/010_tab_completion.pl line 112. # Actual output was "\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from mytab\r\n\e[01;35mmytab\e[0m\e[K123\e[0m\e[K \e[01;35mmytab\e[0m\e[K246\e[0m\e[K \r\npostgres=# select * from mytab" # # Looks like you failed 1 test of 12. t/010_tab_completion.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/12 subtests Test Summary Report ------------------- t/010_tab_completion.pl (Wstat: 256 Tests: 12 Failed: 1) Failed test: 8 Non-zero exit status: 1 Files=1, Tests=12, 7 wallclock secs ( 0.02 usr 0.00 sys + 0.36 cusr 0.12 csys = 0.50 CPU) Result: FAIL Makefile:87: recipe for target 'check' failed make: *** [check] Error 1 -- Peter Geoghegan
Attachment
Peter Geoghegan <pg@bowt.ie> writes: > With the attached patch against HEAD (which is based on your earlier > unset-TERM-in-tab-completion-test.patch), I find that the tests fail > as follows: Um, well, that's the same behavior you were seeing before. So the terminfo reinstall didn't really do anything. I'm still curious about which terminfo file your psql actually reads if TERM is unset, and whether that file is visibly different from the xterm-related files. regards, tom lane
On Sat, Jan 4, 2020 at 4:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Um, well, that's the same behavior you were seeing before. > So the terminfo reinstall didn't really do anything. Sigh. > I'm still curious about which terminfo file your psql actually > reads if TERM is unset, and whether that file is visibly > different from the xterm-related files. I've found the actual problem -- it's my ~/.inputrc. Which is read in by libreadline at some point (determined this using ltrace). Once I comment out the following two lines from ~/.inputrc, everything works fine on HEAD + HEAD-unset-TERM-in-tab-completion-test.patch: set colored-completion-prefix on set colored-stats on -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > I've found the actual problem -- it's my ~/.inputrc. Which is read in by > libreadline at some point (determined this using ltrace). Ah-hah! So what we really want here, I guess, is for the test script to suppress reading of ~/.inputrc, on the same principle that it suppresses reading of ~/.psqlrc. A quick look at the readline docs suggests that the best way to do that would be to set envar INPUTRC to /dev/null --- could you confirm that that works for you? > Once I comment out the following two lines from ~/.inputrc, everything > works fine on > HEAD + HEAD-unset-TERM-in-tab-completion-test.patch: > set colored-completion-prefix on > set colored-stats on Hm. I wonder how it is that that leads to ignoring the TERM environment? Still, it's just an academic point if we can suppress reading the file. regards, tom lane
I wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> Once I comment out the following two lines from ~/.inputrc, everything >> works fine on >> HEAD + HEAD-unset-TERM-in-tab-completion-test.patch: >> set colored-completion-prefix on >> set colored-stats on > Hm. I wonder how it is that that leads to ignoring the TERM environment? A bit of digging says that readline's color support is just hard-wired to use xterm-style escapes -- it doesn't look like there's any connection to terminfo at all. See the _rl_color_indicator[] data structure. The \e[...m and \e[K escape sequences can both be found there. regards, tom lane
On Sat, Jan 4, 2020 at 5:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So what we really want here, I guess, is for the test script to suppress > reading of ~/.inputrc, on the same principle that it suppresses reading > of ~/.psqlrc. That was what I thought of, too. > A quick look at the readline docs suggests that the > best way to do that would be to set envar INPUTRC to /dev/null --- could > you confirm that that works for you? Yes -- "export INPUTRC=/dev/null" also makes it work for me (on HEAD + HEAD-unset-TERM-in-tab-completion-test.patch, but with my original/problematic ~/.inputrc). -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Sat, Jan 4, 2020 at 5:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A quick look at the readline docs suggests that the >> best way to do that would be to set envar INPUTRC to /dev/null --- could >> you confirm that that works for you? > Yes -- "export INPUTRC=/dev/null" also makes it work for me (on HEAD + > HEAD-unset-TERM-in-tab-completion-test.patch, but with my > original/problematic ~/.inputrc). Cool, I'll go commit a fix along those lines. Thanks for tracing this down! regards, tom lane
On Sat, Jan 4, 2020 at 6:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Cool, I'll go commit a fix along those lines. Thanks for tracing > this down! Glad to help! -- Peter Geoghegan
Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Cool, I'll go commit a fix along those lines. Thanks for tracing > this down! Here's one final style cleanup for the TAP test. - use like() for the banner test - pass the regexes around as qr// objects, so they can be syntax-highlighted properly, and don't need regex metacharacter-escaping backslashes doubled. - include the regex that didn't match in the diagnostic - ilmari -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law
Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker) writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> Cool, I'll go commit a fix along those lines. Thanks for tracing >> this down! > > Here's one final style cleanup for the TAP test. > > - use like() for the banner test > - pass the regexes around as qr// objects, so they can be > syntax-highlighted properly, and don't need regex > metacharacter-escaping backslashes doubled. > - include the regex that didn't match in the diagnostic This time with the actual attachment... - ilmari -- - Twitter seems more influential [than blogs] in the 'gets reported in the mainstream press' sense at least. - Matt McLeod - That'd be because the content of a tweet is easier to condense down to a mainstream media article. - Calle Dybedahl From 0307bdea0f95e47e9ed7cf9678c12d568006d772 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Sun, 5 Jan 2020 13:20:10 +0000 Subject: [PATCH] Use qr// for passed-in regexes in tab-completion TAP test This lets editors syntax-highlight them as regexes, not just plain strings, and avoids having to double backslashes when escaping regex metacharacters like *. Also include the pattern that didn't match in the failure diagnostic, and use like() for the startup banner test. --- src/bin/psql/t/010_tab_completion.pl | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 9cfd7ec79c..bff954de24 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -58,7 +58,7 @@ my $timer = timer(5); my $h = $node->interactive_psql('postgres', \$in, \$out, $timer); -ok($out =~ /psql/, "print startup banner"); +like($out, qr/psql/, "print startup banner"); # Simple test case: type something and see if psql responds as expected sub check_completion @@ -75,13 +75,14 @@ sub check_completion # send the data to be sent $in .= $send; # wait ... - pump $h until ($out =~ m/$pattern/ || $timer->is_expired); - my $okay = ($out =~ m/$pattern/ && !$timer->is_expired); + pump $h until ($out =~ $pattern || $timer->is_expired); + my $okay = ($out =~ $pattern && !$timer->is_expired); ok($okay, $annotation); # for debugging, log actual output if it didn't match local $Data::Dumper::Terse = 1; local $Data::Dumper::Useqq = 1; - diag 'Actual output was ' . Dumper($out) . "\n" if !$okay; + diag 'Actual output was ' . Dumper($out) . + "Did not match $pattern\n" if !$okay; return; } @@ -89,20 +90,20 @@ sub check_completion # (won't work if we are inside a string literal!) sub clear_query { - check_completion("\\r\n", "postgres=# ", "\\r works"); + check_completion("\\r\n", qr/postgres=# /, "\\r works"); return; } # check basic command completion: SEL<tab> produces SELECT<space> -check_completion("SEL\t", "SELECT ", "complete SEL<tab> to SELECT"); +check_completion("SEL\t", qr/SELECT /, "complete SEL<tab> to SELECT"); clear_query(); # check case variation is honored -check_completion("sel\t", "select ", "complete sel<tab> to select"); +check_completion("sel\t", qr/select /, "complete sel<tab> to select"); # check basic table name completion -check_completion("* from t\t", "\\* from tab1 ", "complete t<tab> to tab1"); +check_completion("* from t\t", qr/\* from tab1 /, "complete t<tab> to tab1"); clear_query(); @@ -110,14 +111,14 @@ clear_query(); # note: readline might print a bell before the completion check_completion( "select * from my\t", - "select \\* from my\a?tab", + qr/select \* from my\a?tab/, "complete my<tab> to mytab when there are multiple choices"); # some versions of readline/libedit require two tabs here, some only need one -check_completion("\t\t", "mytab123 +mytab246", +check_completion("\t\t", qr/mytab123 +mytab246/, "offer multiple table choices"); -check_completion("2\t", "246 ", +check_completion("2\t", qr/246 /, "finish completion of one of multiple table choices"); clear_query(); @@ -125,7 +126,7 @@ clear_query(); # check case-sensitive keyword replacement # note: various versions of readline/libedit handle backspacing # differently, so just check that the replacement comes out correctly -check_completion("\\DRD\t", "drds ", "complete \\DRD<tab> to \\drds"); +check_completion("\\DRD\t", qr/\\drds /, "complete \\DRD<tab> to \\drds"); clear_query(); -- 2.22.0
ilmari@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: > Here's one final style cleanup for the TAP test. LGTM, pushed. One minor note: you wanted to change the \DRD test to +check_completion("\\DRD\t", qr/\\drds /, "complete \\DRD<tab> to \\drds"); but that doesn't work everywhere, unfortunately. On my machine what comes out is # Actual output was "\\DRD\b\b\bdrds " # Did not match "(?-xism:\\drds )" regards, tom lane
On 5 January 2020 16:38:36 GMT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >ilmari@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: >> Here's one final style cleanup for the TAP test. > >LGTM, pushed. Thanks! >One minor note: you wanted to change the \DRD test to > >+check_completion("\\DRD\t", qr/\\drds /, "complete \\DRD<tab> to >\\drds"); > >but that doesn't work everywhere, unfortunately. On my machine >what comes out is > ># Actual output was "\\DRD\b\b\bdrds " ># Did not match "(?-xism:\\drds )" Sorry, that was something I left in for testing the diagnostic and forgot to remove before committing. > regards, tom lane - ilmari
I wrote: > Indeed. It appears that recent libedit breaks tab-completion for > words involving a backslash, which is the fault of this upstream > commit: > http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/filecomplete.c.diff?r1=1.52&r2=1.53 > Basically what that's doing is applying de-backslashing to EVERY > word that completion is attempted on, whether it might be a filename > or not. So what psql_complete sees in this test case is just "DRD" > which of course it does not recognize as a possible psql backslash > command. The current state of play on this is that I committed a hacky workaround [1], but there is now a fix for it in libedit upstream [2][3]. I gather from looking at Debian's package page that the fix could be expected to propagate to Debian unstable within a few weeks, at which point I'd like to revert the hack. The libedit bug's only been there a few months (it was evidently introduced on 2019-03-31) so we can hope that it hasn't propagated into any long-term-support distros. > I found out while investigating this that the libedit version shipping > with buster (3.1-20181209) is differently broken for the same case: > instead of inapproriate forced de-escaping of the input of the > application-specific completion function, it applies inapproriate > forced escaping to the output of said function, so that when we see > "\DRD" and return "\drds", what comes out to the user is "\\drds". There's little we can do about that one, but it doesn't affect the regression test as currently constituted. Another libedit bug that the regression test *is* running into is that some ancient versions fail to emit a trailing space after a successful completion. snapper is hitting that [4], and I believe locust would be if it were running the TAP tests. While we could work around that by removing the trailing spaces from the match regexps, I really don't wish to do so, because that destroys the test's ability to distinguish correct outputs from incorrect-but-longer ones. (That's not a killer problem for any of the current test cases, perhaps, but I think it will be in future.) So I'd like to define this problem as being out of scope. This bug was fixed eleven years ago upstream (see change in _rl_completion_append_character_function in [5]), so it seems reasonable to insist that people get a newer libedit or not run this test. Another issue visible in [4] is that ancient libedit fails to sort the offered completion strings as one would expect. I don't see much point in working around that either. prairiedog's host has that bug but not the space bug (see [6], from before I suppressed running the test on that machine), so it affects a larger range of libedit versions, but they're probably all way too old for anyone to care. If anyone can point to a new-enough-to-matter libedit that still behaves that way, we can reconsider. regards, tom lane [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ddd87d564508bb1c80aac0a4439cfe74a3c203a9 [2] http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54510 [3] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/filecomplete.c.diff?r1=1.63&r2=1.64 [4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2020-01-05%2013%3A01%3A46 [5] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/readline.c.diff?r1=1.75&r2=1.76 [6] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2020-01-02%2022%3A52%3A36
Re: Tom Lane 2020-01-05 <25771.1578249042@sss.pgh.pa.us> > The current state of play on this is that I committed a hacky workaround > [1], but there is now a fix for it in libedit upstream [2][3]. I gather > from looking at Debian's package page that the fix could be expected to > propagate to Debian unstable within a few weeks, at which point I'd like > to revert the hack. The libedit bug's only been there a few months > (it was evidently introduced on 2019-03-31) so we can hope that it hasn't > propagated into any long-term-support distros. [...] I lost track of what bug is supposed to be where, so here's a summary of the state at apt.postgresql.org: PG13 head work on Debian unstable, buster, stretch. Does not work on Ubuntu bionic, xenial. (Others not tested.) Ubuntu xenial: 07:24:42 # Failed test 'complete SEL<tab> to SELECT' 07:24:42 # at t/010_tab_completion.pl line 98. 07:24:42 # Actual output was "SEL\tpostgres=# SEL\a" 07:24:42 # Did not match "(?^:SELECT )" 07:24:48 07:24:48 # Failed test 'complete sel<tab> to select' 07:24:48 # at t/010_tab_completion.pl line 103. 07:24:48 # Actual output was "sel\b\b\bSELECT " 07:24:48 # Did not match "(?^:select )" 07:24:54 07:24:54 # Failed test 'complete t<tab> to tab1' 07:24:54 # at t/010_tab_completion.pl line 106. 07:24:54 # Actual output was "* from t " 07:24:54 # Did not match "(?^:\* from tab1 )" 07:25:00 07:25:00 # Failed test 'complete my<tab> to mytab when there are multiple choices' 07:25:00 # at t/010_tab_completion.pl line 112. 07:25:00 # Actual output was "select * from my " 07:25:00 # Did not match "(?^:select \* from my\a?tab)" 07:25:06 07:25:06 # Failed test 'offer multiple table choices' 07:25:06 # at t/010_tab_completion.pl line 118. 07:25:06 # Actual output was "\r\n\r\n\r\r\npostgres=# select * from my \r\n\r\n\r\r\npostgres=# select * from my " 07:25:06 # Did not match "(?^:mytab123 +mytab246)" 07:25:12 07:25:12 # Failed test 'finish completion of one of multiple table choices' 07:25:12 # at t/010_tab_completion.pl line 123. 07:25:12 # Actual output was "2 " 07:25:12 # Did not match "(?^:246 )" 07:25:18 07:25:18 # Failed test 'complete \DRD<tab> to \drds' 07:25:18 # at t/010_tab_completion.pl line 131. 07:25:18 # Actual output was "\\DRD\b\b\b\bselect " 07:25:18 # Did not match "(?^:drds )" 07:25:18 # Looks like you failed 7 tests of 12. 07:25:18 t/010_tab_completion.pl .. 07:25:18 Dubious, test returned 7 (wstat 1792, 0x700) 07:25:18 Failed 7/12 subtests Ubuntu bionic fails elsewhere: 07:19:51 t/001_stream_rep.pl .................. ok 07:19:53 t/002_archiving.pl ................... ok 07:19:59 t/003_recovery_targets.pl ............ ok 07:20:01 t/004_timeline_switch.pl ............. ok 07:20:08 t/005_replay_delay.pl ................ ok 07:20:10 Bailout called. Further testing stopped: system pg_ctl failed 07:20:10 FAILED--Further testing stopped: system pg_ctl failed 07:20:10 2020-01-06 06:19:41.285 UTC [26415] LOG: received fast shutdown request 07:20:10 2020-01-06 06:19:41.285 UTC [26415] LOG: aborting any active transactions 07:20:10 2020-01-06 06:19:41.287 UTC [26415] LOG: background worker "logical replication launcher" (PID 26424) exited withexit code 1 07:20:10 2020-01-06 06:19:41.287 UTC [26419] LOG: shutting down 07:20:10 2020-01-06 06:19:41.287 UTC [26419] LOG: checkpoint starting: shutdown immediate (It didn't get to the 010_tab_completion.pl test.) Libedit versions are: Debian: libedit2 | 3.1-20140620-2 | oldoldstable | amd64, armel, armhf, i386 (jessie) libedit2 | 3.1-20160903-3 | oldstable | amd64, arm64, armel, armhf, i386, mips, mips64el, m (stretch) libedit2 | 3.1-20181209-1 | stable | amd64, arm64, armel, armhf, i386, mips, mips64el, m (buster) libedit2 | 3.1-20191211-1 | testing | amd64, arm64, armel, armhf, i386, mips64el, mipsel, (bullseye) libedit2 | 3.1-20191231-1 | unstable | amd64, arm64, armel, armhf, i386, mips64el, mipsel, Ubuntu: libedit2 | 2.11-20080614-3ubuntu2 | precise | amd64, armel, armhf, i386, powerpc libedit2 | 3.1-20130712-2 | trusty | amd64, arm64, armhf, i386, powerpc, ppc64e libedit2 | 3.1-20150325-1ubuntu2 | xenial | amd64, arm64, armhf, i386, powerpc, ppc64e libedit2 | 3.1-20170329-1 | bionic | amd64, arm64, armhf, i386, ppc64el, s390x libedit2 | 3.1-20181209-1 | disco | amd64, arm64, armhf, i386, ppc64el, s390x libedit2 | 3.1-20190324-1 | eoan | amd64, arm64, armhf, i386, ppc64el, s390x libedit2 | 3.1-20191211-1 | focal | amd64, arm64, armhf, i386, ppc64el, s390x libedit2 | 3.1-20191231-1 | focal-proposed | amd64, arm64, armhf, i386, ppc64el, s390x Christoph
Christoph Berg <myon@debian.org> writes: > I lost track of what bug is supposed to be where, so here's a summary > of the state at apt.postgresql.org: > PG13 head work on Debian unstable, buster, stretch. Cool. > Does not work on Ubuntu bionic, xenial. (Others not tested.) Hmm ... do we care? The test output seems to show that xenial's 3.1-20150325-1ubuntu2 libedit is completely broken. Maybe there's a way to work around that, but it's not clear to me that that'd be a useful expenditure of time. You're not really going to be building PG13 for that release are you? > Ubuntu bionic fails elsewhere [ apparently in 006_logical_decoding.pl ] Hmm. Not related to this thread then. But if that's reproducible, somebody should have a look. Maybe related to https://www.postgresql.org/message-id/CAA4eK1LMDx6vK8Kdw8WUeW1MjToN2xVffL2kvtHvZg17%3DY6QQg%40mail.gmail.com ??? (cc'ing Amit for that) Meanwhile, as to the point I was really concerned about, your table of current versions looks promising -- libedit's premature-dequote bug is evidently only in unstable and the last stable branch, and I presume the last-stables are still getting updated, since those have libedit versions that are less than a month old. I looked at Fedora's git repo and the situation seems similar over there. regards, tom lane
Re: Tom Lane 2020-01-06 <3764.1578323719@sss.pgh.pa.us> > > Does not work on Ubuntu bionic, xenial. (Others not tested.) > > Hmm ... do we care? The test output seems to show that xenial's > 3.1-20150325-1ubuntu2 libedit is completely broken. Maybe there's > a way to work around that, but it's not clear to me that that'd > be a useful expenditure of time. You're not really going to be > building PG13 for that release are you? xenial (16.04) is a LTS release with support until 2021-04, and the current plan was to support it. I now realize that's semi-close to the 13 release date, but so far we have tried to really support all PG-Distro combinations. I could probably arrange for that test to be disabled when building for xenial, but it'd be nice if there were a configure switch or environment variable for it so we don't have to invent it. > > Ubuntu bionic fails elsewhere [ apparently in 006_logical_decoding.pl ] > > Hmm. Not related to this thread then. But if that's reproducible, It has been failing with the same output since Jan 2, 2020, noon. > somebody should have a look. Maybe related to > > https://www.postgresql.org/message-id/CAA4eK1LMDx6vK8Kdw8WUeW1MjToN2xVffL2kvtHvZg17%3DY6QQg%40mail.gmail.com The only git change in that build was d207038053837 "Fix running out of file descriptors for spill files." Christoph
Christoph Berg <myon@debian.org> writes: > Re: Tom Lane 2020-01-06 <3764.1578323719@sss.pgh.pa.us> >>> Does not work on Ubuntu bionic, xenial. (Others not tested.) >> Hmm ... do we care? The test output seems to show that xenial's >> 3.1-20150325-1ubuntu2 libedit is completely broken. Maybe there's >> a way to work around that, but it's not clear to me that that'd >> be a useful expenditure of time. You're not really going to be >> building PG13 for that release are you? > xenial (16.04) is a LTS release with support until 2021-04, and the > current plan was to support it. I now realize that's semi-close to the > 13 release date, but so far we have tried to really support all > PG-Distro combinations. I installed libedit_3.1-20150325.orig.tar.gz from source here, and it passes our current regression test and seems to behave just fine in light manual testing. (I did not apply any of the Debian-specific patches at [1], but they don't look like they'd explain much.) So I'm a bit at a loss as to what's going wrong for you. Is the test environment for Xenial the same as for the other branches? regards, tom lane [1] https://launchpad.net/ubuntu/+source/libedit/3.1-20150325-1ubuntu2
On Tue, Jan 7, 2020 at 12:43 AM Christoph Berg <myon@debian.org> wrote: > > Re: Tom Lane 2020-01-06 <3764.1578323719@sss.pgh.pa.us> > > > Does not work on Ubuntu bionic, xenial. (Others not tested.) > > > > Hmm ... do we care? The test output seems to show that xenial's > > 3.1-20150325-1ubuntu2 libedit is completely broken. Maybe there's > > a way to work around that, but it's not clear to me that that'd > > be a useful expenditure of time. You're not really going to be > > building PG13 for that release are you? > > xenial (16.04) is a LTS release with support until 2021-04, and the > current plan was to support it. I now realize that's semi-close to the > 13 release date, but so far we have tried to really support all > PG-Distro combinations. > > I could probably arrange for that test to be disabled when building > for xenial, but it'd be nice if there were a configure switch or > environment variable for it so we don't have to invent it. > > > > Ubuntu bionic fails elsewhere [ apparently in 006_logical_decoding.pl ] > > > > Hmm. Not related to this thread then. But if that's reproducible, > > It has been failing with the same output since Jan 2, 2020, noon. > > > somebody should have a look. Maybe related to > > > > https://www.postgresql.org/message-id/CAA4eK1LMDx6vK8Kdw8WUeW1MjToN2xVffL2kvtHvZg17%3DY6QQg%40mail.gmail.com > > The only git change in that build was d207038053837 "Fix running out > of file descriptors for spill files." > Thanks, for reporting. Is it possible to get a call stack as we are not able to reproduce this failure? Also, if you don't mind, let's discuss this on the thread provided by Tom. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
I wrote: > I installed libedit_3.1-20150325.orig.tar.gz from source here, and it > passes our current regression test and seems to behave just fine in > light manual testing. (I did not apply any of the Debian-specific > patches at [1], but they don't look like they'd explain much.) > So I'm a bit at a loss as to what's going wrong for you. Is the test > environment for Xenial the same as for the other branches? To dig deeper, I set up an actual installation of xenial, and on that I can replicate the tab-completion misbehavior you reported. The cause appears to be that libedit's rl_line_buffer does not contain the current line as expected, but the previous line (or, initially, an empty string). Thus, the hack I put in to make things pass on current libedit actually makes things worse on this version --- although it doesn't fully pass even if I revert ddd87d564, since there are other places where we depend on rl_line_buffer to be valid. So that raises the question: why does xenial's version of libedit not match either its documentation or the distributed source code? Because it doesn't. regards, tom lane
On Mon, Jan 6, 2020 at 4:26 PM Christoph Berg <myon@debian.org> wrote: > > Re: Tom Lane 2020-01-05 <25771.1578249042@sss.pgh.pa.us> > > Ubuntu bionic fails elsewhere: > > 07:19:51 t/001_stream_rep.pl .................. ok > 07:19:53 t/002_archiving.pl ................... ok > 07:19:59 t/003_recovery_targets.pl ............ ok > 07:20:01 t/004_timeline_switch.pl ............. ok > 07:20:08 t/005_replay_delay.pl ................ ok > 07:20:10 Bailout called. Further testing stopped: system pg_ctl failed > 07:20:10 FAILED--Further testing stopped: system pg_ctl failed > > 07:20:10 2020-01-06 06:19:41.285 UTC [26415] LOG: received fast shutdown request > 07:20:10 2020-01-06 06:19:41.285 UTC [26415] LOG: aborting any active transactions > 07:20:10 2020-01-06 06:19:41.287 UTC [26415] LOG: background worker "logical replication launcher" (PID 26424) exitedwith exit code 1 > 07:20:10 2020-01-06 06:19:41.287 UTC [26419] LOG: shutting down > 07:20:10 2020-01-06 06:19:41.287 UTC [26419] LOG: checkpoint starting: shutdown immediate > It looks like this failure is more of what we are getting on "sidewinder" where it failed because of "insufficient file descriptors available to start server process". Can you check in the log (probably in 006_logical_decoding_master.log) if this is the same you are getting or something else. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Amit Kapila 2020-01-08 <CAA4eK1KtQDU=RTAPi6id6Nvv1HkKrpDp_ZZ_uHL1XQ0cjT2_eg@mail.gmail.com> > It looks like this failure is more of what we are getting on > "sidewinder" where it failed because of "insufficient file descriptors > available to start server process". Can you check in the log > (probably in 006_logical_decoding_master.log) if this is the same you > are getting or something else. I can't reproduce it locally now, but it's still failing on the buildd. Is there some setting to make it print the relevant .log files? (Fwiw, I can't see your error message in https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-01-07%2022%3A45%3A24) Christoph
Christoph Berg <myon@debian.org> writes: > (Fwiw, I can't see your error message in > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-01-07%2022%3A45%3A24) sidewinder is currently broken due to an unrelated problem. The case Amit is worried about is only manifesting on the back branches, eg here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-01-02%2018%3A45%3A25 where we're getting postmaster start failures like this one: 2020-01-02 19:51:05.685 CET [24138:1] LOG: starting PostgreSQL 12.1 on x86_64-unknown-netbsd7.0, compiled by gcc (nb2 20150115)4.8.4, 64-bit 2020-01-02 19:51:05.686 CET [24138:2] LOG: listening on Unix socket "/tmp/sxAcn7SAzt/.s.PGSQL.56110" 2020-01-02 19:51:05.687 CET [24138:3] FATAL: insufficient file descriptors available to start server process 2020-01-02 19:51:05.687 CET [24138:4] DETAIL: System allows 19, we need at least 20. 2020-01-02 19:51:05.687 CET [24138:5] LOG: database system is shut down This would happen if anything is causing the postmaster to have a few more open files than the test added by commit d207038053837ae9365df2776371632387f6f655 is allowing for. It's a test bug and nothing more. Why sidewinder is not showing this in HEAD too is an interesting question, but it isn't. However, it could be that on another platform (ie bionic) the problem does manifest in HEAD. regards, tom lane
I wrote: > This would happen if anything is causing the postmaster to have > a few more open files than the test added by commit > d207038053837ae9365df2776371632387f6f655 is allowing for. It's > a test bug and nothing more. > Why sidewinder is not showing this in HEAD too is an interesting > question, but it isn't. However, it could be that on another > platform (ie bionic) the problem does manifest in HEAD. I set up a NetBSD 7 installation locally, and while I have not directly reproduced the failure, I believe I understand all the components of it now. (1) d20703805's test will clearly fall over if there are more than six FDs open in the postmaster when set_max_safe_fds is called, because it sets max_files_per_process = 26 while set_max_safe_fds requires at least 20 usable FDs to be available. (2) The postmaster's stdin/stdout/stderr will surely eat up three of those. (3) In HEAD, that's actually all the FDs there are normally, but in the back branches there is one more (under the conditions of this test), because in the back branches we open the postmaster's listen sockets before we run set_max_safe_fds. (9a86f03b4 changed this.) (4) NetBSD 7.0's cron leaves three extra open FDs in processes that it spawns. I have not looked into why, but I have experimentally observed this. For example, lsof on a "sleep" launched from cron shows COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME sleep 7824 tgl cwd VDIR 0,0 512 795201 /home/tgl sleep 7824 tgl txt VREG 0,0 10431 1613152 /bin/sleep sleep 7824 tgl txt VREG 0,0 1616564 22726 /lib/libc.so.12.193.1 sleep 7824 tgl txt VREG 0,0 55295 22747 /lib/libgcc_s.so.1.0 sleep 7824 tgl txt VREG 0,0 187183 22762 /lib/libm.so.0.11 sleep 7824 tgl txt VREG 0,0 92195 1499524 /libexec/ld.elf_so sleep 7824 tgl 0r PIPE 0xfffffe803131eb58 16384 sleep 7824 tgl 1w PIPE 0xfffffe8007ec4a30 0 ->0xfffffe800cc0d2c0 sleep 7824 tgl 2w PIPE 0xfffffe8007ec4a30 0 ->0xfffffe800cc0d2c0 sleep 7824 tgl 7u unknown file system type: 0 sleep 7824 tgl 8u unknown file system type: 0 sleep 7824 tgl 9w PIPE 0xfffffe80036c4dc0 0 while of course "sleep" launched by hand has only 0/1/2 open. We may conclude that when the regression tests are launched from cron, as would be typical for a buildfarm animal, HEAD has exactly zero FDs to spare in this test, while the back branches are one FD underwater and fail. This matches the observed results from sidewinder. It's not clear whether any of this info applies to Christoph's trouble with bionic. If the extra FDs are an old cron bug, it could be that bionic shares that bug --- but to explain failure on HEAD, you'd have to posit four excess FDs not three. I'm not convinced that what Christoph is seeing matches this anyway; he hasn't showed the telltale "insufficient file descriptors" message, at least. Still, maybe launched-by-cron vs launched-by-hand is a relevant point there. regards, tom lane
Hi, On 2020-01-08 17:31:06 -0500, Tom Lane wrote: > (1) d20703805's test will clearly fall over if there are more than six > FDs open in the postmaster when set_max_safe_fds is called, because it > sets max_files_per_process = 26 while set_max_safe_fds requires at > least 20 usable FDs to be available. > (4) NetBSD 7.0's cron leaves three extra open FDs in processes that > it spawns. I have not looked into why, but I have experimentally > observed this. For example, lsof on a "sleep" launched from cron > shows > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > sleep 7824 tgl cwd VDIR 0,0 512 795201 /home/tgl > sleep 7824 tgl txt VREG 0,0 10431 1613152 /bin/sleep > sleep 7824 tgl txt VREG 0,0 1616564 22726 /lib/libc.so.12.193.1 > sleep 7824 tgl txt VREG 0,0 55295 22747 /lib/libgcc_s.so.1.0 > sleep 7824 tgl txt VREG 0,0 187183 22762 /lib/libm.so.0.11 > sleep 7824 tgl txt VREG 0,0 92195 1499524 /libexec/ld.elf_so > sleep 7824 tgl 0r PIPE 0xfffffe803131eb58 16384 > sleep 7824 tgl 1w PIPE 0xfffffe8007ec4a30 0 ->0xfffffe800cc0d2c0 > sleep 7824 tgl 2w PIPE 0xfffffe8007ec4a30 0 ->0xfffffe800cc0d2c0 > sleep 7824 tgl 7u unknown file system type: 0 > sleep 7824 tgl 8u unknown file system type: 0 > sleep 7824 tgl 9w PIPE 0xfffffe80036c4dc0 0 > > while of course "sleep" launched by hand has only 0/1/2 open. Is it worth having the test close superflous FDs? It'd not be hard to do so via brute force (or even going through /proc/self/fd). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Is it worth having the test close superflous FDs? It'd not be hard to do > so via brute force (or even going through /proc/self/fd). No, it isn't, because d20703805's test is broken by design. There are any number of reasons why there might be more than three-or-so FDs open during postmaster start. Here are a few: * It seems pretty likely that at least one of those FDs is intentionally being left open by cron so it can detect death of all child processes (like our postmaster death pipe). Forcibly closing them will not necessarily have nice results. Other execution environments might do similar tricks. * On platforms where semaphores eat a FD apiece, we intentionally open those before counting free FDs. * We run process_shared_preload_libraries before counting free FDs, too. If a loaded library intentionally leaves a FD open in the postmaster, counting that against the limit also seems like a good idea. My opinion is still that we should just get rid of that test case. The odds of it ever finding anything interesting seem too low to justify the work that will be involved in (a) getting it to work reliably today and then (b) keeping it working. Running on the hairy edge of not having enough FDs doesn't seem like a use-case that we want to spend a lot of time on, but we will be if that test stays as it is. Example: if max_safe_fds is only 10, as this test is trying to make it be, then maxAllocatedDescs won't be allowed to exceed 5. Do you want to bet that no code paths will reasonably exceed that limit? [1] What will we do about it when we find one that does? I also note that the test can only catch cases where we used OpenTransientFile() in an inappropriate way. I think it's at least as likely that somebody would carelessly use open() directly, and then we have no chance of catching it till the kernel complains about EMFILE. Thinking about that some more, maybe the appropriate thing to do is not to mess with max_files_per_process as such, but to test with some artificial limit on maxAllocatedDescs. We still won't catch direct use of open(), but that could test for misuse of OpenTransientFile() with a lot less environment dependence. regards, tom lane [1] Although I notice that the code coverage report shows we never reach the enlargement step in reserveAllocatedDesc(), which means that the set of tests we run today don't exercise any such path. I'm somewhat surprised to see that we *do* seem to exercise overrunning max_safe_fds, though, since ReleaseLruFile() is reached. Maybe this test case is responsible for that?
On Thu, Jan 9, 2020 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > Is it worth having the test close superflous FDs? It'd not be hard to do > > so via brute force (or even going through /proc/self/fd). > > No, it isn't, because d20703805's test is broken by design. There > are any number of reasons why there might be more than three-or-so > FDs open during postmaster start. Here are a few: > > * It seems pretty likely that at least one of those FDs is > intentionally being left open by cron so it can detect death of > all child processes (like our postmaster death pipe). Forcibly > closing them will not necessarily have nice results. Other > execution environments might do similar tricks. > > * On platforms where semaphores eat a FD apiece, we intentionally > open those before counting free FDs. > > * We run process_shared_preload_libraries before counting free FDs, > too. If a loaded library intentionally leaves a FD open in the > postmaster, counting that against the limit also seems like a good > idea. > > My opinion is still that we should just get rid of that test case. > The point is that we know what is going wrong on sidewinder on back branches. However, we still don't know what is going wrong with tern and mandrill on v10 [1][2] where the log is: 2020-01-08 06:38:10.842 UTC [54001846:9] t/006_logical_decoding.pl STATEMENT: SELECT data from pg_logical_slot_get_changes('test_slot', NULL, NULL) WHERE data LIKE '%INSERT%' ORDER BY lsn LIMIT 1; 2020-01-08 06:38:15.993 UTC [63898020:3] LOG: server process (PID 54001846) was terminated by signal 11 2020-01-08 06:38:15.993 UTC [63898020:4] DETAIL: Failed process was running: SELECT data from pg_logical_slot_get_changes('test_slot', NULL, NULL) WHERE data LIKE '%INSERT%' ORDER BY lsn LIMIT 1; 2020-01-08 06:38:15.993 UTC [63898020:5] LOG: terminating any other active server processes Noah has tried to reproduce it [3] on that buildfarm machine by running that test in a loop, but he couldn't reproduce it till now. He is running the test now for a longer duration. Another point is that the logic in v11 code is the same, but the same test is passing on those machines, so I have a slight suspicion that there might be some other problem in v10 which is uncovered by this test, but I am not sure on this point. Now, if we remove that test as per your suggestion, then we might not be able to find out what is going wrong on those machines in v10? [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2020-01-08%2004%3A36%3A27 [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-01-08%2004%3A36%3A27 [3] - https://www.postgresql.org/message-id/20200104185148.GA2270238%40rfd.leadboat.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 7, 2020 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So that raises the question: why does xenial's version of libedit > not match either its documentation or the distributed source code? > Because it doesn't. The level of effort you've put into this is extremely impressive, but I can't shake the feeling that you're going to keep finding issues in the test setup, the operating system, or the upstream libraries, rather than bugs in PostgreSQL. Maybe this is all just one-time stabilization effort, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Robert Haas 2020-01-09 <CA+TgmoZC_6z3d5+6n1GVfy898uoBHMPPJU4-nAQLG5ZkaEyqdQ@mail.gmail.com> > > So that raises the question: why does xenial's version of libedit > > not match either its documentation or the distributed source code? > > Because it doesn't. > > The level of effort you've put into this is extremely impressive, but > I can't shake the feeling that you're going to keep finding issues in > the test setup, the operating system, or the upstream libraries, > rather than bugs in PostgreSQL. Maybe this is all just one-time > stabilization effort, but... Fwiw if libedit in xenial is Just Broken and fixing the tests would just codify the brokenness without adding any value, I'll just disable that test in that particular build. It looks like setting with_readline=no on the make command line should work. Christoph
Robert Haas <robertmhaas@gmail.com> writes: > The level of effort you've put into this is extremely impressive, but > I can't shake the feeling that you're going to keep finding issues in > the test setup, the operating system, or the upstream libraries, > rather than bugs in PostgreSQL. Maybe this is all just one-time > stabilization effort, but... Yeah, this is pretty nearly what I feared would happen, given libedit's apparently-well-earned evil reputation. I suspect that the filename completion tests I posted over in that thread may expose another round of issues. If we can get past that, I think that expanding our test coverage for the rest of tab-complete.c should be relatively unproblematic, because there aren't very many other readline behaviors that tab-complete.c is depending on. One alternative is to give up on the idea of ever having any test coverage for tab-complete.c, which doesn't seem very desirable. Or we could disable the tests when using libedit, or provide some switch to make it easier for the user/packager to do so. In Debian's case, I suspect that they don't care about the behavior when using libedit, so skipping the test would be just fine for them. What they really ought to test is what happens after they sub in libreadline at runtime ... but I imagine that their dubious legal theories about all this would prevent them from actually doing that during the package build. regards, tom lane PS: Stepping back from the immediate problem, it's obviously better for all concerned if libedit is usable with Postgres. So if they're willing to patch problems, which we've found out they are, then coping with this stuff is a win in the long run.
Christoph Berg <myon@debian.org> writes: > Fwiw if libedit in xenial is Just Broken and fixing the tests would > just codify the brokenness without adding any value, I'll just disable > that test in that particular build. It looks like setting > with_readline=no on the make command line should work. That would disable psql's tab completion, command editing, and history altogether, which I doubt is what you want for production builds. If we conclude we can't work around the testing issues for ancient libedit, probably the right answer is to provide a switch to disable just the test. I've been trying to dance around that conclusion, but maybe we should just do it and move on. regards, tom lane
On 2020-01-09 16:50, Tom Lane wrote: > Christoph Berg <myon@debian.org> writes: >> Fwiw if libedit in xenial is Just Broken and fixing the tests would >> just codify the brokenness without adding any value, I'll just disable >> that test in that particular build. It looks like setting >> with_readline=no on the make command line should work. > > That would disable psql's tab completion, command editing, and history > altogether, which I doubt is what you want for production builds. > If we conclude we can't work around the testing issues for ancient > libedit, probably the right answer is to provide a switch to > disable just the test. I've been trying to dance around that > conclusion, but maybe we should just do it and move on. I think he means something like make check with_readline=no not for the actual build. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2020-01-09 16:50, Tom Lane wrote: >> That would disable psql's tab completion, command editing, and history >> altogether, which I doubt is what you want for production builds. >> If we conclude we can't work around the testing issues for ancient >> libedit, probably the right answer is to provide a switch to >> disable just the test. I've been trying to dance around that >> conclusion, but maybe we should just do it and move on. > I think he means something like > make check with_readline=no > not for the actual build. Oh, I see. I'd rather not codify that though, because it risks problems if that symbol ever gets used any other way. I was thinking of making the test script check for some independent environment variable, say SKIP_READLINE_TESTS. regards, tom lane
I wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> I think he means something like >> make check with_readline=no >> not for the actual build. > Oh, I see. I'd rather not codify that though, because it risks > problems if that symbol ever gets used any other way. I was > thinking of making the test script check for some independent > environment variable, say SKIP_READLINE_TESTS. I thought of another problem with the with_readline=no method, which is that it requires the user to be issuing "make check" directly; it wouldn't be convenient for a buildfarm owner, say. (*Perhaps* it'd work to set with_readline=no throughout a buildfarm run, but I think that's asking for trouble with the build part.) I pushed a patch using SKIP_READLINE_TESTS. Christoph should be able to set that for the Ubuntu branches where the test is failing. regards, tom lane
Re: Tom Lane 2020-01-09 <16328.1578606690@sss.pgh.pa.us> > build part.) I pushed a patch using SKIP_READLINE_TESTS. > Christoph should be able to set that for the Ubuntu branches > where the test is failing. That "fixed" the problem on xenial, thanks. Christoph
Re: Amit Kapila 2020-01-09 <CAA4eK1Kq13BS2z1xOfTDcuSwzga+0NPzSWXgw3ygxX9tRk1rEg@mail.gmail.com> > The point is that we know what is going wrong on sidewinder on back > branches. However, we still don't know what is going wrong with tern > and mandrill on v10 [1][2] where the log is: Fwiw, the problem on bionic disappeared yesterday with the build triggered by "Revert test added by commit d207038053". Christoph
On Thu, Jan 16, 2020 at 2:10 AM Christoph Berg <myon@debian.org> wrote: > > Re: Amit Kapila 2020-01-09 <CAA4eK1Kq13BS2z1xOfTDcuSwzga+0NPzSWXgw3ygxX9tRk1rEg@mail.gmail.com> > > The point is that we know what is going wrong on sidewinder on back > > branches. However, we still don't know what is going wrong with tern > > and mandrill on v10 [1][2] where the log is: > > Fwiw, the problem on bionic disappeared yesterday with the build > triggered by "Revert test added by commit d207038053". > Thanks for the update. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com