Thread: pgsql: Refactor all TAP test suites doing connection checks
Refactor all TAP test suites doing connection checks This commit refactors more TAP tests to adapt with the recent introduction of connect_ok() and connect_fails() in PostgresNode, introduced by 0d1a3343. This changes the following test suites to use the same code paths for connection checks: - Kerberos - LDAP - SSL - Authentication Those routines are extended to be able to handle optional parameters that are set depending on each suite's needs, as of: - custom SQL query. - expected stderr matching pattern. - expected stdout matching pattern. The new design is extensible with more parameters, and there are some plans for those routines in the future with checks based on the contents of the backend logs. Author: Jacob Champion, Michael Paquier Discussion: https://postgr.es/m/d17b919e27474abfa55d97786cb9cfadfe2b59e9.camel@vmware.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/c50624cdd248c13b4ba199f95e24c88d2cc8a097 Modified Files -------------- src/test/authentication/t/001_password.pl | 19 ++-- src/test/authentication/t/002_saslprep.pl | 18 ++-- src/test/kerberos/t/001_auth.pl | 51 +++++------ src/test/ldap/t/001_auth.pl | 17 ++-- src/test/perl/PostgresNode.pm | 71 ++++++++++++--- src/test/ssl/t/001_ssltests.pl | 143 +++++++++++++++++------------- src/test/ssl/t/002_scram.pl | 22 +++-- 7 files changed, 211 insertions(+), 130 deletions(-)
Michael Paquier <michael@paquier.xyz> writes: > Refactor all TAP test suites doing connection checks Some of the buildfarm seems not to like this. gaur failed here: ==~_~===-=-===~_~== pgsql.build/src/test/authentication/tmp_check/log/regress_log_001_password ==~_~===-=-===~_~== ... ok 8 - authentication success for method md5, role md5_role ### Reloading node "primary" # Running: pg_ctl -D /home/bfarm/bf-data/HEAD/pgsql.build/src/test/authentication/tmp_check/t_001_password_primary_data/pgdatareload server signaled ack Broken pipe: write( 11, 'SELECT $$connected with user=scram_role$$' ) at /opt/perl5.8.9/lib/site_perl/5.8.9/IPC/Run/IO.pmline 558 and I think hoverfly's failure is the same thing. I didn't look at the code, but I bet this is similar to the race conditions we've seen before where a server process may exit before the perl test script finishes stuffing a command down the pipe. regards, tom lane
On Mon, Apr 05, 2021 at 10:57:22AM -0400, Tom Lane wrote: > Some of the buildfarm seems not to like this. gaur failed here: Thanks. I did not notice these. > I didn't look at the code, but I bet this is similar to the race > conditions we've seen before where a server process may exit before > the perl test script finishes stuffing a command down the pipe. Hmm. Those failures refer to the 5th and 8th tests of 001_password.pl where connect_ok() is used. The only logical differences compared to the previous code are: - The addition of on_error_stop in connect_ok() to ignore any errors - The fact that this now defines a SQL query (was undef previously). - $Test::Builder::Level So, the problem is with on_error_stop. That's not necessary anyway, but I am a bit surprised that this would be an issue. I'll apply that to see if it fixes the issue. If it does not, a revert would be better at this stage. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Apr 05, 2021 at 10:57:22AM -0400, Tom Lane wrote: >> I didn't look at the code, but I bet this is similar to the race >> conditions we've seen before where a server process may exit before >> the perl test script finishes stuffing a command down the pipe. > Hmm. Those failures refer to the 5th and 8th tests of > 001_password.pl where connect_ok() is used. The only logical > differences compared to the previous code are: > - The addition of on_error_stop in connect_ok() to ignore any errors > - The fact that this now defines a SQL query (was undef previously). > - $Test::Builder::Level > So, the problem is with on_error_stop. No, I bet it is with "now defines a SQL query". That means there's something to be stuffed down the pipe. I find I can reproduce this pretty reliably on slower machines here, if you want me to dig into it more carefully. regards, tom lane
On Mon, Apr 05, 2021 at 06:57:47PM -0400, Tom Lane wrote: > No, I bet it is with "now defines a SQL query". That means there's > something to be stuffed down the pipe. > > I find I can reproduce this pretty reliably on slower machines > here, if you want me to dig into it more carefully. Now that I think about it, another thing that has changed in this code is the fact that it calls PostgresNode::psql with a result array rather than just the error result, still I don't see how this would have an impact? If you are right, it would be easy to bypass this error with a change in the connect routines to be able to pass down undefined queries rather than filling in a default one. But that does not feel like the root of it. -- Michael
Attachment
I wrote: > Michael Paquier <michael@paquier.xyz> writes: >> So, the problem is with on_error_stop. > No, I bet it is with "now defines a SQL query". That means there's > something to be stuffed down the pipe. I can confirm that removing the on_error_stop parameter as you propose does *not* fix it. regards, tom lane
What does seem to fix it reliably is the attached. regards, tom lane diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index bbde34c..07a9167 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -1938,7 +1938,7 @@ sub connect_fails # have set up things properly, and this should not block. my ($ret, $stdout, $stderr) = $self->psql( 'postgres', - "SELECT \$\$connected with $connstr\$\$", + undef, # don't try to send a query extra_params => ['-w'], connstr => "$connstr");
On Mon, Apr 05, 2021 at 07:23:43PM -0400, Tom Lane wrote: > What does seem to fix it reliably is the attached. Thanks for confirming. I am still surprised that this has not popped up as an issue until now. Could that be because those animals have never run the LDAP, SSL or kerberos test suites over the years? The attached would address this issue, and I am going to apply it for the buildfarm. LDAP and kerberos still use some queries, while the SSL suite loses that, but as long as there is log_connections in the configuration we'd still know about the connection attempts, so I guess that's fine. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Apr 05, 2021 at 07:23:43PM -0400, Tom Lane wrote: >> What does seem to fix it reliably is the attached. > Thanks for confirming. I am still surprised that this has not popped > up as an issue until now. Could that be because those animals have > never run the LDAP, SSL or kerberos test suites over the years? Dunno. Certainly most animals that enable TAP at all should be running the "authentication" test, but those others all need to be manually enabled, I believe. My own animals run the SSL test but not LDAP or kerberos. I'm fairly sure that we've previously fixed this exact same issue in some tests that expected a connection failure ... regards, tom lane
On Mon, Apr 05, 2021 at 08:15:31PM -0400, Tom Lane wrote: > Dunno. Certainly most animals that enable TAP at all should be > running the "authentication" test, but those others all need to > be manually enabled, I believe. My own animals run the SSL test > but not LDAP or kerberos. The SSL tests have always sent down a query even for scenarios where the connection would fail, but they have used command_fails_like() rather than PostgresNode::psql. The puzzle does not seem complete. > I'm fairly sure that we've previously fixed this exact same issue > in some tests that expected a connection failure ... Yeah. Looking at the git history, that would be c757a3da. test_access() in the kerberos tests sends a SELECT true even for expected failures, so it seems to me that it would fail with the same symptoms on those machines. That looks worth fixing and backpatching. And I need more caffeine this morning.. I missed your point that a connection failure should pass down an undefined query string per that. Well, the fix is simple then now that there is a single code path for all those connection attempts. -- Michael
Attachment
On Tue, Apr 6, 2021 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Mon, Apr 05, 2021 at 07:23:43PM -0400, Tom Lane wrote: > >> What does seem to fix it reliably is the attached. > > > Thanks for confirming. I am still surprised that this has not popped > > up as an issue until now. Could that be because those animals have > > never run the LDAP, SSL or kerberos test suites over the years? > > Dunno. Certainly most animals that enable TAP at all should be > running the "authentication" test, but those others all need to > be manually enabled, I believe. My own animals run the SSL test > but not LDAP or kerberos. > > I'm fairly sure that we've previously fixed this exact same issue > in some tests that expected a connection failure ... Commit f44b9b62 springs to mind.
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Apr 05, 2021 at 08:15:31PM -0400, Tom Lane wrote: >> Dunno. Certainly most animals that enable TAP at all should be >> running the "authentication" test, but those others all need to >> be manually enabled, I believe. My own animals run the SSL test >> but not LDAP or kerberos. > The SSL tests have always sent down a query even for scenarios where > the connection would fail, but they have used command_fails_like() > rather than PostgresNode::psql. The puzzle does not seem complete. Well, since it's a race condition it's obviously going to be sensitive to machine speed. I think that Perl version, or more likely version of IPC::Run or a related module, may enter into it too. prairiedog has gotten through the tests several times since yesterday without failing, yet it's significantly slower than the other PPC Mac I was able to consistently reproduce the problem on. I don't think those two have the same vintage of Perl installation, though, so Perl version seems like a likely explanation. regards, tom lane
On Tue, Apr 06, 2021 at 12:30:48PM +1200, Thomas Munro wrote: > Commit f44b9b62 springs to mind. Right. The kerberos suite does not do that though, so wouldn't it be better to use the same trick with -c? For example, the attached for ~13 does so. -- Michael
Attachment
I wrote: > Well, since it's a race condition it's obviously going to be sensitive to > machine speed. I think that Perl version, or more likely version of > IPC::Run or a related module, may enter into it too. prairiedog has > gotten through the tests several times since yesterday without failing, > yet it's significantly slower than the other PPC Mac I was able to > consistently reproduce the problem on. I don't think those two have > the same vintage of Perl installation, though, so Perl version seems > like a likely explanation. Just for the archives' sake: prairiedog (passed 3 out of 3 tries): 450MHz PPC32 Perl 5.8.3 $ perl -MIPC::Run -e 'print $IPC::Run::VERSION ."\n";' 0.79 (file dates on the IPC::Run files are 2004) gaur (failed 1 out of 1 tries): 360MHz HPPA Perl 5.8.9 IPC::Run 0.94 (file dates 2014) my other PPC Mac (failed consistently, 10 out of 10): 1.33GHz PPC32 Perl 5.8.8 IPC::Run 20180523.0 Not sure what to make of that, though a regression in IPC::Run seems possible. It's also possible that buildfarm script (the first two) versus manual invocation (the last) is related somehow. regards, tom lane
On Tue, Apr 06, 2021 at 10:37:53AM +0900, Michael Paquier wrote: > Right. The kerberos suite does not do that though, so wouldn't it be > better to use the same trick with -c? For example, the attached for > ~13 does so. This was the only problem after one extra lookup, so fixed in 11~13. -- Michael