Thread: pgsql: Refactor all TAP test suites doing connection checks

pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
Refactor all TAP test suites doing connection checks

This commit refactors more TAP tests to adapt with the recent
introduction of connect_ok() and connect_fails() in PostgresNode,
introduced by 0d1a3343.  This changes the following test suites to use
the same code paths for connection checks:
- Kerberos
- LDAP
- SSL
- Authentication

Those routines are extended to be able to handle optional parameters
that are set depending on each suite's needs, as of:
- custom SQL query.
- expected stderr matching pattern.
- expected stdout matching pattern.
The new design is extensible with more parameters, and there are some
plans for those routines in the future with checks based on the contents
of the backend logs.

Author: Jacob Champion, Michael Paquier
Discussion: https://postgr.es/m/d17b919e27474abfa55d97786cb9cfadfe2b59e9.camel@vmware.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/c50624cdd248c13b4ba199f95e24c88d2cc8a097

Modified Files
--------------
src/test/authentication/t/001_password.pl |  19 ++--
src/test/authentication/t/002_saslprep.pl |  18 ++--
src/test/kerberos/t/001_auth.pl           |  51 +++++------
src/test/ldap/t/001_auth.pl               |  17 ++--
src/test/perl/PostgresNode.pm             |  71 ++++++++++++---
src/test/ssl/t/001_ssltests.pl            | 143 +++++++++++++++++-------------
src/test/ssl/t/002_scram.pl               |  22 +++--
7 files changed, 211 insertions(+), 130 deletions(-)


Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Refactor all TAP test suites doing connection checks

Some of the buildfarm seems not to like this.  gaur failed here:

==~_~===-=-===~_~== pgsql.build/src/test/authentication/tmp_check/log/regress_log_001_password ==~_~===-=-===~_~==
...
ok 8 - authentication success for method md5, role md5_role
### Reloading node "primary"
# Running: pg_ctl -D
/home/bfarm/bf-data/HEAD/pgsql.build/src/test/authentication/tmp_check/t_001_password_primary_data/pgdatareload 
server signaled
ack Broken pipe: write( 11, 'SELECT $$connected with user=scram_role$$' ) at
/opt/perl5.8.9/lib/site_perl/5.8.9/IPC/Run/IO.pmline 558 

and I think hoverfly's failure is the same thing.

I didn't look at the code, but I bet this is similar to the race
conditions we've seen before where a server process may exit before
the perl test script finishes stuffing a command down the pipe.

            regards, tom lane



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
On Mon, Apr 05, 2021 at 10:57:22AM -0400, Tom Lane wrote:
> Some of the buildfarm seems not to like this.  gaur failed here:

Thanks.  I did not notice these.

> I didn't look at the code, but I bet this is similar to the race
> conditions we've seen before where a server process may exit before
> the perl test script finishes stuffing a command down the pipe.

Hmm.  Those failures refer to the 5th and 8th tests of
001_password.pl where connect_ok() is used.  The only logical
differences compared to the previous code are:
- The addition of on_error_stop in connect_ok() to ignore any errors
- The fact that this now defines a SQL query (was undef previously).
- $Test::Builder::Level
So, the problem is with on_error_stop.  That's not necessary anyway,
but I am a bit surprised that this would be an issue.  I'll apply that
to see if it fixes the issue.  If it does not, a revert would be
better at this stage.
--
Michael

Attachment

Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Apr 05, 2021 at 10:57:22AM -0400, Tom Lane wrote:
>> I didn't look at the code, but I bet this is similar to the race
>> conditions we've seen before where a server process may exit before
>> the perl test script finishes stuffing a command down the pipe.

> Hmm.  Those failures refer to the 5th and 8th tests of
> 001_password.pl where connect_ok() is used.  The only logical
> differences compared to the previous code are:
> - The addition of on_error_stop in connect_ok() to ignore any errors
> - The fact that this now defines a SQL query (was undef previously).
> - $Test::Builder::Level
> So, the problem is with on_error_stop.

No, I bet it is with "now defines a SQL query".  That means there's
something to be stuffed down the pipe.

I find I can reproduce this pretty reliably on slower machines
here, if you want me to dig into it more carefully.

            regards, tom lane



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
On Mon, Apr 05, 2021 at 06:57:47PM -0400, Tom Lane wrote:
> No, I bet it is with "now defines a SQL query".  That means there's
> something to be stuffed down the pipe.
>
> I find I can reproduce this pretty reliably on slower machines
> here, if you want me to dig into it more carefully.

Now that I think about it, another thing that has changed in this code
is the fact that it calls PostgresNode::psql with a result array
rather than just the error result, still I don't see how this would
have an impact?  If you are right, it would be easy to bypass this
error with a change in the connect routines to be able to pass down
undefined queries rather than filling in a default one.  But that does
not feel like the root of it.
--
Michael

Attachment

Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
I wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> So, the problem is with on_error_stop.

> No, I bet it is with "now defines a SQL query".  That means there's
> something to be stuffed down the pipe.

I can confirm that removing the on_error_stop parameter as you
propose does *not* fix it.

            regards, tom lane



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
What does seem to fix it reliably is the attached.

            regards, tom lane

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index bbde34c..07a9167 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1938,7 +1938,7 @@ sub connect_fails
     # have set up things properly, and this should not block.
     my ($ret, $stdout, $stderr) = $self->psql(
         'postgres',
-        "SELECT \$\$connected with $connstr\$\$",
+        undef, # don't try to send a query
         extra_params => ['-w'],
         connstr      => "$connstr");


Re: pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
On Mon, Apr 05, 2021 at 07:23:43PM -0400, Tom Lane wrote:
> What does seem to fix it reliably is the attached.

Thanks for confirming.  I am still surprised that this has not popped
up as an issue until now.  Could that be because those animals have
never run the LDAP, SSL or kerberos test suites over the years?  The
attached would address this issue, and I am going to apply it for the
buildfarm.  LDAP and kerberos still use some queries, while the SSL
suite loses that, but as long as there is log_connections in the
configuration we'd still know about the connection attempts, so I
guess that's fine.
--
Michael

Attachment

Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Apr 05, 2021 at 07:23:43PM -0400, Tom Lane wrote:
>> What does seem to fix it reliably is the attached.

> Thanks for confirming.  I am still surprised that this has not popped
> up as an issue until now.  Could that be because those animals have
> never run the LDAP, SSL or kerberos test suites over the years?

Dunno.  Certainly most animals that enable TAP at all should be
running the "authentication" test, but those others all need to
be manually enabled, I believe.  My own animals run the SSL test
but not LDAP or kerberos.

I'm fairly sure that we've previously fixed this exact same issue
in some tests that expected a connection failure ...

            regards, tom lane



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
On Mon, Apr 05, 2021 at 08:15:31PM -0400, Tom Lane wrote:
> Dunno.  Certainly most animals that enable TAP at all should be
> running the "authentication" test, but those others all need to
> be manually enabled, I believe.  My own animals run the SSL test
> but not LDAP or kerberos.

The SSL tests have always sent down a query even for scenarios where
the connection would fail, but they have used command_fails_like()
rather than PostgresNode::psql.  The puzzle does not seem complete.

> I'm fairly sure that we've previously fixed this exact same issue
> in some tests that expected a connection failure ...

Yeah.  Looking at the git history, that would be c757a3da.
test_access() in the kerberos tests sends a SELECT true even for
expected failures, so it seems to me that it would fail with the same
symptoms on those machines.  That looks worth fixing and
backpatching.

And I need more caffeine this morning..  I missed your point that a
connection failure should pass down an undefined query string per
that.  Well, the fix is simple then now that there is a single code
path for all those connection attempts.
--
Michael

Attachment

Re: pgsql: Refactor all TAP test suites doing connection checks

From
Thomas Munro
Date:
On Tue, Apr 6, 2021 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > On Mon, Apr 05, 2021 at 07:23:43PM -0400, Tom Lane wrote:
> >> What does seem to fix it reliably is the attached.
>
> > Thanks for confirming.  I am still surprised that this has not popped
> > up as an issue until now.  Could that be because those animals have
> > never run the LDAP, SSL or kerberos test suites over the years?
>
> Dunno.  Certainly most animals that enable TAP at all should be
> running the "authentication" test, but those others all need to
> be manually enabled, I believe.  My own animals run the SSL test
> but not LDAP or kerberos.
>
> I'm fairly sure that we've previously fixed this exact same issue
> in some tests that expected a connection failure ...

Commit f44b9b62 springs to mind.



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Apr 05, 2021 at 08:15:31PM -0400, Tom Lane wrote:
>> Dunno.  Certainly most animals that enable TAP at all should be
>> running the "authentication" test, but those others all need to
>> be manually enabled, I believe.  My own animals run the SSL test
>> but not LDAP or kerberos.

> The SSL tests have always sent down a query even for scenarios where
> the connection would fail, but they have used command_fails_like()
> rather than PostgresNode::psql.  The puzzle does not seem complete.

Well, since it's a race condition it's obviously going to be sensitive to
machine speed.  I think that Perl version, or more likely version of
IPC::Run or a related module, may enter into it too.  prairiedog has
gotten through the tests several times since yesterday without failing,
yet it's significantly slower than the other PPC Mac I was able to
consistently reproduce the problem on.  I don't think those two have
the same vintage of Perl installation, though, so Perl version seems
like a likely explanation.

            regards, tom lane



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
On Tue, Apr 06, 2021 at 12:30:48PM +1200, Thomas Munro wrote:
> Commit f44b9b62 springs to mind.

Right.  The kerberos suite does not do that though, so wouldn't it be
better to use the same trick with -c?  For example, the attached for
~13 does so.
--
Michael

Attachment

Re: pgsql: Refactor all TAP test suites doing connection checks

From
Tom Lane
Date:
I wrote:
> Well, since it's a race condition it's obviously going to be sensitive to
> machine speed.  I think that Perl version, or more likely version of
> IPC::Run or a related module, may enter into it too.  prairiedog has
> gotten through the tests several times since yesterday without failing,
> yet it's significantly slower than the other PPC Mac I was able to
> consistently reproduce the problem on.  I don't think those two have
> the same vintage of Perl installation, though, so Perl version seems
> like a likely explanation.

Just for the archives' sake:

prairiedog (passed 3 out of 3 tries):
450MHz PPC32
Perl 5.8.3
$ perl -MIPC::Run -e 'print $IPC::Run::VERSION ."\n";' 
0.79
(file dates on the IPC::Run files are 2004)

gaur (failed 1 out of 1 tries):
360MHz HPPA
Perl 5.8.9
IPC::Run 0.94 (file dates 2014)

my other PPC Mac (failed consistently, 10 out of 10):
1.33GHz PPC32
Perl 5.8.8
IPC::Run 20180523.0

Not sure what to make of that, though a regression in IPC::Run
seems possible.  It's also possible that buildfarm script (the
first two) versus manual invocation (the last) is related somehow.

            regards, tom lane



Re: pgsql: Refactor all TAP test suites doing connection checks

From
Michael Paquier
Date:
On Tue, Apr 06, 2021 at 10:37:53AM +0900, Michael Paquier wrote:
> Right.  The kerberos suite does not do that though, so wouldn't it be
> better to use the same trick with -c?  For example, the attached for
> ~13 does so.

This was the only problem after one extra lookup, so fixed in 11~13.
--
Michael

Attachment