Thread: More test/kerberos tweaks

More test/kerberos tweaks

From
Jacob Champion
Date:
While working on [1] I ended up running into a couple issues with the
Kerberos test suite. Attached are two patches with possible
improvements:

Some tests check for expected log messages. They currently search
through the entire log file, from the beginning, for every match. So if
two tests share the same expected log content (which is common), it's
possible for the second test to get a false positive by matching
against the first test's output. (You can see this by modifying one of
the expected-failure tests to expect the same output as a previous
happy-path test.)

The first patch stores the offset of the previous match, and searches
forward from there during the next match, resetting the offset every
time the log file changes. This isn't perfect -- it could still result
in false positives if one test spits out two or more matching log lines
and only matches the first one by itself -- but searching forward
should be an improvement over what's there now.

The second patch is more of a quality-of-life improvement for devs. On
a failed log match, the test will spin for three minutes before giving
up on the match. I think this is excessive, especially since
interrupting the test with Ctrl-C leaves behind a running KDC daemon.
The patch reduces the timeout to three seconds. I guess the only
question I have is whether there are any underpowered machines out
there running this test, relying on the higher timeout to function.

--Jacob

[1] 
https://www.postgresql.org/message-id/a9ee5e4e8e844d06c2bcf70c6ed3306ccb4897f1.camel%40vmware.com



Attachment

Re: More test/kerberos tweaks

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> The second patch is more of a quality-of-life improvement for devs. On
> a failed log match, the test will spin for three minutes before giving
> up on the match. I think this is excessive, especially since
> interrupting the test with Ctrl-C leaves behind a running KDC daemon.
> The patch reduces the timeout to three seconds. I guess the only
> question I have is whether there are any underpowered machines out
> there running this test, relying on the higher timeout to function.

We have, almost invariably, regretted it when we tried to use short
timeouts in test cases.

I checked the buildfarm logs for the past month to see which machines
are running the kerberos test and what their reported stage runtimes
were.  There are just three:

 system    min time     max time

 crake   | 00:00:09   | 00:01:16
 eelpout | 00:00:00   | 00:00:01
 elver   | 00:00:04   | 00:00:09

I'm not sure what's happening on crake to give it such a wide range
of runtimes on this test, but I can't help thinking it would probably
have failed a few of those runs with a three-second timeout.

More generally, sometimes people want to do things like run a test
under valgrind.  So it's not just "underpowered machines" that may
need a generous timeout.  Even if we did reduce the default, I'd
want a way (probably via an environment variable, cf PGCTLTIMEOUT)
to kick it back up.

On the whole, I think the right thing to be doing here is not so
much messing with the timeout as fixing the test script to be
more robust against control-C.  If it's failing to shut down the
KDC, I'd say that's a test bug.

            regards, tom lane



Re: More test/kerberos tweaks

From
Jacob Champion
Date:
On Fri, 2021-02-05 at 15:55 -0500, Tom Lane wrote:
> We have, almost invariably, regretted it when we tried to use short
> timeouts in test cases.

That's what I was afraid of. I can work around it easily enough on my
local machine, so it's not really a blocker in any sense.

That just leaves the first patch, then.

> On the whole, I think the right thing to be doing here is not so
> much messing with the timeout as fixing the test script to be
> more robust against control-C.  If it's failing to shut down the
> KDC, I'd say that's a test bug.

Agreed. I'm trying to limit the amount of test churn I introduce, since
I don't speak Perl very well. :D

Reworking the log checks so that they didn't need timeouts (e.g. by
stopping the server or otherwise flushing the logs, a la the
ssl_passphrase_callback tests) would be another approach. I'll jot it
down to look into later.

--Jacob