Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" testpending solution of its timing is (fwd) - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" testpending solution of its timing is (fwd)
Date
Msg-id alpine.DEB.2.20.1801131618070.27289@lancre
Whole thread Raw
In response to Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" testpending solution of its timing is (fwd)
List pgsql-hackers
Hello Tom,

Thanks for having a look at this attempt.

>> Attached is an attempt at improving the situation:
>
> I took a quick look at this and can't really convince myself that it
> improves our situation.  The fact that it prints nothing if the runtime
> is outside of a tightly constrained range seems likely to hide the sort
> of bug you might hope such a test would detect.  A simple example is
> that, if there's an off-by-one bug causing the test to run for 3 seconds
> not 2, this test script won't complain about it.  Worse, if pgbench dumps
> core instantly on startup when given -p, this test script won't complain
> about it.  And the test is of no value at all on very slow buildfarm
> critters such as valgrind or clobber-cache-always animals.

Hmmm.

Note that an instant core does not fall under "too slow" and is caught 
anyway because pgbench return status is checked, and I expect it would not 
be zero when core dumped.

Alas, testing time related features with zero assumption about time is 
quite difficult:-)

The compromise I'm proposing is to skip time-related stuff if too slow. 
The value I see is that it provides coverage for all time-related 
features. I can also add a check that the run is *at least* 2 seconds when 
two seconds are asked for, because otherwise something was certainly 
amiss.

However it cannot do the impossible, i.e. check that something happens 
every second if we cannot assume that some cycles are spared for the 
process within a second. There is no miracle to expect.

>> (2) the test now only expects "progress: \d" from the output, so it is enough
>> that one progress is shown, whenever it is shown.
>
> Hm.  Could we get somewhere by making the test look for that, and
> adjusting the loop logic inside pgbench so that (maybe only with the
> tested switch values) it's guaranteed to print at least one progress
> output regardless of timing, because it won't check for exit until after
> it's printed a log message?

I'll look into ensuring structuraly that at least one progress is shown.

>> For the random-ness related test (--sample-rate), we could add a feature to
>> pgbench which allows to control the random seed, so that the number of samples
>> could be known in advance for testing purposes.
>
> Don't see how locking down the seed gets us anywhere.  The behavior of
> random() can't be assumed identical across different machines, even
> with the same seed.

Indeed, I surmised in between that expecting some portability from 
random() is naïve. Controlling the seed could still be useful for testing 
though, so I have submitted a patch for that.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [HACKERS] Replication status in logical replication
Next
From: Fabien COELHO
Date:
Subject: Re: Minor fix for pgbench documentation