Thread: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timingis

Remove pgbench "progress" test pending solution of its timing issues.

Buildfarm member skink shows that this is even more flaky than
I thought.  There are probably some actual pgbench bugs here
as well as a timing dependency.  But we can't have stuff this
unstable in the buildfarm, it obscures other issues.

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ad51c6fb5708342e603d12a730bbc4e663bd637e

Modified Files
--------------
src/bin/pgbench/t/001_pgbench_with_server.pl | 22 ----------------------
1 file changed, 22 deletions(-)


--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Hello Tom,

> Remove pgbench "progress" test pending solution of its timing issues.

It would have been nicer to simply comment them out...

> Buildfarm member skink shows that this is even more flaky than I 
> thought.  There are probably some actual pgbench bugs here as well as a 
> timing dependency.

According to the trace, the host was so loaded that it could not do 
anything for over two seconds, so that the first output is "progress: 
2.6", i.e. the 0 thread did not get any significant time for the first 2.6 
seconds... of a 1 second test. Hmmm, not very kind.

Somehow this cannot be helped: if the system does not give any execution 
time to some pgbench thread, the expected output cannot be there.

The checked condition could be relaxed to "progress: \d", which still 
assumes that the system was kind enough to give enough cpu time to the 
process in the two second run so that it could output something.

The issue is probably the same on the second test on the generated log. My 
guess is that some transactions (we are talking about "-S"...) took 
several seconds, thus were completed and aggregate in rows over the 
expected limit. Could be considered bug, that is the aggregation should 
discard over the limit results rather than counting them in some late 
entry. I'll investigate that.

Globally these test failures reveals pathological testing conditions: the 
inability to execute a simple select over several seconds. It could have 
been OOM or whatever else.

If the test must run even when postgres doesn't, it is a little bit hard 
as a starting assumption for a benchmarking tool:-(

-- 
Fabien.


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Hi,

On 2017-09-23 19:57:40 +0200, Fabien COELHO wrote:
> According to the trace, the host was so loaded that it could not do anything
> for over two seconds, so that the first output is "progress: 2.6", i.e. the
> 0 thread did not get any significant time for the first 2.6 seconds... of a
> 1 second test. Hmmm, not very kind.
> 
> Somehow this cannot be helped: if the system does not give any execution
> time to some pgbench thread, the expected output cannot be there.

skink runs postgres under valgrind, so it's going to be slower.


> If the test must run even when postgres doesn't, it is a little bit hard as
> a starting assumption for a benchmarking tool:-(

I'm unsure what the point of this is.  It's not like we discussing
removing pgbench, we're talking about an unreliable test.

Greetings,

Andres Freund


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Globally these test failures reveals pathological testing conditions: the 
> inability to execute a simple select over several seconds. It could have 
> been OOM or whatever else.

Well, I think it's mostly about valgrind making everything really slow.
Since we have seen some passes from skink recently, perhaps there was also
a component of more-load-on-the-machine-than-usual.  But in the end this
is just evidence for my point that regression tests have to be very much
not timing-sensitive.  We run them under all kinds of out-of-the-ordinary
stress.
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

>> If the test must run even when postgres doesn't, it is a little bit 
>> hard as a starting assumption for a benchmarking tool:-(
>
> I'm unsure what the point of this is.  It's not like we discussing
> removing pgbench, we're talking about an unreliable test.

My sentence was probably not very clear.

I just meant that devising a coverage test which does not fail when the 
benchmarking tool is not able to run because of load/valgrind/... is kind 
of hard.

Anything even loosely related to time, here having a few simple SELECT 
transactions over 2 seconds and a few printf into a file or stdout every 
second, can always fail if conditions are bad enough.

So this means somehow giving up on coverage, because of one host which can 
fail under very unusual testing conditions once in a while.

I would prefer to keep the test and have a warning instead, something
like "ok, although a test which is allowed to rarely fail failed", but at 
least the feature are tested most of the time.

Sigh.

-- 
Fabien.


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Fabien COELHO <coelho@cri.ensmp.fr> writes:
> So this means somehow giving up on coverage, because of one host which can 
> fail under very unusual testing conditions once in a while.

You are attacking a straw man.  There is no reason whatever to believe
that the problem is confined to one host.  We have quite a number of slow
buildfarm hosts: some are using CLOBBER_CACHE_ALWAYS, some are overloaded,
some are just plain slow.  Also, regression test failures outside the
context of our buildfarm are just as bad if not worse.  I still remember
the pain I had with mysql's regression tests, which routinely fell over
in Red Hat's package buildfarm even though they always succeeded on a
personal workstation with nothing else going on.  I do not want our own
packagers to have that kind of experience with Postgres.

> I would prefer to keep the test and have a warning instead, something
> like "ok, although a test which is allowed to rarely fail failed", but at 
> least the feature are tested most of the time.

If the failures are ignored, how much of a test have you really got?
Nobody inquires into nominally-passed buildfarm runs to see what
actually happened.

In the TAP-test context, you do have quite a bit of freedom to decide
what is a pass and what is a fail.  So maybe the path forward is to be
more flexible about the pass conditions for this test.  But I can't
help thinking that these results have exposed some outright bugs too.
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

> You are attacking a straw man.

I thought it was tilting at windmills:-)

> In the TAP-test context, you do have quite a bit of freedom to decide
> what is a pass and what is a fail.  So maybe the path forward is to be
> more flexible about the pass conditions for this test.

Hmmm. I'm not sure that there is a solution under "any" testing conditions 
if -T & -P are to be tested and some sensible output expected out of them.
I'll think about it.

> But I can't help thinking that these results have exposed some outright 
> bugs too.

Yep, I agree that there is probably one bug/race/unexpected condition wrt 
what is output and time out handling.

-- 
Fabien.


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Hello Tom,

> Well, I think it's mostly about valgrind making everything really slow. 
> Since we have seen some passes from skink recently, perhaps there was 
> also a component of more-load-on-the-machine-than-usual.  But in the end 
> this is just evidence for my point that regression tests have to be very 
> much not timing-sensitive.  We run them under all kinds of 
> out-of-the-ordinary stress.

Attached is an attempt at improving the situation:

(1) there are added comments to explain the whys, which is to provide
coverage for pgbench time-related features... while still not being
time-sensitive, which is a challenge.

(2) the test now only expects "progress: \d" from the output, so it is 
enough that one progress is shown, whenever it is shown.

(3) if the test is detected to have gone AWOL, detailed log checks are
coldly skipped.

This would have passed on "skink" under the special conditions it 
encountered.

I cannot guaranty that it would pass under any circumstance, though.

If it still encounters a failure, ISTM that it should only be a missing 
"progress:" in the output, which has not been encountered so far.

If it occurs, a few options would remain, none of them very convincing:

  - give the test some more time, eg 3 seconds (hmmm... could still fail
    after any time...)

  - drop the "progress:" expectation (hmmm... but then it does not check
    anything).

  - make the "progress:" output check conditional to the running time
    (hmmm... it would require changing the command_checks_all function,
     and there is still a chance that the bench was stuck for 2 seconds
     then given time to realize that it has to stop right now...)

  - use an even simpler transaction, eg "SELECT;" which is less likely to
    get stuck (but still could get stuck...).

For the random-ness related test (--sample-rate), we could add a feature 
to pgbench which allows to control the random seed, so that the number of 
samples could be known in advance for testing purposes.

-- 
Fabien.
-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Attachment
Here's a rebased version of this patch.  This is not an endorsement of
it -- I just used it to locally increase coverage for the doCustom
refactoring patch I'm about to push.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment