Re: [HACKERS] pgbench regression test failure - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] pgbench regression test failure
Date
Msg-id alpine.DEB.2.20.1709122102320.4555@lancre
Whole thread Raw
In response to Re: [HACKERS] pgbench regression test failure  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] pgbench regression test failure
List pgsql-hackers
> I have a serious, serious dislike for tests that seem to work until
> they're run on a heavily loaded machine.

I'm not that sure the error message was because of that. ISTM that it was 
rather finding 3 seconds in two because it started just at the right time, 
or maybe because of slowness induce by load and the order in which the 
different checks are performed.

> So unless there is some reason why pgbench is *guaranteed* to run at 
> least one transaction per thread, I'd rather the test not assume that.

Well, pgbench is for testing performance... so if the checks allow zero 
performance that's quite annoying as well:-) The tests are designed to 
require very low performance (eg there are a lot of -t 1 when only one 
transaction is enough to check a point), but maybe some test assume a 
minimal requirement, maybe 10 tps with 2 threads...

> I would not necessarily object to doing something in the code that
> would guarantee that, though.

Hmmm. Interesting point.

There could be a client-side synchronization barrier, eg something like 
"\sync :nclients/nthreads" could be easy enough to implement with pthread, 
and quite error prone to use, but probably that could be okay for 
validation purposes. Or maybe we could expose something at the SQL level, 
eg "SELECT synchro('synchroname', whomanyclientstowait);" which would be 
harder to implement server-side but possibly doable as well.

A simpler option may be to introduce a synchronization barrier at thread 
start, so that all threads start together and that would set the "zero" 
time. Not sure that would solve the potential issue you raise, although 
that would help.

Currently the statistics collection and outputs are performed by thread 0 
in addition to the client it runs, so that pgbench would work even if 
there are no threads, but it also means that under a heavy load some 
things may not be done on the target time but a little bit later, if some 
thread is stuck somewhere. Although the async protocol try to avoid that.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Re: [HACKERS] Clarification in pg10's pgupgrade.html step10 (upgrading standby servers)
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] psql - add special variable to reflect the last query status