Re: pgbench small bug fix - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench small bug fix
Date
Msg-id alpine.DEB.2.10.1602070910140.22821@sto
Whole thread Raw
In response to Re: pgbench small bug fix  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pgbench small bug fix
List pgsql-hackers
<Ooops, resent, wrong "From" again... sorry :-( >

Hello Robert,

> > While testing for something else I encountered two small bugs under very low
> > rate (--rate=0.1). The attached patches fixes these.
> > 
> >  - when a duration (-T) is specified, ensure that pgbench ends at that
> >    time (i.e. do not wait for a transaction beyond the end of the run).
> 
> Why does this use INSTR_TIME_GET_DOUBLE() and not INSTR_TIME_GET_MICROSEC()?

I choose that because duration is in seconds, but MICROSEC would work fine as
well. See attached version.

> Also, why do we really need this change?  Won't the timer expiration
> stop the thread at the right time anyway?

Not always. When several threads are used, the time expiration in non-zero
threads is currently triggered after the last schedule transaction, even if this
transaction is long after the expiration, which means it runs overtime:
 sh> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2 starting vacuum...end. progress: 1.0 s, 0.0 tps, lat -nan ms stddev -nan,
lag-nan ms progress: 2.0 s, 1.0 tps, lat 12.129 ms stddev 0.000, lag 1.335 ms progress: 3.0 s, 0.0 tps, lat -nan ms
stddev-nan, lag -nan ms progress: 4.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms progress: 5.0 s, 0.0 tps, lat
-nanms stddev -nan, lag -nan ms < 6 seconds wait for a schedule transaction in thread 1   no report because thread 0
expired...>transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 2 number of
threads:2 duration: 5 s number of transactions actually processed: 2 latency average: 14.648 ms latency stddev: 2.518
msrate limit schedule lag: avg 5.598 (max 9.861) ms tps = 0.180517 (including connections establishing) tps = 0.180566
(excludingconnections establishing)
 
 real    0m11.158s           ^^ 11 seconds, not 5. user    0m0.041s sys     0m0.013s

> I mean, sure, in theory it's wasteful for the thread to sit around doing
> nothing waiting for the timer to expire, but it's not evident to me that hurts
> anything, really.

I compute stats on the output, including the progress report, to check for
responsiveness (eg it is not stuck somewhere because of a checkpoint which
triggered some IO storm or whatever), for that purpose it is better for the
trace to be there as expected.

> >  - when there is a progress (-P) report, ensure that all progress
> >    reports are shown even if no more transactions are schedule.
> 
> That's pretty ugly - it would be easy for the test at the top of the
> loop to be left out of sync with the similar test inside the loop by
> some future patch.

Hmmm. Cannot help it.

A cleaner fix would be to have the main thread do only the progress and periodic
stats aggregation, while other threads would do actual transactions, however
that would break pgbench working without threads, so I think this is a no go.

> And again, I wonder why this is really a bug.

Well, if you are fine to set "-T 5 -P 1" and the run not to last 5 seconds nor
print any report every second, then it is not a bug:
 sh> time ./pgbench -T 5 -P 1 -R 0.1 -c 2 -j 2 starting vacuum...end. < UNLUCKY, NO PROGRESS REPORT AT ALL...>
transactiontype: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 2 number of
threads:2 duration: 5 s number of transactions actually processed: 1 latency average = 16.198 ms latency stddev = 0.000
msrate limit schedule lag: avg 4.308 (max 4.308) ms tps = 0.236361 (including connections establishing) tps = 0.236771
(excludingconnections establishing)
 
 real    0m4.256s

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: "using previous checkpoint record at" maybe not the greatest idea?
Next
From: Vik Fearing
Date:
Subject: Re: proposal: make NOTIFY list de-duplication optional