Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers

From Greg Smith
Subject Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date
Msg-id 51E2E61F.5020807@2ndQuadrant.com
Whole thread Raw
In response to Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
List pgsql-hackers
On 7/13/13 12:13 PM, Fabien COELHO wrote:
> My 0.02€: if it means adding complexity to the pgbench code, I think
> that it is not worth it. The point of pgbench is to look at a steady
> state, not to end in the most graceful possible way as far as measures
> are concerned.

That's how some people use pgbench.  I'm just as likely to use it to
characterize system latency.  If there's a source of latency that's
specific to the pgbench code, I want that out of there even if it's hard.

But we don't have to argue about that because it isn't.  The attached
new patch seems to fix the latency spikes at the end, with -2 lines of
new code!  With that resolved I did a final pass across the rate limit
code too, attached as a v14 and ready for a committer.  I don't really
care what order these two changes are committed, there's no hard
dependency, but I would like to see them both go in eventually.

No functional code was changed from your v13 except for tweaking the
output.  The main thing I did was expand/edit comments and rename a few
variables to try and make this easier to read.  If you have any
objections to my cosmetic changes feel free to post an update.  I've put
a good bit of time into trying to simplify this further, thinking it
can't really be this hard.  But this seems to be the minimum complexity
that still works given the mess of the pgbench state machine.  Every
change I try now breaks something.

To wrap up the test results motivating my little pgbench-delay-finish
patch, the throttled cases that were always giving >100ms of latency
clustered at the end here now look like this:

average rate limit lag: 0.181 ms (max 53.108 ms)
tps = 10088.727398 (including connections establishing)
tps = 10105.775864 (excluding connections establishing)

There are still some of these cases where latency spikes, but they're
not as big and they're randomly distributed throughout the run.  The
problem I had with the ones at the end is how they tended to happen a
few times in a row.  I kept seeing multiple of these ~50ms lulls adding
up to a huge one, because the source of the lag kept triggering at every
connection close.

pgbench was already cleaning up all of its connections at the end, after
all the transactions were finished.  It looks safe to me to just rely on
that for calling PQfinish in the normal case.  And calls to client_done
already label themselves ok or abort, the code just didn't do anything
with that state before.  I tried adding some more complicated state
tracking, but that adds complexity while doing the exact same thing as
the simple implementation I did.

The only part of your code change I reverted was altering the latency
log transaction timestamps to read like "1373821907.65702" instead of
"1373821907 65702".  Both formats were considered when I added that
feature, and I completely understand why you'd like to change it.  One
problem is that doing so introduces a new class of float parsing and
rounding issues to consumers of that data.  I'd rather not revisit that
without a better reason to break the output format.  Parsing tools like
my pgbench-tools already struggle trying to support multiple versions of
pgbench, and I don't think there's enough benefit to the float format to
bother breaking them today.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pgsql: Optimize pglz compressor for small inputs.
Next
From: Greg Smith
Date:
Subject: Re: Improvement of checkpoint IO scheduler for stable transaction responses