Thread: New features for pgbench

New features for pgbench

From
Greg Smith
Date:
The attached adds two new command line switches to pgbench:

-x:  Generate extended detail in the latency log, including a timestamp
for each transaction

-X:  Do extra cleanup after the run (vacuum on all tables, checkpoint)
before stopping the clock.  This gives substantially more consistancy in
results between runs.  Most pgbench results I see people present are so
short that they're skewed considerably by whether there was a checkpoint
in the middle of the run.  This also allows testing situations with
various autovacuum settings fairly.

There's an update to the README describing the features, as well as
correcting/extending some of the existing documentation.

I generated the patch from the 8.2.3 release.  Since pgbench runs the same
way unless you pass it one of the new flags, I was hoping this would be
considered for the next 8.2 update.  I have a series of additional scripts
I'll be releasing shortly that do interesting analysis of this extended
latency data from pgbench (graphs of TPS and latency, that sort of thing),
and I'd hate for that to only be available on 8.3.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Attachment

Re: New features for pgbench

From
Neil Conway
Date:
On Sun, 2007-02-11 at 20:32 -0500, Greg Smith wrote:
> The attached adds two new command line switches to pgbench:

FYI, context diffs are preferred.

> -x:  Generate extended detail in the latency log, including a timestamp
> for each transaction

I wonder if it's worth just making this the default.

> I generated the patch from the 8.2.3 release.  Since pgbench runs the same
> way unless you pass it one of the new flags, I was hoping this would be
> considered for the next 8.2 update.

Feature additions are usually severely frowned up in stable release
branches, but the standard for contrib/ changes is lower, and as you
say, there is no change in behavior if the options aren't used. I'm okay
with backporting it: if no one else objects, I'll apply this in a few
days.

-Neil



Re: New features for pgbench

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Sun, 2007-02-11 at 20:32 -0500, Greg Smith wrote:
>> -x:  Generate extended detail in the latency log, including a timestamp
>> for each transaction

> I wonder if it's worth just making this the default.

Does this have any impact on the reported results (by slowing pg_bench
itself)?  If not, then doing it always would be OK, but I'm not
convinced about that ...

> Feature additions are usually severely frowned up in stable release
> branches, but the standard for contrib/ changes is lower,

No, it isn't.  This is *not* a candidate for back-porting.

            regards, tom lane

Re: New features for pgbench

From
Neil Conway
Date:
On Sun, 2007-02-11 at 23:12 -0500, Tom Lane wrote:
> No, it isn't.  This is *not* a candidate for back-porting.

Why is that? It seems to me that the potential downside is essentially
zero. This is a developer-oriented benchmark tool, after all.

-Neil



Re: New features for pgbench

From
"Joshua D. Drake"
Date:
Neil Conway wrote:
> On Sun, 2007-02-11 at 23:12 -0500, Tom Lane wrote:
>> No, it isn't.  This is *not* a candidate for back-porting.
>
> Why is that? It seems to me that the potential downside is essentially
> zero. This is a developer-oriented benchmark tool, after all.

To me, it is a clear line. Once we accept for one, we may accept for
another.

Sincerely,

Joshua D. Drake

>
> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: New features for pgbench

From
Greg Smith
Date:
On Sun, 11 Feb 2007, Tom Lane wrote:

> Does this have any impact on the reported results (by slowing pg_bench
> itself)?

I didn't put more code than I had to in the transaction path, to avoid any
slowdown.  I didn't convert the timestamp to human readable format or
anything intensive like that to avoid impacting the pgbench results.
It's just dumping some data that was already sitting there.

There is an extra if statement for each transaction, and a slightly longer
fprintf when running with the extra latency output in place.  That's it.
The file gets "%d %d %.0f %d %ld %ld\n" instead of "%d %d %.0f\n"

The main drawback to logging more as the default is about twice as much
disk I/O for writing the latency log out.  That's small change compared
with the WAL/database writes that must be going on to generate that
transaction, and I sure haven't been able to measure any change in
results.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New features for pgbench

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> Neil Conway wrote:
>> On Sun, 2007-02-11 at 23:12 -0500, Tom Lane wrote:
>>> No, it isn't.  This is *not* a candidate for back-porting.
>>
>> Why is that? It seems to me that the potential downside is essentially
>> zero. This is a developer-oriented benchmark tool, after all.

> To me, it is a clear line. Once we accept for one, we may accept for
> another.

It's a new feature and we don't do new features in back branches.
I don't see that that rule is any weaker for contrib than elsewhere.
(Since the buildfarm fired up, there is no longer any meaningful
difference between contrib and core as far as quality requirements
go: you break contrib, you've broken the build.)

Possibly there would be exceptional cases where we'd allow it, but
certainly not for a patch that came in over the transom and hasn't
garnered any community consensus that it is an essential thing to have.

            regards, tom lane

Re: New features for pgbench

From
NikhilS
Date:
Hi,

On 2/12/07, Greg Smith <gsmith@gregsmith.com> wrote:
The attached adds two new command line switches to pgbench:

-x:  Generate extended detail in the latency log, including a timestamp
for each transaction

From your patch I see that it augments the -l flag. IMHO it does not make sense to add another flag. We can save the "if" check and log the extended contents as part of -l itself.

-X:  Do extra cleanup after the run (vacuum on all tables, checkpoint)
before stopping the clock.  This gives substantially more consistancy in
results between runs.  

I am sorry, but I do not understand the above. If I read it correctly, are you suggesting that the same database with a prior pgbench run be used for further pgbench runs? How is it useful? How can one guarantee consistency of observed tps values with this in place?

Regards,
Nikhils


--
EnterpriseDB               http://www.enterprisedb.com

Re: New features for pgbench

From
Greg Smith
Date:
On Mon, 12 Feb 2007, NikhilS wrote:

> From your patch I see that it augments the -l flag. IMHO it does not
> make sense to add another flag. We can save the "if" check and log the
> extended contents as part of -l itself.

I wanted something people could apply to 8.2 without breaking existing
scripts (regardless of whether it was accepted into 8.2).  And I expected
some concern over whether this change effects results.  By putting in a
switch, it's possible to test both ways, with only the if being added to
the default case.

> If I read it correctly, are you suggesting that the same database with a
> prior pgbench run be used for further pgbench runs? How is it useful?
> How can one guarantee consistency of observed tps values with this in
> place?

Right now when you run pgbench, the results vary considerably from run to
run even if you completely rebuild the database every time.  I've found
that a lot of that variation comes from two things:

1) If your run is so small that it usually doesn't generate a checkpoint,
the runs that do encounter one will be slower--possibly a lot slower if
you have a large buffer cache.  Similarly, runs that are just long enough
to normally encounter one checkpoint will take longer if they happen to
run into two, and so on.  There are games you can play with improving
pgbench performance by using more checkpoint_segments and a bigger
shared_buffer cache that look like they dramatically improve results.
But what you're mainly doing is just making checkpoints less frequent,
reducing the odds that you'll run into one during the pgbench test itself.

2) The standard pgbench test does 3 UPDATEs per transaction.  That leaves
behind a lot of dead rows that need to be vacuumed.  The amount of
autovacuum that occurs during the run will vary.  This means that some
runs finish with more dead space left behind than others.  It really isn't
fair that a pgbench run that involves cleaning up more of its own mess
during the test will get a lower TPS result than one that just generates a
bunch of dead tuples and leaves the mess hanging around.  Right now,
optimal pgbench results are generated with the autovacuum turned
completely off; that's just not realistic.

In order to get a completely fair comparison, I've adopted a policy that
says the run isn't over until the database has been cleaned up such that
it's in a similar state to how it started:  all tables are vacuumed, and
all updates have been written to disk.  The new -X behavior forces this
cleanup to be considered part of the test.  Whether or not you choose to
use it for your regular tests, I suggest trying it out.  You may be as
surprised as I was at exactly how much vacuuming work is leftover after a
long pgbench run, and how dramatically it lowers TPS results if you
consider that cleanup essential to the test.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New features for pgbench

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> Right now when you run pgbench, the results vary considerably from run to
> run even if you completely rebuild the database every time.  I've found
> that a lot of that variation comes from two things:

This is a real issue, but I think your proposed patch does not fix it.
A pgbench run will still be penalized according to the number of
checkpoints or autovacuums that happen while it occurs.  Guaranteeing
that there's at least one is maybe a bit more fair than allowing the
possibility of having none, but it's hardly a complete fix.  Also,
this approach means that short test runs will have artificially lower
TPS results than longer ones, because the fixed part of the maintenance
overhead is amortized over fewer transactions.

I believe it's a feature, not a bug, that Postgres shoves a lot of
maintenance out of the main transaction pathways and into background
tasks.  That allows us to deal with higher peak transaction rates than
we otherwise could do.  Maybe the right way to think about approaching
this issue is to try to estimate a "peak TPS" (what we can achieve
when no maintenance processing is happening) and a "long-term average TPS"
(net throughput allowing for maintenance processing).  I don't have a
specific suggestion about how to modify pgbench to account for this,
but I do think we need something more than a single TPS number if we
want to describe the system behavior well.

            regards, tom lane

Re: New features for pgbench

From
"Takayuki Tsunakawa"
Date:
Hello, Greg-san

> The attached adds two new command line switches to pgbench:

If you are OK, could you make the measurement of time on Windows more
precise?  The Win32 APIs that pgbench is using for gettimeofday() (in
src/port/gettimeofday.c) is much lower in resolution than Linux.  The
Win32 APIs provide about 15 milliseconds resolution, so I run
pgbench -c1 -t16000 the response times of more than 10000 transactions
are 0.  In contrast, Linux provides microssecond resolution.
Please excuse me for making an unrelated request, but this seems to be
a good chance...




Re: New features for pgbench

From
Greg Smith
Date:
On Mon, 12 Feb 2007, Tom Lane wrote:

> This is a real issue, but I think your proposed patch does not fix it.

I certainly wouldn't claim that my patch _fixes_ the problem in the
general case; it provides one way to measure it.  Currently it's not
obvious to new pgbench users that the problem even exists at all.  I feel
it's important to draw attention to the fact that it's something you
should be aware of, even if an automatic resolution to the problem isn't
obvious yet.

In the context I run pgbench in, it is also a workable fix.  I don't even
pay attention to pgbench results unless I'm popping 10,000 (desktop) to
100,000 (server) transactions through it.  In that context, I believe it
fairly penalizes the transactions for the data they leave behind for
maintenance.  I completely agree that people doing short runs shouldn't
use this switch.

Anyway, I like your idea of describing the lower TPS number as including
maintenance, that matches the terminology used within the documentation
better.  I will reformat the output to use that term.

Here's what I'm gonna do.  The patch I submitted was prepared with the
goal of possibly being implemented in 8.2.  I thought a change to contrib/
that added a feature turned off by default might have a shot at a
backport, and I wanted something people could use on the current release
to be available.  Now that I know it's never going into an offical 8.2, I
will prepare a slightly different patch aimed at 8.3--incorporating all
the feedback I've gotten here as either code changes or additional
documentation--and resubmit in another week or so.

Thanks for the feedback.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New features for pgbench

From
Greg Smith
Date:
On Tue, 13 Feb 2007, Takayuki Tsunakawa wrote:

> The Win32 APIs that pgbench is using for gettimeofday() (in
> src/port/gettimeofday.c) is much lower in resolution than Linux.

I wasn't aware of this issue, and it certainly makes the whole latency
side of pgbench pretty useless on Win32.  There is code in
src/include/executor/instrument.h that uses a higher resolution Windows
timer API than gettimeofday() does (as you point out, that one is only
resolves to one Windows tick, about 15ms).  If I can get a Windows build
environment setup, I'll see if I can borrow that solution for pgbench.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New features for pgbench

From
NikhilS
Date:
Hi,

Right now when you run pgbench, the results vary considerably from run to
run even if you completely rebuild the database every time.  I've found
that a lot of that variation comes from two things:

The main purpose of pgbench runs is an "apples to apples" comparison of 2 source bases. One pristine Postgresql source base and another base being the same source patched with supposed enhancements.

As long as we use the same postgresql.conf, same hardware environment and exactly same parameter pgbench runs, the difference in the TPS values observed between the 2 sources should be a good enough indicator as to the viability of the new code, dont you think?

E.g. autovacuum will trigger on certain tables only if the threshold is over the limit. So that gets tied in to the update rate. The "shared_buffers" will become a bottleneck only if the code and the run is I/O intensive enough etc.

IMHO, as long as the same environment holds true for both the source base runs, we should not see unexplained variations as per the reasons you have mentioned in the observed TPS values.

Regards,
Nikhils

--
EnterpriseDB               http://www.enterprisedb.com

Re: New features for pgbench

From
Tom Lane
Date:
NikhilS <nikkhils@gmail.com> writes:
> As long as we use the same postgresql.conf, same hardware environment and
> exactly same parameter pgbench runs, the difference in the TPS values
> observed between the 2 sources should be a good enough indicator as to the
> viability of the new code, dont you think?

pgbench has a long-standing, thoroughly earned reputation for producing
unrepeatable results.  While I agree that we shouldn't whack it around
without good cause, there's definitely some problems there, and I think
Greg is on to at least one of 'em.  The question is what's the best way
to fix it ...

            regards, tom lane

Re: New features for pgbench

From
Magnus Hagander
Date:
On Tue, Feb 13, 2007 at 01:08:04AM -0500, Greg Smith wrote:
> On Tue, 13 Feb 2007, Takayuki Tsunakawa wrote:
>
> >The Win32 APIs that pgbench is using for gettimeofday() (in
> >src/port/gettimeofday.c) is much lower in resolution than Linux.
>
> I wasn't aware of this issue, and it certainly makes the whole latency
> side of pgbench pretty useless on Win32.  There is code in
> src/include/executor/instrument.h that uses a higher resolution Windows
> timer API than gettimeofday() does (as you point out, that one is only
> resolves to one Windows tick, about 15ms).  If I can get a Windows build
> environment setup, I'll see if I can borrow that solution for pgbench.

As long as you only need to measure time *difference*, those are pretty
easy to use. Different from Unix, but easy. If you need to keep a
counter that contains actual time it can still be done, but it's a bit
more tricky (not really hard, though).

//Magnus

Re: New features for pgbench

From
Jan Wieck
Date:
On 2/12/2007 11:43 AM, Tom Lane wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
>> Right now when you run pgbench, the results vary considerably from run to
>> run even if you completely rebuild the database every time.  I've found
>> that a lot of that variation comes from two things:
>
> This is a real issue, but I think your proposed patch does not fix it.
> A pgbench run will still be penalized according to the number of
> checkpoints or autovacuums that happen while it occurs.  Guaranteeing
> that there's at least one is maybe a bit more fair than allowing the
> possibility of having none, but it's hardly a complete fix.  Also,
> this approach means that short test runs will have artificially lower
> TPS results than longer ones, because the fixed part of the maintenance
> overhead is amortized over fewer transactions.

Anything that doesn't run exclusively on the server, is given enough
data in size and enough time to similarly populate the buffer cache for
each run, WILL report more or less random TPS results. Real benchmarks
on considerable sized hardware have ramp-up times that are measured in
hours if not days, with the sole purpose of populating the cache and
thus smoothing out the transaction response profile. I think this change
is an entirely misleading approach to tackle the problem at hand.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #