Thread: New features for pgbench
The attached adds two new command line switches to pgbench: -x: Generate extended detail in the latency log, including a timestamp for each transaction -X: Do extra cleanup after the run (vacuum on all tables, checkpoint) before stopping the clock. This gives substantially more consistancy in results between runs. Most pgbench results I see people present are so short that they're skewed considerably by whether there was a checkpoint in the middle of the run. This also allows testing situations with various autovacuum settings fairly. There's an update to the README describing the features, as well as correcting/extending some of the existing documentation. I generated the patch from the 8.2.3 release. Since pgbench runs the same way unless you pass it one of the new flags, I was hoping this would be considered for the next 8.2 update. I have a series of additional scripts I'll be releasing shortly that do interesting analysis of this extended latency data from pgbench (graphs of TPS and latency, that sort of thing), and I'd hate for that to only be available on 8.3. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Attachment
On Sun, 2007-02-11 at 20:32 -0500, Greg Smith wrote: > The attached adds two new command line switches to pgbench: FYI, context diffs are preferred. > -x: Generate extended detail in the latency log, including a timestamp > for each transaction I wonder if it's worth just making this the default. > I generated the patch from the 8.2.3 release. Since pgbench runs the same > way unless you pass it one of the new flags, I was hoping this would be > considered for the next 8.2 update. Feature additions are usually severely frowned up in stable release branches, but the standard for contrib/ changes is lower, and as you say, there is no change in behavior if the options aren't used. I'm okay with backporting it: if no one else objects, I'll apply this in a few days. -Neil
Neil Conway <neilc@samurai.com> writes: > On Sun, 2007-02-11 at 20:32 -0500, Greg Smith wrote: >> -x: Generate extended detail in the latency log, including a timestamp >> for each transaction > I wonder if it's worth just making this the default. Does this have any impact on the reported results (by slowing pg_bench itself)? If not, then doing it always would be OK, but I'm not convinced about that ... > Feature additions are usually severely frowned up in stable release > branches, but the standard for contrib/ changes is lower, No, it isn't. This is *not* a candidate for back-porting. regards, tom lane
On Sun, 2007-02-11 at 23:12 -0500, Tom Lane wrote: > No, it isn't. This is *not* a candidate for back-porting. Why is that? It seems to me that the potential downside is essentially zero. This is a developer-oriented benchmark tool, after all. -Neil
Neil Conway wrote: > On Sun, 2007-02-11 at 23:12 -0500, Tom Lane wrote: >> No, it isn't. This is *not* a candidate for back-porting. > > Why is that? It seems to me that the potential downside is essentially > zero. This is a developer-oriented benchmark tool, after all. To me, it is a clear line. Once we accept for one, we may accept for another. Sincerely, Joshua D. Drake > > -Neil > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Sun, 11 Feb 2007, Tom Lane wrote: > Does this have any impact on the reported results (by slowing pg_bench > itself)? I didn't put more code than I had to in the transaction path, to avoid any slowdown. I didn't convert the timestamp to human readable format or anything intensive like that to avoid impacting the pgbench results. It's just dumping some data that was already sitting there. There is an extra if statement for each transaction, and a slightly longer fprintf when running with the extra latency output in place. That's it. The file gets "%d %d %.0f %d %ld %ld\n" instead of "%d %d %.0f\n" The main drawback to logging more as the default is about twice as much disk I/O for writing the latency log out. That's small change compared with the WAL/database writes that must be going on to generate that transaction, and I sure haven't been able to measure any change in results. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
"Joshua D. Drake" <jd@commandprompt.com> writes: > Neil Conway wrote: >> On Sun, 2007-02-11 at 23:12 -0500, Tom Lane wrote: >>> No, it isn't. This is *not* a candidate for back-porting. >> >> Why is that? It seems to me that the potential downside is essentially >> zero. This is a developer-oriented benchmark tool, after all. > To me, it is a clear line. Once we accept for one, we may accept for > another. It's a new feature and we don't do new features in back branches. I don't see that that rule is any weaker for contrib than elsewhere. (Since the buildfarm fired up, there is no longer any meaningful difference between contrib and core as far as quality requirements go: you break contrib, you've broken the build.) Possibly there would be exceptional cases where we'd allow it, but certainly not for a patch that came in over the transom and hasn't garnered any community consensus that it is an essential thing to have. regards, tom lane
Hi,
--
EnterpriseDB http://www.enterprisedb.com
On 2/12/07, Greg Smith <gsmith@gregsmith.com> wrote:
From your patch I see that it augments the -l flag. IMHO it does not make sense to add another flag. We can save the "if" check and log the extended contents as part of -l itself.
I am sorry, but I do not understand the above. If I read it correctly, are you suggesting that the same database with a prior pgbench run be used for further pgbench runs? How is it useful? How can one guarantee consistency of observed tps values with this in place?
Regards,
Nikhils
The attached adds two new command line switches to pgbench:
-x: Generate extended detail in the latency log, including a timestamp
for each transaction
From your patch I see that it augments the -l flag. IMHO it does not make sense to add another flag. We can save the "if" check and log the extended contents as part of -l itself.
-X: Do extra cleanup after the run (vacuum on all tables, checkpoint)
before stopping the clock. This gives substantially more consistancy in
results between runs.
I am sorry, but I do not understand the above. If I read it correctly, are you suggesting that the same database with a prior pgbench run be used for further pgbench runs? How is it useful? How can one guarantee consistency of observed tps values with this in place?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
On Mon, 12 Feb 2007, NikhilS wrote: > From your patch I see that it augments the -l flag. IMHO it does not > make sense to add another flag. We can save the "if" check and log the > extended contents as part of -l itself. I wanted something people could apply to 8.2 without breaking existing scripts (regardless of whether it was accepted into 8.2). And I expected some concern over whether this change effects results. By putting in a switch, it's possible to test both ways, with only the if being added to the default case. > If I read it correctly, are you suggesting that the same database with a > prior pgbench run be used for further pgbench runs? How is it useful? > How can one guarantee consistency of observed tps values with this in > place? Right now when you run pgbench, the results vary considerably from run to run even if you completely rebuild the database every time. I've found that a lot of that variation comes from two things: 1) If your run is so small that it usually doesn't generate a checkpoint, the runs that do encounter one will be slower--possibly a lot slower if you have a large buffer cache. Similarly, runs that are just long enough to normally encounter one checkpoint will take longer if they happen to run into two, and so on. There are games you can play with improving pgbench performance by using more checkpoint_segments and a bigger shared_buffer cache that look like they dramatically improve results. But what you're mainly doing is just making checkpoints less frequent, reducing the odds that you'll run into one during the pgbench test itself. 2) The standard pgbench test does 3 UPDATEs per transaction. That leaves behind a lot of dead rows that need to be vacuumed. The amount of autovacuum that occurs during the run will vary. This means that some runs finish with more dead space left behind than others. It really isn't fair that a pgbench run that involves cleaning up more of its own mess during the test will get a lower TPS result than one that just generates a bunch of dead tuples and leaves the mess hanging around. Right now, optimal pgbench results are generated with the autovacuum turned completely off; that's just not realistic. In order to get a completely fair comparison, I've adopted a policy that says the run isn't over until the database has been cleaned up such that it's in a similar state to how it started: all tables are vacuumed, and all updates have been written to disk. The new -X behavior forces this cleanup to be considered part of the test. Whether or not you choose to use it for your regular tests, I suggest trying it out. You may be as surprised as I was at exactly how much vacuuming work is leftover after a long pgbench run, and how dramatically it lowers TPS results if you consider that cleanup essential to the test. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > Right now when you run pgbench, the results vary considerably from run to > run even if you completely rebuild the database every time. I've found > that a lot of that variation comes from two things: This is a real issue, but I think your proposed patch does not fix it. A pgbench run will still be penalized according to the number of checkpoints or autovacuums that happen while it occurs. Guaranteeing that there's at least one is maybe a bit more fair than allowing the possibility of having none, but it's hardly a complete fix. Also, this approach means that short test runs will have artificially lower TPS results than longer ones, because the fixed part of the maintenance overhead is amortized over fewer transactions. I believe it's a feature, not a bug, that Postgres shoves a lot of maintenance out of the main transaction pathways and into background tasks. That allows us to deal with higher peak transaction rates than we otherwise could do. Maybe the right way to think about approaching this issue is to try to estimate a "peak TPS" (what we can achieve when no maintenance processing is happening) and a "long-term average TPS" (net throughput allowing for maintenance processing). I don't have a specific suggestion about how to modify pgbench to account for this, but I do think we need something more than a single TPS number if we want to describe the system behavior well. regards, tom lane
Hello, Greg-san > The attached adds two new command line switches to pgbench: If you are OK, could you make the measurement of time on Windows more precise? The Win32 APIs that pgbench is using for gettimeofday() (in src/port/gettimeofday.c) is much lower in resolution than Linux. The Win32 APIs provide about 15 milliseconds resolution, so I run pgbench -c1 -t16000 the response times of more than 10000 transactions are 0. In contrast, Linux provides microssecond resolution. Please excuse me for making an unrelated request, but this seems to be a good chance...
On Mon, 12 Feb 2007, Tom Lane wrote: > This is a real issue, but I think your proposed patch does not fix it. I certainly wouldn't claim that my patch _fixes_ the problem in the general case; it provides one way to measure it. Currently it's not obvious to new pgbench users that the problem even exists at all. I feel it's important to draw attention to the fact that it's something you should be aware of, even if an automatic resolution to the problem isn't obvious yet. In the context I run pgbench in, it is also a workable fix. I don't even pay attention to pgbench results unless I'm popping 10,000 (desktop) to 100,000 (server) transactions through it. In that context, I believe it fairly penalizes the transactions for the data they leave behind for maintenance. I completely agree that people doing short runs shouldn't use this switch. Anyway, I like your idea of describing the lower TPS number as including maintenance, that matches the terminology used within the documentation better. I will reformat the output to use that term. Here's what I'm gonna do. The patch I submitted was prepared with the goal of possibly being implemented in 8.2. I thought a change to contrib/ that added a feature turned off by default might have a shot at a backport, and I wanted something people could use on the current release to be available. Now that I know it's never going into an offical 8.2, I will prepare a slightly different patch aimed at 8.3--incorporating all the feedback I've gotten here as either code changes or additional documentation--and resubmit in another week or so. Thanks for the feedback. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Tue, 13 Feb 2007, Takayuki Tsunakawa wrote: > The Win32 APIs that pgbench is using for gettimeofday() (in > src/port/gettimeofday.c) is much lower in resolution than Linux. I wasn't aware of this issue, and it certainly makes the whole latency side of pgbench pretty useless on Win32. There is code in src/include/executor/instrument.h that uses a higher resolution Windows timer API than gettimeofday() does (as you point out, that one is only resolves to one Windows tick, about 15ms). If I can get a Windows build environment setup, I'll see if I can borrow that solution for pgbench. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Hi,
The main purpose of pgbench runs is an "apples to apples" comparison of 2 source bases. One pristine Postgresql source base and another base being the same source patched with supposed enhancements.
As long as we use the same postgresql.conf, same hardware environment and exactly same parameter pgbench runs, the difference in the TPS values observed between the 2 sources should be a good enough indicator as to the viability of the new code, dont you think?
E.g. autovacuum will trigger on certain tables only if the threshold is over the limit. So that gets tied in to the update rate. The "shared_buffers" will become a bottleneck only if the code and the run is I/O intensive enough etc.
IMHO, as long as the same environment holds true for both the source base runs, we should not see unexplained variations as per the reasons you have mentioned in the observed TPS values.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Right now when you run pgbench, the results vary considerably from run to
run even if you completely rebuild the database every time. I've found
that a lot of that variation comes from two things:
The main purpose of pgbench runs is an "apples to apples" comparison of 2 source bases. One pristine Postgresql source base and another base being the same source patched with supposed enhancements.
As long as we use the same postgresql.conf, same hardware environment and exactly same parameter pgbench runs, the difference in the TPS values observed between the 2 sources should be a good enough indicator as to the viability of the new code, dont you think?
E.g. autovacuum will trigger on certain tables only if the threshold is over the limit. So that gets tied in to the update rate. The "shared_buffers" will become a bottleneck only if the code and the run is I/O intensive enough etc.
IMHO, as long as the same environment holds true for both the source base runs, we should not see unexplained variations as per the reasons you have mentioned in the observed TPS values.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
NikhilS <nikkhils@gmail.com> writes: > As long as we use the same postgresql.conf, same hardware environment and > exactly same parameter pgbench runs, the difference in the TPS values > observed between the 2 sources should be a good enough indicator as to the > viability of the new code, dont you think? pgbench has a long-standing, thoroughly earned reputation for producing unrepeatable results. While I agree that we shouldn't whack it around without good cause, there's definitely some problems there, and I think Greg is on to at least one of 'em. The question is what's the best way to fix it ... regards, tom lane
On Tue, Feb 13, 2007 at 01:08:04AM -0500, Greg Smith wrote: > On Tue, 13 Feb 2007, Takayuki Tsunakawa wrote: > > >The Win32 APIs that pgbench is using for gettimeofday() (in > >src/port/gettimeofday.c) is much lower in resolution than Linux. > > I wasn't aware of this issue, and it certainly makes the whole latency > side of pgbench pretty useless on Win32. There is code in > src/include/executor/instrument.h that uses a higher resolution Windows > timer API than gettimeofday() does (as you point out, that one is only > resolves to one Windows tick, about 15ms). If I can get a Windows build > environment setup, I'll see if I can borrow that solution for pgbench. As long as you only need to measure time *difference*, those are pretty easy to use. Different from Unix, but easy. If you need to keep a counter that contains actual time it can still be done, but it's a bit more tricky (not really hard, though). //Magnus
On 2/12/2007 11:43 AM, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: >> Right now when you run pgbench, the results vary considerably from run to >> run even if you completely rebuild the database every time. I've found >> that a lot of that variation comes from two things: > > This is a real issue, but I think your proposed patch does not fix it. > A pgbench run will still be penalized according to the number of > checkpoints or autovacuums that happen while it occurs. Guaranteeing > that there's at least one is maybe a bit more fair than allowing the > possibility of having none, but it's hardly a complete fix. Also, > this approach means that short test runs will have artificially lower > TPS results than longer ones, because the fixed part of the maintenance > overhead is amortized over fewer transactions. Anything that doesn't run exclusively on the server, is given enough data in size and enough time to similarly populate the buffer cache for each run, WILL report more or less random TPS results. Real benchmarks on considerable sized hardware have ramp-up times that are measured in hours if not days, with the sole purpose of populating the cache and thus smoothing out the transaction response profile. I think this change is an entirely misleading approach to tackle the problem at hand. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #