Thread: [PATCH] add --throttle option to pgbench
Hello, Please find attached a small patch to add a throttling capability to pgbench, that is pgbench aims at a given client transaction rate instead of maximizing the load. The throttling relies on Poisson-distributed delays inserted after each transaction. I wanted that to test the impact of various load levels, and for functionnal tests on my laptop which should not drain the battery. sh> ./pgbench -T 10 -c 2 --throttle 10tps test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor:1 query mode: simple number of clients: 2 number of threads: 1 duration: 10 s number of transactions actually processed:214 tps = 21.054216 (including connections establishing) tps = 21.071253 (excluding connections establishing) -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > Please find attached a small patch to add a throttling capability to > pgbench, that is pgbench aims at a given client transaction rate instead > of maximizing the load. The throttling relies on Poisson-distributed > delays inserted after each transaction. I'm having a hard time understanding the use-case for this feature. Surely, if pgbench is throttling its transaction rate, you're going to just end up measuring the throttle rate. > I wanted that to test the impact of various load levels, and for > functionnal tests on my laptop which should not drain the battery. How does causing a test to take longer result in reduced battery drain? You still need the same number of transactions if you want an honest test, so it seems to me the machine would have to be on longer and thus you'd eat *more* battery to get an equivalently trustworthy result. regards, tom lane
On Mon, Apr 29, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Fabien COELHO <coelho@cri.ensmp.fr> writes:I'm having a hard time understanding the use-case for this feature.
> Please find attached a small patch to add a throttling capability to
> pgbench, that is pgbench aims at a given client transaction rate instead
> of maximizing the load. The throttling relies on Poisson-distributed
> delays inserted after each transaction.
Surely, if pgbench is throttling its transaction rate, you're going
to just end up measuring the throttle rate.
While I don't understand the part about his laptop battery, I think that there is a good use case for this. If you are looking at latency distributions or spikes, you probably want to see what they are like with a load which is like the one you expect having, not the load which is the highest possible. Although for this use case you would almost surely be using custom transaction files, not default ones, so I think you could just use \sleep. However, I don't know if there is an easy way to dynamically adjust the sleep value by subtracting off the overhead time and randomizing it a bit, like is done here.
It does seem to me that we should Poissonize the throttle time, then subtract the average overhead, rather than Poissonizing the difference.
Cheers,
Jeff
Hello Tom, > I'm having a hard time understanding the use-case for this feature. > Surely, if pgbench is throttling its transaction rate, you're going > to just end up measuring the throttle rate. Indeed, I do not want to measure the tps if I throttle it. The point is to generate a continuous but not necessarily maximal load, and to test other things under such load such as possiby cascading replication, failover, various dump strategies, whatever. >> I wanted that to test the impact of various load levels, and for >> functionnal tests on my laptop which should not drain the battery. > > How does causing a test to take longer result in reduced battery drain? If I test a replication setup on my laptop at maximum load, I can see the battery draining in a few seconds by looking at the effect on the time left widget. This remark is mostly for functional tests, not for performance test. If I want to test the maximum load of a setup, obviously I will not do that on my laptop, and I will not use --throttle... -- Fabien.
Hello Jeff, > While I don't understand the part about his laptop battery, I think that > there is a good use case for this. If you are looking at latency > distributions or spikes, you probably want to see what they are like with a > load which is like the one you expect having, not the load which is the > highest possible. Although for this use case you would almost surely be > using custom transaction files, not default ones, so I think you could just > use \sleep. However, I don't know if there is an easy way to dynamically > adjust the sleep value by subtracting off the overhead time and randomizing > it a bit, like is done here. Indeed, my thoughts:-) Having regularly (\sleep n) or uniformly distributed (\sleep :random_value) is not very realistic, and I would have to do some measures to find the right value for a target load. > It does seem to me that we should Poissonize the throttle time, then > subtract the average overhead, rather than Poissonizing the difference. I actually thought about doing it the way you suggested, because it was "right". However I did not do it, because if the Poisson gives, possibly quite frequently, a time below the transaction time, one ends up with an artificial sequence of stuck transactions, as a client cannot start the second transaction while the previous one is not finished, and this does not seem realistic. To really do that more cleanly, it would require distributing the events between clients, so having some kind of coordination between clients, which would really be another test application. Having an approximation of that seemed good enough for my purpose. -- Fabien.
> I'm having a hard time understanding the use-case for this feature. Here is an example functional use case I had in mind. Let us say I'm teaching a practice session about administrating replication. Students have a desktop computer on which they can install several instances or postgresql, or possibly use virtual machines. I'd like them to setup one server, put it under a continuous load, then create a first slave, then a second, and things like that. The thing I do not want is the poor desktop and its hard drive to be at maximum speed for the whole afternoon while doing the session, making it hard to do anything else on the host. So I want something both realistic (the database is under a load, the WAL is advancing, let us dump it, base backup it, replicate it, monitor it, update it, whatever...), but gentle all the same. Using pgbench with --throttle basically provides the adjustable continuous load I need. I understand that this is not at all the intent for which it was developed. Note that I will probably propose another patch to provide a heart beat while things are going on, but I thought that one patch at a time was enough. -- Fabien.
On 4/29/13 1:08 PM, Fabien COELHO wrote: > >> While I don't understand the part about his laptop battery, I think that >> there is a good use case for this. If you are looking at latency >> distributions or spikes, you probably want to see what they are like with a >> load which is like the one you expect having, not the load which is the >> highest possible. Although for this use case you would almost surely be >> using custom transaction files, not default ones, so I think you could just >> use \sleep. However, I don't know if there is an easy way to dynamically >> adjust the sleep value by subtracting off the overhead time and randomizing >> it a bit, like is done here. > > Indeed, my thoughts:-) Having regularly (\sleep n) or uniformly distributed (\sleep :random_value) is not very realistic,and I would have to do some measures to find the right value for a target load. +1 to being able to throttle to make latency measurements. I'm also wondering if it would be useful to be able to set a latency target and have something adjust concurrency to seehow well you can hit it. Certainly feature creep for the proposed patch; I only bring it up because there may be enoughsimilarity to consider that use case at this time, even if we don't implement it yet. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
> It does seem to me that we should Poissonize the throttle time, then > subtract the average overhead, rather than Poissonizing the difference. After thinking again about Jeff's point and failing to sleep, I think that doing exactly that is better because: - it is "right" - the code is simpler and shorter - my transaction stuck sequence issueis not that big an issue anyway Here is a patch to schedule transactions along Poisson-distributed events. This patch replaces my previous proposal. Note that there is no reference to the current time after the stochastic process is initiated. This is necessary, and mean that if transactions lag behind the throttle at some point they will try to catch up later. Neither a good nor a bad thing, mostly a feature. -- Fabien