Thread: pgbench --latency-limit option
While playing with 9.5's pgbench, I faced with a strange behavior. $ pgbench -p 11002 --rate 2 --latency-limit 1 -c 10 -T 10 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 10 s number of transactions actually processed: 16 number of transactions skipped: 0 (0.000 %) number of transactions above the 1.0 ms latency limit: 16 (100.000 %) latency average: 5.518 ms latency stddev: 1.834 ms rate limit schedule lag: avg 0.694 (max 1.823) ms tps = 1.599917 (including connections establishing) tps = 1.600319 (excluding connections establishing) From the pgbench manual: <term><option>--latency-limit=</option><replaceable>limit</></term> <listitem> <para> Transaction whichlast more than <replaceable>limit</> milliseconds are counted and reported separately, as <firstterm>late</>. </para> <para> When throttling is used (<option>--rate=...</>), transactions that lag behind schedule by more than <replaceable>limit</> ms, and thus have no hope of meeting the latency limit, arenot sent to the server at all. They are counted and reported separately as <firstterm>skipped</>. In my understanding, this says: any transaction takes longer than the duration specified by --latency-limit (in this case 1.0 ms) will not be sent the sever. In the case above all (16) transactions were behind the latency limit 1.0 ms: number of transactions above the 1.0 ms latency limit: 16 (100.000 %) So the number of skipped transactions should be 16 (100%), rather than: number of transactions skipped: 0 (0.000 %) in this case I think. Am I missing something? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Tue, Dec 22, 2015 at 9:28 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: > While playing with 9.5's pgbench, I faced with a strange behavior. > > $ pgbench -p 11002 --rate 2 --latency-limit 1 -c 10 -T 10 test > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 10 > number of threads: 1 > duration: 10 s > number of transactions actually processed: 16 > number of transactions skipped: 0 (0.000 %) > number of transactions above the 1.0 ms latency limit: 16 (100.000 %) > latency average: 5.518 ms > latency stddev: 1.834 ms > rate limit schedule lag: avg 0.694 (max 1.823) ms > tps = 1.599917 (including connections establishing) > tps = 1.600319 (excluding connections establishing) > > From the pgbench manual: > > <term><option>--latency-limit=</option><replaceable>limit</></term> > <listitem> > <para> > Transaction which last more than <replaceable>limit</> milliseconds > are counted and reported separately, as <firstterm>late</>. > </para> > <para> > When throttling is used (<option>--rate=...</>), transactions that > lag behind schedule by more than <replaceable>limit</> ms, and thus > have no hope of meeting the latency limit, are not sent to the server > at all. They are counted and reported separately as > <firstterm>skipped</>. > > In my understanding, this says: any transaction takes longer than the > duration specified by --latency-limit (in this case 1.0 ms) will not > be sent the sever. I don't think that's what it says. There seem to be three points here: 1. If the transaction is sent to the server, we'll check whether it runs for longer than the amount of time specified by the limit; if so, it will be reported separately. This is true with or without --rate. 2. If --rate is used, we'll calculate the latency for each statement based on the time it was due to be sent, rather than the time it actually got sent. (This is further clarified in the documentation for --rate.) 3. If --rate is used and the server is so far behind that --latency-limit cannot possibly be met, then we'll skip sending the query at all. In your example, you've got 10 connections and are trying to run at 2 tps, so to avoid having to start skipping things you need transaction response times to be <~ 5 ms. The actual response time is ~5.5ms, so if you ran the test for longer I think you would see some skips. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert & Tatsuo, Some paraphrasing and additional comments. >> $ pgbench -p 11002 --rate 2 --latency-limit 1 -c 10 -T 10 test You are targetting 2 tps over 10 connections, so that is about one transaction every 5 seconds for each connection, the target is about 20 transactions in 10 seconds. You want transaction latency below 1 *ms*. >> number of transactions actually processed: 16 The stochastic process scheduled 16 transactions during the 10 seconds, 1.6 tps. In the long run it should be close to 2 tps, if the stochastic process does its job as expected. >> number of transactions skipped: 0 (0.000 %) All transactions were started (i.e. no transaction was skipped). >> number of transactions above the 1.0 ms latency limit: 16 (100.000 %) But none responded within 1 ms, they were all late. >> latency average: 5.518 ms >> latency stddev: 1.834 ms Indeed, unlikely to be under 1 ms. >> In my understanding, this says: any transaction takes longer than the >> duration specified by --latency-limit (in this case 1.0 ms) will not >> be sent the sever. We cannot know that a transaction will take longer before running it. > I don't think that's what it says. There seem to be three points here: > > 1. If the transaction is sent to the server, we'll check whether it > runs for longer than the amount of time specified by the limit; if so, > it will be reported separately. This is true with or without --rate. Yes. > 2. If --rate is used, we'll calculate the latency for each statement > based on the time it was due to be sent, rather than the time it > actually got sent. (This is further clarified in the documentation > for --rate.) Yes. The idea is that the client wanted (say a web server) to send a transaction a time t, but due to the load or whatever it may not have been able to send it at that time, but it sends it later. > 3. If --rate is used and the server is so far behind that > --latency-limit cannot possibly be met, then we'll skip sending the > query at all. Yes. The time when the client finally get to send the transaction, the current time is beyond schedule + delay limit, no way to get an answer in time, this simulates a client timeout, where the client gives up getting an answer. > In your example, you've got 10 connections and are trying to run at 2 > tps, so to avoid having to start skipping things you need transaction > response times to be <~ 5 ms. The actual response time is ~5.5ms, so > if you ran the test for longer I think you would see some skips. Probably no skips though, because the response time needed is below 5 *seconds*, not ms : 2 tps on 10 connections, 1 transaction every 5 seconds for each connection. -- Fabien.
On Wed, Dec 23, 2015 at 9:52 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> In your example, you've got 10 connections and are trying to run at 2 >> tps, so to avoid having to start skipping things you need transaction >> response times to be <~ 5 ms. The actual response time is ~5.5ms, so >> if you ran the test for longer I think you would see some skips. > > Probably no skips though, because the response time needed is below 5 > *seconds*, not ms : 2 tps on 10 connections, 1 transaction every 5 seconds > for each connection. Oops. Right. But why did this test only run 16 transactions in total instead of 20? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> Probably no skips though, because the response time needed is below 5 >> *seconds*, not ms : 2 tps on 10 connections, 1 transaction every 5 seconds >> for each connection. > > Oops. Right. But why did this test only run 16 transactions in total > instead of 20? Because the schedule is based on a stochastic process, transactions are not set regularly (that would induce patterns and is not representative of real-life load) but randomly. The long term average is expected to converge to 2 tps, but on a short run it may differ significantly. -- Fabien.
On Wed, Dec 23, 2015 at 11:23 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> Probably no skips though, because the response time needed is below 5 >>> *seconds*, not ms : 2 tps on 10 connections, 1 transaction every 5 >>> seconds >>> for each connection. >> >> Oops. Right. But why did this test only run 16 transactions in total >> instead of 20? > > Because the schedule is based on a stochastic process, transactions are not > set regularly (that would induce patterns and is not representative of > real-life load) but randomly. > > The long term average is expected to converge to 2 tps, but on a short run > it may differ significantly. Hmm. Is that documented somewhere? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> [...] >> Because the schedule is based on a stochastic process, transactions are not >> set regularly (that would induce patterns and is not representative of >> real-life load) but randomly. >> >> The long term average is expected to converge to 2 tps, but on a short run >> it may differ significantly. > > Hmm. Is that documented somewhere? Sure, see --rate option in pgbench doc, which states: "The rate is targeted by starting transactions along a Poisson-distributed schedule time line." The impact on the tps is implied, though. -- Fabien.