Thread: [PATCH] add --progress option to pgbench

[PATCH] add --progress option to pgbench

From
Fabien COELHO
Date:
Please find attached a small patch submission, for reference to the next 
commit fest.

Each thread reports its progress about every the number of seconds 
specified with the option. May be particularly useful for long running 
pgbench invocations, which should always be the case.
 shell> ./pgbench -T 16 --progress 5 -c 4 -j 2 test starting vacuum...end. thread 0 running at 53.194457 tps after 5.0
sthread 1 running at 59.792203 tps after 5.0 s [ bzzzz... ] thread 0 running at 56.050592 tps after 10.0 s thread 1
runningat 54.075444 tps after 10.1 s [ bzzzz... ] thread 0 running at 49.746026 tps after 15.0 s thread 1 running at
48.560258tps after 15.1 s [ bzzzz... ] transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of
clients:4 number of threads: 2 duration: 16 s number of transactions actually processed: 1725 tps = 107.034958
(includingconnections establishing) tps = 107.094691 (excluding connections establishing)
 


-- 
Fabien.

Re: [PATCH] add --progress option to pgbench (submission 2)

From
Fabien COELHO
Date:
New submission which put option help in alphabetical position, as
per Peter Eisentraut f0ed3a8a99b052d2d5e0b6153a8907b90c486636

This is for reference to the next commitfest.

-- 
Fabien.

Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
> New submission which put option help in alphabetical position, as
> per Peter Eisentraut f0ed3a8a99b052d2d5e0b6153a8907b90c486636
>
> This is for reference to the next commitfest.

Patch update after conflict induced by pg-indentation, for the next 
commitfest.

-- 
Fabien.

Re: [PATCH] add --progress option to pgbench (submission 3)

From
KONDO Mitsumasa
Date:
Hi Febien,

I send you my review result and refactoring patch. I think that your patch has
good function and many people surely want to use! I hope that my review comment
will be good for your patch.


* 1. Complete words and variable in source code and sgml document.
It is readable for user and developper that new patch completes words and
variables in existing source code. For example, SECONDS is NUM etc.

* 2. Output format in result for more readable.
I think taht output format should be simple and intuitive. Your patch's output
format is not easy to read very much. So I fix simple format and add Average
latency. My proposed format is good for readable and programing processing.

> [mitsu-ko@localhost postgresql]$ bin/pgbench -T10 -P5 -c2 -j2
> starting vacuum...end.
> 5.0 s     [thread 1]: tps = 1015.576032, AverageLatency(ms) = 0.000984663
> 5.0 s     [thread 0]: tps = 1032.580794, AverageLatency(ms) = 0.000968447
> 10.0 s [thread 0]: tps = 1129.591189, AverageLatency(ms) = 0.000885276
> 10.0 s [thread 1]: tps = 1126.267776, AverageLatency(ms) = 0.000887888

However, interesting of output format(design) is different depending on the
person:-). If you like other format, fix it.


* 3. Thread name in output format is not nesessary.
I cannot understand that thread name is displayed in each progress. I think that
it does not need. I hope that output result sould be more simple also in a lot of
thread. My images is here,

> [mitsu-ko@localhost postgresql]$ bin/pgbench -T10 -P5 -c2 -j2
> starting vacuum...end.
> 5.0 s     : tps = 2030.576032, AverageLatency(ms) = 0.000984663
> 10.0 s : tps = 2250.591189, AverageLatency(ms) = 0.000885276

This output format is more simple and intuitive. If you need result in each
threads, please tell us the reason.


* 4. Slipping the progress time.
Whan I executed this patch in long time, I found slipping the progress time. This
problem image is here.

 > [mitsu-ko@localhost postgresql]$ bin/pgbench -T10 -P5 -c2
 > starting vacuum...end.
 > 5.1 s     : tps = 2030.576032, AverageLatency(ms) = 0.000984663
 > 10.2 s : tps = 2250.591189, AverageLatency(ms) = 0.000885276

It has problem in method of calculate progress time. It needs to fix collect, or
displaying time format will be like '13:00:00'. If you select later format, it
will fit in postgresql log and other contrib modules that are like
pg_stat_statements.


Best regards,
--
Mitsumasa KONDO

Attachment

Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
Hello Mitsumasa,

Thanks for the review.

> * 2. Output format in result for more readable.
>> 5.0 s     [thread 1]: tps = 1015.576032, AverageLatency(ms) = 0.000984663
>> 5.0 s     [thread 0]: tps = 1032.580794, AverageLatency(ms) = 0.000968447
>> 10.0 s [thread 0]: tps = 1129.591189, AverageLatency(ms) = 0.000885276
>> 10.0 s [thread 1]: tps = 1126.267776, AverageLatency(ms) = 0.000887888
>
> However, interesting of output format(design) is different depending on the 
> person:-). If you like other format, fix it.

I think that your suggestion is too verbose, and as far as automation is 
oncerned I like "cut -f 2" unix filtering and other gnuplot processing... 
but I see your point and it is a matter of taste. I'll try to propose 
something in between, if I can.

> * 3. Thread name in output format is not nesessary.
> I cannot understand that thread name is displayed in each progress. I think 
> that it does not need. I hope that output result sould be more simple also in 
> a lot of thread. My images is here,
>
>> 5.0 s     : tps = 2030.576032, AverageLatency(ms) = 0.000984663
>> 10.0 s : tps = 2250.591189, AverageLatency(ms) = 0.000885276
>
> This output format is more simple and intuitive. If you need result in each 
> threads, please tell us the reason.

I agree that it would be better, but only a thread has access to its data, 
if it must work with the "fork" pthread emulation, so each thread has to 
do its report... If the "fork" emulation is removed and only real threads 
are used, it would be much better, and one thread would be able to report 
for everyone. The alternative is to do a feature which does not work with
fork emulation.

> * 4. Slipping the progress time.
> Whan I executed this patch in long time, I found slipping the progress time. 
> This problem image is here.

Yep. I must change the test to align on the overall start time.

I'll submit a new patch later.

-- 
Fabien.



Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
Dear Matsumasa,

Here is a v4 that takes into account most of your points: The report is 
performed for all threads by thread 0, however --progress is not supported 
under thread fork emulation if there are more than one thread. The report 
time does not slip anymore.

However I've kept the format scarse. It is a style thing:-) and it is more 
consistent with the kind of format used in the log. I have not added the 
"latency" measure because it is redundant with the tps, and the latency 
that people are expecting is the actual latency of each transactions, not 
the apparent latency of transactions running in parallel, which is really 
a throuput.

-- 
Fabien.

Re: [PATCH] add --progress option to pgbench (submission 3)

From
KONDO Mitsumasa
Date:
Hello Fevien,

Thank you for your fast work and reply. I try to test your new patch until next 
week.

(2013/06/26 20:16), Fabien COELHO wrote:
> Here is a v4 that takes into account most of your points: The report is performed
> for all threads by thread 0, however --progress is not supported under thread
> fork emulation if there are more than one thread. The report time does not slip
> anymore.
Good! I think that you try to talk to commiter about implimentaion of progress 
output in ready for commiter. It is good for patch that giving advices by many 
people.

> However I've kept the format scarse. It is a style thing:-) and it is more
> consistent with the kind of format used in the log. I have not added the
> "latency" measure because it is redundant with the tps, and the latency that
> people are expecting is the actual latency of each transactions, not the apparent
> latency of transactions running in parallel, which is really a throuput.
As I know, famous NoSQL benchmark program which was called YCSB is display 
latency measure. I think that TPS indicates system performance for system 
administrator, and latency indicates service performance for end user, in custom 
benchmarks. It might be redundant, but it would be needed by some engineer who 
cannot decide to select PostgreSQL or other database such like NoSQL. It is also 
good to talk to committer and other people. Objective opinion is important!

Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center








Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
Dear Mitsumasa,

> As I know, famous NoSQL benchmark program which was called YCSB is display 
> latency measure. I think that TPS indicates system performance for system 
> administrator, and latency indicates service performance for end user, in 
> custom benchmarks.

Sure. I agree that both information are very useful.

If I show a latency at full load, that would be "nclients/tps", not 
"1/tps". However, I'm hoping to pass the throttling patch to pgbench, in 
which case the latency to show is a little bit different because the 
"nclients/tps" would include sleep time and does not correspond to the 
latency for the end user. Also, under throttling it would also be useful 
to show the "time lag" behind scheduled transactions.

So I would like to know whether the throttling patch is committed and then 
update the progress patch to take that into account.

-- 
Fabien.



Re: [PATCH] add --progress option to pgbench (submission 3)

From
KONDO Mitsumasa
Date:
Dear Febien

(2013/06/27 14:39), Fabien COELHO wrote:
> If I show a latency at full load, that would be "nclients/tps", not "1/tps".
> However, I'm hoping to pass the throttling patch to pgbench, in which case the
> latency to show is a little bit different because the "nclients/tps" would
> include sleep time and does not correspond to the latency for the end user. Also,
> under throttling it would also be useful to show the "time lag" behind scheduled
> transactions.
All right. Of Corse, I consider your wishing functions is realized with best 
implementation.

> So I would like to know whether the throttling patch is committed and then update
> the progress patch to take that into account.
OK! I watch it and use it.

Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center




Re: [PATCH] add --progress option to pgbench (submission 3)

From
Robert Haas
Date:
On Wed, Jun 26, 2013 at 7:16 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Here is a v4 that takes into account most of your points: The report is
> performed for all threads by thread 0, however --progress is not supported
> under thread fork emulation if there are more than one thread. The report
> time does not slip anymore.

I don't believe that to be an acceptable restriction.  We generally
require features to work on all platforms we support.  We have made
occasional compromises there, but generally only when the restriction
is fundamental to the platform rather than for developer convenience.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
Dear Robert,

>> Here is a v4 that takes into account most of your points: The report is
>> performed for all threads by thread 0, however --progress is not supported
>> under thread fork emulation if there are more than one thread. The report
>> time does not slip anymore.
>
> I don't believe that to be an acceptable restriction.

The "pthread fork emulation" is just an ugly hack to run pgbench on a host 
that does not have pthreads (portable threads). I'm not sure that it 
applies on any significant system, but I can assure you that it imposes 
severe limitations about how to do things properly in pgbench: As there is 
no threads, there is no shared memory, no locking mecanism, nothing 
really. So it is hard to generated a shared report in such conditions.

My first proposal is to remove the fork emulation altogether, which would 
remove many artificial limitations to pgbench and simplify the code 
significantly. That would be an improvement.

Otherwise, he simplest possible adaptation, if it is required to have the 
progress feature under fork emulation to pass it, is that under "fork 
emulation" each processus reports its current progress instead of having a 
collective summing.

Note that it is possible to implement the feature with interprocess 
communications, but really generating many pipes will add a lot of 
complexity to the code, and I do not thing that the code nor this simple 
feature deserve that.

Another option is to have each thread to report its progression 
indenpently with all implementations, that what I did in the first 
instance. It is much less interesting, but it would be homogeneous 
although poor for every versions.

> We generally require features to work on all platforms we support.  We 
> have made occasional compromises there, but generally only when the 
> restriction is fundamental to the platform rather than for developer 
> convenience.

I agree with this kind of "generally", but please consider that "pthread 
fork emulation" really means "processes", so that simple things with 
threads become significantly more complex to implement.

-- 
Fabien.



Re: [PATCH] add --progress option to pgbench (submission 3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>>> Here is a v4 that takes into account most of your points: The report is
>>> performed for all threads by thread 0, however --progress is not supported
>>> under thread fork emulation if there are more than one thread. The report
>>> time does not slip anymore.

>> I don't believe that to be an acceptable restriction.

> My first proposal is to remove the fork emulation altogether, which would 
> remove many artificial limitations to pgbench and simplify the code 
> significantly. That would be an improvement.

I would object strongly to that, as it would represent a significant
movement of the goalposts on what is required to build Postgres at all,
ie platforms on which --enable-thread-safety is unavailable or expensive
would be out in the cold.  Perhaps that set is approaching empty, but a
project that's still standardized on C89 has little business making such
a choice IMO.

> Otherwise, he simplest possible adaptation, if it is required to have the 
> progress feature under fork emulation to pass it, is that under "fork 
> emulation" each processus reports its current progress instead of having a 
> collective summing.

Perhaps that's worth doing.  I agree with Fabien that full support of
this feature in the process model is more trouble than it's worth,
though, and I wouldn't scream loudly if we just didn't support it.
--disable-thread-safety doesn't have to be entirely penalty-free.
        regards, tom lane



Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
>> Otherwise, he simplest possible adaptation, if it is required to have the
>> progress feature under fork emulation to pass it, is that under "fork
>> emulation" each processus reports its current progress instead of having a
>> collective summing.
>
> Perhaps that's worth doing.  I agree with Fabien that full support of
> this feature in the process model is more trouble than it's worth,
> though, and I wouldn't scream loudly if we just didn't support it.
> --disable-thread-safety doesn't have to be entirely penalty-free.

Attached is patch version 5.

It includes this solution for fork emulation, one report per thread 
instead of a global report. Some code duplication for that.

It also solves conflicts introduced by the long options patch.

Finally, I've added a latency measure as defended by Mitsumasa. However 
the formula must be updated for the throttling patch.

Maybe I should have submitted a bunch of changes to pgbench in one patch. 
I thought that separating orthogonal things made reviewing simpler so the 
patches were more likely to pass, but I'm not so sure that the other 
strategy would have been that bad.

-- 
Fabien.

Re: [PATCH] add --progress option to pgbench (submission 3)

From
KONDO Mitsumasa
Date:
(2013/06/28 3:17), Fabien COELHO wrote:
> Attached is patch version 5.
>
> It includes this solution for fork emulation, one report per thread instead of a
> global report. Some code duplication for that.
It's good coding. I test configure option with --disable-thread-safety and not. 
My test results were same as your proposal. It fix problems that compatiblity and 
progress time is off to the side, too. Here is the test results.

* with --disable-thread-safety
[mitsu-ko@localhost postgresql]$ bin/pgbench -T 600 -c10 -j5 -P 5
starting vacuum...end.
progress 1: 5.0 s, 493.8 tps, 4.050 ms lat
progress 2: 5.0 s, 493.2 tps, 4.055 ms lat
progress 3: 5.0 s, 474.6 tps, 4.214 ms lat
progress 4: 5.0 s, 479.1 tps, 4.174 ms lat
progress 0: 5.0 s, 469.5 tps, 4.260 ms lat

* without --disable-thread-safety (normal)
[mitsu-ko@localhost postgresql]$ bin/pgbench -T 600 -c10 -j5 -P 5
starting vacuum...end.
progress: 5.0 s, 2415.0 tps, 4.141 ms lat
progress: 10.0 s, 2445.5 tps, 4.089 ms lat
progress: 15.0 s, 2442.2 tps, 4.095 ms lat
progress: 20.0 s, 2414.3 tps, 4.142 ms lat

> Finally, I've added a latency measure as defended by Mitsumasa. However the
> formula must be updated for the throttling patch.
Thanks! In benchmark test, it is not good to over throttle. It is difficult to 
set appropriate options which are number of client or number of threads. These 
result will help to set appropriate throttle options. We can easy to search by 
these information which is indicated as high tps and low latency as possible.

I have small comments. I think that 'lat' is not generally abbreviation of 
'latency'. But I don't know good abbreviation. If you have any good abbreviation, 
please send us revise version. And, please fix under following code. It might be 
degrade by past your patches.

-           "  -P, --progress SEC       show thread progress report every SEC seconds\n"
+           "  -P, --progress NUM       show thread progress report every NUM seconds\n"

-                tps = 1000000.0 * (count-last_count) / run;
+                tps = 1000000.0 * (count - last_count) / run;

My comments are that's all. If you send latest patch, I'm going to set ready for 
commiter.


I also test your throttle patch. My impression of this patch is good, but it does 
not necessary to execute with progress option. Because, in the first place, 
throttle patch is controlling transaction of pgbench, and it does not need to 
display progress which will be same information which is expected by a user. A 
user who uses throttle patch will think that throttle patch can control 
transaction exactly, and it is not debugging option. So I think that it had 
better to increase the accuracy of throttle patch, and it does not need to exist 
together of both patches. If you think that it cannot exist together, I suggest 
that forbidding simultaneously progress option and throttle option.

Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



Re: [PATCH] add --progress option to pgbench (submission 3)

From
Fabien COELHO
Date:
Dear Mitsumasa,

> I have small comments. I think that 'lat' is not generally abbreviation 
> of 'latency'. But I don't know good abbreviation. If you have any good 
> abbreviation, please send us revise version.

I needed something short, because I may add a "lag" time as well under 
throttling. No better idea.

> And, please fix under following code. It might be degrade by past your 
> patches.

Done. I've also put the long option definition at its right place in the 
alphabetical order.

> My comments are that's all. If you send latest patch, I'm going to set ready 
> for commiter.

Please find attached version 6.

> I also test your throttle patch. My impression of this patch is good, but it 
> does not necessary to execute with progress option. [...]

I agree that it is not necessary. However for my use case it would be 
useful to have both throttling & progress at the same time, in particular 
to check the effect of other concurrent operations (eg. pg_dump, 
pg_basebackup) while a bench is running.

-- 
Fabien.

Re: [PATCH] add --progress option to pgbench (submission 3)

From
KONDO Mitsumasa
Date:
Hi, Febien

Thanks for your fast response and fix! I set your patch ready for commiter now.

(2013/07/01 19:49), Fabien COELHO wrote:
>> I have small comments. I think that 'lat' is not generally abbreviation of
>> 'latency'. But I don't know good abbreviation. If you have any good
>> abbreviation, please send us revise version.
>
> I needed something short, because I may add a "lag" time as well under
> throttling. No better idea.
OK. We have no idea:-)

>> And, please fix under following code. It might be degrade by past your patches.
>
> Done. I've also put the long option definition at its right place in the
> alphabetical order.
Oh, I leak it in my review. Thanks.

>> I also test your throttle patch. My impression of this patch is good, but it
>> does not necessary to execute with progress option. [...]
>
> I agree that it is not necessary. However for my use case it would be useful to
> have both throttling & progress at the same time, in particular to check the
> effect of other concurrent operations (eg. pg_dump, pg_basebackup) while a bench
> is running.
It is very dicreet checking! I think it is important for momentous systems, too. 
If I have time for reviewing throttle patch for more detail, I will send you 
comment. I hope both patches are commited.

Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center