Thread: too much pgbench init output

too much pgbench init output

From

Peter Eisentraut

Date:

01 September 2012, 04:00:45

When initializing a large database, pgbench writes tons of "%d tuples
done" lines.  I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

Attachment

pg-pgbench-report.patch

Re: too much pgbench init output

From

Robert Haas

Date:

01 September 2012, 10:30:13

On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> When initializing a large database, pgbench writes tons of "%d tuples
> done" lines.  I propose to change this to a sort of progress counter
> that stays on the same line, as in the attached patch.

I'm not sure I like this - what if the output is being saved off to a file?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: too much pgbench init output

From

"Tomas Vondra"

Date:

01 September 2012, 20:10:09

On 1 Září 2012, 12:30, Robert Haas wrote:
> On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> When initializing a large database, pgbench writes tons of "%d tuples
>> done" lines.  I propose to change this to a sort of progress counter
>> that stays on the same line, as in the attached patch.
>
> I'm not sure I like this - what if the output is being saved off to a
> file?

What about using istty(stdout) to handle this situation? Although I find
it usually confusing, because it prints one thing when executed directly
and something else when the output is redirected to a file.

I see two other options:

(1) removing this output altogether (I can't imagine a situation when this
really matters) and replace it with a simple "inserted 23% of rows,
estimated remaining time 14:23 (863 sec)", updated each 1%

(2) adding a switch (--verbose) that enables these lines, don't print them
by default

Another option might be updating the process title so that the "top" shows
current progress more precisely than (1).

Tomas

Re: too much pgbench init output

From

Peter Eisentraut

Date:

04 September 2012, 22:00:52

On 9/1/12 6:30 AM, Robert Haas wrote:
> On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> When initializing a large database, pgbench writes tons of "%d tuples
>> done" lines.  I propose to change this to a sort of progress counter
>> that stays on the same line, as in the attached patch.
> 
> I'm not sure I like this - what if the output is being saved off to a file?

I suppose we could print \n instead of \r then.

Re: too much pgbench init output

From

Tom Lane

Date:

05 September 2012, 00:35:49

Peter Eisentraut <peter_e@gmx.net> writes:
> On 9/1/12 6:30 AM, Robert Haas wrote:
>> On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> When initializing a large database, pgbench writes tons of "%d tuples
>>> done" lines.  I propose to change this to a sort of progress counter
>>> that stays on the same line, as in the attached patch.

>> I'm not sure I like this - what if the output is being saved off to a file?

> I suppose we could print \n instead of \r then.

Possibly off-the-wall idea: we could fix the "too much output" problem
once and for all by going to a log scale.
10 tuples done100 tuples done1000 tuples done10000 tuples done100000 tuples done...
        regards, tom lane

Re: too much pgbench init output

From

Robert Haas

Date:

05 September 2012, 03:14:30

On Tue, Sep 4, 2012 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On 9/1/12 6:30 AM, Robert Haas wrote:
>>> On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>>> When initializing a large database, pgbench writes tons of "%d tuples
>>>> done" lines.  I propose to change this to a sort of progress counter
>>>> that stays on the same line, as in the attached patch.
>
>>> I'm not sure I like this - what if the output is being saved off to a file?
>
>> I suppose we could print \n instead of \r then.
>
> Possibly off-the-wall idea: we could fix the "too much output" problem
> once and for all by going to a log scale.
>
>         10 tuples done
>         100 tuples done
>         1000 tuples done
>         10000 tuples done
>         100000 tuples done
>         ...

I don't like that, because one of the things you can see by following
the current output is where the checkpoint stalls are happening during
the load.  You'd lose the ability to notice any kind of slowdown after
the first few tuples with this kind of format.

Actually, this whole things seems like a solution in search of a
problem to me.  We just reduced the verbosity of pgbench -i tenfold in
the very recent past - I would have thought that enough to address
this problem.  But maybe not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: too much pgbench init output

From

Peter Eisentraut

Date:

05 September 2012, 03:31:10

On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:
> Actually, this whole things seems like a solution in search of a
> problem to me.  We just reduced the verbosity of pgbench -i tenfold in
> the very recent past - I would have thought that enough to address
> this problem.  But maybe not.

The problem is that

a) It blasts out too much output and everything scrolls off the screen,
and

b) There is no indication of where the end is.

These are independent problems, and I'd be happy to address them
separately if there are such specific concerns attached to this.

Speaking of tenfold, we could reduce the output frequency tenfold to
once every 1000000, which would alleviate this problem for a while
longer.

Re: too much pgbench init output

From

Tom Lane

Date:

05 September 2012, 03:44:48

Peter Eisentraut <peter_e@gmx.net> writes:
> On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:
>> Actually, this whole things seems like a solution in search of a
>> problem to me.  We just reduced the verbosity of pgbench -i tenfold in
>> the very recent past - I would have thought that enough to address
>> this problem.  But maybe not.

> The problem is that

> a) It blasts out too much output and everything scrolls off the screen,
> and

Robert evidently thinks that the verbosity of the output is a feature
not a bug.  I'm not convinced that eyeballing pgbench output is a
particularly useful way to measure checkpoint stalls, but ...

> b) There is no indication of where the end is.

Well, surely *that* can be fixed in a noncontroversial way: just
print "M/N tuples done", where N is the target.
        regards, tom lane

Re: too much pgbench init output

From

Robert Haas

Date:

05 September 2012, 04:17:09

On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:
>> Actually, this whole things seems like a solution in search of a
>> problem to me.  We just reduced the verbosity of pgbench -i tenfold in
>> the very recent past - I would have thought that enough to address
>> this problem.  But maybe not.
>
> The problem is that
>
> a) It blasts out too much output and everything scrolls off the screen,
> and
>
> b) There is no indication of where the end is.
>
> These are independent problems, and I'd be happy to address them
> separately if there are such specific concerns attached to this.
>
> Speaking of tenfold, we could reduce the output frequency tenfold to
> once every 1000000, which would alleviate this problem for a while
> longer.

Well, I wouldn't object to displaying a percentage on each output
line.  But I don't really like the idea of having them less frequent
than they already are, because if you run into a situation that makes
pgbench -i run slowly, as I occasionally do, it's marginal to tell the
difference between "slow" and "completely hung" even with the current
level of verbosity.

However, we could add a -q flag to run more quietly, or something like
that.  Actually, I'd even be fine with making the default quieter,
though we can't use -v for verbose since that's already taken.  But
I'd like to preserve the option of getting the current amount of
output because sometimes I need that to troubleshoot problems.
Actually it'd be nice to even get a bit more output: say, a timestamp
on each line, and a completion percentage... but now I'm getting
greedy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: too much pgbench init output

From

Pavan Deolasee

Date:

05 September 2012, 04:39:12

On Wed, Sep 5, 2012 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Actually it'd be nice to even get a bit more output: say, a timestamp
on each line, and a completion percentage... but now I'm getting
greedy.

May be we need a verbosity level and print a lot less or a lot more information than what we do today. That will satisfy everyone. Hopefully.

Thanks,

Pavan

Re: too much pgbench init output

From

Peter Eisentraut

Date:

06 September 2012, 03:14:21

On Tue, 2012-09-04 at 23:44 -0400, Tom Lane wrote:
> > b) There is no indication of where the end is.
> 
> Well, surely *that* can be fixed in a noncontroversial way: just
> print "M/N tuples done", where N is the target.

I have made this change.  I won't pursue using \r if others find it
useful as is.

Re: too much pgbench init output

From

Tomas Vondra

Date:

16 September 2012, 22:26:15

On 5.9.2012 06:17, Robert Haas wrote:
> On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:
>>> Actually, this whole things seems like a solution in search of a
>>> problem to me.  We just reduced the verbosity of pgbench -i tenfold in
>>> the very recent past - I would have thought that enough to address
>>> this problem.  But maybe not.
>>
>> The problem is that
>>
>> a) It blasts out too much output and everything scrolls off the screen,
>> and
>>
>> b) There is no indication of where the end is.
>>
>> These are independent problems, and I'd be happy to address them
>> separately if there are such specific concerns attached to this.
>>
>> Speaking of tenfold, we could reduce the output frequency tenfold to
>> once every 1000000, which would alleviate this problem for a while
>> longer.
>
> Well, I wouldn't object to displaying a percentage on each output
> line.  But I don't really like the idea of having them less frequent
> than they already are, because if you run into a situation that makes
> pgbench -i run slowly, as I occasionally do, it's marginal to tell the
> difference between "slow" and "completely hung" even with the current
> level of verbosity.
>
> However, we could add a -q flag to run more quietly, or something like
> that.  Actually, I'd even be fine with making the default quieter,
> though we can't use -v for verbose since that's already taken.  But
> I'd like to preserve the option of getting the current amount of
> output because sometimes I need that to troubleshoot problems.
> Actually it'd be nice to even get a bit more output: say, a timestamp
> on each line, and a completion percentage... but now I'm getting
> greedy.

Hi,

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

  $ pgbench -i -s 1000 --logging-step 1000 testdb

and if you want to print a line each 5%, you can do this

  $ pgbench -i -s 1000 --logging-step 5% testdb

and that's it.

Moreover the patch adds a record of elapsed an estimate of remaining
time. So for example with 21% you may get this:

creating tables...
21000 of 100000 tuples (21%) done (elapsed 1.56 s, remaining 5.85 s).
42000 of 100000 tuples (42%) done (elapsed 3.15 s, remaining 4.35 s).
63000 of 100000 tuples (63%) done (elapsed 4.73 s, remaining 2.78 s).
84000 of 100000 tuples (84%) done (elapsed 6.30 s, remaining 1.20 s).
100000 of 100000 tuples (100%) done (elapsed 8.17 s, remaining 0.00 s).
vacuum...
set primary keys...

Now, I've had a hard time with the patch - no matter what I do, I do get
"invalid option" error whenever I try to run that from command line for
some reason. But when I run it from gdb, it works just fine.

kind regards
Tomas

Attachment

pgbench-logging-v1.diff

Re: too much pgbench init output

From

Alvaro Herrera

Date:

23 October 2012, 16:02:24

Tomas Vondra wrote:

> I've been thinking about this a bit more, and do propose to use an
> option that determines "logging step" i.e. number of items (either
> directly or as a percentage) between log lines.
>
> The attached patch defines a new option "--logging-step" that accepts
> either integers or percents. For example if you want to print a line
> each 1000 lines, you can to this
>
>   $ pgbench -i -s 1000 --logging-step 1000 testdb

I find it hard to get excited about having to specify a command line
argument to tweak this.  Would it work to have it emit messages
depending on elapsed time and log scale of tuples emitted?  So for
example emit the first message after 5 seconds or 100k tuples, then back
off until (say) 15 seconds have lapsed and 1M tuples, etc?  The idea is
to make it verbose enough to keep a human satisfied with what he sees,
but not flood the terminal with pointless updates.  (I think printing
the ETA might be nice as well, not sure).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: too much pgbench init output

From

Robert Haas

Date:

23 October 2012, 16:21:36

On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Tomas Vondra wrote:
>
>> I've been thinking about this a bit more, and do propose to use an
>> option that determines "logging step" i.e. number of items (either
>> directly or as a percentage) between log lines.
>>
>> The attached patch defines a new option "--logging-step" that accepts
>> either integers or percents. For example if you want to print a line
>> each 1000 lines, you can to this
>>
>>   $ pgbench -i -s 1000 --logging-step 1000 testdb
>
> I find it hard to get excited about having to specify a command line
> argument to tweak this.  Would it work to have it emit messages
> depending on elapsed time and log scale of tuples emitted?  So for
> example emit the first message after 5 seconds or 100k tuples, then back
> off until (say) 15 seconds have lapsed and 1M tuples, etc?  The idea is
> to make it verbose enough to keep a human satisfied with what he sees,
> but not flood the terminal with pointless updates.  (I think printing
> the ETA might be nice as well, not sure).

I like this idea.  One of the times when the more verbose output is
really useful is when you expect it to run fast but then it turns out
that for some reason it runs really slow.  If you make the output too
terse, then you end up not really knowing what's going on.  Having it
give an update at least every 5 seconds would be a nice way to give
the user a heads-up if things aren't going as planned, without
cluttering the normal case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: too much pgbench init output

From

Tomas Vondra

Date:

11 November 2012, 17:32:08

On 23.10.2012 18:21, Robert Haas wrote:
> On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Tomas Vondra wrote:
>>
>>> I've been thinking about this a bit more, and do propose to use an
>>> option that determines "logging step" i.e. number of items (either
>>> directly or as a percentage) between log lines.
>>>
>>> The attached patch defines a new option "--logging-step" that accepts
>>> either integers or percents. For example if you want to print a line
>>> each 1000 lines, you can to this
>>>
>>>   $ pgbench -i -s 1000 --logging-step 1000 testdb
>>
>> I find it hard to get excited about having to specify a command line
>> argument to tweak this.  Would it work to have it emit messages
>> depending on elapsed time and log scale of tuples emitted?  So for
>> example emit the first message after 5 seconds or 100k tuples, then back
>> off until (say) 15 seconds have lapsed and 1M tuples, etc?  The idea is
>> to make it verbose enough to keep a human satisfied with what he sees,
>> but not flood the terminal with pointless updates.  (I think printing
>> the ETA might be nice as well, not sure).
>
> I like this idea.  One of the times when the more verbose output is
> really useful is when you expect it to run fast but then it turns out
> that for some reason it runs really slow.  If you make the output too
> terse, then you end up not really knowing what's going on.  Having it
> give an update at least every 5 seconds would be a nice way to give
> the user a heads-up if things aren't going as planned, without
> cluttering the normal case.

I've prepared a patch along these lines. The attached version used only
elapsed time to print the log messages each 5 seconds, so now it prints
a meessage each 5 seconds no matter what, along with an estimate of
remaining time.

I've removed the config option, although it might be useful to specify
the interval?

I'm not entirely sure how the 'log scale of tuples' should work - for
example when the time 15 seconds limit is reached, should it be reset
back to the previous step (5 seconds) to give a more detailed info, or
should it be kept at 15 seconds?

Tomas

Attachment

pgbench-logging-v2.patch

Re: too much pgbench init output

From

Jeevan Chalke

Date:

19 November 2012, 11:00:17

Hi,<br /><br />I gone through the discussion for this patch and here is my review:<br /><br />The main aim of this
patchis to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us
arenot much agree on it. So final discussion ended at keeping 5 sec interval between each log line.<br /><br />However,
Isee, there are two types of users here:<br />1. Who likes these log lines, so that they can troubleshoot some slowness
andall<br />2. Who do not like these log lines.<br /><br />So keeping these in mind, I rather go for an option which
willcontrol this. People falling in category one can set this option to very low where as users falling under second
categorycan keep it high.<br /><br />However, assuming we settled on 5 sec delay, here are few comments on that patch
attached:<br/><br />Comments:<br />=========<br /><br />Patch gets applied cleanly with some whitespace errors. make
andmake install too went smooth.<br /> make check was smooth. Rather it should be smooth since there are NO changes in
otherpart of the code rather than just pgbench.c and we do not have any test-case as well.<br /><br />However, here are
fewcomments on changes in pgbench.c<br /><br />1.<br />Since the final discussion ended at keeping a 5 seconds interval
willbe good enough, Author used a global int variable for that.<br />Given that it's just a constant, #define would be
abetter choice.<br /><br /> 2.<br /><span style="font-family:courier new,monospace">+        /* let's not call the
timingfor each row, but only each 100 rows */</span><br />Why only 100 rows ? Have you done any testing to come up with
number100 ? To me it seems very low. It will be good to test with 1K or even 10K.<br /> On my machine (2.4 GHz Intel
core2 duo Macbook PRO, running Ubuntu in VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
checkingevery 100 rows looks overkill.<br /><br />3.<br />Please indent following block as per the indentation just
abovethat<br /><br /><span style="font-family:courier new,monospace">    /* used to track elapsed time and estimate of
theremaining time */<br />    instr_time    start, diff;<br />    double elapsed_sec, remaining_sec;<br />    int
log_interval= 1;<br /></span><br />4.<br /><span style="font-family:courier new,monospace">+            /* have ve
reachedthe next interval? */</span><br />Do you mean "have WE reached..."<br /><br />5.<br />While applying a patch, I
gotfew white-space errors. But I think every patch goes through pgindent which might take care of this.<br /><br
/>Thanks<br/><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On Sun, Nov 11, 2012 at 11:02 PM,
TomasVondra <span dir="ltr"><<a href="mailto:tv@fuzzy.cz" target="_blank">tv@fuzzy.cz</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
class="HOEnZb"><divclass="h5">On 23.10.2012 18:21, Robert Haas wrote:<br /> > On Tue, Oct 23, 2012 at 12:02 PM,
AlvaroHerrera<br /> > <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>> wrote:<br />
>>Tomas Vondra wrote:<br /> >><br /> >>> I've been thinking about this a bit more, and do propose
touse an<br /> >>> option that determines "logging step" i.e. number of items (either<br /> >>>
directlyor as a percentage) between log lines.<br /> >>><br /> >>> The attached patch defines a new
option"--logging-step" that accepts<br /> >>> either integers or percents. For example if you want to print a
line<br/> >>> each 1000 lines, you can to this<br /> >>><br /> >>>   $ pgbench -i -s 1000
--logging-step1000 testdb<br /> >><br /> >> I find it hard to get excited about having to specify a command
line<br/> >> argument to tweak this.  Would it work to have it emit messages<br /> >> depending on elapsed
timeand log scale of tuples emitted?  So for<br /> >> example emit the first message after 5 seconds or 100k
tuples,then back<br /> >> off until (say) 15 seconds have lapsed and 1M tuples, etc?  The idea is<br /> >>
tomake it verbose enough to keep a human satisfied with what he sees,<br /> >> but not flood the terminal with
pointlessupdates.  (I think printing<br /> >> the ETA might be nice as well, not sure).<br /> ><br /> > I
likethis idea.  One of the times when the more verbose output is<br /> > really useful is when you expect it to run
fastbut then it turns out<br /> > that for some reason it runs really slow.  If you make the output too<br /> >
terse,then you end up not really knowing what's going on.  Having it<br /> > give an update at least every 5 seconds
wouldbe a nice way to give<br /> > the user a heads-up if things aren't going as planned, without<br /> >
clutteringthe normal case.<br /><br /></div></div>I've prepared a patch along these lines. The attached version used
only<br/> elapsed time to print the log messages each 5 seconds, so now it prints<br /> a meessage each 5 seconds no
matterwhat, along with an estimate of<br /> remaining time.<br /><br /> I've removed the config option, although it
mightbe useful to specify<br /> the interval?<br /><br /> I'm not entirely sure how the 'log scale of tuples' should
work- for<br /> example when the time 15 seconds limit is reached, should it be reset<br /> back to the previous step
(5seconds) to give a more detailed info, or<br /> should it be kept at 15 seconds?<br /><span class="HOEnZb"><font
color="#888888"><br/> Tomas<br /><br /></font></span><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div><br /><br clear="all"
/><br/>-- <br />Jeevan B Chalke<br />Senior Software Engineer, R&D<br />EnterpriseDB Corporation<br />The
EnterprisePostgreSQL Company<br /><br />Phone: +91 20 30589500<br /><br />Website: <a
href="http://www.enterprisedb.com"target="_blank">www.enterprisedb.com</a><br /> EnterpriseDB Blog: <a
href="http://blogs.enterprisedb.com/"target="_blank">http://blogs.enterprisedb.com/</a><br />Follow us on Twitter: <a
href="http://www.twitter.com/enterprisedb"target="_blank">http://www.twitter.com/enterprisedb</a><br /><br />This
e-mailmessage (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This
messagecontains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from
disclosureunder applicable law. If you are not the intended recipient or authorized to receive this for the intended
recipient,any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly
prohibited.If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete
thismessage.<br /></div>

Re: too much pgbench init output

From

Tomas Vondra

Date:

19 November 2012, 18:38:22

On 19.11.2012 11:59, Jeevan Chalke wrote:
> Hi,
> 
> I gone through the discussion for this patch and here is my review:
> 
> The main aim of this patch is to reduce the number of log lines. It is
> also suggested to use an options to provide the interval but few of us
> are not much agree on it. So final discussion ended at keeping 5 sec
> interval between each log line.
> 
> However, I see, there are two types of users here:
> 1. Who likes these log lines, so that they can troubleshoot some
> slowness and all
> 2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

> So keeping these in mind, I rather go for an option which will control
> this. People falling in category one can set this option to very low
> where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

> However, assuming we settled on 5 sec delay, here are few comments on
> that patch attached:
> 
> Comments:
> =========
> 
> Patch gets applied cleanly with some whitespace errors. make and make
> install too went smooth.
> make check was smooth. Rather it should be smooth since there are NO
> changes in other part of the code rather than just pgbench.c and we do
> not have any test-case as well.
> 
> However, here are few comments on changes in pgbench.c
> 
> 1.
> Since the final discussion ended at keeping a 5 seconds interval will be
> good enough, Author used a global int variable for that.
> Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

> 2.
> +        /* let's not call the timing for each row, but only each 100
> rows */
> Why only 100 rows ? Have you done any testing to come up with number 100
> ? To me it seems very low. It will be good to test with 1K or even 10K.
> On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
> VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
> checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

> 3.
> Please indent following block as per the indentation just above that
> 
>     /* used to track elapsed time and estimate of the remaining time */
>     instr_time    start, diff;
>     double elapsed_sec, remaining_sec;
>     int log_interval = 1;

OK

> 4.
> +            /* have ve reached the next interval? */
> Do you mean "have WE reached..."

OK

> 
> 5.
> While applying a patch, I got few white-space errors. But I think every
> patch goes through pgindent which might take care of this.

OK

> 
> Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Tomas

Re: too much pgbench init output

From

Jeevan Chalke

Date:

20 November 2012, 07:23:10

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:
> Hi,
>
> I gone through the discussion for this patch and here is my review:
>
> The main aim of this patch is to reduce the number of log lines. It is
> also suggested to use an options to provide the interval but few of us
> are not much agree on it. So final discussion ended at keeping 5 sec
> interval between each log line.
>
> However, I see, there are two types of users here:
> 1. Who likes these log lines, so that they can troubleshoot some
> slowness and all
> 2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot more) output.

> So keeping these in mind, I rather go for an option which will control
> this. People falling in category one can set this option to very low
> where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

Interval length.

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1 second to 10 seconds.

BTW, what if, we put one log message every 10% (or 5%) with time taken (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness happens, one can easily find a block by looking at the timings and troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant will eliminate the requirement of an option.

> However, assuming we settled on 5 sec delay, here are few comments on
> that patch attached:
>
> Comments:
> =========
>
> Patch gets applied cleanly with some whitespace errors. make and make
> install too went smooth.
> make check was smooth. Rather it should be smooth since there are NO
> changes in other part of the code rather than just pgbench.c and we do
> not have any test-case as well.
>
> However, here are few comments on changes in pgbench.c
>
> 1.
> Since the final discussion ended at keeping a 5 seconds interval will be
> good enough, Author used a global int variable for that.
> Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

Exactly, if we ended up with an option then it looks good. But in your current patch it was constant, so #define should be preferred.

> 2.
> + /* let's not call the timing for each row, but only each 100
> rows */
> Why only 100 rows ? Have you done any testing to come up with number 100
> ? To me it seems very low. It will be good to test with 1K or even 10K.
> On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
> VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
> checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

Hmm... inserting just 100 rows / seconds is really slow.
Since you already seen such behaviour, I have no objection keeping it 100.

> 3.
> Please indent following block as per the indentation just above that
>
> /* used to track elapsed time and estimate of the remaining time */
> instr_time start, diff;
> double elapsed_sec, remaining_sec;
> int log_interval = 1;

OK

> 4.
> + /* have ve reached the next interval? */
> Do you mean "have WE reached..."

OK

>
> 5.
> While applying a patch, I got few white-space errors. But I think every
> patch goes through pgindent which might take care of this.

OK

>
> Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Sure

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thanks
--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: too much pgbench init output

From

Tomas Vondra

Date:

09 December 2012, 02:41:03

On 20.11.2012 08:22, Jeevan Chalke wrote:
> Hi,
>
>
> On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
>     On 19.11.2012 11:59, Jeevan Chalke wrote:
>     > Hi,
>     >
>     > I gone through the discussion for this patch and here is my review:
>     >
>     > The main aim of this patch is to reduce the number of log lines. It is
>     > also suggested to use an options to provide the interval but few of us
>     > are not much agree on it. So final discussion ended at keeping 5 sec
>     > interval between each log line.
>     >
>     > However, I see, there are two types of users here:
>     > 1. Who likes these log lines, so that they can troubleshoot some
>     > slowness and all
>     > 2. Who do not like these log lines.
>
>     Who likes these lines / needs them for something useful?
>
>
> No idea. I fall in second category.
>
> But from the discussion, I believe some people may need detailed (or lot
> more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

  (1) it's rather pointless to print a message every 100k rows, as it
      usually fills the console with garbabe

  (2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

>     > So keeping these in mind, I rather go for an option which will control
>     > this. People falling in category one can set this option to very low
>     > where as users falling under second category can keep it high.
>
>     So what option(s) would you expect? Something that tunes the interval
>     length or something else?
>
>
> Interval length.

Well, I can surely imagine something like "--interval N".

>     A switch that'd choose between the old and new behavior might be a good
>     idea, but I'd strongly vote against "automagic" heuristics. It makes the
>     behavior very difficult to predict and I really don't want to force the
>     users to wonder whether the long delay is due to general slowness of the
>     machine or some "clever" logic that causes long delays between log
>     messages.
>
>     That's why I choose a very simple approach with constant time interval.
>     It does what I was aiming for (less logs) and it's easy to predict.
>     Sure, we could choose different interval (or make it an option).
>
>
> I am preferring an option for choosing an interval, say from 1 second to
> 10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

> BTW, what if, we put one log message every 10% (or 5%) with time taken
> (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
> This will have only 10 (or 20) lines per pgbench initialisation.
> And since we are showing time taken for each block, if any slowness
> happens, one can easily find a block by looking at the timings and
> troubleshoot it.
> Though 10% or 5% is again a debatable number, but keeping it constant
> will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

Tomas

Attachment

pgbench-logging-v3.patch

Re: too much pgbench init output

From

Jeevan Chalke

Date:

11 December 2012, 09:24:02

On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 20.11.2012 08:22, Jeevan Chalke wrote:
> Hi,
>
>
> On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
> On 19.11.2012 11:59, Jeevan Chalke wrote:
> > Hi,
> >
> > I gone through the discussion for this patch and here is my review:
> >
> > The main aim of this patch is to reduce the number of log lines. It is
> > also suggested to use an options to provide the interval but few of us
> > are not much agree on it. So final discussion ended at keeping 5 sec
> > interval between each log line.
> >
> > However, I see, there are two types of users here:
> > 1. Who likes these log lines, so that they can troubleshoot some
> > slowness and all
> > 2. Who do not like these log lines.
>
> Who likes these lines / needs them for something useful?
>
>
> No idea. I fall in second category.
>
> But from the discussion, I believe some people may need detailed (or lot
> more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

(1) it's rather pointless to print a message every 100k rows, as it
usually fills the console with garbabe

(2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

Both the options are fine with me, but the default should be the current behaviour.

> > So keeping these in mind, I rather go for an option which will control
> > this. People falling in category one can set this option to very low
> > where as users falling under second category can keep it high.
>
> So what option(s) would you expect? Something that tunes the interval
> length or something else?
>
>
> Interval length.

Well, I can surely imagine something like "--interval N".

> A switch that'd choose between the old and new behavior might be a good
> idea, but I'd strongly vote against "automagic" heuristics. It makes the
> behavior very difficult to predict and I really don't want to force the
> users to wonder whether the long delay is due to general slowness of the
> machine or some "clever" logic that causes long delays between log
> messages.
>
> That's why I choose a very simple approach with constant time interval.
> It does what I was aiming for (less logs) and it's easy to predict.
> Sure, we could choose different interval (or make it an option).
>
>
> I am preferring an option for choosing an interval, say from 1 second to
> 10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

Hmm.. actually, I have no issues with any number there. Just put 1..10 as we hard-coded it 5. No particular reason as such.

> BTW, what if, we put one log message every 10% (or 5%) with time taken
> (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
> This will have only 10 (or 20) lines per pgbench initialisation.
> And since we are showing time taken for each block, if any slowness
> happens, one can easily find a block by looking at the timings and
> troubleshoot it.
> Though 10% or 5% is again a debatable number, but keeping it constant
> will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

OK. Looks good now.

Any other views / suggestions are welcome.

Thanks

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: too much pgbench init output

From

Tomas Vondra

Date:

17 December 2012, 00:07:46

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
    not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
    interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?


Tomas

On 11.12.2012 10:23, Jeevan Chalke wrote:
>
>
>
> On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
>     On 20.11.2012 08:22, Jeevan Chalke wrote:
>     > Hi,
>     >
>     >
>     > On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
>     <mailto:tv@fuzzy.cz>
>     > <mailto:tv@fuzzy.cz <mailto:tv@fuzzy.cz>>> wrote:
>     >
>     >     On 19.11.2012 11:59, Jeevan Chalke wrote:
>     >     > Hi,
>     >     >
>     >     > I gone through the discussion for this patch and here is my
>     review:
>     >     >
>     >     > The main aim of this patch is to reduce the number of log
>     lines. It is
>     >     > also suggested to use an options to provide the interval but
>     few of us
>     >     > are not much agree on it. So final discussion ended at
>     keeping 5 sec
>     >     > interval between each log line.
>     >     >
>     >     > However, I see, there are two types of users here:
>     >     > 1. Who likes these log lines, so that they can troubleshoot some
>     >     > slowness and all
>     >     > 2. Who do not like these log lines.
>     >
>     >     Who likes these lines / needs them for something useful?
>     >
>     >
>     > No idea. I fall in second category.
>     >
>     > But from the discussion, I believe some people may need detailed
>     (or lot
>     > more) output.
>
>     I've read the thread again and my impression is that no one really needs
>     or likes those lines, but
>
>       (1) it's rather pointless to print a message every 100k rows, as it
>           usually fills the console with garbabe
>
>       (2) it's handy to have regular updates of the progress
>
>     I don't think there're people (in the thread) that require to keep the
>     current amount of log messages.
>
>     But there might be users who actually use the current logs to do
>     something (although I can't imagine what). If we want to do this in a
>     backwards compatible way, we should probably use a new option (e.g.
>     "-q") to enable the new (less verbose) logging.
>
>     Do we want to allow both types of logging, or shall we keep only the new
>     one? If both, which one should be the default one?
>
>
> Both the options are fine with me, but the default should be the current
> behaviour.
>
>
>     >     > So keeping these in mind, I rather go for an option which
>     will control
>     >     > this. People falling in category one can set this option to
>     very low
>     >     > where as users falling under second category can keep it high.
>     >
>     >     So what option(s) would you expect? Something that tunes the
>     interval
>     >     length or something else?
>     >
>     >
>     > Interval length.
>
>     Well, I can surely imagine something like "--interval N".
>
>
> +1
>
>
>
>     >     A switch that'd choose between the old and new behavior might
>     be a good
>     >     idea, but I'd strongly vote against "automagic" heuristics. It
>     makes the
>     >     behavior very difficult to predict and I really don't want to
>     force the
>     >     users to wonder whether the long delay is due to general
>     slowness of the
>     >     machine or some "clever" logic that causes long delays between log
>     >     messages.
>     >
>     >     That's why I choose a very simple approach with constant time
>     interval.
>     >     It does what I was aiming for (less logs) and it's easy to
>     predict.
>     >     Sure, we could choose different interval (or make it an option).
>     >
>     >
>     > I am preferring an option for choosing an interval, say from 1
>     second to
>     > 10 seconds.
>
>     Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?
>
>
> Hmm.. actually, I have no issues with any number there. Just put 1..10
> as we hard-coded it 5. No particular reason as such.
>
>
>
>     > BTW, what if, we put one log message every 10% (or 5%) with time taken
>     > (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
>     > This will have only 10 (or 20) lines per pgbench initialisation.
>     > And since we are showing time taken for each block, if any slowness
>     > happens, one can easily find a block by looking at the timings and
>     > troubleshoot it.
>     > Though 10% or 5% is again a debatable number, but keeping it constant
>     > will eliminate the requirement of an option.
>
>     That's what I originally proposed in September (see the messages from
>     17/9), and Alvaro was not relly excited about this.
>
>     Attached is a patch with fixed whitespace / indentation errors etc.
>     Otherwise it's the same as the previous version.
>
>
> OK. Looks good now.
>
> Any other views / suggestions are welcome.
>
> Thanks
>
>
>     Tomas
>
>
>     --
>     Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
>     <mailto:pgsql-hackers@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
> --
> Jeevan B Chalke
> Senior Software Engineer, R&D
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: +91 20 30589500
>
> Website: www.enterprisedb.com <http://www.enterprisedb.com>
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the
> intended recipient, any use, dissemination, distribution, retention,
> archiving, or copying of this communication is strictly prohibited. If
> you have received this e-mail in error, please notify the sender
> immediately by reply e-mail and delete this message.

Attachment

pgbench-logging-v4.patch

Re: too much pgbench init output

From

Jeevan Chalke

Date:

19 December 2012, 05:30:42

On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?

It seems that you have generated a patch over your earlier version and due to that it is not cleanly applying on fresh sources.

Please generate patch on fresh sources.

However, I absolutely no issues with the design. Also code review is already done and looks good to me.

I think to move forward on this we need someone from core-team. So I am planning to change its status to "ready-for-committor".

Before that please provide updated patch for final code review.

Thanks

Tomas

On 11.12.2012 10:23, Jeevan Chalke wrote:
>
>
>
> On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
> On 20.11.2012 08:22, Jeevan Chalke wrote:
> > Hi,
> >
> >
> > On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>
> > <mailto:tv@fuzzy.cz <mailto:tv@fuzzy.cz>>> wrote:
> >
> > On 19.11.2012 11:59, Jeevan Chalke wrote:
> > > Hi,
> > >
> > > I gone through the discussion for this patch and here is my
> review:
> > >
> > > The main aim of this patch is to reduce the number of log
> lines. It is
> > > also suggested to use an options to provide the interval but
> few of us
> > > are not much agree on it. So final discussion ended at
> keeping 5 sec
> > > interval between each log line.
> > >
> > > However, I see, there are two types of users here:
> > > 1. Who likes these log lines, so that they can troubleshoot some
> > > slowness and all
> > > 2. Who do not like these log lines.
> >
> > Who likes these lines / needs them for something useful?
> >
> >
> > No idea. I fall in second category.
> >
> > But from the discussion, I believe some people may need detailed
> (or lot
> > more) output.
>
> I've read the thread again and my impression is that no one really needs
> or likes those lines, but
>
> (1) it's rather pointless to print a message every 100k rows, as it
> usually fills the console with garbabe
>
> (2) it's handy to have regular updates of the progress
>
> I don't think there're people (in the thread) that require to keep the
> current amount of log messages.
>
> But there might be users who actually use the current logs to do
> something (although I can't imagine what). If we want to do this in a
> backwards compatible way, we should probably use a new option (e.g.
> "-q") to enable the new (less verbose) logging.
>
> Do we want to allow both types of logging, or shall we keep only the new
> one? If both, which one should be the default one?
>
>
> Both the options are fine with me, but the default should be the current
> behaviour.
>
>
> > > So keeping these in mind, I rather go for an option which
> will control
> > > this. People falling in category one can set this option to
> very low
> > > where as users falling under second category can keep it high.
> >
> > So what option(s) would you expect? Something that tunes the
> interval
> > length or something else?
> >
> >
> > Interval length.
>
> Well, I can surely imagine something like "--interval N".
>
>
> +1
>
>
>
> > A switch that'd choose between the old and new behavior might
> be a good
> > idea, but I'd strongly vote against "automagic" heuristics. It
> makes the
> > behavior very difficult to predict and I really don't want to
> force the
> > users to wonder whether the long delay is due to general
> slowness of the
> > machine or some "clever" logic that causes long delays between log
> > messages.
> >
> > That's why I choose a very simple approach with constant time
> interval.
> > It does what I was aiming for (less logs) and it's easy to
> predict.
> > Sure, we could choose different interval (or make it an option).
> >
> >
> > I am preferring an option for choosing an interval, say from 1
> second to
> > 10 seconds.
>
> Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?
>
>
> Hmm.. actually, I have no issues with any number there. Just put 1..10
> as we hard-coded it 5. No particular reason as such.
>
>
>
> > BTW, what if, we put one log message every 10% (or 5%) with time taken
> > (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
> > This will have only 10 (or 20) lines per pgbench initialisation.
> > And since we are showing time taken for each block, if any slowness
> > happens, one can easily find a block by looking at the timings and
> > troubleshoot it.
> > Though 10% or 5% is again a debatable number, but keeping it constant
> > will eliminate the requirement of an option.
>
> That's what I originally proposed in September (see the messages from
> 17/9), and Alvaro was not relly excited about this.
>
> Attached is a patch with fixed whitespace / indentation errors etc.
> Otherwise it's the same as the previous version.
>
>
> OK. Looks good now.
>
> Any other views / suggestions are welcome.
>
> Thanks
>
>
> Tomas
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> <mailto:pgsql-hackers@postgresql.org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
> --
> Jeevan B Chalke
> Senior Software Engineer, R&D
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: +91 20 30589500
>
> Website: www.enterprisedb.com <http://www.enterprisedb.com>
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the
> intended recipient, any use, dissemination, distribution, retention,
> archiving, or copying of this communication is strictly prohibited. If
> you have received this e-mail in error, please notify the sender
> immediately by reply e-mail and delete this message.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: too much pgbench init output

From

Tomas Vondra

Date:

19 December 2012, 21:01:33

On 19.12.2012 06:30, Jeevan Chalke wrote:
>
>
>
> On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
>     Hi,
>
>     attached is a new version of the patch that
>
>     (a) converts the 'log_step_seconds' variable to a constant (and does
>         not allow changing it using a command-line option etc.)
>
>     (b) keeps the current logging as a default
>
>     (c) adds a "-q" switch that enables the new logging with a 5-second
>         interval
>
>     I'm still not convinced there should be yet another know for tuning the
>     log interval - opinions?
>
>
> It seems that you have generated a patch over your earlier version and
> due to that it is not cleanly applying on fresh sources.
> Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

> However, I absolutely no issues with the design. Also code review is
> already done and looks good to me.
> I think to move forward on this we need someone from core-team. So I am
> planning to change its status to "ready-for-committor".
>
> Before that please provide updated patch for final code review.

thanks
Tomas

Attachment

pgbench-logging-v5.patch

Re: too much pgbench init output

From

Jeevan Chalke

Date:

26 December 2012, 14:55:23

Looks good to me.

Will mark "Ready for Committer"

Thanks

On Thu, Dec 20, 2012 at 2:30 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 19.12.2012 06:30, Jeevan Chalke wrote:
>
>
>
> On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
> <mailto:tv@fuzzy.cz>> wrote:
>
> Hi,
>
> attached is a new version of the patch that
>
> (a) converts the 'log_step_seconds' variable to a constant (and does
> not allow changing it using a command-line option etc.)
>
> (b) keeps the current logging as a default
>
> (c) adds a "-q" switch that enables the new logging with a 5-second
> interval
>
> I'm still not convinced there should be yet another know for tuning the
> log interval - opinions?
>
>
> It seems that you have generated a patch over your earlier version and
> due to that it is not cleanly applying on fresh sources.
> Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

> However, I absolutely no issues with the design. Also code review is
> already done and looks good to me.
> I think to move forward on this we need someone from core-team. So I am
> planning to change its status to "ready-for-committor".
>
> Before that please provide updated patch for final code review.

thanks
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: too much pgbench init output

From

Tatsuo Ishii

Date:

06 January 2013, 02:03:42

> On 19.12.2012 06:30, Jeevan Chalke wrote:
>> 
>> 
>> 
>> On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
>> <mailto:tv@fuzzy.cz>> wrote:
>> 
>>     Hi,
>> 
>>     attached is a new version of the patch that
>> 
>>     (a) converts the 'log_step_seconds' variable to a constant (and does
>>         not allow changing it using a command-line option etc.)
>> 
>>     (b) keeps the current logging as a default
>> 
>>     (c) adds a "-q" switch that enables the new logging with a 5-second
>>         interval
>> 
>>     I'm still not convinced there should be yet another know for tuning the
>>     log interval - opinions?
>> 
>> 
>> It seems that you have generated a patch over your earlier version and
>> due to that it is not cleanly applying on fresh sources.
>> Please generate patch on fresh sources.
> 
> Seems you're right - I've attached the proper patch against current master.
> 
>> However, I absolutely no issues with the design. Also code review is
>> already done and looks good to me.
>> I think to move forward on this we need someone from core-team. So I am
>> planning to change its status to "ready-for-committor".
>> 
>> Before that please provide updated patch for final code review.

As a committer, I have looked into the patch. I noticed two things:

1) In the help you put '-q' option into "Common options" section. I
think this should be moved to "Initialization options" section because
the option is only applied while initializing.

2) Shouldn't a long option for '-q' be provided? Something like
'--quiet-progress-logging'?

3) No patches for docs found (doc/src/sgml/pgbench.sgml)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: too much pgbench init output

From

Tomas Vondra

Date:

06 January 2013, 03:13:17

On 6.1.2013 03:03, Tatsuo Ishii wrote:
> As a committer, I have looked into the patch. I noticed two things:
>
> 1) In the help you put '-q' option into "Common options" section. I
> think this should be moved to "Initialization options" section because
> the option is only applied while initializing.

Good point, moved.

> 2) Shouldn't a long option for '-q' be provided? Something like
> '--quiet-progress-logging'?

I don't think so. Currently pgbench has either short or long option, not
both (for the same thing). I believe we should stick to this and either
choose "-q" or "--quiet-logging" but not both.

> 3) No patches for docs found (doc/src/sgml/pgbench.sgml)

I've added a brief description of the "-q" option into the docs. IMHO
it's enough but feel free to add some more details.

There's one more thing I've just noticed - the original version of the
patch simply removed the old logging, but this one keeps both old and
quiet logging. But the old logging still uses this:

    fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....

while the new logging does this

    fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
remaining %.2f s).\n",

i.e. it prints additional info about elapsed/estimated time. Do we want
to keep it this way (i.e. not to mess with the old logging) or do we
want to add these new fields to the old logging too?

I suggest to add it to the old logging, to keep the log messages the
same, the only difference being the logging frequency.

Tomas

Attachment

pgbench-logging-v6.patch

Re: too much pgbench init output

From

Tatsuo Ishii

Date:

06 January 2013, 04:08:01

> On 6.1.2013 03:03, Tatsuo Ishii wrote:
>> As a committer, I have looked into the patch. I noticed two things:
>> 
>> 1) In the help you put '-q' option into "Common options" section. I
>> think this should be moved to "Initialization options" section because
>> the option is only applied while initializing.
> 
> Good point, moved.

In addition to this, I'd suggest to add checking -q is only possible
with -i option since without -i, -q is meaningless.

>> 2) Shouldn't a long option for '-q' be provided? Something like
>> '--quiet-progress-logging'?
> 
> I don't think so. Currently pgbench has either short or long option, not
> both (for the same thing). I believe we should stick to this and either
> choose "-q" or "--quiet-logging" but not both.

Ok.

>> 3) No patches for docs found (doc/src/sgml/pgbench.sgml)
> 
> I've added a brief description of the "-q" option into the docs. IMHO
> it's enough but feel free to add some more details.

Good.

> There's one more thing I've just noticed - the original version of the
> patch simply removed the old logging, but this one keeps both old and
> quiet logging. But the old logging still uses this:
> 
>     fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....
> 
> while the new logging does this
> 
>     fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
> remaining %.2f s).\n",
> 
> i.e. it prints additional info about elapsed/estimated time. Do we want
> to keep it this way (i.e. not to mess with the old logging) or do we
> want to add these new fields to the old logging too?
> 
> I suggest to add it to the old logging, to keep the log messages the
> same, the only difference being the logging frequency.

If we do so, probably '-q' is not appropeate option name any more,
since the only difference between old logging and new one is, the
former is printed every 10000 lines while the latter is every 5
seconds, which is not really "quiet". What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: too much pgbench init output

From

Tomas Vondra

Date:

06 January 2013, 05:21:27

On 6.1.2013 05:07, Tatsuo Ishii wrote:
>> On 6.1.2013 03:03, Tatsuo Ishii wrote:
>>> As a committer, I have looked into the patch. I noticed two things:
>>>
>>> 1) In the help you put '-q' option into "Common options" section. I
>>> think this should be moved to "Initialization options" section because
>>> the option is only applied while initializing.
>>
>> Good point, moved.
>
> In addition to this, I'd suggest to add checking -q is only possible
> with -i option since without -i, -q is meaningless.

Done.

>> There's one more thing I've just noticed - the original version of the
>> patch simply removed the old logging, but this one keeps both old and
>> quiet logging. But the old logging still uses this:
>>
>>     fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....
>>
>> while the new logging does this
>>
>>     fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
>> remaining %.2f s).\n",
>>
>> i.e. it prints additional info about elapsed/estimated time. Do we want
>> to keep it this way (i.e. not to mess with the old logging) or do we
>> want to add these new fields to the old logging too?
>>
>> I suggest to add it to the old logging, to keep the log messages the
>> same, the only difference being the logging frequency.
>
> If we do so, probably '-q' is not appropeate option name any more,
> since the only difference between old logging and new one is, the
> former is printed every 10000 lines while the latter is every 5
> seconds, which is not really "quiet". What do you think?

AFAIK the "5 second" logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
"quiet" logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the "-q" is more verbose right now.

So I'd stick with the "-q" option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Tomas

Attachment

pgbench-logging-v7.patch

Re: too much pgbench init output

From

Tatsuo Ishii

Date:

06 January 2013, 09:35:57

>> If we do so, probably '-q' is not appropeate option name any more,
>> since the only difference between old logging and new one is, the
>> former is printed every 10000 lines while the latter is every 5
>> seconds, which is not really "quiet". What do you think?
> 
> AFAIK the "5 second" logging is much quieter in most cases (and a bit
> more verbose when the initialization gets very slower), so I think the
> "quiet" logging is not a bad match although maybe there's a better name.
> 
> This change (adding the elapsed/remaining fields to the original loggin)
> would be consistent with this name, because considering a single line,
> the "-q" is more verbose right now.
> 
> So I'd stick with the "-q" option and added the fields to the original
> logging. But I'm not opposing a different name, I just can't think of a
> better one.

Ok, I'm with you ("-q" and along with adding the elapsed/remaining
fields to the original logging).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: too much pgbench init output

From

Tomas Vondra

Date:

06 January 2013, 12:18:36

On 6.1.2013 10:35, Tatsuo Ishii wrote:
>>> If we do so, probably '-q' is not appropeate option name any more,
>>> since the only difference between old logging and new one is, the
>>> former is printed every 10000 lines while the latter is every 5
>>> seconds, which is not really "quiet". What do you think?
>>
>> AFAIK the "5 second" logging is much quieter in most cases (and a bit
>> more verbose when the initialization gets very slower), so I think the
>> "quiet" logging is not a bad match although maybe there's a better name.
>>
>> This change (adding the elapsed/remaining fields to the original loggin)
>> would be consistent with this name, because considering a single line,
>> the "-q" is more verbose right now.
>>
>> So I'd stick with the "-q" option and added the fields to the original
>> logging. But I'm not opposing a different name, I just can't think of a
>> better one.
>
> Ok, I'm with you ("-q" and along with adding the elapsed/remaining
> fields to the original logging).

Great, attached is a patch that does that.

Tomas

Attachment

pgbench-logging-v8.patch

Re: too much pgbench init output

From

Tatsuo Ishii

Date:

07 January 2013, 02:29:52

>>> AFAIK the "5 second" logging is much quieter in most cases (and a bit
>>> more verbose when the initialization gets very slower), so I think the
>>> "quiet" logging is not a bad match although maybe there's a better name.
>>>
>>> This change (adding the elapsed/remaining fields to the original loggin)
>>> would be consistent with this name, because considering a single line,
>>> the "-q" is more verbose right now.
>>>
>>> So I'd stick with the "-q" option and added the fields to the original
>>> logging. But I'm not opposing a different name, I just can't think of a
>>> better one.
>> 
>> Ok, I'm with you ("-q" and along with adding the elapsed/remaining
>> fields to the original logging).
> 
> Great, attached is a patch that does that.

Committed. Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp