Re: too much pgbench init output - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: too much pgbench init output
Date
Msg-id CAM2+6=VFYPdM3EB4v0_yV-hBOKw4wrrEUXFK=xTE7mdHpxNJmw@mail.gmail.com
Whole thread Raw
In response to Re: too much pgbench init output  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: too much pgbench init output  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
Hi,


On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 19.11.2012 11:59, Jeevan Chalke wrote:
> Hi,
>
> I gone through the discussion for this patch and here is my review:
>
> The main aim of this patch is to reduce the number of log lines. It is
> also suggested to use an options to provide the interval but few of us
> are not much agree on it. So final discussion ended at keeping 5 sec
> interval between each log line.
>
> However, I see, there are two types of users here:
> 1. Who likes these log lines, so that they can troubleshoot some
> slowness and all
> 2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot more) output.
 

> So keeping these in mind, I rather go for an option which will control
> this. People falling in category one can set this option to very low
> where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

Interval length.
 

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1 second to 10 seconds.

BTW, what if, we put one log message every 10% (or 5%) with time taken (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness happens, one can easily find a block by looking at the timings and troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant will eliminate the requirement of an option.

 

> However, assuming we settled on 5 sec delay, here are few comments on
> that patch attached:
>
> Comments:
> =========
>
> Patch gets applied cleanly with some whitespace errors. make and make
> install too went smooth.
> make check was smooth. Rather it should be smooth since there are NO
> changes in other part of the code rather than just pgbench.c and we do
> not have any test-case as well.
>
> However, here are few comments on changes in pgbench.c
>
> 1.
> Since the final discussion ended at keeping a 5 seconds interval will be
> good enough, Author used a global int variable for that.
> Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

Exactly, if we ended up with an option then it looks good. But in your current patch it was constant, so #define should be preferred.
 

> 2.
> +        /* let's not call the timing for each row, but only each 100
> rows */
> Why only 100 rows ? Have you done any testing to come up with number 100
> ? To me it seems very low. It will be good to test with 1K or even 10K.
> On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
> VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
> checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

Hmm... inserting just 100 rows / seconds is really slow.
Since you already seen such behaviour, I have no objection keeping it 100.
 

> 3.
> Please indent following block as per the indentation just above that
>
>     /* used to track elapsed time and estimate of the remaining time */
>     instr_time    start, diff;
>     double elapsed_sec, remaining_sec;
>     int log_interval = 1;

OK

> 4.
> +            /* have ve reached the next interval? */
> Do you mean "have WE reached..."

OK

>
> 5.
> While applying a patch, I got few white-space errors. But I think every
> patch goes through pgindent which might take care of this.

OK

>
> Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Sure
 

Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Thanks
--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: removing some dead "keep compiler quiet" code
Next
From: Pavel Stehule
Date:
Subject: cannot disable hashSetOp