Thread: pgbench small bug fix

pgbench small bug fix

From
Fabien COELHO
Date:
While testing for something else I encountered two small bugs under very 
low rate (--rate=0.1). The attached patches fixes these.
 - when a duration (-T) is specified, ensure that pgbench ends at that   time (i.e. do not wait for a transaction
beyondthe end of the run).
 
 - when there is a progress (-P) report, ensure that all progress   reports are shown even if no more transactions are
schedule.

-- 
Fabien.

Re: pgbench small bug fix

From
Robert Haas
Date:
On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> While testing for something else I encountered two small bugs under very low
> rate (--rate=0.1). The attached patches fixes these.
>
>  - when a duration (-T) is specified, ensure that pgbench ends at that
>    time (i.e. do not wait for a transaction beyond the end of the run).

Why does this use INSTR_TIME_GET_DOUBLE() and not INSTR_TIME_GET_MICROSEC()?

Also, why do we really need this change?  Won't the timer expiration
stop the thread at the right time anyway?  I mean, sure, in theory
it's wasteful for the thread to sit around doing nothing waiting for
the timer to expire, but it's not evident to me that hurts anything,
really.

>  - when there is a progress (-P) report, ensure that all progress
>    reports are shown even if no more transactions are schedule.

That's pretty ugly - it would be easy for the test at the top of the
loop to be left out of sync with the similar test inside the loop by
some future patch.  And again, I wonder why this is really a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench small bug fix

From
Fabien COELHO
Date:
<Ooops, resent, wrong "From" again... sorry :-( >

Hello Robert,

> > While testing for something else I encountered two small bugs under very low
> > rate (--rate=0.1). The attached patches fixes these.
> > 
> >  - when a duration (-T) is specified, ensure that pgbench ends at that
> >    time (i.e. do not wait for a transaction beyond the end of the run).
> 
> Why does this use INSTR_TIME_GET_DOUBLE() and not INSTR_TIME_GET_MICROSEC()?

I choose that because duration is in seconds, but MICROSEC would work fine as
well. See attached version.

> Also, why do we really need this change?  Won't the timer expiration
> stop the thread at the right time anyway?

Not always. When several threads are used, the time expiration in non-zero
threads is currently triggered after the last schedule transaction, even if this
transaction is long after the expiration, which means it runs overtime:
 sh> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2 starting vacuum...end. progress: 1.0 s, 0.0 tps, lat -nan ms stddev -nan,
lag-nan ms progress: 2.0 s, 1.0 tps, lat 12.129 ms stddev 0.000, lag 1.335 ms progress: 3.0 s, 0.0 tps, lat -nan ms
stddev-nan, lag -nan ms progress: 4.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms progress: 5.0 s, 0.0 tps, lat
-nanms stddev -nan, lag -nan ms < 6 seconds wait for a schedule transaction in thread 1   no report because thread 0
expired...>transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 2 number of
threads:2 duration: 5 s number of transactions actually processed: 2 latency average: 14.648 ms latency stddev: 2.518
msrate limit schedule lag: avg 5.598 (max 9.861) ms tps = 0.180517 (including connections establishing) tps = 0.180566
(excludingconnections establishing)
 
 real    0m11.158s           ^^ 11 seconds, not 5. user    0m0.041s sys     0m0.013s

> I mean, sure, in theory it's wasteful for the thread to sit around doing
> nothing waiting for the timer to expire, but it's not evident to me that hurts
> anything, really.

I compute stats on the output, including the progress report, to check for
responsiveness (eg it is not stuck somewhere because of a checkpoint which
triggered some IO storm or whatever), for that purpose it is better for the
trace to be there as expected.

> >  - when there is a progress (-P) report, ensure that all progress
> >    reports are shown even if no more transactions are schedule.
> 
> That's pretty ugly - it would be easy for the test at the top of the
> loop to be left out of sync with the similar test inside the loop by
> some future patch.

Hmmm. Cannot help it.

A cleaner fix would be to have the main thread do only the progress and periodic
stats aggregation, while other threads would do actual transactions, however
that would break pgbench working without threads, so I think this is a no go.

> And again, I wonder why this is really a bug.

Well, if you are fine to set "-T 5 -P 1" and the run not to last 5 seconds nor
print any report every second, then it is not a bug:
 sh> time ./pgbench -T 5 -P 1 -R 0.1 -c 2 -j 2 starting vacuum...end. < UNLUCKY, NO PROGRESS REPORT AT ALL...>
transactiontype: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 2 number of
threads:2 duration: 5 s number of transactions actually processed: 1 latency average = 16.198 ms latency stddev = 0.000
msrate limit schedule lag: avg 4.308 (max 4.308) ms tps = 0.236361 (including connections establishing) tps = 0.236771
(excludingconnections establishing)
 
 real    0m4.256s

-- 
Fabien.

Re: pgbench small bug fix

From
Aleksander Alekseev
Date:
> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2

On my laptop this command executes 25 seconds instead of 5. I'm pretty
sure it IS a bug. Probably a minor one though.

I tested this patch on Ubuntu 14.04 LTS with GCC 4.8. It applies
cleanly on master branch (c7111d11) and solves a described problem.
No compilation warnings. Everything else works as before.

Still I wonder if code could be patched more cleanly. Instead of:

if(someint)
if(somebool)

... you should probably write:

if(someint > 0)
if(somebool == TRUE)

Also I suggest to introduce a few new boolean variables with meaningful
names instead of writing all these long expressions right inside of
if( ... ). 

As a side note I noticed that pgbench.c is not pgindent'ed. Since you
are modifying this file anyway probably you cold solve this issue too?
As a separate patch perhaps.



Re: pgbench small bug fix

From
Robert Haas
Date:
On Thu, Mar 3, 2016 at 7:23 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
>> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2
>
> On my laptop this command executes 25 seconds instead of 5. I'm pretty
> sure it IS a bug. Probably a minor one though.
>
> I tested this patch on Ubuntu 14.04 LTS with GCC 4.8. It applies
> cleanly on master branch (c7111d11) and solves a described problem.
> No compilation warnings. Everything else works as before.
>
> Still I wonder if code could be patched more cleanly. Instead of:
>
> if(someint)
> if(somebool)
>
> ... you should probably write:
>
> if(someint > 0)
> if(somebool == TRUE)

I think our usual style is to test Booleans directly; that is if
(somebool).  But for other types, we typically include an explicit
comparison, like if (someint != 0) or if (someint > 0).

> As a side note I noticed that pgbench.c is not pgindent'ed. Since you
> are modifying this file anyway probably you cold solve this issue too?
> As a separate patch perhaps.

That's not really this patch's job.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench small bug fix

From
Fabien COELHO
Date:
Hello Aleksander,

Thanks for the look at the patch.

>> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2
>
> On my laptop this command executes 25 seconds instead of 5.
> I'm pretty sure it IS a bug. Probably a minor one though.

Sure.

> [...] you should probably write:
>
> if(someint > 0)

Ok.

> if(somebool == TRUE)

I like "if (somebool)", the "== TRUE" looks like a tautology, and the 
short version is also the current practice in the project.

> Also I suggest to introduce a few new boolean variables with meaningful
> names instead of writing all these long expressions right inside of
> if( ... ).

I agree about the lisibility, but there are semantics issues to consider:
    if (short-A && pretty-long-B)

If short-A is false then pretty-long-B is not evaluated, which is a win 
because it also costs, I try to order conditions... If I move 
pretty-long-B before then the cost is always paid. Now I could write:
    if (short-A) {      bool b = pretty-long-B;      if (b) {        ...

But this looks contrived and people would raise other questions about such
a strange construct for implementing && in 3 lines, 2 if and 1 variable...

> As a side note I noticed that pgbench.c is not pgindent'ed. Since you
> are modifying this file anyway probably you cold solve this issue too?
> As a separate patch perhaps.

As Robert said, not the purpose of this patch.

Attached is a v3 which test integers more logically. I'm a lazy programmer 
who tends to minimize the number of key strokes.

-- 
Fabien.

Re: pgbench small bug fix

From
Aleksander Alekseev
Date:
> Attached is a v3 which test integers more logically. I'm a lazy
> programmer who tends to minimize the number of key strokes.

Well. From what I can tell this patch is Ready for Committer. 



Re: pgbench small bug fix

From
Alvaro Herrera
Date:
Aleksander Alekseev wrote:
> > Attached is a v3 which test integers more logically. I'm a lazy
> > programmer who tends to minimize the number of key strokes.
> 
> Well. From what I can tell this patch is Ready for Committer. 

I'm not a fan of this approach either.  Would it be too complicated if
we had a global variable that indicates which thread is the progress
reporter?  We start that with thread 0, but if the reporter thread
finishes its transactions then it elects some other thread which hasn't
yet finished.  For this to work, each thread would have to maintain in a
global variable whether it has finished or not.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench small bug fix

From
Fabien COELHO
Date:
Hello Alvaro,

>>> Attached is a v3 which test integers more logically. I'm a lazy
>>> programmer who tends to minimize the number of key strokes.
>>
>> Well. From what I can tell this patch is Ready for Committer.
>
> I'm not a fan of this approach either.  Would it be too complicated if
> we had a global variable that indicates which thread is the progress
> reporter?  We start that with thread 0, but if the reporter thread
> finishes its transactions then it elects some other thread which hasn't
> yet finished.  For this to work, each thread would have to maintain in a
> global variable whether it has finished or not.

Hmmm.

Probably it is possible, but it will sure need more that one little 
condition to be achieved... I do not think that introducing a non trivial 
distributed election algorithm involving locks and so would be a good 
decision for this very little matter.

My advice is "keep it simple".

If this is a blocker, I can sure write such an algorithm, when I have some 
spare time, but I'm not sure that the purpose is worth it.

-- 
Fabien.



Re: pgbench small bug fix

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> Probably it is possible, but it will sure need more that one little
> condition to be achieved... I do not think that introducing a non trivial
> distributed election algorithm involving locks and so would be a good
> decision for this very little matter.
> 
> My advice is "keep it simple".
> 
> If this is a blocker, I can sure write such an algorithm, when I have some
> spare time, but I'm not sure that the purpose is worth it.

You're probably right, but TBH I'm pretty unsure about this whole thing.
I will leave it alone for the time being.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench small bug fix

From
Fabien COELHO
Date:
>> Probably it is possible, but it will sure need more that one little
>> condition to be achieved... I do not think that introducing a non trivial
>> distributed election algorithm involving locks and so would be a good
>> decision for this very little matter.
>>
>> My advice is "keep it simple".
>>
>> If this is a blocker, I can sure write such an algorithm, when I have some
>> spare time, but I'm not sure that the purpose is worth it.
>
> You're probably right, but TBH I'm pretty unsure about this whole thing.

If the question is "is there a bug", then answer is yes. The progress 
report may disappear if thread 0 happens to stop, even of all other 
threads go on. Obviously it only concerns slow queries, but there is no 
reason why pgbench should not work with slow queries. I can imagin good 
reason to do that, say to check the impact of such queries on an OLTP 
load.

The bug can be kept instead, and it can be called a feature.

> I will leave it alone for the time being.

Maybe you could consider pushing the first part of the patch, which stops 
if a transaction is scheduled after the end of the run? Or is this part 
bothering you as well?

-- 
Fabien.



Re: pgbench small bug fix

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> >>Probably it is possible, but it will sure need more that one little
> >>condition to be achieved... I do not think that introducing a non trivial
> >>distributed election algorithm involving locks and so would be a good
> >>decision for this very little matter.
> >>
> >>My advice is "keep it simple".
> >>
> >>If this is a blocker, I can sure write such an algorithm, when I have some
> >>spare time, but I'm not sure that the purpose is worth it.
> >
> >You're probably right, but TBH I'm pretty unsure about this whole thing.
> 
> If the question is "is there a bug", then answer is yes. The progress report
> may disappear if thread 0 happens to stop, even of all other threads go on.
> Obviously it only concerns slow queries, but there is no reason why pgbench
> should not work with slow queries. I can imagin good reason to do that, say
> to check the impact of such queries on an OLTP load.
> 
> The bug can be kept instead, and it can be called a feature.

No, I agree that this looks like a bug and that we should fix it; for
example, if all connections from thread 0 terminate for some reason,
there will be no more reports, even if the other threads continue.
That's bad too.

What I'm unsure about is the proposed fix.

> >I will leave it alone for the time being.
> 
> Maybe you could consider pushing the first part of the patch, which stops if
> a transaction is scheduled after the end of the run? Or is this part
> bothering you as well?

So there are *two* bugs here?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench small bug fix

From
Fabien COELHO
Date:
>>> You're probably right, but TBH I'm pretty unsure about this whole thing.
>>
>> If the question is "is there a bug", then answer is yes. The progress report
>> may disappear if thread 0 happens to stop, even of all other threads go on.
>> Obviously it only concerns slow queries, but there is no reason why pgbench
>> should not work with slow queries. I can imagin good reason to do that, say
>> to check the impact of such queries on an OLTP load.
>>
>> The bug can be kept instead, and it can be called a feature.
>
> No, I agree that this looks like a bug and that we should fix it; for
> example, if all connections from thread 0 terminate for some reason,
> there will be no more reports, even if the other threads continue.
> That's bad too.
>
> What I'm unsure about is the proposed fix.
>
>>> I will leave it alone for the time being.
>>
>> Maybe you could consider pushing the first part of the patch, which stops if
>> a transaction is scheduled after the end of the run? Or is this part
>> bothering you as well?
>
> So there are *two* bugs here?

Hmmm... AFAICR, maybe fixing the first creates the second issue, i.e. 
maybe the second issue is currently hidden by the thread going on after 
the end of the run, so the second is just a latent bug that cannot be 
encountered.

I'm not sure whether I'm very clear:-)

-- 
Fabien.



Re: pgbench small bug fix

From
Robert Haas
Date:
On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>  - when a duration (-T) is specified, ensure that pgbench ends at that
>    time (i.e. do not wait for a transaction beyond the end of the run).

Every other place where doCustom() returns false is implemented as
return clientDone(...).  I think this should probably do the same.

I also think that we should probably store the end time someplace
instead of recomputing it in multiple places (this patch adds two such
places).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench small bug fix

From
Fabien COELHO
Date:
> On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>  - when a duration (-T) is specified, ensure that pgbench ends at that
>>    time (i.e. do not wait for a transaction beyond the end of the run).
>
> Every other place where doCustom() returns false is implemented as
> return clientDone(...).  I think this should probably do the same.

Why not. clientDone() second arguments is totally ignored, I put true 
because it looks better.

> I also think that we should probably store the end time someplace
> instead of recomputing it in multiple places (this patch adds two such
> places).

Why not.

Attached is a v4.

-- 
Fabien.

Re: pgbench small bug fix

From
Robert Haas
Date:
On Wed, Mar 9, 2016 at 4:12 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr>
>> wrote:
>>>
>>>  - when a duration (-T) is specified, ensure that pgbench ends at that
>>>    time (i.e. do not wait for a transaction beyond the end of the run).
>>
>>
>> Every other place where doCustom() returns false is implemented as
>> return clientDone(...).  I think this should probably do the same.
>
>
> Why not. clientDone() second arguments is totally ignored, I put true
> because it looks better.
>
>> I also think that we should probably store the end time someplace
>> instead of recomputing it in multiple places (this patch adds two such
>> places).
>
>
> Why not.
>
> Attached is a v4.

OK, I've committed the fix for the -T part.  It didn't back-patch
cleanly, and it is a minor bug, so I'm not inclined to worry about it
further.

I didn't commit the fix for the -P part, because Alvaro objected to
the proposed method of fixing it as ugly, and I think he's right.
Unless you can come up with a nicer-looking fix, I think that part is
going to stay unfixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench small bug fix

From
Fabien COELHO
Date:
> OK, I've committed the fix for the -T part.  It didn't back-patch
> cleanly, and it is a minor bug, so I'm not inclined to worry about it
> further.

I agree that it is a very minor bug and not necessary worth back-patching.

> I didn't commit the fix for the -P part, because Alvaro objected to
> the proposed method of fixing it as ugly, and I think he's right.
> Unless you can come up with a nicer-looking fix, I think that part is
> going to stay unfixed.

A bug kept on esthetical ground, that's a first!

-- 
Fabien.