Thread: pgbench stats per script & other stuff

pgbench stats per script & other stuff

From
Fabien
Date:
This patch adds per-script statistics & other improvements to pgbench

Rationale: Josh asked for the per-script stats:-)

Some restructuring is done so that all stats (-l --aggregate-interval 
--progress --per-script-stats, latency & lag...) share the same structures 
and functions to accumulate data. This limits a lot the growth of pgbench 
from this patch (+17 lines).

In passing, remove the distinction between internal and external scripts.
Pgbench just execute scripts, some of them may be internal...

As a side effect, all scripts can be accumulated "pgbench -B -N -S -f ..." 
would execute 4 scripts, 3 of which internal (tpc-b, simple-update, 
select-only and another externally supplied one).

Also add a weight option to change the probability of choosing some scripts
when several are available.

Hmmm... Not sure that the --per-script-stats option is really useful. The 
stats could always be shown when several scripts are executed?
  sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats  starting vacuum...end.  transaction type: multiple scripts
scalingfactor: 1  query mode: simple  number of clients: 1  number of threads: 1  duration: 3 s  number of transactions
actuallyprocessed: 3192  latency average: 0.940 ms  tps = 1063.756045 (including connections establishing)  tps =
1065.412737(excluding connections establishing)  SQL script 0: <builtin: TPC-B (sort of)>   - weight is 1   - 297
transactions(tps = 98.977301)   - latency average = 3.001 ms   - latency stddev = 1.320 ms  SQL script 1: <builtin:
simpleupdate>   - weight is 2   - 621 transactions (tps = 206.952539)   - latency average = 2.506 ms   - latency stddev
=1.194 ms  SQL script 2: <builtin: select only>   - weight is 7   - 2274 transactions (tps = 757.826205)   - latency
average= 0.236 ms   - latency stddev = 0.083 ms
 

-- 
Fabien

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Oops, as usual I forgot something...

This v2 removes old stats code that was put in comment and simplify the 
logic when counting lag times, as they are now taken into account at the 
end of the transaction instead of at the beginning.

> This patch adds per-script statistics & other improvements to pgbench
>
> Rationale: Josh asked for the per-script stats:-)
>
> Some restructuring is done so that all stats (-l --aggregate-interval 
> --progress --per-script-stats, latency & lag...) share the same structures 
> and functions to accumulate data. This limits a lot the growth of pgbench 
> from this patch (+17 lines).
>
> In passing, remove the distinction between internal and external scripts.
> Pgbench just execute scripts, some of them may be internal...
>
> As a side effect, all scripts can be accumulated "pgbench -B -N -S -f ..." 
> would execute 4 scripts, 3 of which internal (tpc-b, simple-update, 
> select-only and another externally supplied one).
>
> Also add a weight option to change the probability of choosing some scripts
> when several are available.
>
> Hmmm... Not sure that the --per-script-stats option is really useful. The 
> stats could always be shown when several scripts are executed?
>
>  sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats
>  starting vacuum...end.
>  transaction type: multiple scripts
>  scaling factor: 1
>  query mode: simple
>  number of clients: 1
>  number of threads: 1
>  duration: 3 s
>  number of transactions actually processed: 3192
>  latency average: 0.940 ms
>  tps = 1063.756045 (including connections establishing)
>  tps = 1065.412737 (excluding connections establishing)
>  SQL script 0: <builtin: TPC-B (sort of)>
>   - weight is 1
>   - 297 transactions (tps = 98.977301)
>   - latency average = 3.001 ms
>   - latency stddev = 1.320 ms
>  SQL script 1: <builtin: simple update>
>   - weight is 2
>   - 621 transactions (tps = 206.952539)
>   - latency average = 2.506 ms
>   - latency stddev = 1.194 ms
>  SQL script 2: <builtin: select only>
>   - weight is 7
>   - 2274 transactions (tps = 757.826205)
>   - latency average = 0.236 ms
>   - latency stddev = 0.083 ms
>
>

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Fri, Jul 17, 2015 at 9:50 AM, Fabien <coelho@cri.ensmp.fr> wrote:
>   sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats

That is a truly horrifying abuse of command-line arguments.  -1 from
me, or minus more than one if I've got that many chits to burn.

I have been thinking that the way to do this is to push more into the
script file itself, e.g. allow:

\if random() < 0.1
stuff
\else
other stuff
\endif

Maybe that's overkill and there's some way of specifying multiple
scripts on the command line, but IMO what you've got here is not it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Robert,

> On Fri, Jul 17, 2015 at 9:50 AM, Fabien <coelho@cri.ensmp.fr> wrote:
>>   sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats
>
> That is a truly horrifying abuse of command-line arguments.  -1 from
> me, or minus more than one if I've got that many chits to burn.

Are you against the -w, or against saying that pgbench execute scripts, 
whether internal or from files?

The former is obviously a matter of taste and I can remove "-w" if nobody 
wants it, too bad because the feature seems useful to me from a testing 
point of view, this is a choice between aesthetic and feature. Note that 
you do not have to use it if you do not like it.

The later really homogeneise the code internally and allows to factor out 
things, to have orthogonal features (internal scripts are treated the same 
way as external files, this requires less lines of code because it is 
simpler), and does not harm anyone IMO, so it would be sad to let it go.

> I have been thinking that the way to do this is to push more into the
> script file itself, e.g. allow:
>
> \if random() < 0.1
> stuff
> \else
> other stuff
> \endif
>
> Maybe that's overkill and there's some way of specifying multiple
> scripts on the command line, but IMO what you've got here is not it.

I think that is overkill, and moreover it is not useful: the point is to 
collect statistics *per scripts*, with an "random if" you would not know 
which part was executed, so you would loose the whole point of having per 
script stats.

If you have another suggestion about how to provide weights, which does 
not rely on ifs nor on options? Maybe a special comment in the script (yuk 
from my point of view because the script would carry its weight whereas I 
think this should be orthogonal to the script contents, but it would be 
better than nothing..).

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Tue, Jul 21, 2015 at 10:42 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>>   sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats
>>
>> That is a truly horrifying abuse of command-line arguments.  -1 from
>> me, or minus more than one if I've got that many chits to burn.
>
> Are you against the -w, or against saying that pgbench execute scripts,
> whether internal or from files?

I'm against the idea that we accept multiple arguments for scripts,
and that a subsequent -w modifies the meaning of the
script-specifiying argument already read.  That strikes me as a very
unintuitive interface.  I'm not sure exactly what would be better at
the moment, but I think we need something better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
5~5~5~
>>> That is a truly horrifying abuse of command-line arguments.  -1 from
>>> me, or minus more than one if I've got that many chits to burn.
>>
>> Are you against the -w, or against saying that pgbench execute scripts,
>> whether internal or from files?
>
> I'm against the idea that we accept multiple arguments for scripts,

Pgbench *currently* already accept multiple "-f ..." options, and this is 
a good thing to test realistic loads which may intermix several kind of 
transactions, say a lot of readonly and some update or insert, and very 
rare deletes...

Now if you do not need it you do not use it, and all is fine. Once you 
have several scripts, being able to "weight" them becomes useful for 
realism.

> and that a subsequent -w modifies the meaning of the script-specifiying 
> argument already read. That strikes me as a very unintuitive interface.

Ok, I understand this "afterward modification" objection.

What if the -w would be required *before*, and supply a weight for (the 
first/maybe all) script(s) specified *afterwards*, so it does not modify 
something already provided? I think it would be more intuitive, or at 
least less surprising.

> I'm not sure exactly what would be better at the moment, but I think we 
> need something better.

Maybe -f file.sql:weight (yuk from my point of view, but it can be 
done easily).

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>> [...] and that a subsequent -w modifies the meaning of the 
>> script-specifiying argument already read. That strikes me as a very 
>> unintuitive interface.
>
> Ok, I understand this "afterward modification" objection.
>
> What if the -w would be required *before*, and supply a weight for (the 
> first/maybe all) script(s) specified *afterwards*, so it does not modify 
> something already provided? I think it would be more intuitive, or at least 
> less surprising.

Here is a v3 which does that. If there is a better idea, do not hesitate!
 sh> ./pgbench -w 9 -f one.sql -f now.sql -T 2 -P 1 --per-script-stats starting vacuum...end. progress: 1.0 s, 24536.0
tps,lat 0.039 ms stddev 0.024 progress: 2.0 s, 25963.8 tps, lat 0.038 ms stddev 0.015 transaction type: multiple
scriptsscaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 2 s number of
transactionsactually processed: 50501 latency average = 0.039 ms latency stddev = 0.020 ms tps = 25249.464772
(includingconnections establishing) tps = 25339.454154 (excluding connections establishing) SQL script 0, weight 9:
one.sql - 45366 transactions (89.8% of total, tps = 22682.070035)  - latency average = 0.038 ms  - latency stddev =
0.016ms SQL script 1, weight 1: now.sql  - 5135 transactions (10.2% of total, tps = 2567.394737)  - latency average =
0.044ms  - latency stddev = 0.041 ms
 

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Tue, Jul 21, 2015 at 12:29 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Pgbench *currently* already accept multiple "-f ..." options, and this is a
> good thing to test realistic loads which may intermix several kind of
> transactions, say a lot of readonly and some update or insert, and very rare
> deletes...

Hmm, I didn't realize that.  The code looks a bit inconsistent right
now - e.g. we do support multiple files, but pgbench's options-parsing
loop sets ttype to a value that depends only on the last of -f, -N,
and -S encountered.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Josh Berkus
Date:
On 07/21/2015 09:29 AM, Fabien COELHO wrote:
> Maybe -f file.sql:weight (yuk from my point of view, but it can be done
> easily).

Maybe it's past time for pgbench to have a config file?

Given that we want to define some per-workload options, the config file
would probably need to be YAML or JSON, e.g.:

pgbench --config=workload1.pgb

workload1.pgb
-------------

database: bench
port: 5432
host: localhost
user: josh
clients : 16
threads : 4
response-times : on
stats-level: script
script1:file: script1.benchweight: 3
script2:file: script2.benchweight: 1
the above would execute a pgbench with 16 clients, 4 threads, "script1"
three times as often as script2, and report stats at the script (rather
than SQL statement) level.




-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Robert,

>> Pgbench *currently* already accept multiple "-f ..." options, and this is a
>> good thing to test realistic loads which may intermix several kind of
>> transactions, say a lot of readonly and some update or insert, and very rare
>> deletes...
>
> Hmm, I didn't realize that.  The code looks a bit inconsistent right
> now - e.g. we do support multiple files, but pgbench's options-parsing
> loop sets ttype to a value that depends only on the last of -f, -N,
> and -S encountered.

Indeed. However as with current pgbench <nothing>/-N/-S and -f are 
mutually exclusive it is ok to have ttype set as it is.

With the patch pgbench just executes scripts and the options are not 
mutually exclusive: some scripts are internal and others are not, but they 
are treated the same beyond initialization, which helps removing some code 
including the "ttype" variable you mention. The name of the script is kept 
in an SQLScript struct along with its commands, weight and stats.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Josh,

>> Maybe -f file.sql:weight (yuk from my point of view, but it can be done
>> easily).
>
> Maybe it's past time for pgbench to have a config file?

That is an idea.  For "simple" usage, for backward compatibility and for 
people like me who like them, ISTM that options are fine too:-)

Also this may mean adding a dependency to some YAML library, configure 
issues (I'm not sure whether pg currently uses YAML, and JSON is quite 
verbose), maybe conditionals around the feature to compile without the 
dependency, more documentation...

I'm not sure all that is desirable just for weighting scripts.

> Given that we want to define some per-workload options, the config file
> would probably need to be YAML or JSON, e.g.:
>
> [...]
>
> script1:
>     file: script1.bench
>     weight: 3
> script2:
>     file: script2.bench
>     weight: 1
>
> the above would execute a pgbench with 16 clients, 4 threads, "script1"
> three times as often as script2, and report stats at the script (rather
> than SQL statement) level.

Yep. Probably numbering within field names should be avoided, so a list of 
records that could look like:

scripts: - file: foo1.sql   weight: 9 - file: foo2.sql - internal: tpc-b   weight: 2

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Josh Berkus
Date:
On 07/21/2015 10:25 PM, Fabien COELHO wrote:
> 
> Hello Josh,
> 
>>> Maybe -f file.sql:weight (yuk from my point of view, but it can be done
>>> easily).
>>
>> Maybe it's past time for pgbench to have a config file?
> 
> That is an idea.  For "simple" usage, for backward compatibility and for
> people like me who like them, ISTM that options are fine too:-)
> 
> Also this may mean adding a dependency to some YAML library, configure
> issues (I'm not sure whether pg currently uses YAML, and JSON is quite
> verbose), maybe conditionals around the feature to compile without the
> dependency, more documentation...
> 
> I'm not sure all that is desirable just for weighting scripts.

Maybe not.

If so, I would vote for:

-f script1.bench:3 -f script2.bench:1

over:

-f script1.bench -w 3 -f script2.bench -w 1

Making command-line options order-dependant breaks a lot of system call
libraries in various languages, as well as being easy to mess up.

>> Given that we want to define some per-workload options, the config file
>> would probably need to be YAML or JSON, e.g.:
>>
>> [...]
>>
>> script1:
>>     file: script1.bench
>>     weight: 3
>> script2:
>>     file: script2.bench
>>     weight: 1
>>
>> the above would execute a pgbench with 16 clients, 4 threads, "script1"
>> three times as often as script2, and report stats at the script (rather
>> than SQL statement) level.
> 
> Yep. Probably numbering within field names should be avoided, so a list
> of records that could look like:

Oh, you misunderstand.  "script1" and "script2" are meant to be
user-supplied names which then get reported in things like response time
output.  They're labels. Better example:

deposit:file: deposit.benchweight: 3
withdrawal:file: withdrawal.benchweight: 3
reporting:file: summary_report.benchweigh: 1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> If so, I would vote for:
>   -f script1.bench:3 -f script2.bench:1
> over:
>   -f script1.bench -w 3 -f script2.bench -w 1

Ok, I'll take that into consideration. Any other opinion out there? The 
current v3 version is:
  -w 3 -f script1.bench -w 1 -f script2.bench

With provision to generate errors if a -w is set but not used,
in two case.
 - in the middle ... -w 4 <no script option...> -w 1 ... - in the end ... -w 1 <no script option...>

I can provide -f x:weight easilly, but this mean that there will be no way 
to associate weight for internal scripts. Not orthogonal, not very 
elegant, but no big deal.

> Oh, you misunderstand.  "script1" and "script2" are meant to be 
> user-supplied names which then get reported in things like response time 
> output.  They're labels.

Ok, that is much better. This means that labels should not choose names 
which may interact with other commands, so maybe a list would have been 
nice as well. Anyway, I do not think it is the way to go just for this 
feature.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Andres Freund
Date:
On 2015-07-22 10:54:14 -0700, Josh Berkus wrote:
> Making command-line options order-dependant breaks a lot of system call
> libraries in various languages, as well as being easy to mess up.

What?



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Wed, Jul 22, 2015 at 1:54 PM, Josh Berkus <josh@agliodbs.com> wrote:
> If so, I would vote for:
>
> -f script1.bench:3 -f script2.bench:1
>
> over:
>
> -f script1.bench -w 3 -f script2.bench -w 1
>
> Making command-line options order-dependant breaks a lot of system call
> libraries in various languages, as well as being easy to mess up.

Yes, I think that's a good idea.  I don't know whether : is the right
separator; I kind of line @.  But that's bikeshedding.

As Fabien mentions further downthread, it would be nice to set weights
for the built-ins.  I'd actually like to introduce a new pgbench
option that selects a builtin script by name, so that we can have more
than three of them without running out of option names (or going
insane).  So suppose we introduce pgbench -b BUILTIN_NAME, where
BUILTIN_NAME is initially one of these:

classic
classic-simple-update
classic-select-only

Then you can do pgbench -b classic@1 -b classic-select-only@9 or
similar to get 10% write, 90% read.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> Yes, I think that's a good idea.  I don't know whether : is the right
> separator; I kind of line @.  But that's bikeshedding.

Possible ASCII contenders should avoid shell and filename interaction, 
which exclude * ? ! & / < > [ ] . - $ and so on: those that seem to 
remain are @ , = : % # +. I like "%" because this is about sharing, 
although this is not a percentage.

> I'd actually like to introduce a new pgbench option that selects a 
> builtin script by name, so that we can have more than three of them 
> without running out of option names (or going insane).  So suppose we 
> introduce pgbench -b BUILTIN_NAME, where BUILTIN_NAME is initially one 
> of these:
> classic, classic-simple-update, classic-select-only
>
> Then you can do pgbench -b classic@1 -b classic-select-only@9 or
> similar to get 10% write, 90% read.

I like this idea, as -b/-f would be symmetric. Prepending classic to the 
names does not look necessary. I would suggest "tpcb-like", 
"simple-update" & "select-only", or even maybe any prefix. If the bench 
scripts could be read from some pg directory instead of being actually 
inlined, even more code could be dropped from pgbench.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Thu, Jul 23, 2015 at 12:15 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Yes, I think that's a good idea.  I don't know whether : is the right
>> separator; I kind of line @.  But that's bikeshedding.
>
> Possible ASCII contenders should avoid shell and filename interaction, which
> exclude * ? ! & / < > [ ] . - $ and so on: those that seem to remain are @ ,
> = : % # +. I like "%" because this is about sharing, although this is not a
> percentage.

I liked @ because it makes sense to read it as the word "at".

>> I'd actually like to introduce a new pgbench option that selects a builtin
>> script by name, so that we can have more than three of them without running
>> out of option names (or going insane).  So suppose we introduce pgbench -b
>> BUILTIN_NAME, where BUILTIN_NAME is initially one of these:
>> classic, classic-simple-update, classic-select-only
>>
>> Then you can do pgbench -b classic@1 -b classic-select-only@9 or
>> similar to get 10% write, 90% read.
>
> I like this idea, as -b/-f would be symmetric. Prepending classic to the
> names does not look necessary. I would suggest "tpcb-like", "simple-update"
> & "select-only", or even maybe any prefix. If the bench scripts could be
> read from some pg directory instead of being actually inlined, even more
> code could be dropped from pgbench.

I think including classic would be a very good idea.  We might want to
add a TPC-C like workload in the future, or any number of other
things.  Naming things in a good way from the outset can only make
that easier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> >>[...] and that a subsequent -w modifies the meaning of the
> >>script-specifiying argument already read. That strikes me as a very
> >>unintuitive interface.
> >
> >Ok, I understand this "afterward modification" objection.
> >
> >What if the -w would be required *before*, and supply a weight for (the
> >first/maybe all) script(s) specified *afterwards*, so it does not modify
> >something already provided? I think it would be more intuitive, or at
> >least less surprising.
> 
> Here is a v3 which does that. If there is a better idea, do not hesitate!

This seems a moderately reasonable interface to me.  There are other
programs that behave in that way, and once you get used to the idea, it
makes sense.

I think for complete consistency we would have to require that -w is
specified for all scripts or none of them.  I am not sure if this means
that it's okay to have later scripts use a weight specified for a
previous one (i.e. it's only an error to fail to specify a weight for
options before the first -w), or each -f must have always its own -w
explicitely.  In other words,   pg_bench -w2 -f script1.sql -f script2.sql
either script2 has weight 2, or it's an error, depending on what we
decide; but   pg_bench -f script1.sql -w 2 -fscript2.sql
is always an error.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> I liked @ because it makes sense to read it as the word "at".

Yep, why not.

>> Prepending classic to the names does not look necessary. I would 
>> suggest "tpcb-like", "simple-update" & "select-only", or even maybe any 
>> prefix. If the bench scripts could be read from some pg directory 
>> instead of being actually inlined, even more code could be dropped from 
>> pgbench.
>
> I think including classic would be a very good idea.

Hmm. This is the command line, you have to type them! With a prefix-based
approach this suggests that the builtin names must start differently so as 
to be easily selected.

> We might want to add a TPC-C like workload in the future, or any number 
> of other things.  Naming things in a good way from the outset can only 
> make that easier.

Here is a v4 which:
 - removes -w stuff
 - enhance -f with @weight
 - adds -b/--builtin name@weight, based on prefix
   builtin names are: tpcb-like, simple-update & select-only,   which matches their more or less historical names
(althoughI wasn't sure of "tpcb-sort-of", so I put "tpcb-like")
 
 - removes -B (now can be accessed with -b tpcb-like)

Pgbench builtin scripts are still inlined in the code, not in a separate 
directory, which might be an option to simplify the code and allow easy 
extensions.

I still think that the "--per-script-stats" option is useless and per 
script stats should always be on as soon as several scripts are running.

Even more, I think that stats (maybe no per-command stat though) should
always be collected. The point of pgbench is to collect data, and the
basic end-of-run tps summary is very terse and does not reflect much
of what happened during the run.

Also, maybe per-command detailed stats should use the same common struct 
to hold data as all other stats. I did not change it because it is 
maintained in a different part of the code.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> Also, maybe per-command detailed stats should use the same common struct 
> to hold data as all other stats. I did not change it because it is 
> maintained in a different part of the code.

I played just once with the --report-latencies option and was astonished 
that meta commands showed negative latencies...

This v5 also fixes this bug (on meta commands there is a goto loop in 
doCustom, but as now was not reset the stmt_begin ended up being after 
now, hence accumulating increasing negative times) and in passing uses the 
same stats structure as the rest, which result in removing some more code. 
The "report-latencies" option is made to imply per script stats, which 
simplifies the final output code, and if you want per-command per-script 
stats, probably providing the per-script stats, i.e. the sum of the 
commands, make sense.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
v6 is just a rebase after a bug fix by Andres Freund.

Also a small question: The patch currently displays pgbench scripts 
starting numbering at 0. Probably a little too geek... should start at 1?

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> v6 is just a rebase after a bug fix by Andres Freund.
>
> Also a small question: The patch currently displays pgbench scripts 
> starting numbering at 0. Probably a little too geek... should start at 
> 1?

v7 is a rebase after another small bug fix in pgbench.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> v7 is a rebase after another small bug fix in pgbench.

v8 is a rebase after yet another small bug fix in pgbench.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Andres Freund
Date:
On 2015-07-30 18:03:56 +0200, Fabien COELHO wrote:
> 
> >v6 is just a rebase after a bug fix by Andres Freund.
> >
> >Also a small question: The patch currently displays pgbench scripts
> >starting numbering at 0. Probably a little too geek... should start at 1?
> 
> v7 is a rebase after another small bug fix in pgbench.
> 
> -- 
> Fabien.

> diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
> index 2517a3a..99670d4 100644
> --- a/doc/src/sgml/ref/pgbench.sgml
> +++ b/doc/src/sgml/ref/pgbench.sgml
> @@ -261,6 +261,23 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
>      benchmarking arguments:
>  
>      <variablelist>
> +     <varlistentry>
> +      <term><option>-b</> <replaceable>scriptname[@weight]</></term>
> +      <term><option>--builtin</> <replaceable>scriptname[@weight]</></term>
> +      <listitem>
> +       <para>
> +        Add the specified builtin script to the list of executed scripts.
> +        An optional integer weight after <literal>@</> allows to adjust the
> +        probability of drawing the test.
> +        Available builtin scripts are: <literal>tpcb-like</>,
> +        <literal>simple-update</> and <literal>select-only</>.
> +        The provided <repleacable>scriptname</> needs only to be a prefix
> +        of the builtin name, hence <literal>simp</> would be enough to select
> +        <literal>simple-update</>.
> +       </para>
> +      </listitem>
> +     </varlistentry>

Maybe add --builtin list to show them?

> @@ -404,10 +422,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
>        <term><option>--skip-some-updates</option></term>
>        <listitem>
>         <para>
> -        Do not update <structname>pgbench_tellers</> and
> -        <structname>pgbench_branches</>.
> -        This will avoid update contention on these tables, but
> -        it makes the test case even less like TPC-B.
> +        Shorthand for <option>-b simple-update@1</>.
>         </para>
>        </listitem>
>       </varlistentry>

> @@ -511,7 +526,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
>        <term><option>--select-only</option></term>
>        <listitem>
>         <para>
> -        Perform select-only transactions instead of TPC-B-like test.
> +        Shorthand for <option>-b select-only@1</>.
>         </para>
>        </listitem>
>       </varlistentry>

I'm a bit inclined to remove these options.

>    <para>
> -   The default transaction script issues seven commands per transaction:
> +   Pgbench executes test scripts chosen randomly from a specified list.
> +   They include built-in scripts with <option>-b</> and
> +   user-provided custom scripts with <option>-f</>.
> +   Each script may be given a relative weight specified after a
> +   <literal>@</> so as to change its drawing probability.
> +   The default weight is <literal>1</>.
> + </para>

I'm wondering if percentages instead of weights would be a better
idea. That'd mean you'd be forced to be more careful when adding another
script (having to adjust the percentages of other scripts) but arguably
that's a good thing?

> +static SQLScript sql_script[MAX_SCRIPTS];
> +static struct {
> +    char *name;   /* very short name for -b ...*/
> +    char *desc;   /* short description */
> +    char *script; /* actual pgbench script */
> +} builtin_script[]

Can't we put these in the same array?

> +    printf("transaction type: %s\n",
> +           num_scripts == 1? sql_script[0].name: "multiple scripts");

Seems like it'd be more useful to simply always list the scripts +
weights here.

Greetings,

Andres Freund



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Andres,

> Maybe add --builtin list to show them?

Yep, easy enough.

>> [...]
>> +        Shorthand for <option>-b simple-update@1</>.
>> +        Shorthand for <option>-b select-only@1</>.
>
> I'm a bit inclined to remove these options.

Hm...

This is really backward compatibility, and people may find reference to 
these in blogs or elswhere, so I think that it would make sense to
be upward compatible.

I would certainly be against adding any other of these options, though.

>> +   Each script may be given a relative weight specified after a
>> +   <literal>@</> so as to change its drawing probability.
>> +   The default weight is <literal>1</>.
>
> I'm wondering if percentages instead of weights would be a better
> idea. That'd mean you'd be forced to be more careful when adding another
> script (having to adjust the percentages of other scripts) but arguably
> that's a good thing?

If you use only percent, then you have to check that the total is 100, 
probably you have to use floats, to do something when the total is not 
100, checking would complicate the code and test people mental calculus 
abilities. Not sure this is a good idea:-)

In the use case you outline, when adding a script, maybe you know that it 
runs "as much as" this other script, so you can pick up the same weight 
without bothering.

Also, when testing, there is an issue when you want to remove one script 
for a quick test, and that would mean changing all percentages on the 
command line...

So I would advise not to put such a constraint.

>> +static SQLScript sql_script[MAX_SCRIPTS];
>>
>> +static struct {
>> +    char *name;   /* very short name for -b ...*/
>> +    char *desc;   /* short description */
>> +    char *script; /* actual pgbench script */
>> +} builtin_script[]
>
> Can't we put these in the same array?

I do not understand.

>> +    printf("transaction type: %s\n",
>> +           num_scripts == 1? sql_script[0].name: "multiple scripts");
>
> Seems like it'd be more useful to simply always list the scripts +
> weights here.

The detailed list is shown later, with the summary performance figure for 
each scripts, so ISTM that it would be redundant? Maybe the transaction 
type could be moved downwards just before said list?

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Wed, Sep 2, 2015 at 2:20 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> I'm wondering if percentages instead of weights would be a better
>> idea. That'd mean you'd be forced to be more careful when adding another
>> script (having to adjust the percentages of other scripts) but arguably
>> that's a good thing?
>
> If you use only percent, then you have to check that the total is 100,
> probably you have to use floats, to do something when the total is not 100,
> checking would complicate the code and test people mental calculus
> abilities. Not sure this is a good idea:-)

I agree.  I don't see a reason to enforce that the total of the
weights must be 100.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Andres Freund
Date:
On 2015-09-02 14:36:51 -0400, Robert Haas wrote:
> On Wed, Sep 2, 2015 at 2:20 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> >> I'm wondering if percentages instead of weights would be a better
> >> idea. That'd mean you'd be forced to be more careful when adding another
> >> script (having to adjust the percentages of other scripts) but arguably
> >> that's a good thing?
> >
> > If you use only percent, then you have to check that the total is 100,
> > probably you have to use floats, to do something when the total is not 100,
> > checking would complicate the code and test people mental calculus
> > abilities. Not sure this is a good idea:-)
> 
> I agree.  I don't see a reason to enforce that the total of the
> weights must be 100.

I'm slightly worried that using weights will be a bit confusing because
adding another script will obviously reduce the frequency of already
defined scripts. But it's probably not worth worrying.

Andres



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Wed, Sep 2, 2015 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-09-02 14:36:51 -0400, Robert Haas wrote:
>> On Wed, Sep 2, 2015 at 2:20 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> >> I'm wondering if percentages instead of weights would be a better
>> >> idea. That'd mean you'd be forced to be more careful when adding another
>> >> script (having to adjust the percentages of other scripts) but arguably
>> >> that's a good thing?
>> >
>> > If you use only percent, then you have to check that the total is 100,
>> > probably you have to use floats, to do something when the total is not 100,
>> > checking would complicate the code and test people mental calculus
>> > abilities. Not sure this is a good idea:-)
>>
>> I agree.  I don't see a reason to enforce that the total of the
>> weights must be 100.
>
> I'm slightly worried that using weights will be a bit confusing because
> adding another script will obviously reduce the frequency of already
> defined scripts. But it's probably not worth worrying.

That sounds like a feature to me, not a bug.  I wouldn't worry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Anders,

This v9 : - add "-b list" to show the list of builtins - remove the explicit --per-scripts-stats option, which is
instead  automatically set when several scripts are run or with per-command   latencies (-r) - count scripts from 1
insteadof 0 in the output
 

I've left out: - removing -N/-S upward compatibility shorthands   but I will not cry if they are removed - requiring
percentsinstead of integer weights, because   it is too constrained - your "array" remark as I did not understood it
 

Thanks to the restructuring and sharing of stats code, the patch does not 
change the loc count, although a few features are added.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Andres Freund
Date:
On 2015-09-02 20:20:45 +0200, Fabien COELHO wrote:
> >>+static SQLScript sql_script[MAX_SCRIPTS];
> >>
> >>+static struct {
> >>+    char *name;   /* very short name for -b ...*/
> >>+    char *desc;   /* short description */
> >>+    char *script; /* actual pgbench script */
> >>+} builtin_script[]
> >
> >Can't we put these in the same array?
> 
> I do not understand.

Right now builtins and user defined scripts are stored in different data
structures. I'd rather see them in the same.

Greetings,

Andres Freund



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:

> Right now builtins and user defined scripts are stored in different data
> structures. I'd rather see them in the same.

They already are in the same array (sql_script) when pre-processed and 
executed, there is no distinction beyond initialization.

The builtin_script array contains the equivalent of the external custom 
file (name, lines of code), so that they can be processed by 
process_builtin and addScript to build the SQLScript ready for execution, 
while for external files it relies on process_file and addScript.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Thu, Sep 3, 2015 at 3:26 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I've left out:
>  - removing -N/-S upward compatibility shorthands
>    but I will not cry if they are removed

I see no particular merit to breaking backward compatibility here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>> I've left out:
>>  - removing -N/-S upward compatibility shorthands
>>    but I will not cry if they are removed
>
> I see no particular merit to breaking backward compatibility here.

I agree, but I would not fight for this. I think there is a good argument 
*NOT* to add more if new builtin scripts are added later.

Currently the builtin script can be selected with "-b t" (t for tcpb-like), 
"-b s" (s for simple-update) and "-b se" (se for select-only).

I've reused their current names for the option selector, and it takes the 
first matching prefix.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Here is a v10, which is a rebase because of the "--progress-timestamp" 
option addition.

It also include the fix for the tps without connection computation and 
some minor code simplification, so it is redundant with this bug fix 
patch:
    https://commitfest.postgresql.org/7/378/

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Robert Haas
Date:
On Sat, Sep 26, 2015 at 3:27 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Here is a v10, which is a rebase because of the "--progress-timestamp"
> option addition.

I do not see it attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>> Here is a v10, which is a rebase because of the "--progress-timestamp"
>> option addition.
>
> I do not see it attached.

Indeed. Here it is.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> Here is a v10, which is a rebase because of the "--progress-timestamp" option 
> addition.

Here is a v11, which is a rebase after some recent changes committed to 
pgbench.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Sat, Oct 3, 2015 at 3:11 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> Here is a v10, which is a rebase because of the "--progress-timestamp"
>> option addition.
>
>
> Here is a v11, which is a rebase after some recent changes committed to
> pgbench.

+        The provided <repleacable>scriptname</> needs only to be a prefix
s/repleacable/replaceable, in short I think that documentation
compilation would fail.

-        Do not update <structname>pgbench_tellers</> and
-        <structname>pgbench_branches</>.
-        This will avoid update contention on these tables, but
-        it makes the test case even less like TPC-B.
+        Shorthand for <option>-b simple-update@1</>.
I don't think it is a good idea to remove entirely the description of
what the default scenarios can do. The description would be better at
the bottom in some <para> with a list of each default test and what to
expect from them.

+/* data structure to hold various statistics.
+ * it is used for interval statistics as well as file statistics. */
Nitpick: this is not a comment formatted the Postgres-way.

This is surprisingly broken:
$ pgbench -i
some of the specified options cannot be used in initialization (-i) mode

Any file name or path including "@" will fail strangely:
$ pgbench -f "test@1.sql"
could not open file "test": No such file or directory
empty commands for test
Perhaps instead of failing we should warn the user and enforce the
weight to be set at 1?

$ pgbench -b foo
no builtin found for "foo"
This is not really helpful for the user, I think that the list of
potential options should be listed as an error hint.

-                  "  -N, --skip-some-updates  skip updates of
pgbench_tellers and pgbench_branches\n"
+                  "  -N, --skip-some-updates  same as \"-b simple-update@1\"\n"                  "  -P, --progress=NUM
     show thread progress
 
report every NUM seconds\n"                  "  -r, --report-latencies   report average latency
per command\n"               "  -R, --rate=NUM           target rate in
transactions per second\n"                  "  -s, --scale=NUM          report this scale
factor in output\n"
-                  "  -S, --select-only        perform SELECT-only
transactions\n"
+                  "  -S, --select-only        same as \"-b select-only@1\"\n"
It is good to mention that there is an equivalent, but I think that
the description should be kept.

+                       /* although a mutex would make sense, the
likelyhood of an issue
+                        * is small and these are only stats which may
be slightly false
+                        */
+                       doSimpleStats(& commands[st->state]->stats,
+                                                 INSTR_TIME_GET_DOUBLE(now) -
+
INSTR_TIME_GET_DOUBLE(st->stmt_begin));
Why would the likelyhood of an issue be small here?

+       /* print NaN if no transactions where executed */
+       double latency = ss->sum / ss->count;
This does not look like a good idea, ss->count can be 0.

It seems also that it would be a good idea to split the patch into two parts:
1) Refactor the code so as the existing test scripts are put under the
same umbrella with addScript, adding at the same time the new option
-b.
2) Add the weight facility and its related statistics.

The patch having some issues, I am marking it as returned with
feedback. It would be nice to see a new version for next CF.
Regards,
-- 
Michael



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> -        Do not update <structname>pgbench_tellers</> and
> -        <structname>pgbench_branches</>.
> -        This will avoid update contention on these tables, but
> -        it makes the test case even less like TPC-B.
> +        Shorthand for <option>-b simple-update@1</>.

> I don't think it is a good idea to remove entirely the description of
> what the default scenarios can do. The description would be better at
> the bottom in some <para> with a list of each default test and what to
> expect from them.

I'm trying to avoid to have the same explanation twice, otherwise someone 
is bound to complain.

> +/* data structure to hold various statistics.
> + * it is used for interval statistics as well as file statistics.
>  */
> Nitpick: this is not a comment formatted the Postgres-way.

Indeed.

> This is surprisingly broken:
> $ pgbench -i
> some of the specified options cannot be used in initialization (-i) mode

Hmmm.

> Any file name or path including "@" will fail strangely:
> $ pgbench -f "test@1.sql"
> could not open file "test": No such file or directory
> empty commands for test
> Perhaps instead of failing we should warn the user and enforce the
> weight to be set at 1?

Yep, I can have a look at that.

> $ pgbench -b foo
> no builtin found for "foo"
> This is not really helpful for the user, I think that the list of
> potential options should be listed as an error hint.

Yep.

> -                  "  -S, --select-only        perform SELECT-only
> transactions\n"
> +                  "  -S, --select-only        same as \"-b select-only@1\"\n"
> It is good to mention that there is an equivalent, but I think that
> the description should be kept.

The reason replace it is to keep the help message short column-wise.

> +                       /* although a mutex would make sense, the
> likelyhood of an issue
> +                        * is small and these are only stats which may
> be slightly false
> +                        */
> +                       doSimpleStats(& commands[st->state]->stats,
> +                                                 INSTR_TIME_GET_DOUBLE(now) -


> Why would the likelyhood of an issue be small here?

The time to update one stat (<< 100 cycles ?) to the time to do a 
transaction with the database (typically Y ms), so the likelyhood of two 
thread to update the very same stat at the same time is probably under 
1/10,000,000. Even if it occurs, then one stat is slightly false, no big 
deal. So I think the potential slowdown induced by a mutex is not worth 
it, so I a comment instead.

> +       /* print NaN if no transactions where executed */
> +       double latency = ss->sum / ss->count;
> This does not look like a good idea, ss->count can be 0.

"sum" is a double so count is converted to 0.0, 0.0/0.0 == NaN, hence the 
comment.

> It seems also that it would be a good idea to split the patch into two parts:
> 1) Refactor the code so as the existing test scripts are put under the
> same umbrella with addScript, adding at the same time the new option
> -b.
> 2) Add the weight facility and its related statistics.

Sigh. The patch & documentation are probably not independent, so that 
would make two dependent patches, probably.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Tue, Dec 15, 2015 at 5:53 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I wrote:
>> Why would the likelyhood of an issue be small here?
>
> The time to update one stat (<< 100 cycles ?) to the time to do a
> transaction with the database (typically Y ms), so the likelyhood of two
> thread to update the very same stat at the same time is probably under
> 1/10,000,000. Even if it occurs, then one stat is slightly false, no big
> deal. So I think the potential slowdown induced by a mutex is not worth it,
> so I a comment instead.

OK, got it and agreed.

>> +       /* print NaN if no transactions where executed */
>> +       double latency = ss->sum / ss->count;
>> This does not look like a good idea, ss->count can be 0.
>
>
> "sum" is a double so count is converted to 0.0, 0.0/0.0 == NaN, hence the
> comment.

PG code usually avoids that, and I recall static analyze tools type
coverity complaining that this may lead to undefined behavior. While I
agree that this would lead to NaN...

>> It seems also that it would be a good idea to split the patch into two
>> parts:
>> 1) Refactor the code so as the existing test scripts are put under the
>> same umbrella with addScript, adding at the same time the new option
>> -b.
>> 2) Add the weight facility and its related statistics.
>
>
> Sigh. The patch & documentation are probably not independent, so that would
> make two dependent patches, probably.

I am not really saying so, it seems just that doing the refactoring
(with its related docs), and then add the extension for the weight
(with its docs) is more natural than doing both things at the same
time.
-- 
Michael



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>> "sum" is a double so count is converted to 0.0, 0.0/0.0 == NaN, hence the
>> comment.
>
> PG code usually avoids that, and I recall static analyze tools type
> coverity complaining that this may lead to undefined behavior. While I
> agree that this would lead to NaN...

Hmmm. In this case that is what is actually wanted. If there is no 
transaction, the tps or average latency or whatever is "NaN", I cannot 
help it, and IEEE 754 allow that. So in this case the tool is wrong if it 
complains, or at least we are right to ignore the warning. Maybe there is 
some special comment to say "ignore this warning on the next line" if it 
occurs, if this is an issue.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Tue, Dec 15, 2015 at 8:41 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> PG code usually avoids that, and I recall static analyze tools type
>> coverity complaining that this may lead to undefined behavior. While I
>> agree that this would lead to NaN...
>
>
> Hmmm. In this case that is what is actually wanted. If there is no
> transaction, the tps or average latency or whatever is "NaN", I cannot help
> it, and IEEE 754 allow that. So in this case the tool is wrong if it
> complains, or at least we are right to ignore the warning. Maybe there is
> some special comment to say "ignore this warning on the next line" if it
> occurs, if this is an issue.

Yeah, that's actually fine. I just had a look at Windows stuff, and
things seem to be correct on this side for double:
https://msdn.microsoft.com/en-us/library/aa691373%28v=vs.71%29.aspx
And then I also had a look at src/port/snprintf.c, where things get
actually weird when no transactions are run for a script (emulated
with 2 scripts, one with @10000 and the second with @1):- 0 transactions (0.0% of total, tps = 0.000000)- latency
average= -1.#IO ms- latency stddev = -1.#IO ms
 
And it seems that this is a bug in fmtfloat() because it does not
handle nan values correctly. Others, any thoughts about that?
It is possible to address things within your patch by using isnan()
and print another message but I think that we had better patch
snprintf.c if my analysis is right.

Oh, and actually when trying to compile the patch on Windows things
are failing because int64_t is undefined :) After switching to int64
things worked better.
-- 
Michael



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>>> It seems also that it would be a good idea to split the patch into two
>>> parts:
>>> 1) Refactor the code so as the existing test scripts are put under the
>>> same umbrella with addScript, adding at the same time the new option
>>> -b.
>>> 2) Add the weight facility and its related statistics.
>>
>> Sigh. The patch & documentation are probably not independent, so that would
>> make two dependent patches, probably.
>
> I am not really saying so, it seems just that doing the refactoring
> (with its related docs), and then add the extension for the weight
> (with its docs) is more natural than doing both things at the same
> time.

Ok. I can separate the refactoring (scripts & stats) and the weight stuff 
on top of the refactoring.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Here is a two part v12, which:

part a (refactoring of scripts and their stats): - fix option checks (-i alone) - s/repleacable/replaceable/ in doc -
keepsmall description in doc and help for -S & -N - fix 2 comments for pg style - show builtin list if not found
 

part b (weight) - check that the weight is an int

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Wed, Dec 16, 2015 at 4:09 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Yeah, that's actually fine. I just had a look at Windows stuff, and
> things seem to be correct on this side for double:
> https://msdn.microsoft.com/en-us/library/aa691373%28v=vs.71%29.aspx
> And then I also had a look at src/port/snprintf.c, where things get
> actually weird when no transactions are run for a script (emulated
> with 2 scripts, one with @10000 and the second with @1):
>  - 0 transactions (0.0% of total, tps = 0.000000)
>  - latency average = -1.#IO ms
>  - latency stddev = -1.#IO ms
> And it seems that this is a bug in fmtfloat() because it does not
> handle nan values correctly. Others, any thoughts about that?
> It is possible to address things within your patch by using isnan()
> and print another message but I think that we had better patch
> snprintf.c if my analysis is right.

FWIW, I just had a closer look at this portion and I arrived at the
conclusion that sprintf implementation on Windows is just broken as it
is not able to handle appropriately inf or nan as exceptions.
fmtfloat@src/port/snprintf.c relies on the system's implementation of
sprintf to handle those exceptions, however even directly calling
sprintf results in the same weird output, inf showing up as "1.#IO"
and nan as "-1.#IO". Anyone, feel free to disagree if I am missing
something.

Still, it would be cool to have better error message when there is no
value to show up to the user, like "no latency average" or "undefined
latency average". That would be more elegant, and the patches proposed
still lack that.
-- 
Michael



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Michaël,

>> And then I also had a look at src/port/snprintf.c, where things get
>> actually weird when no transactions are run for a script (emulated
>> with 2 scripts, one with @10000 and the second with @1):
>>  - 0 transactions (0.0% of total, tps = 0.000000)
>>  - latency average = -1.#IO ms
>>  - latency stddev = -1.#IO ms
>> And it seems that this is a bug in fmtfloat() because it does not
>> handle nan values correctly. Others, any thoughts about that?
>> It is possible to address things within your patch by using isnan()
>> and print another message but I think that we had better patch
>> snprintf.c if my analysis is right.
>
> FWIW, I just had a closer look at this portion and I arrived at the
> conclusion that sprintf implementation on Windows is just broken as it
> is not able to handle appropriately inf or nan as exceptions.
> fmtfloat@src/port/snprintf.c relies on the system's implementation of
> sprintf to handle those exceptions, however even directly calling
> sprintf results in the same weird output, inf showing up as "1.#IO"
> and nan as "-1.#IO". Anyone, feel free to disagree if I am missing
> something.

I have no opinion any about M$ implementation of double prettyprinting, 
but I agree that "-1.#IO" looks strange. WWW seems to say that "-1.INF" 
and "-1.IND" are the "normal" way for windows to say infinity or not a 
number. Well, if someone there thought it look good, I cannot help it.

> Still, it would be cool to have better error message when there is no
> value to show up to the user, like "no latency average" or "undefined
> latency average". That would be more elegant, and the patches proposed
> still lack that.

Hmmm. I do not buy that for several reasons:

For --progress style reporting you want NaN or whatever, because the 
output could be processed further unix-style from a pipe (grep/cut/...). 
This is also true for the final report. I would not want to change the 
output organisations for some special values, I would just like to get the 
value whatever it is, "NaN" or "Infinity" or even "-1.IND", so that the 
pipe commands would work.

Also, for the final report, it seems to me overkill to try to work around 
cases when pgbench does not run any transactions, which is basically not 
often, as the point is to run many transactions.

Finally this behavior already exists, the patch does not change anything 
AFAICS, and it is not its purpose.

So I would suggest to keep it that way.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> Here is a two part v12, which:
> 
> part a (refactoring of scripts and their stats):
>  - fix option checks (-i alone)
>  - s/repleacable/replaceable/ in doc
>  - keep small description in doc and help for -S & -N
>  - fix 2 comments for pg style
>  - show builtin list if not found

I'm looking at this part of your patch and I think it's far too big to
be a simple refactoring.  Would you split it up please?  I think the
StatsData / SimpleStat addition should be one patch; then there's the -b
changes.  Then there may (or may not) be a bunch of other minor
cleanups, not sure.

I'm willing to commit these patches if I can easily review what they do,
which I cannot with the current state.

Please pgindent; make sure to add /*--- here to avoid pgindent mangling
the comment:
    if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0)    {        /*--------         * parsing:


> part b (weight)
>  - check that the weight is an int

This part looks okay to me.  Minor nitpick,

+       int i = 0, w = 0, wc = (int) getrand(thread, 0, total_weight - 1);

should be three lines, not one.  Also the @W part in the --help output
should be in brackets, as FILE[@W], right?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

>> Here is a two part v12, which:
>>
>> part a (refactoring of scripts and their stats):
>>  - fix option checks (-i alone)
>>  - s/repleacable/replaceable/ in doc
>>  - keep small description in doc and help for -S & -N
>>  - fix 2 comments for pg style
>>  - show builtin list if not found
>
> I'm looking at this part of your patch and I think it's far too big to
> be a simple refactoring.  Would you split it up please?
> I think the StatsData / SimpleStat addition should be one patch;
> then there's the -b changes.  Then there may (or may not) be a bunch of 
> other minor cleanups, not sure.
>
> I'm willing to commit these patches if I can easily review what they do,
> which I cannot with the current state.

Hmmm. ISTM that other people already reviewed it.

I can try to separate (again) some stuff, but there will be no miracle.

The overdue refactoring is because pgbench collects statistics at various 
levels, and each time this is done in a different way. Cleaning this 
requires to touch the stuff in many places, which means a "big" patch, 
although ISTM a straightforward one, but this already the case with this 
one.

> Please pgindent; make sure to add /*--- here to avoid pgindent mangling
> the comment:

Ok.

>> part b (weight)
>>  - check that the weight is an int
>
> This part looks okay to me.  Minor nitpick,
>
> +       int i = 0, w = 0, wc = (int) getrand(thread, 0, total_weight - 1);
>
> should be three lines, not one.

Ok.

> Also the @W part in the --help output should be in brackets, as 
> FILE[@W], right?

Why not.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

> I'm looking at this part of your patch and I think it's far too big to
> be a simple refactoring. Would you split it up please?

You know how delighted I am to split patches...

Here is a 5 part ordered patch serie:

a) add -b option for cumulating builtins and rework internal script   management so that builtin and external scripts
aremanaged the   same way.
 

b) refactor statistics collections (per thread, per command, per whatever)   so as to use the same structure
everywhere,reducing the CLOC by 115.   this enables the next small patch which can reuse the new functions.
 

c) add per-script statistics... because Josh asked:-)

d) add optional weight to control the relative frequency of scripts.

e) minor code cleanup :   use bool instead of int where appropriate   put together struct fields when they belong
together  move 2 options at their right position in the list
 

This patch serie conflicts slightly with the "add functions to pgbench" 
patch which is marked as ready in the CF. The first to make it will mean 
some conflict resolution for the other. Maybe I would prefer this one 
serie to go first, if I had any say...

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> You know how delighted I am to split patches...

Yes, of course, it's the most interesting task in the world.  I'm fully
aware of that.

FWIW I'm going to apply a preliminary commit to pgindent-clean the file
before your patches, then apply each patch as pgindent-clean.  Otherwise
your whitespace style was getting too much on my nerves.

> b) refactor statistics collections (per thread, per command, per whatever)
>    so as to use the same structure everywhere, reducing the CLOC by 115.
>    this enables the next small patch which can reuse the new functions.

I'm not really sure about the fact that we operate on those Stats
structs without locking.  I see upthread you convinced Michael that it
was okay, but is it really?  How severe is the damage if two threads
happen to collide?


Why is this function defined like this?

/** Initialize a StatsData struct to all zeroes, but the given* start_time, except that if it's exactly zero don't
changeit.*/
 
static void
initStats(StatsData *sd, double start_time)
{sd->cnt = 0;sd->skipped = 0;initSimpleStats(&sd->latency);initSimpleStats(&sd->lag);
/* not necessarily overriden? */if (start_time)    sd->start_time = start_time;
}

It seems a bit funny to have the start_time not be reset when 0.0 is
passed, which is almost all the callers.  Using a float as a boolean
looks pretty odd; is that kosher?  Maybe it'd be a good idea to have a
separate boolean flag instead?  Something like this

/** Initialize a StatsData struct to all zeroes.  Use the given* start_time only if reset_start_time, otherwise keep
theoriginal* value.*/
 
static void
initStats(StatsData *sd, double start_time, bool reset_start_time)
{sd->cnt = 0;sd->skipped = 0;initSimpleStats(&sd->latency);initSimpleStats(&sd->lag);
/* not necessarily overriden? */if (reset_start_time)    sd->start_time = start_time;
}


I renamed a couple of your functionettes, for instance doSimpleStats to
addToSimpleStats and appendSimpleStats to mergeSimpleStats.

Haven't looked at patches c or d yet.  I'm tempted to thrown patch e in
with the initial pgindent run.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> a) add -b option for cumulating builtins and rework internal script
>    management so that builtin and external scripts are managed the
>    same way.

I tweaked this a bit.  I found a bug in threadRun: it was reading the
commands first, and setting st->use_file later.  This led to the wrong
commands being read.

Some other less interesting changes:

* made chooseScript have the logic to react to single existing script;
no need to inject ternary operators in each caller to check for that
condition.

* Added a debug line every time a script is chosen,
+       if (debug)
+           fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
+                   sql_script[st->use_file].name);
  (I'd have liked to have chooseScript itself do it, but it doesn't
  have the script name handy.  Maybe this indicates that the data
  structures are slightly wrong.)

* Added a separate routine to list available scripts; originally that
was duplicated in "-b list" and when -b got an invalid script name.

* In usage(), I split out the options to select a script instead of
mixing them within "Benchmarking options"; also changed wording of
parenthical comment, no longer carrying the full list of scripts (a
choice which also omitted "-b list" itself):
+          "\nOptions to select what to run:\n"
+          "  -b, --builtin=NAME       add buitin script (use \"-b list\" to display\n"
+          "                           available scripts)\n"
+          "  -f, --file=FILENAME      add transaction script from FILENAME\n"
+          "  -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n"
+          "                           (same as \"-b simple-update\")\n"
+          "  -S, --select-only        perform SELECT-only transactions\n"
+          "                           (same as \"-b select-only\")\n"

I couldn't find a better heading to use there, so that'll have to do
unless someone has a better idea.

Some other trivial changes.  Patch attached.  I plan to push this as
soon as I'm able.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> a) add -b option for cumulating builtins and rework internal script
>    management so that builtin and external scripts are managed the
>    same way.

I'm uncomfortable with the prefix-matching aspect of -b.  It makes
"-b s" ambiguous -- whether it stands for select-only or simple-update
is merely a matter of what comes earlier in the table, which doesn't
seem reasonable to me.  I'd rather have a real way to reject ambiguous
cases, or simply accept only complete spellings.  This is the guilty
party:

> +static char *
> +find_builtin(const char *name, char **desc)
> +{
> +    int        len = strlen(name), i;
> +
> +    for (i = 0; i < N_BUILTIN; i++)
> +    {
> +        if (strncmp(builtin_script[i].name, name, len) == 0)
> +        {
> +            *desc = builtin_script[i].desc;
> +            return builtin_script[i].script;
> +        }
> +    }

I'm going to change this to use strlen(builtin_script[i].name) instead
of "len" here.

If you want to implement real non-ambiguous-prefix code (i.e. have "se"
for "select-only", but reject "s" as ambiguous) be my guest.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> I'm uncomfortable with the prefix-matching aspect of -b.  It makes
> "-b s" ambiguous -- whether it stands for select-only or simple-update
> is merely a matter of what comes earlier in the table, which doesn't
> seem reasonable to me.  [...]

> I'm going to change this to use strlen(builtin_script[i].name) instead
> of "len" here.

I pushed like that, but of course that means you can use "-b
simple-update-foo" and it works.  I could have used just strcmp().
(Part e is pushed too along with an initial pgindent).

Here's part b rebased, pgindented and with some minor additional tweaks
(mostly function commands and the function renames I mentioned).  Still
concerned about the unlocked stat accums.

I haven't tried to rebase the other ones yet, they need manual conflict
fixes.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

> I'm not really sure about the fact that we operate on those Stats
> structs without locking.  I see upthread you convinced Michael that it
> was okay, but is it really?  How severe is the damage if two threads
> happen to collide?

For stats shared among threads, when it occurs one data about one 
transaction is not counted.

On the risk side: the collision probability is pretty low because the time 
to update a value is a "few" cycles, and the time to execute a transaction 
is typically in ms: I think under 1/10,000,000 data could be lost.

On the advantageous side: locking costs significant time thus would impact 
performance, I think that the measured performance loss because the 
occasional transaction data is not counted is lower that the performance 
loss due to systematically locking.

So for me this is really a low risk trade-off.

> [...]
> It seems a bit funny to have the start_time not be reset when 0.0 is
> passed, which is almost all the callers.  Using a float as a boolean
> looks pretty odd; is that kosher?  Maybe it'd be a good idea to have a
> separate boolean flag instead?  Something like this
>
> /*
> * Initialize a StatsData struct to all zeroes.  Use the given
> * start_time only if reset_start_time, otherwise keep the original
> * value.
> */
> static void
> initStats(StatsData *sd, double start_time, bool reset_start_time)
> {
>     sd->cnt = 0;
>     sd->skipped = 0;
>     initSimpleStats(&sd->latency);
>     initSimpleStats(&sd->lag);
>
>     /* not necessarily overriden? */
>     if (reset_start_time)
>         sd->start_time = start_time;
> }

Obviously this would work. I did not think the special case was worth the 
extra argument. This one has some oddity too, because the second argument 
is ignored depending on the third. Do as you feel.

> I renamed a couple of your functionettes, for instance doSimpleStats to
> addToSimpleStats and appendSimpleStats to mergeSimpleStats.

Fine with me.

-- 
Fabien.



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello again,

> If you want to implement real non-ambiguous-prefix code (i.e. have "se"
> for "select-only", but reject "s" as ambiguous) be my guest.

I'm fine with filtering out ambiguous cases (i.e. just the "s" case). 
Attached a small patch for that.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello again,

> Here's part b rebased, pgindented and with some minor additional tweaks
> (mostly function commands and the function renames I mentioned).

Patch looks ok to me, various tests where ok as well.

> Still concerned about the unlocked stat accums.

See my arguments in other mail. I can add a lock if this is a blocker, but 
I think that it is actually better without, because of quantum: the 
measuring process should avoid affecting the measured data, and locking is 
not cheap.

> I haven't tried to rebase the other ones yet, they need manual conflict
> fixes.

Find attached 14-c/d/e rebased patches.

About e, for some obscure reason I failed in my initial attempt at 
inserting the misplaced options in their rightfull position in the option 
list. Sorry for the noise.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> >It seems a bit funny to have the start_time not be reset when 0.0 is
> >passed, which is almost all the callers.  Using a float as a boolean
> >looks pretty odd; is that kosher?  Maybe it'd be a good idea to have a
> >separate boolean flag instead?

> Obviously this would work. I did not think the special case was worth the
> extra argument. This one has some oddity too, because the second argument is
> ignored depending on the third. Do as you feel.

Actually my question was whether keeping the original start_time was the
intended design.  I think some places are okay with keeping the original
value, but the ones in addScript, the per-thread loop in main(), and the
global one also in main() should all be getting a 0.0 instead of leaving
the value uninitialized.

(I did turn the arguments around so that the bool is second and the
float is third.  Thanks for the suggestion.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello again,

>> Obviously this would work. I did not think the special case was worth the
>> extra argument. This one has some oddity too, because the second argument is
>> ignored depending on the third. Do as you feel.
>
> Actually my question was whether keeping the original start_time was the
> intended design.

Sorry I misunderstood the question.

The answer is essentially yes, the field is needed for the "aggregated" 
mode where this specific behavior is used.

However, after some look at the code I think that it is possible to do 
without.

I also spotted an small issue under low tps where the last aggregation was 
not shown.

With the attached version these problems have been removed, no conditional 
initialization. There is also a small diff with the version you sent.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> The answer is essentially yes, the field is needed for the "aggregated" mode
> where this specific behavior is used.

OK, thanks, that looks better to me.

Can you now appreciate why I asked for split patches?  If I had to go
over the big patch I probably wouldn't have been able to read through
each to make sense of it.

I pushed this, along with a few more tweaks, mostly adding comments and
moving functions so that related things are together.  I hope I didn't
break anything.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

Thanks for the progress!

> I pushed this, along with a few more tweaks, mostly adding comments and
> moving functions so that related things are together.  I hope I didn't
> break anything.

Looks ok.

Here is a rebase of the 3 remaining parts: - 15-c: per script stats - 15-d: weighted scripts - 15-e: prefix selection
for-b
 

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Fri, Jan 29, 2016 at 11:28 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Here is a rebase of the 3 remaining parts:
>  - 15-c: per script stats
>  - 15-d: weighted scripts
>  - 15-e: prefix selection for -b

Regarding patch d.
+       /* compute total_weight */
+       for (i = 0; i < num_scripts; i++)
+               total_weight += sql_script[i].weight;
total_weight can overflow :) I don't think that's worth worrying, I am
just noticing that.

+        The provided <replaceable>scriptname</> needs only be an unambiguous
+        prefix of the builtin name, hence <literal>si</> would be enough to
+        select <literal>simple-update</>.
[...]
-               if (strncmp(builtin_script[i].name, name,
-                                       strlen(builtin_script[i].name)) == 0)
+               if (strncmp(builtin_script[i].name, name, len) == 0)
I agree with Alvaro here: this should remain unchanged. It seems to be
a rebase mistake.
-- 
Michael



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Michaël,

Two rebase attached.

>>  - 15-e: prefix selection for -b

> -               if (strncmp(builtin_script[i].name, name,
> -                                       strlen(builtin_script[i].name)) == 0)
> +               if (strncmp(builtin_script[i].name, name, len) == 0)
>
> I agree with Alvaro here: this should remain unchanged. It seems to be
> a rebase mistake.

I do not understand. I tested it and it works as expected. If I put the 
above strlen instead the suffix detection does not work:

 fabien@sto:bin/pgbench> git diff diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index
783cbf9..0c33012100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2684,7 +2684,7 @@
findBuiltin(constchar *name, char **desc)
 
         for (i = 0; i < N_BUILTIN; i++)         { -               if (strncmp(builtin_script[i].name, name, len) == 0)
+              if (strncmp(builtin_script[i].name, name, strlen(builtin_script[i].name)) == 0)                 {
                *desc = builtin_script[i].desc;                         commands = builtin_script[i].commands;
 

 ./pgbench -b t no builtin script found for name "t" Available builtin scripts:         tpcb-like         simple-update
       select-only
 

Indeed, then it can only match if the provided "name" is as long as the 
recorded len. The point of the "suffix" selection is to align to the short 
supplied string.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Something is wrong with patch d.  I noticed two things,

1. the total_weight stuff can overflow,
2. the chooseScript stuff is broken, or something.

See the output below and notice how the percentages don't add up to 100%
(this exact case is an absurd one, of course, but I noticed totals of
99.5% and others while playing with reasonable numbers, so this is
something that really needs to be fixed.)

Another thing is that the "transaction type" output really deserves some
more work.  I think "multiple scripts" really doesn't cut it; we should
have some YAML-like as in the latency reports, which lists all scripts
in use and their weights.

Also, while I have your attention regarding accumulated "technical
debt", please have a look at the "desc" argument used in addScript etc.
It's pretty ridiculous currently.  Maybe findBuiltin / process_builtin /
process_file should return a struct containing Command ** and the
"desc" string, rather than passing desc as a separate argument.

I changed the getWeight stuff completely (and renamed it, and added
comments); I wasn't comfortable with the idea of messing with the
optarg, and neither with the idea of continuing to use it without
copying after processing the arguments.  I think some platforms don't
like any of those things.

Attached is my version of the patch.  While you're messing with it, it'd
be nice if you added comments on top of your recently added functions
such as findBuiltin, process_builtin, chooseScript.


$ ./pgbench -r -j4 -c4 -t1000 -b tpcb-like@10 -f uno.sql@214748364
starting vacuum...end.
transaction type: multiple scripts
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
number of transactions per client: 1000
number of transactions actually processed: 4000/4000
latency average: 0.000 ms
tps = 23422.357812 (including connections establishing)
tps = 24172.981858 (excluding connections establishing)
SQL script 1, weight 10: <builtin: TPC-B (sort of)>
 - 0 transactions (0.0% of total, tps = 0.000000)
 - latency average = -nan ms
 - latency stddev = -nan ms
 - statement latencies in milliseconds:
          -nan  \set nbranches 1 * :scale
          -nan  \set ntellers 10 * :scale
          -nan  \set naccounts 100000 * :scale
          -nan  \setrandom aid 1 :naccounts
          -nan  \setrandom bid 1 :nbranches
          -nan  \setrandom tid 1 :ntellers
          -nan  \setrandom delta -5000 5000
          -nan  BEGIN;
          -nan  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
          -nan  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
          -nan  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
          -nan  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
          -nan  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta,
CURRENT_TIMESTAMP);
          -nan  END;
SQL script 2, weight 214748364: uno.sql
 - 2649 transactions (66.2% of total, tps = 15511.456461)
 - latency average = 0.163 ms
 - latency stddev = 0.337 ms
 - statement latencies in milliseconds:
         0.158  select 1;



--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

> Something is wrong with patch d.  I noticed two things,
>
> 1. the total_weight stuff can overflow,

It can generate an error on overflow by checking the total_weight while 
it is being computed. I've switched total_weight to int64 so it is now 
really impossible to overflow with the 32 bit int_max limit on weight.

> 2. the chooseScript stuff is broken, or something.

Sorry, probably a <=/< error. I think I've fixed it and I've simplified 
the code a little bit.

> Another thing is that the "transaction type" output really deserves some
> more work.  I think "multiple scripts" really doesn't cut it; we should
> have some YAML-like as in the latency reports, which lists all scripts
> in use and their weights.

For me the current output is clear for the reader, which does not 
mean that it cannot be improve, but how is rather a matter of taste.

I've tried to improve it further, see attached patch.

If you want something else, it would help to provide a sample of what you 
expect.

> Also, while I have your attention regarding accumulated "technical
> debt", please have a look at the "desc" argument used in addScript etc.
> It's pretty ridiculous currently.  Maybe findBuiltin / process_builtin /
> process_file should return a struct containing Command ** and the
> "desc" string, rather than passing desc as a separate argument.

Ok, it can return a pointer to the builtin script.

> Attached is my version of the patch.  While you're messing with it, it'd
> be nice if you added comments on top of your recently added functions
> such as findBuiltin, process_builtin, chooseScript.

Why not.

Find attached a 18-d which addresses these concerns, and a actualized 18-e 
for the prefix.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Fri, Feb 5, 2016 at 12:53 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Something is wrong with patch d.  I noticed two things,
>> 1. the total_weight stuff can overflow,
>
> It can generate an error on overflow by checking the total_weight while it
> is being computed. I've switched total_weight to int64 so it is now really
> impossible to overflow with the 32 bit int_max limit on weight.

+       /* compute total_weight */
+       for (i = 0; i < num_scripts; i++)
+       {
+               total_weight += sql_script[i].weight;
+
+               /* detect overflow... */
+               if (total_weight < 0)
+               {
+                       fprintf(stderr, "script weight overflow at
script %d\n", i+1);
+                       exit(1);
+               }
+       }
If let as int64, you may want to remove this overflow check, or keep
it as int32.

>> 2. the chooseScript stuff is broken, or something.
>
> Sorry, probably a <=/< error. I think I've fixed it and I've simplified the
> code a little bit.

+       w = getrand(thread, 0, total_weight - 1);
+       do
+       {
+               w -= sql_script[i++].weight;
+       } while (w >= 0);
+
+       return i - 1;
This portion looks fine to me.

>> Another thing is that the "transaction type" output really deserves some
>> more work.  I think "multiple scripts" really doesn't cut it; we should
>> have some YAML-like as in the latency reports, which lists all scripts
>> in use and their weights.
>
> For me the current output is clear for the reader, which does not mean that
> it cannot be improve, but how is rather a matter of taste.
>
> I've tried to improve it further, see attached patch.
>
> If you want something else, it would help to provide a sample of what you
> expect.

You could do that with an additional option here as well:
--output-format=normal|yamljson. The tastes of each user is different.
>> Attached is my version of the patch.  While you're messing with it, it'd
>> be nice if you added comments on top of your recently added functions
>> such as findBuiltin, process_builtin, chooseScript.
> Why not.
       const char *name;
+       int                     weight;       Command   **commands;
-       StatsData stats;
+       StatsData       stats;
Noise here?

> Find attached a 18-d which addresses these concerns, and a actualized 18-e
> for the prefix.

(I could live without that personally)

-/* return builtin script "name", or fail if not found */
+/* return commands for selected builtin script, if unambiguous */static script_t *findBuiltin(const char *name)
This comment needs a refresh. This does not return a set of commands,
but the script itself.
-- 
Michael



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
I closed this one as "committed", since we pushed a bunch of parts.
Please submit the two remaining ones to the next commitfest.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Michaël,

> +       /* compute total_weight */
> +       for (i = 0; i < num_scripts; i++)
> +       {
> +               total_weight += sql_script[i].weight;
> +
> +               /* detect overflow... */
> If let as int64, you may want to remove this overflow check, or keep
> it as int32.

I'd rather keep int64, and remove the check.

>> [JSON/YAML]
>> If you want something else, it would help to provide a sample of what you
>> expect.
>
> You could do that with an additional option here as well:
> --output-format=normal|yamljson. The tastes of each user is different.

I think that json/yaml-ifying pgbench output is beyond the object of this
patch, so should be kept out?

>        const char *name;
> +       int                     weight;
>        Command   **commands;
> -       StatsData stats;
> +       StatsData       stats;
> Noise here?

Indeed.

>> Find attached a 18-d which addresses these concerns, and a actualized 18-e
>> for the prefix.
>
> (I could live without that personally)

Hmmm, I type them and I'm not so good with a keyboard, so "se" is better 
than:

"selct-only<back-back-back-back-back-back-back-back>ect-only".

> -/* return builtin script "name", or fail if not found */
> +/* return commands for selected builtin script, if unambiguous */
> static script_t *
> findBuiltin(const char *name)
> This comment needs a refresh. This does not return a set of commands,
> but the script itself.

Indeed.

Attached 19-d and 19-e.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Michael Paquier
Date:
On Tue, Feb 9, 2016 at 4:22 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> +       /* compute total_weight */
>> +       for (i = 0; i < num_scripts; i++)
>> +       {
>> +               total_weight += sql_script[i].weight;
>> +
>> +               /* detect overflow... */
>> If let as int64, you may want to remove this overflow check, or keep
>> it as int32.
>
>
> I'd rather keep int64, and remove the check.

OK, and you did so. Thanks.

>>> [JSON/YAML]
>>> If you want something else, it would help to provide a sample of what you
>>> expect.
>>
>> You could do that with an additional option here as well:
>> --output-format=normal|yamljson. The tastes of each user is different.
>
> I think that json/yaml-ifying pgbench output is beyond the object of this
> patch, so should be kept out?

Yeah, that's just a free idea that this set of patches does not need
to address. If someone thinks that's worth it, feel free to submit a
patch, perhaps we could add a TODO item on the wiki. Regarding the
output generated by your patch, I think that's fine. Perhaps Alvaro
has other thoughts on the matter. I don't know this part.

>>> Find attached a 18-d which addresses these concerns, and a actualized
>>> 18-e
>>> for the prefix.
>>
>>
>> (I could live without that personally)
>
> Hmmm, I type them and I'm not so good with a keyboard, so "se" is better
> than:
>
> "selct-only<back-back-back-back-back-back-back-back>ect-only".

I can understand that feeling.

>> -/* return builtin script "name", or fail if not found */
>> +/* return commands for selected builtin script, if unambiguous */
>> static script_t *
>> findBuiltin(const char *name)
>> This comment needs a refresh. This does not return a set of commands,
>> but the script itself.
>
> Indeed.
>
> Attached 19-d and 19-e.

+/* return builtin script "name", or fail if not found */
builtin does not sound like correct English to me, but built-in is.
-- 
Michael



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hi Michaël,

>> Attached 19-d and 19-e.
>
> +/* return builtin script "name", or fail if not found */
> builtin does not sound like correct English to me, but built-in is.

I notice that "man bash" uses "builtin" extensively, so I think it is okay 
like that, but I would be fine as well with "built-in".

I suggest to let it as is unless some native speaker really requires 
"built-in", in which case there would be many places to update, so that 
would be another orthographic-oriented patch:-)

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
I looked at 19.d and I think the design has gotten pretty convoluted.  I
think we could simplify with the following changes:

struct script_t gets a new member, of type Command **, which is
initially null.

function process_builtin receives the complete script_t (not individual
memebers of it) constructs the Command ** array and puts it in
script_t's new member; return value is the same script_t struct it got
(except it's now augmented with the Command **array).

function process_file constructs a new script_t from the string list,
creates its Command **array just like process_builtin and returns the
constructed struct.

function addScript receives script_t instead of individual members of
it, and does the appropriate thing.


Alternatively, we could have a different struct that's defined to carry
only the Command ** array (not the command string array) and is returned 
by both process_builtin and process_file.  Perhaps we could also put the
script weight in there.  With this arrangement we don't need to typedef
script_t at all and we can just keep it as an anonymous struct as today.

This is what I tried to describe earlier, but obviously I wasn't clear
enough.

Thanks,

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Tue, Feb 9, 2016 at 4:22 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> > Hmmm, I type them and I'm not so good with a keyboard, so "se" is better
> > than:
> >
> > "selct-only<back-back-back-back-back-back-back-back>ect-only".
> 
> I can understand that feeling.

Pushed 19-e, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

> I looked at 19.d and I think the design has gotten pretty convoluted.  I
> think we could simplify with the following changes:
>
> struct script_t gets a new member, of type Command **, which is
> initially null.
>
> function process_builtin receives the complete script_t (not individual
> memebers of it) constructs the Command ** array and puts it in
> script_t's new member; return value is the same script_t struct it got
> (except it's now augmented with the Command **array).
>
> function process_file constructs a new script_t from the string list,
> creates its Command **array just like process_builtin and returns the
> constructed struct.
>
> function addScript receives script_t instead of individual members of
> it, and does the appropriate thing.

Why not. Here are two versions:
  *-20.patch is the initial rebased version
  *-21.patch does what you suggested above, some hidden awkwardness     but much less that the previous one.

-- 
Fabien

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

Hi,

>   *-21.patch does what you suggested above, some hidden awkwardness
>      but much less that the previous one.

Yeah, I think this is much nicer, don't you agree?

However, this is still a bit broken -- you cannot return a stack
variable from process_file, because the stack goes away once the
function returns.  You need to malloc it.

Also, you forgot to update the comments in process_file,
process_builtin, etc.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>>   *-21.patch does what you suggested above, some hidden awkwardness
>>      but much less that the previous one.
>
> Yeah, I think this is much nicer, don't you agree?

Yep, I said "less awkwarness than previous", a pretty contrived way to say 
better:-)

> However, this is still a bit broken -- you cannot return a stack
> variable from process_file, because the stack goes away once the
> function returns.  You need to malloc it.

That is why the "fs" variable in process_file is declared "static", and 
why I wrote "some hidden awkwarness".

I did want to avoid a malloc because then who would free the struct? 
addScript cannot to it systematically because builtins are static. Or it 
would have to create an on purpose struct, but I then that would be more 
awkwarness, and malloc/free to pass arguments between functions is not 
efficient nor very elegant.

So the "static" option looked like the simplest & most elegant version.

> Also, you forgot to update the comments in process_file,
> process_builtin, etc.

Indeed. v22 attached with better comments.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> >However, this is still a bit broken -- you cannot return a stack
> >variable from process_file, because the stack goes away once the
> >function returns.  You need to malloc it.
> 
> That is why the "fs" variable in process_file is declared "static", and why
> I wrote "some hidden awkwarness".
> 
> I did want to avoid a malloc because then who would free the struct?
> addScript cannot to it systematically because builtins are static. Or it
> would have to create an on purpose struct, but I then that would be more
> awkwarness, and malloc/free to pass arguments between functions is not
> efficient nor very elegant.
> 
> So the "static" option looked like the simplest & most elegant version.

Surely that trick breaks if you have more than one -f switch, no?  Oh, I
see what you're doing: you only use the command list, which is
allocated, so it doesn't matter that the rest of the struct changes
later.  That seems rather nasty to me -- I'd avoid that.

I'm not concerned about freeing the struct; what's the problem with it
surviving until the program terminates?  If somebody specifies thousands
of -f switches, they will waste a few bytes with each, but I'm hardly
concerned about a few dozen kilobytes there ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>> That is why the "fs" variable in process_file is declared "static", and why
>> I wrote "some hidden awkwarness".
>>
>> I did want to avoid a malloc because then who would free the struct?
>> addScript cannot to it systematically because builtins are static. Or it
>> would have to create an on purpose struct, but I then that would be more
>> awkwarness, and malloc/free to pass arguments between functions is not
>> efficient nor very elegant.
>>
>> So the "static" option looked like the simplest & most elegant version.
>
> Surely that trick breaks if you have more than one -f switch, no?  Oh, I
> see what you're doing: you only use the command list, which is
> allocated, so it doesn't matter that the rest of the struct changes
> later.

The two fields that matter (desc and commands) are really copied into 
sql_scripts, so what stays in the is overriden if used another time.

> I'm not concerned about freeing the struct; what's the problem with it
> surviving until the program terminates?

It is not referenced anywhere so it is a memory leak.

> If somebody specifies thousands of -f switches, they will waste a few 
> bytes with each, but I'm hardly concerned about a few dozen kilobytes 
> there ...

Ok, so you prefer a memory leak. I hate it on principle.

Here is a v23 with a memory leak anyway.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
David Steele
Date:
On 3/4/16 1:53 PM, Fabien COELHO wrote:

>>> That is why the "fs" variable in process_file is declared "static",
>>> and why
>>> I wrote "some hidden awkwarness".
>>>
>>> I did want to avoid a malloc because then who would free the struct?
>>> addScript cannot to it systematically because builtins are static. Or it
>>> would have to create an on purpose struct, but I then that would be more
>>> awkwarness, and malloc/free to pass arguments between functions is not
>>> efficient nor very elegant.
>>>
>>> So the "static" option looked like the simplest & most elegant version.
>>
>> Surely that trick breaks if you have more than one -f switch, no?  Oh, I
>> see what you're doing: you only use the command list, which is
>> allocated, so it doesn't matter that the rest of the struct changes
>> later.
> 
> The two fields that matter (desc and commands) are really copied into
> sql_scripts, so what stays in the is overriden if used another time.
> 
>> I'm not concerned about freeing the struct; what's the problem with it
>> surviving until the program terminates?
> 
> It is not referenced anywhere so it is a memory leak.
> 
>> If somebody specifies thousands of -f switches, they will waste a few
>> bytes with each, but I'm hardly concerned about a few dozen kilobytes
>> there ...
> 
> Ok, so you prefer a memory leak. I hate it on principle.
> 
> Here is a v23 with a memory leak anyway.

Álvaro, it looks like you've been both reviewer and committer on this
work for some time.

The latest patch seems to address you final concern.  Can I mark it
"ready for committer"?

-- 
-David
david@pgmasters.net



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> >If somebody specifies thousands of -f switches, they will waste a few
> >bytes with each, but I'm hardly concerned about a few dozen kilobytes
> >there ...
> 
> Ok, so you prefer a memory leak. I hate it on principle.

I don't "prefer" memory leaks -- I prefer interfaces that make sense.
Speaking of which, I don't think the arrangement in your patch really
does.  I know I suggested it, but now that I look again, it turns out I
chose badly and you implemented a bad idea, so can we go back and fix
it, please?

What I now think should really happen is that the current sql_scripts
array, currently under an anonymous struct, should be a typedef, say
ParsedScript, and get a new member for the weight; process_file and
process_builtin return a ParsedScript.  The weight and Command ** should
not be part of script_t at all.  In fact, with ParsedScript I don't
think we need to give a name to the anon struct used for builtin
scripts.  Rename the current sql_scripts.name to "desc", to mirror what
is actually put in there from the builtin array struct.  Make addScript
receive a ParsedScript and weight, fill in the weight into the struct,
and put it to the array after sanity-checking.  (I'm OK with keeping
"name" instead of renaming to "desc", if that change becomes too
invasive.)

No need for N_BUILTIN; we can use lengthof(builtin_script) instead.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Alvaro,

>>> If somebody specifies thousands of -f switches, they will waste a few
>>> bytes with each, but I'm hardly concerned about a few dozen kilobytes
>>> there ...
>>
>> Ok, so you prefer a memory leak. I hate it on principle.
>
> I don't "prefer" memory leaks -- I prefer interfaces that make sense.

C is not designed to return two things, and if it is what is needed it 
looks awkward whatever is done. The static variable trick is dirty, but it 
is the minimal fuss solution, IMO. So we are only trading awkward code 
against awkward code.

> Speaking of which, I don't think the arrangement in your patch really
> does.  I know I suggested it,

Yep:-)

> but now that I look again, it turns out I chose badly and you 
> implemented a bad idea, so can we go back and fix it, please?

Yep.

I have very little time available, so I'm trying to minimize the effort. 
I've tried "argue my point with committers", but it has proven very 
ineffective. I've switched to "do whatever is asked if it still works", 
but it is not very effective either.

> What I now think should really happen is that the current sql_scripts
> array, currently under an anonymous struct, should be a typedef, say
> ParsedScript,

Why not.

> and get a new member for the weight;

Hm... it already contains "weight".

> process_file and process_builtin return a ParsedScript.  The weight and 
> Command ** should not be part of script_t at all.

Sure.

> In fact, with ParsedScript I don't think we need to give a name to the 
> anon struct used for builtin scripts.

It is useful that it has a name so that find_builtin can return it.

> Rename the current sql_scripts.name to "desc", to mirror what
> is actually put in there from the builtin array struct.  Make addScript
> receive a ParsedScript and weight, fill in the weight into the struct,
> and put it to the array after sanity-checking.  (I'm OK with keeping
> "name" instead of renaming to "desc", if that change becomes too
> invasive.)

See attached a v24 & v25.

The awkwardness in v24 is that functions allocate a struct which is freed 
afterwards, really just to return two data. Whether it is better or worst 
than a static is really a matter of taste.

Version v25 results a script which is then passed as an argument, so it 
avoid the dynamic allocation & later free. Maybe it is better. I had to 
cut short the error handling if a file does not exists, though, and it 
passes a struct by value.

Feel free to pick whichever you like most.

> No need for N_BUILTIN; we can use lengthof(builtin_script) instead.

Indeed. "lengthof" does not seem to be standard... ok, it is a macro in 
some header file. I really wanted to avoid an ugly sizeof divide hack, but 
as it is hidden elsewhere this is fine.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
I pushed your 25, with some additional minor tweaks.  I hope I didn't
break anything; please test.

Fabien COELHO wrote:

> >I don't "prefer" memory leaks -- I prefer interfaces that make sense.
> 
> C is not designed to return two things, and if it is what is needed it looks
> awkward whatever is done. The static variable trick is dirty, but it is the
> minimal fuss solution, IMO. So we are only trading awkward code against
> awkward code.

That's true.
> I have very little time available, so I'm trying to minimize the effort.
> I've tried "argue my point with committers", but it has proven very
> ineffective. I've switched to "do whatever is asked if it still works", but
> it is not very effective either.

I understand.  Sometimes arguing is better, if you can convince the
other person, but sometimes the other person disagrees with you or they
are just not listening.  I don't have any useful advice on what to do,
but frequently resigning to do a stupid thing because somebody suggested
it leads to bad decisions.

> >In fact, with ParsedScript I don't think we need to give a name to the
> >anon struct used for builtin scripts.
> 
> It is useful that it has a name so that find_builtin can return it.

So it is.  I have kept it, but I used the name BuiltinScript rather than
script_t.

> Version v25 results a script which is then passed as an argument, so it
> avoid the dynamic allocation & later free. Maybe it is better. I had to cut
> short the error handling if a file does not exists, though, and it passes a
> struct by value.

Passing structs by value should work fine, and I don't care much about
the case that a file doesn't exist.

> >No need for N_BUILTIN; we can use lengthof(builtin_script) instead.
> 
> Indeed. "lengthof" does not seem to be standard... ok, it is a macro in some
> header file. I really wanted to avoid an ugly sizeof divide hack, but as it
> is hidden elsewhere this is fine.

Right.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Álvaro,

> I pushed your 25, with some additional minor tweaks.  I hope I didn't
> break anything; please test.

I've made a few tests and all looks well. I guess the build farm will say 
if it does not like it.

Thanks,

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Jeff Janes
Date:
On Sat, Mar 19, 2016 at 8:41 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I pushed your 25, with some additional minor tweaks.  I hope I didn't
> break anything; please test.

I'm now getting compiler warnings:

gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)


pgbench.c: In function 'process_builtin':
pgbench.c:2765: warning: 'ps.stats.lag.sum2' is used uninitialized in
this function
pgbench.c:2765: warning: 'ps.stats.lag.sum' is used uninitialized in
this function
pgbench.c:2765: warning: 'ps.stats.lag.max' is used uninitialized in
this function
pgbench.c:2765: warning: 'ps.stats.lag.min' is used uninitialized in
this function
pgbench.c:2765: warning: 'ps.stats.lag.count' is used uninitialized in
this function
pgbench.c:2765: warning: 'ps.stats.latency.sum2' is used uninitialized
in this function
pgbench.c:2765: warning: 'ps.stats.latency.sum' is used uninitialized
in this function
pgbench.c:2765: warning: 'ps.stats.latency.max' is used uninitialized
in this function
pgbench.c:2765: warning: 'ps.stats.latency.min' is used uninitialized
in this function
pgbench.c:2765: warning: 'ps.stats.latency.count' is used
uninitialized in this function
pgbench.c:2765: warning: 'ps.stats.skipped' is used uninitialized in
this function
pgbench.c:2765: warning: 'ps.stats.cnt' is used uninitialized in this function
pgbench.c:2765: warning: 'ps.stats.start_time' is used uninitialized
in this function
pgbench.c:2765: warning: 'ps.weight' is used uninitialized in this function



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Jeff Janes wrote:
> On Sat, Mar 19, 2016 at 8:41 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I pushed your 25, with some additional minor tweaks.  I hope I didn't
> > break anything; please test.
>
> I'm now getting compiler warnings:
>
> gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
>
>
> pgbench.c: In function 'process_builtin':
> pgbench.c:2765: warning: 'ps.stats.lag.sum2' is used uninitialized in
> this function

Fair complaints.  I suppose the following should fix them?


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pgbench stats per script & other stuff

From
Jeff Janes
Date:
On Sat, Mar 19, 2016 at 11:34 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Jeff Janes wrote:
>> On Sat, Mar 19, 2016 at 8:41 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>> > I pushed your 25, with some additional minor tweaks.  I hope I didn't
>> > break anything; please test.
>>
>> I'm now getting compiler warnings:
>>
>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
>>
>>
>> pgbench.c: In function 'process_builtin':
>> pgbench.c:2765: warning: 'ps.stats.lag.sum2' is used uninitialized in
>> this function
>
> Fair complaints.  I suppose the following should fix them?


Yes, that fixes them.

Thanks,

Jeff



Re: pgbench stats per script & other stuff

From
Jeff Janes
Date:
On Fri, Jul 17, 2015 at 6:50 AM, Fabien <coelho@cri.ensmp.fr> wrote:
>
> This patch adds per-script statistics & other improvements to pgbench
>
> Rationale: Josh asked for the per-script stats:-)
>
> Some restructuring is done so that all stats (-l --aggregate-interval
> --progress --per-script-stats, latency & lag...) share the same structures
> and functions to accumulate data. This limits a lot the growth of pgbench
> from this patch (+17 lines).
>
> In passing, remove the distinction between internal and external scripts.
> Pgbench just execute scripts, some of them may be internal...
>
> As a side effect, all scripts can be accumulated "pgbench -B -N -S -f ..."
> would execute 4 scripts, 3 of which internal (tpc-b, simple-update,
> select-only and another externally supplied one).
>
> Also add a weight option to change the probability of choosing some scripts
> when several are available.

I was eager to use this to do some performance testing on a series of
workloads gradually transitioning from write-heavy to read-only.

So I wanted to do something like:

for f in `seq 0 5 100`; do
  pgbench -T 180 -c8 -j8 -b tpcb-like@$f -b select-only@100
done;

But, I'm not allowed to specify a weight of zero.  That means I have
to special-case the first iteration of the "for" loop where $f is
zero.  I think it would be more convenient if I was allowed to specify
a zero weight, and the script would just ignore that script.  All I
had to do to make this work is remove the check that prevents from
setting the weight to zero.  But then I would need to add in a check
that the sum of all weights is not zero, which I have done here.

We could get more complex by not adding a zero-weight script into the
array of scripts at all, rather than adding it in a way where it can
never be selected.  But then that would complicate the parsing of the
per-script stats report, when one of the scripts was no longer
reported.  I like this way better.

Would this be a welcome change?

Cheers,

Jeff

Attachment

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello Jeff,

> So I wanted to do something like:
>
> for f in `seq 0 5 100`; do
>  pgbench -T 180 -c8 -j8 -b tpcb-like@$f -b select-only@100
> done;
>
> But, I'm not allowed to specify a weight of zero.

Indeed. I did not envision such a use case, but it is quite legitimate and 
interesting! I would hope that the behavior would be a linear combination 
of the raw performance of each script, but whether it is indeed the case 
is not that sure.

> Would this be a welcome change?

Speaking for myself, I would be fine with such a change, provided:
 - that it does work:-) I'm not sure what happens by the script selection   process, it should be checked carefully
becauseit was not designed   with allowing a zero weight, and it may depend on its/their positions.   It may already
work,but it really needs checking.
 
 - I would suggest that a warning is shown when a weight is zero,   something like "warning, script #%d weight is zero,
willbe ignored".
 
 - the documentation should be updated:-)

-- 
Fabien



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
> - that it does work:-) I'm not sure what happens by the script selection
>   process, it should be checked carefully because it was not designed
>   with allowing a zero weight, and it may depend on its/their positions.
>   It may already work, but it really needs checking.

Hmmm, it seems ok.

Attached is an updated patch, which:

> - I would suggest that a warning is shown when a weight is zero,
>   something like "warning, script #%d weight is zero, will be ignored".

includes such a warning.

> - the documentation should be updated:-)

adds a line about 0 weight in the documentation.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> >- that it does work:-) I'm not sure what happens by the script selection
> >  process, it should be checked carefully because it was not designed
> >  with allowing a zero weight, and it may depend on its/their positions.
> >  It may already work, but it really needs checking.
> 
> Hmmm, it seems ok.

It's not -- if you used -i, it died saying weight is zero.

> >- I would suggest that a warning is shown when a weight is zero,
> >  something like "warning, script #%d weight is zero, will be ignored".
> 
> includes such a warning.

I didn't include this part.

Pushed.


In doing this, I noticed that the latency output is wrong if you use -T
instead of -t; it always says the latency is zero because "duration" is
zero.  I suppose it should be like in the attached instead.  At the same
time, it says "latency average: XYZ" instead of "latency average = XYZ"
as in printSimpleStats, which doesn't look terribly important.  But the
line appears in the SGML docs.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> In doing this, I noticed that the latency output is wrong if you use -T
> instead of -t; it always says the latency is zero because "duration" is
> zero.  I suppose it should be like in the attached instead.  At the same
> time, it says "latency average: XYZ" instead of "latency average = XYZ"
> as in printSimpleStats, which doesn't look terribly important.  But the
> line appears in the SGML docs.

Patch actually attached here.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
Hello,

>> In doing this, I noticed that the latency output is wrong if you use -T
>> instead of -t; it always says the latency is zero because "duration" is
>> zero.  I suppose it should be like in the attached instead.

Indeed, I clearly overlooked option -t (transactions) which I never use.

> Patch actually attached here.

Tested. There is a small issue because the \n is missing.

Here is another version which just replaces duration by time_include,
as they should be pretty close, and fixes the style so that it is the same 
whether the detailed stats are collected or not, as you pointed out.

>> At the same time, it says "latency average: XYZ" instead of "latency 
>> average = XYZ" as in printSimpleStats, which doesn't look terribly 
>> important.  But the line appears in the SGML docs.

Indeed. The documentation is manually edited when submitting changes so as 
to minimize diffs, but then it does not correspond anymore to any actual 
output, so it is easy to do it wrong.

-- 
Fabien.

Re: pgbench stats per script & other stuff

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> Hello,
> 
> >>In doing this, I noticed that the latency output is wrong if you use -T
> >>instead of -t; it always says the latency is zero because "duration" is
> >>zero.  I suppose it should be like in the attached instead.
> 
> Indeed, I clearly overlooked option -t (transactions) which I never use.

Makes sense.

> >Patch actually attached here.
> 
> Tested. There is a small issue because the \n is missing.
> 
> Here is another version which just replaces duration by time_include,
> as they should be pretty close, and fixes the style so that it is the same
> whether the detailed stats are collected or not, as you pointed out.

Thanks, that makes sense.

> >>At the same time, it says "latency average: XYZ" instead of "latency
> >>average = XYZ" as in printSimpleStats, which doesn't look terribly
> >>important.  But the line appears in the SGML docs.
> 
> Indeed. The documentation is manually edited when submitting changes so as
> to minimize diffs, but then it does not correspond anymore to any actual
> output, so it is easy to do it wrong.

Well, you fixed the "latency stddev" line to the sample output too, but
in my trial run that line was not displayed, only the latency average.
What are the command line args that supposedly produced this output?
Maybe we should add it as a SGML comment, or even display it to the
doc's reader.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench stats per script & other stuff

From
Fabien COELHO
Date:
>> Indeed. The documentation is manually edited when submitting changes so as
>> to minimize diffs, but then it does not correspond anymore to any actual
>> output, so it is easy to do it wrong.
>
> Well, you fixed the "latency stddev" line to the sample output too, but
> in my trial run that line was not displayed, only the latency average.
> What are the command line args that supposedly produced this output?
> Maybe we should add it as a SGML comment, or even display it to the
> doc's reader.

Good point.

The test above shows the stats if there was -P , -L & --rate, because 
under these conditions the necessary data was collected, so they can be 
computed. Thus the output in the documentation assumes that one of these 
was used. I nearly always use "-P 1".

Note that the documentation is not really precise, "will look similar to", 
so there is no commitment.

If you feel like removing the stddev line from the doc because it is not 
there with usual options, fine with me.

-- 
Fabien.