Thread: pgbench --startup option

pgbench --startup option

From
Jeff Janes
Date:
While looking at some proposed patches and pondering some questions on performance, I realized I desperately needed ways to run benchmarks with different settings without needing to edit postgresql.conf and restart/reload the server each time.

Most commonly, I want to run with synchronous_commit on or off at whim.  I thought of writing a patch specifically for that, but decided a more general approach would be better.  The attached patch allows any arbitrary command to be executed upon the start up of each connection intended for benchmarking (as opposed to utility connections, intended for either -i or for counting the rows in in pgbench_branches and for vacuuming), and so covers the need for changing any session-changeable GUCs, plus doing other things as well.

I created doBenchMarkConnect() to segregate bench-marking connections from utility connections.  At first I thought of adding the startup code to only the normal path and leaving support for -C in the wind, but decided that was just lazy.

Will add to commitfest-next.

Cheers,

Jeff
Attachment

Re: pgbench --startup option

From
Peter Geoghegan
Date:
On 10 February 2013 23:27, Jeff Janes <jeff.janes@gmail.com> wrote:
> While looking at some proposed patches and pondering some questions on
> performance, I realized I desperately needed ways to run benchmarks with
> different settings without needing to edit postgresql.conf and
> restart/reload the server each time.

I'd thought about hacking this capability into pgbench-tools, so that
there was a new outer-most loop that iterates through different
postgresql.conf files. Come to think of it, the need to vary settings
per test set (that is, per series of benchmarks, each of which is
plotted as a different color) generally only exists for one or two
settings, so it is probably better to pursue the approach that you
propose here.

I guess what I've outlined could still be useful for PGC_POSTMASTER
gucs, like shared_buffers.

-- 
Regards,
Peter Geoghegan



Re: pgbench --startup option

From
Jeff Janes
Date:
On Sun, Feb 10, 2013 at 3:27 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

...
 
Will add to commitfest-next.

Cheers,

Jeff

I realized the original patch was broken for multithreaded use due to an errant "static".

Also, I've cleaned up some carelessness in where I did the copy-and-paste for parameter parsing.
Attachment

Re: pgbench --startup option

From
Fabien COELHO
Date:
I've just done a quick review of the source, as I've been hacking in 
pgbench myself.

I think that the feature makes sense.

About the details of the patch:

(1) Some changes in the patch are unrelated to the purpose of the patch 
(e.g. spacing changes, error message...), and should be removed?

(2) Instead adding a new function, I would suggest to modify the existing 
one with an added argument, which would be ignored when NULL is passed.

(3) I'm not sure of the behavior of the feature. What if two statements 
are required, should it not be able to handle multiple --startup 
specifications, say with an array instead of a scalar?

(4) C style: there is no need to put a ";" after "}".

(5) In the documentation, other options do not put a "=" sign between the 
option and its argument, although it is also accepted.

Have a nice day,

-- 
Fabien.



Re: pgbench --startup option

From
Craig Ringer
Date:
<div class="moz-cite-prefix">On 02/11/2013 07:27 AM, Jeff Janes wrote:<br /></div><blockquote
cite="mid:CAMkU=1xV3tYKoHD8U2mQzfC5Kbn_bdcVf8br-EnUvy-6Z=B47w@mail.gmail.com"type="cite"><pre wrap="">I created
doBenchMarkConnect()to segregate bench-marking connections from
 
utility connections.  At first I thought of adding the startup code to only
the normal path and leaving support for -C in the wind, but decided that
was just lazy.
</pre></blockquote><br /> That sounds very useful and would've eased some recent pgbench work I've been doing too.<br
/><br/> I've put some patches together to make pgbench capable of talking to multiple servers. I needed it for
benchmarkingwork on bidirectional replication, but it's also useful if you want to benchmark a group of hot standbys in
read-onlymode, and it may be useful with various 3rd pty replication solutions. As written it uses one or more threads
perserver, with all clients managed by a given thread using the same server. Multiple servers are specified by using
connstringstyle syntax, eg:<br /><br />      pgbench -T 600 -j 4 -c 64 "host=server1 user=postgres" "host=server2
user=postgresport=5433"<br /><br /> It isn't ready for a commitfest yet as I need to tidy up a few things and I still
haven'tadded an extra set of timings to measure how long the DB takes to return to a steady state after the pgbench
run,but once that's done I'll send it in. The after-run timings are intended for things like measuring how much lag an
asynchronousreplica has built up and how long it takes to catch up after the write flood stops, or how long a
CHECKPOINTtakes after the pgbench run.<br /><br /> I also have a patch that adds a flag to force a CHECKPOINT after
vacuumand before running its tests. This makes pgbench results _vastly_ more stable over short runs.<br /><br /> The
workis currently lurking in the 'multibench' branch of git://github.com/ringerc/postgres.git ; see <a
href="https://github.com/ringerc/postgres/tree/multibench">https://github.com/ringerc/postgres/tree/multibench</a>.Only
pgbench.cis actually changed. Comments appreciated.<br /><pre class="moz-signature" cols="72">-- Craig Ringer
       <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQL
Development,24x7 Support, Training & Services</pre> 

Re: pgbench --startup option

From
Jeff Janes
Date:
On Thu, May 2, 2013 at 11:25 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I've just done a quick review of the source, as I've been hacking in pgbench myself.

I think that the feature makes sense.

About the details of the patch:

(1) Some changes in the patch are unrelated to the purpose of the patch (e.g. spacing changes, error message...), and should be removed?


I did fix the indenting of the -c and -t options in the usage message, which was inconsistent before and was bugging me.  (The bad indenting was just in the source, did not makes its way to the usage output.)  Perhaps I shouldn't have done that in this patch, at least not without mentioning it.  

I made the error message more verbose, in case someone passes invalid syntax to the startup command, the original message was rather confusing I thought.  I'm not sure if the error handling I do is suitable or not, it is rather minimal, simply piggy backing on what was already there.  Do you have any thoughts on that?

 

(2) Instead adding a new function, I would suggest to modify the existing one with an added argument, which would be ignored when NULL is passed.

I considered passing a boolean to doConnect, but I often find that hard to follow because:

doConnect(false);

doesn't tell you what is false without having to go hunt down the comments for function definition (maybe using a IDE rather than vi would help me out there).  There are plenty of places in the existing code which do exactly that, but those places usually have the action depending on the boolean take place in the middle of the function.  Here the dependent action takes place at the end, so it is easy to have two functions without code duplication because one function just calls the other and then takes one more action.  So, I don't know; mostly a matter of taste.  

But your suggestion would be to pass in either char *startup or NULL to doConnect?  That is at least self-explanatory in the case where what you pass in is startup rather than NULL.  It is a bit weird to pass a file-scoped global variable into a function which is in the same file, but I don't see a clean way of reducing the scope without introducing a lot of noise.
 

(3) I'm not sure of the behavior of the feature. What if two statements are required, should it not be able to handle multiple --startup specifications, say with an array instead of a scalar?

I've just been putting a semicolon in the string to separate the statements (see example at end of post).  I could use a linked list of strings, but that is pretty elaborate for something so simple.  Are the core LL routines easily available to contrib?
 

(4) C style: there is no need to put a ";" after "}".

Thanks, old habits die hard.
 
(5) In the documentation, other options do not put a "=" sign between the option and its argument, although it is also accepted.

The convention seems to be to use a space for short options and an equal sign for long options in the documentation.  

Thanks for doing the review.  I'm not sure what things to go change without further feedback/discussion, except point 4.  I'll wait a day to see if I get more feedback on the other issues and submit a new patch.


By the way, I have two more examples of using this beyond the syncronous_commit one:

To probe the overhead of beginning a transaction.

pgbench -S -T 300 -c4 -j4 --startup='BEGIN'
pgbench -S -T 300 -c4 -j4 

At a scale of 10 (all in shaerd_buffers), tps excluding connections are respectively (median of 10 alternating runs):
38623.26
36090.69

So doing it all within one transaction gives a 7% improvement.

One the other hand, preemptively locking the table gives no detectable further improvement, which doesn't surprise me because every select runs in a different resource owner, so it still needs to obtain the lock for itself despite a higher resource owner already owning it (median of many hundreds alternating runs):

pgbench -S -T 30 -c4 -j4 --startup='BEGIN;LOCK TABLE pgbench_accounts in access share mode' 
pgbench -S -T 30 -c4 -j4 --startup='BEGIN;'

38645.25
38681.52


 Cheers,

Jeff

Re: pgbench --startup option

From
Jeff Janes
Date:
On Sun, Jun 16, 2013 at 9:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, May 2, 2013 at 11:25 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:


 
Thanks for doing the review.  I'm not sure what things to go change without further feedback/discussion, except point 4.  I'll wait a day to see if I get more feedback on the other issues and submit a new patch.


I've fixed a conflict, and I've removed extraneous semicolons from the C.

I've left in the fixing of some existing bad indenting in the existing code, which is not strictly related to my change.

I hope my defenses of the other points were persuasive.

Cheers,

Jeff

Attachment

Re: pgbench --startup option

From
Fabien COELHO
Date:
> I've fixed a conflict, and I've removed extraneous semicolons from the C.
>
> I've left in the fixing of some existing bad indenting in the existing
> code, which is not strictly related to my change.

There are still unrelated changes : spacing on -c and -t options' help. 
The "pgindent" command is passed on the sources from time to time, so 
there should be no reason to change this in this commit.

The updated string for PQerrorMessage does not bring much, and the message 
does not seem an improvement. "Command failed with ERROR", indeed.
  Command failed with ERROR:  syntax error at or near ";"  LINE 1: set synchronous_commit=on;set synchronous_;

The preceding result seems bother simpler and fine:
  ERROR:  syntax error at or near ";"  LINE 1: set synchronous_commit=on;set synchronous_;

Otherwise I've tested the patch with one "set", two "set"s and a syntax 
error, and it worked as expected.

-- 
Fabien.



Re: pgbench --startup option

From
Robert Haas
Date:
On Thu, Jun 20, 2013 at 1:46 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I've fixed a conflict, and I've removed extraneous semicolons from the C.
>
> I've left in the fixing of some existing bad indenting in the existing code,
> which is not strictly related to my change.

OK, I like this idea a lot, but I have a question.  Right now, to use
this, you have to supply the startup SQL on the command line.  And
that could definitely be useful.  But ISTM that you might also want to
take the startup SQL from a file, and indeed you might well want to
include metacommands in there.  Maybe that's getting greedy, but the
rate at which people are adding features to pgbench suggests to me
that it won't be long before this isn't enough.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench --startup option

From
Fabien COELHO
Date:
> OK, I like this idea a lot, but I have a question.  Right now, to use
> this, you have to supply the startup SQL on the command line.  And
> that could definitely be useful.  But ISTM that you might also want to
> take the startup SQL from a file, and indeed you might well want to
> include metacommands in there.  Maybe that's getting greedy, but the
> rate at which people are adding features to pgbench suggests to me
> that it won't be long before this isn't enough.
>
> Thoughts?

My 0.02€:

I thought about this point while reviewing the patch, but I did not add it 
to the review because ISTM that it would require a lot of changes, and 
pgbench is already kind of a mess, IMHO. Also, possibly you would like to 
use a script under different assumptions to test them, that would require 
the startup string to be out of the script so as to change it easily. So 
it would not be that useful.

-- 
Fabien.

Re: pgbench --startup option

From
Jeff Janes
Date:
On Tuesday, June 25, 2013, Robert Haas wrote:
On Thu, Jun 20, 2013 at 1:46 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I've fixed a conflict, and I've removed extraneous semicolons from the C.
>
> I've left in the fixing of some existing bad indenting in the existing code,
> which is not strictly related to my change.

OK, I like this idea a lot, but I have a question.  Right now, to use
this, you have to supply the startup SQL on the command line.  And
that could definitely be useful.  But ISTM that you might also want to
take the startup SQL from a file, and indeed you might well want to
include metacommands in there.

I had not previously considered making the argument to --startup be the name of a file rather than the command itself.  

I had at first wanted to make a new metacommand named \startup which could be added into regular "-f" files, but realized that that would make it harder to work with the default supplied transactions, and would be substantially harder to implement.  (And it is somewhat unclear what to do when there are multiple -f given--take the UNION ALL of them in command-line order, I guess)

 
 Maybe that's getting greedy, but the
rate at which people are adding features to pgbench suggests to me
that it won't be long before this isn't enough.

Thoughts?

On a related note, I have also occasionally wished for a variant of -f which would read the argument as the contents rather than as the name of a file supplying the contents.  That way a shell script to run a custom transaction could be self-contained (both -c and -j etc as well the contents of -f being in a single file, and so nothing to get out of sync or misplaced).

So thinking about this as a more general problem now, I see that I can use a bash trick to get what I want, if given what you want (example written in terms of -f not --startup, as -f already exists in the needed form)

pgbench  -f <(echo -e 'select count(*) from pgbench_accounts')

It is a little less clean, but workable.

So I now think --startup should take a file, as that is more consistent with what other pgbench options take, and still allows me to do what I want fairly easily.


I'll look into re-writing it to work in this way.  But probably not during this commitfest.

Thanks,

Jeff