Re: pgbench --startup option - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: pgbench --startup option
Date
Msg-id CAMkU=1zrsQUuB=2nGc1N_k7TW1ROscxzVmS4m1v7Nr1VqPaxnQ@mail.gmail.com
Whole thread Raw
In response to Re: pgbench --startup option  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench --startup option
List pgsql-hackers
On Thu, May 2, 2013 at 11:25 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I've just done a quick review of the source, as I've been hacking in pgbench myself.

I think that the feature makes sense.

About the details of the patch:

(1) Some changes in the patch are unrelated to the purpose of the patch (e.g. spacing changes, error message...), and should be removed?


I did fix the indenting of the -c and -t options in the usage message, which was inconsistent before and was bugging me.  (The bad indenting was just in the source, did not makes its way to the usage output.)  Perhaps I shouldn't have done that in this patch, at least not without mentioning it.  

I made the error message more verbose, in case someone passes invalid syntax to the startup command, the original message was rather confusing I thought.  I'm not sure if the error handling I do is suitable or not, it is rather minimal, simply piggy backing on what was already there.  Do you have any thoughts on that?

 

(2) Instead adding a new function, I would suggest to modify the existing one with an added argument, which would be ignored when NULL is passed.

I considered passing a boolean to doConnect, but I often find that hard to follow because:

doConnect(false);

doesn't tell you what is false without having to go hunt down the comments for function definition (maybe using a IDE rather than vi would help me out there).  There are plenty of places in the existing code which do exactly that, but those places usually have the action depending on the boolean take place in the middle of the function.  Here the dependent action takes place at the end, so it is easy to have two functions without code duplication because one function just calls the other and then takes one more action.  So, I don't know; mostly a matter of taste.  

But your suggestion would be to pass in either char *startup or NULL to doConnect?  That is at least self-explanatory in the case where what you pass in is startup rather than NULL.  It is a bit weird to pass a file-scoped global variable into a function which is in the same file, but I don't see a clean way of reducing the scope without introducing a lot of noise.
 

(3) I'm not sure of the behavior of the feature. What if two statements are required, should it not be able to handle multiple --startup specifications, say with an array instead of a scalar?

I've just been putting a semicolon in the string to separate the statements (see example at end of post).  I could use a linked list of strings, but that is pretty elaborate for something so simple.  Are the core LL routines easily available to contrib?
 

(4) C style: there is no need to put a ";" after "}".

Thanks, old habits die hard.
 
(5) In the documentation, other options do not put a "=" sign between the option and its argument, although it is also accepted.

The convention seems to be to use a space for short options and an equal sign for long options in the documentation.  

Thanks for doing the review.  I'm not sure what things to go change without further feedback/discussion, except point 4.  I'll wait a day to see if I get more feedback on the other issues and submit a new patch.


By the way, I have two more examples of using this beyond the syncronous_commit one:

To probe the overhead of beginning a transaction.

pgbench -S -T 300 -c4 -j4 --startup='BEGIN'
pgbench -S -T 300 -c4 -j4 

At a scale of 10 (all in shaerd_buffers), tps excluding connections are respectively (median of 10 alternating runs):
38623.26
36090.69

So doing it all within one transaction gives a 7% improvement.

One the other hand, preemptively locking the table gives no detectable further improvement, which doesn't surprise me because every select runs in a different resource owner, so it still needs to obtain the lock for itself despite a higher resource owner already owning it (median of many hundreds alternating runs):

pgbench -S -T 30 -c4 -j4 --startup='BEGIN;LOCK TABLE pgbench_accounts in access share mode' 
pgbench -S -T 30 -c4 -j4 --startup='BEGIN;'

38645.25
38681.52


 Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Robins Tharakan
Date:
Subject: Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements
Next
From: Jon Nelson
Date:
Subject: Re: fallocate / posix_fallocate for new WAL file creation (etc...)