Re: pgbench \for or similar loop - Mailing list pgsql-hackers

From Greg Smith
Subject Re: pgbench \for or similar loop
Date
Msg-id 4DB11EC4.6010408@2ndquadrant.com
Whole thread Raw
In response to Re: pgbench \for or similar loop  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
Alvaro Herrera wrote:
> Why do we have pgbench at all in the first place?  Surely we could
> rewrite it in plpgsql with proper stored procedures.
>   

pgbench gives you a driver program with the following useful properties:

1) Multiple processes are spawned and each gets its own connection
2) A time/transaction limit is enforced across all of the connections at 
once
3) Timing information is written to a client-side log file
4) The work of running the clients can happen on a remote system, so 
that it's possible to just test the server-side performance
5) The program is similar enough to any other regular client, using the 
standard libpq interface, that connection-related overhead should be 
similar to a real workload.

All of those have some challenges before you could duplicate them in a 
stored procedure context.

My opinion of this feature is similar to the one Aiden already 
expressed:  there's already so many ways to do this sort of thing using 
shell-oriented approaches (as well as generate_series) that it's hard to 
get too excited about implementing it directly in pgbench.  Part of the 
reason for adding the \shell and \setshell commands way to make tricky 
things like this possible without having to touch the pgbench code 
further.  I for example would solve the problem you're facing like this:

1) Write a shell script that generates the file I need
2) Call it from pgbench using \shell, passing the size it needs.  Have 
that write a delimited file with the data required.
3) Import the whole thing with COPY.

And next thing you know you've even got the CREATE/COPY optimization as 
a possibility to avoid WAL, as well as the ability to avoid creating the 
data file more than once if the script is smart enough.

Sample data file generation can be difficult; most of the time I'd 
rather solve in a general programming language.  The fact that simple 
generation cases could be done with the mechanism you propose is true.  
However, this only really helps cases that are too complicated to 
express with generate_series, yet not so complicated that you really 
want a full programming language to generate the data.  I don't think 
there's that much middle ground in that use case.

But if this is what you think makes your life easier, I'm not going to 
tell you you're wrong.  And I don't feel that your desire for this 
features means you must tackle a more complicated thing instead--even 
though what I personally would much prefer is something making this sort 
of thing easier to do in regression tests, too.  That's a harder 
problem, though, and you're only volunteering to solve an easier one 
than that.

Stepping aside from debate over usefulness, my main code concern is that 
each time I look at the pgbench code for yet another tacked on bit, it's 
getting increasingly creakier and harder to maintain.  It's never going 
to be a good benchmark driver program capable of really complicated 
tasks.  And making it try keeps piling on the risk of breaking it for 
its intended purpose of doing simple tests.  If you can figure out how 
to keep the code contortions to implement the feature under control, 
there's some benefit there.  I can't think of a unique reason for it; 
again, lots of ways to solve this already.  But I'd probably use it if 
it were there.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: fsync reliability
Next
From: Greg Smith
Date:
Subject: Re: pgbench \for or similar loop