Re: pgbench \for or similar loop - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: pgbench \for or similar loop |
Date | |
Msg-id | 4DB11EC4.6010408@2ndquadrant.com Whole thread Raw |
In response to | Re: pgbench \for or similar loop (Alvaro Herrera <alvherre@commandprompt.com>) |
List | pgsql-hackers |
Alvaro Herrera wrote: > Why do we have pgbench at all in the first place? Surely we could > rewrite it in plpgsql with proper stored procedures. > pgbench gives you a driver program with the following useful properties: 1) Multiple processes are spawned and each gets its own connection 2) A time/transaction limit is enforced across all of the connections at once 3) Timing information is written to a client-side log file 4) The work of running the clients can happen on a remote system, so that it's possible to just test the server-side performance 5) The program is similar enough to any other regular client, using the standard libpq interface, that connection-related overhead should be similar to a real workload. All of those have some challenges before you could duplicate them in a stored procedure context. My opinion of this feature is similar to the one Aiden already expressed: there's already so many ways to do this sort of thing using shell-oriented approaches (as well as generate_series) that it's hard to get too excited about implementing it directly in pgbench. Part of the reason for adding the \shell and \setshell commands way to make tricky things like this possible without having to touch the pgbench code further. I for example would solve the problem you're facing like this: 1) Write a shell script that generates the file I need 2) Call it from pgbench using \shell, passing the size it needs. Have that write a delimited file with the data required. 3) Import the whole thing with COPY. And next thing you know you've even got the CREATE/COPY optimization as a possibility to avoid WAL, as well as the ability to avoid creating the data file more than once if the script is smart enough. Sample data file generation can be difficult; most of the time I'd rather solve in a general programming language. The fact that simple generation cases could be done with the mechanism you propose is true. However, this only really helps cases that are too complicated to express with generate_series, yet not so complicated that you really want a full programming language to generate the data. I don't think there's that much middle ground in that use case. But if this is what you think makes your life easier, I'm not going to tell you you're wrong. And I don't feel that your desire for this features means you must tackle a more complicated thing instead--even though what I personally would much prefer is something making this sort of thing easier to do in regression tests, too. That's a harder problem, though, and you're only volunteering to solve an easier one than that. Stepping aside from debate over usefulness, my main code concern is that each time I look at the pgbench code for yet another tacked on bit, it's getting increasingly creakier and harder to maintain. It's never going to be a good benchmark driver program capable of really complicated tasks. And making it try keeps piling on the risk of breaking it for its intended purpose of doing simple tests. If you can figure out how to keep the code contortions to implement the feature under control, there's some benefit there. I can't think of a unique reason for it; again, lots of ways to solve this already. But I'd probably use it if it were there. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
pgsql-hackers by date: