Re: pgbench - implement strict TPC-B benchmark - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: pgbench - implement strict TPC-B benchmark |
Date | |
Msg-id | alpine.DEB.2.21.1908052208280.26206@lancre Whole thread Raw |
In response to | Re: pgbench - implement strict TPC-B benchmark (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hello Andres, >> Which is a (somehow disappointing) * 3.3 speedup. The impact on the 3 >> complex expressions tests is not measurable, though. > > I don't know why that could be disappointing. We put in much more work > for much smaller gains in other places. Probably, but I thought I would have a better deal by eliminating most string stuff from variables. >> Questions: >> - how likely is such a patch to pass? (IMHO not likely) > > I don't see why? I didn't review the patch in any detail, but it didn't > look crazy in quick skim? Increasing how much load can be simulated > using pgbench, is something I personally find much more interesting than > adding capabilities that very few people will ever use. Yep, but my point is that the bottleneck is mostly libpq/system, as I tried to demonstrate with the few experiments I reported. > FWIW, the areas I find current pgbench "most lacking" during development > work are: > > 1) Data load speed. The data creation is bottlenecked on fprintf in a > single process. snprintf actually, could be replaced. I submitted a patch to add more control on initialization, including a server-side loading feature, i.e. the client does not send data, the server generates its own, see 'G': https://commitfest.postgresql.org/24/2086/ However on my laptop it is slower than client-side loading on a local socket. The client version is doing around 70 MB/s, the client load is 20-30%, postgres load is 85%, but I'm not sure I can hope for much more on my SSD. On my laptop the bottleneck is postgres/disk, not fprintf. > The index builds are done serially. The vacuum could be replaced by COPY > FREEZE. Well, it could be added? > For a lot of meaningful tests one needs 10-1000s of GB of testdata - > creating that is pretty painful. Yep. > 2) Lack of proper initialization integration for custom > scripts. Hmmm… You can always write a psql script for schema and possibly simplistic data initialization? However, generating meaningful pseudo-random data for an arbitrary schema is a pain. I did an external tool for that a few years ago: http://www.coelho.net/datafiller.html but it is still a pain. > I.e. have steps that are in the custom script that allow -i, vacuum, etc > to be part of the script, rather than separately executable steps. > --init-steps doesn't do anything for that. Sure. It just gives some control. > 3) pgbench overhead, although that's to a significant degree libpq's fault I'm afraid that is currently the case. > 4) Ability to cancel pgbench and get approximate results. That currently > works if the server kicks out the clients, but not when interrupting > pgbench - which is just plain weird. Obviously that doesn't matter > for "proper" benchmark runs, but often during development, it's > enough to run pgbench past some events (say the next checkpoint). Do you mean have a report anyway on "Ctrl-C"? I usually do a -P 1 to see the progress, but making Ctrl-C work should be reasonably easy. >> - what is its impact to overall performance when actual queries >> are performed (IMHO very small). > > Obviously not huge - I'd also not expect it to be unobservably small > either. Hmmm… Indeed, the 20 \set script runs at 2.6 M/s, that is 0.019 µs per \set, and any discussion over the connection is at least 15 µs (for one client on a local socket). -- Fabien.
pgsql-hackers by date: