Re: pgbench - implement strict TPC-B benchmark - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - implement strict TPC-B benchmark
Date
Msg-id alpine.DEB.2.21.1908052208280.26206@lancre
Whole thread Raw
In response to Re: pgbench - implement strict TPC-B benchmark  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

>> Which is a (somehow disappointing) * 3.3 speedup. The impact on the 3
>> complex expressions tests is not measurable, though.
>
> I don't know why that could be disappointing. We put in much more work
> for much smaller gains in other places.

Probably, but I thought I would have a better deal by eliminating most 
string stuff from variables.

>> Questions:
>>  - how likely is such a patch to pass? (IMHO not likely)
>
> I don't see why? I didn't review the patch in any detail, but it didn't
> look crazy in quick skim? Increasing how much load can be simulated
> using pgbench, is something I personally find much more interesting than
> adding capabilities that very few people will ever use.

Yep, but my point is that the bottleneck is mostly libpq/system, as I 
tried to demonstrate with the few experiments I reported.

> FWIW, the areas I find current pgbench "most lacking" during development
> work are:
>
> 1) Data load speed. The data creation is bottlenecked on fprintf in a
>   single process.

snprintf actually, could be replaced.

I submitted a patch to add more control on initialization, including a 
server-side loading feature, i.e. the client does not send data, the 
server generates its own, see 'G':

     https://commitfest.postgresql.org/24/2086/

However on my laptop it is slower than client-side loading on a local 
socket. The client version is doing around 70 MB/s, the client load is 
20-30%, postgres load is 85%, but I'm not sure I can hope for much more on 
my SSD. On my laptop the bottleneck is postgres/disk, not fprintf.

> The index builds are done serially. The vacuum could be replaced by COPY 
> FREEZE.

Well, it could be added?

> For a lot of meaningful tests one needs 10-1000s of GB of testdata - 
> creating that is pretty painful.

Yep.

> 2) Lack of proper initialization integration for custom
>   scripts.

Hmmm…

You can always write a psql script for schema and possibly simplistic data 
initialization?

However, generating meaningful pseudo-random data for an arbitrary schema 
is a pain. I did an external tool for that a few years ago:

     http://www.coelho.net/datafiller.html

but it is still a pain.

> I.e. have steps that are in the custom script that allow -i, vacuum, etc 
> to be part of the script, rather than separately executable steps. 
> --init-steps doesn't do anything for that.

Sure. It just gives some control.

> 3) pgbench overhead, although that's to a significant degree libpq's fault

I'm afraid that is currently the case.

> 4) Ability to cancel pgbench and get approximate results. That currently
>   works if the server kicks out the clients, but not when interrupting
>   pgbench - which is just plain weird.  Obviously that doesn't matter
>   for "proper" benchmark runs, but often during development, it's
>   enough to run pgbench past some events (say the next checkpoint).

Do you mean have a report anyway on "Ctrl-C"?

I usually do a -P 1 to see the progress, but making Ctrl-C work should be 
reasonably easy.

>>  - what is its impact to overall performance when actual queries
>>    are performed (IMHO very small).
>
> Obviously not huge - I'd also not expect it to be unobservably small
> either.

Hmmm… Indeed, the 20 \set script runs at 2.6 M/s, that is 0.019 µs per 
\set, and any discussion over the connection is at least 15 µs (for one 
client on a local socket).

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan
Next
From: Daniel Migowski
Date:
Subject: Re: Adding column "mem_usage" to view pg_prepared_statements