Thread: few ideas for pgbench
Hi
I am doing some test and I have few ideas:
1. print server version to the output of pgbench. Now only client side version is printed
2. can ve generate some output in structured format - XML, JSON ?
Regards
Pavel
Hello Pavel, > 1. print server version to the output of pgbench. Now only client side > version is printed It is easy enough and makes sense. Maybe only if it differs from the client side version ? > 2. can ve generate some output in structured format - XML, JSON ? It is obviously possible, but that would mean some code. ISTM that the various outputs are easy enough to parse and convert to anything without needing a special format? Is there some particular part you have in mind? -- Fabien.
út 4. 5. 2021 v 20:59 odesílatel Fabien COELHO <coelho@cri.ensmp.fr> napsal:
Hello Pavel,
> 1. print server version to the output of pgbench. Now only client side
> version is printed
It is easy enough and makes sense. Maybe only if it differs from the
client side version ?
This is not a simple question. Personally I prefer to show this info every time, although it can be redundant. Just for check and for more simple automatic processing.
When I run pgbench, I usually work with more releases together, so the server version is important info.
> 2. can ve generate some output in structured format - XML, JSON ?
It is obviously possible, but that would mean some code. ISTM that the
various outputs are easy enough to parse and convert to anything without
needing a special format? Is there some particular part you have in mind?
I thought about something what I can simply import to Postgres or to R. But maybe XML or JSON is a bad idea.
What about CSV? Any run can produce one row.
--
Fabien.
Hello Pavel, > This is not a simple question. Personally I prefer to show this info every > time, although it can be redundant. Just for check and for more simple > automatic processing. > > When I run pgbench, I usually work with more releases together, so the > server version is important info. Ok. Yes. >>> 2. can ve generate some output in structured format - XML, JSON ? >> >> It is obviously possible, but that would mean some code. ISTM that the >> various outputs are easy enough to parse and convert to anything without >> needing a special format? Is there some particular part you have in mind? >> > > I thought about something what I can simply import to Postgres or to R. > But maybe XML or JSON is a bad idea. > > What about CSV? Any run can produce one row. Yep, CSV is simple and nice. It depends on what information you would like. For instance for progress report (-P 1) or logs/sampling (-l) would be relevant candidates for CSV. Not so much for the final report, though. -- Fabien.
st 5. 5. 2021 v 11:55 odesílatel Fabien COELHO <coelho@cri.ensmp.fr> napsal:
Hello Pavel,
> This is not a simple question. Personally I prefer to show this info every
> time, although it can be redundant. Just for check and for more simple
> automatic processing.
>
> When I run pgbench, I usually work with more releases together, so the
> server version is important info.
Ok. Yes.
>>> 2. can ve generate some output in structured format - XML, JSON ?
>>
>> It is obviously possible, but that would mean some code. ISTM that the
>> various outputs are easy enough to parse and convert to anything without
>> needing a special format? Is there some particular part you have in mind?
>>
>
> I thought about something what I can simply import to Postgres or to R.
> But maybe XML or JSON is a bad idea.
>
> What about CSV? Any run can produce one row.
Yep, CSV is simple and nice. It depends on what information you would
like. For instance for progress report (-P 1) or logs/sampling (-l) would
be relevant candidates for CSV. Not so much for the final report, though.
I think so there can be almost all information. We have to ensure consistency of columns.
The basic usage can be
for ....
do
pg_bench ... >> logfile
done
and log file can looks like
start time, rowno, serverver, clientver, connections, scale, readonly, jobs, tps, latency, ...
The header row can be optional
--
Fabien.
pá 7. 5. 2021 v 9:46 odesílatel Fabien COELHO <fabien.coelho@mines-paristech.fr> napsal:
Hello,
>>> When I run pgbench, I usually work with more releases together, so the
>>> server version is important info.
>>
>> Ok. Yes.
Here is a putative patch for this simple part.
+1
>>> What about CSV? Any run can produce one row.
>>
>> Yep, CSV is simple and nice. It depends on what information you would
>> like. For instance for progress report (-P 1) or logs/sampling (-l) would
>> be relevant candidates for CSV. Not so much for the final report, though.
>
> I think so there can be almost all information. We have to ensure
> consistency of columns.
>
> The basic usage can be
>
> for ....
> do
> pg_bench ... >> logfile
> done
>
> and log file can looks like
>
> start time, rowno, serverver, clientver, connections, scale, readonly,
> jobs, tps, latency, ...
>
> The header row can be optional
Hmmm. It is less clear how to do that with minimal code impact on the
code, as some options which change the report structure, eg when using
multiple scripts (-b/-f) or having detailed per-op informations (-r), as
show below:
sh> pgbench -P 1 -T 10 -M prepared -c 2 -b se@9 -b si -r
starting vacuum...end.
progress: 1.0 s, 10666.9 tps, lat 0.186 ms stddev 0.454
progress: 2.0 s, 9928.0 tps, lat 0.201 ms stddev 0.466
progress: 3.0 s, 10314.8 tps, lat 0.193 ms stddev 0.469
progress: 4.0 s, 10042.7 tps, lat 0.198 ms stddev 0.466
progress: 5.0 s, 11084.3 tps, lat 0.180 ms stddev 0.408
progress: 6.0 s, 9804.1 tps, lat 0.203 ms stddev 0.474
progress: 7.0 s, 10271.5 tps, lat 0.194 ms stddev 0.463
progress: 8.0 s, 10511.5 tps, lat 0.190 ms stddev 0.424
progress: 9.0 s, 10005.7 tps, lat 0.199 ms stddev 0.501
progress: 10.0 s, 10512.4 tps, lat 0.190 ms stddev 0.428
pgbench (PostgreSQL) 14.0
server version: 13.2
transaction type: multiple scripts
scaling factor: 1
query mode: prepared
number of clients: 2
number of threads: 1
duration: 10 s
number of transactions actually processed: 103144
latency average = 0.193 ms
latency stddev = 0.455 ms
initial connection time = 5.043 ms
tps = 10319.361549 (without initial connection time)
SQL script 1: <builtin: select only>
- weight: 9 (targets 90.0% of total)
- 92654 transactions (89.8% of total, tps = 9269.856947)
- latency average = 0.052 ms
- latency stddev = 0.018 ms
- statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
0.052 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
SQL script 2: <builtin: simple update>
- weight: 1 (targets 10.0% of total)
- 10490 transactions (10.2% of total, tps = 1049.504602)
- latency average = 1.436 ms
- latency stddev = 0.562 ms
- statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.027 BEGIN;
0.065 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.045 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.048 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
1.249 END
The nature of columns would change depending on options, eg "initial
connection time" does not make sense under -C, so that under a loop around
pgbench scenario rows would not necessarily be consistent…
Also, I'm not sure whether such a report can/should include all inputs
options.
Finally it is unclear how to add such a feature with minimal impact on the
source code.
It is a question if this is possible without more changes or without compatibility break :( Probably not. All output should be centralized.
What I usually do is to put each pgbench run output in a separate file and
write a small shell/perl/python script to process these, possibly
generating CSV on the way.
The goal of my proposal was a reduction of necessity to write auxiliary scripts. The produced document should not be "nice", but should be very easy to import it to some analytical tools.
There is an analogy with Postgres's CSV logs. It is the same. We can see the result of pgbench like some log.
Pavel
--
Fabien.
>> Finally it is unclear how to add such a feature with minimal impact on the >> source code. > > > It is a question if this is possible without more changes or without > compatibility break :( Probably not. All output should be centralized. Yes and no. For some things we could have "void report_sometype(file, name, data)" which append "data," under csv but "name = data\n" under text, but this does not work for nested data (eg -r -b/-f), which would rather require some json/yaml/whatever format which can embed a hierarchy. >> What I usually do is to put each pgbench run output in a separate file and >> write a small shell/perl/python script to process these, possibly >> generating CSV on the way. > > The goal of my proposal was a reduction of necessity to write auxiliary > scripts. The produced document should not be "nice", but should be very > easy to import it to some analytical tools. Yes, I understood that. I tend to use CSV for that, import results in pg or sqlite and analyse with SQL. > There is an analogy with Postgres's CSV logs. It is the same. We can see > the result of pgbench like some log. Sure, but this works for simple flat data, not changing structures. -- Fabien.
pá 7. 5. 2021 v 11:28 odesílatel Fabien COELHO <coelho@cri.ensmp.fr> napsal:
>> Finally it is unclear how to add such a feature with minimal impact on the
>> source code.
>
>
> It is a question if this is possible without more changes or without
> compatibility break :( Probably not. All output should be centralized.
Yes and no.
For some things we could have "void report_sometype(file, name, data)"
which append "data," under csv but "name = data\n" under text, but this
does not work for nested data (eg -r -b/-f), which would rather require
some json/yaml/whatever format which can embed a hierarchy.
It can work with nested data too, but the result should be denormalized.
>> What I usually do is to put each pgbench run output in a separate file and
>> write a small shell/perl/python script to process these, possibly
>> generating CSV on the way.
>
> The goal of my proposal was a reduction of necessity to write auxiliary
> scripts. The produced document should not be "nice", but should be very
> easy to import it to some analytical tools.
Yes, I understood that. I tend to use CSV for that, import results in pg
or sqlite and analyse with SQL.
> There is an analogy with Postgres's CSV logs. It is the same. We can see
> the result of pgbench like some log.
Sure, but this works for simple flat data, not changing structures.
Denormalized tables are common. Although it can be ugly, it should work.
--
Fabien.