Re: [PATCH] pgbench log file headers - Mailing list pgsql-hackers

From Andrew Atkinson
Subject Re: [PATCH] pgbench log file headers
Date
Msg-id CAG6XLEmAc4b5ZNU3P44RTd_4TMBH=8txF0=dU0eRRJ0Cg0YkRQ@mail.gmail.com
Whole thread Raw
In response to [PATCH] pgbench log file headers  (Adam Hendel <adam@tembo.io>)
List pgsql-hackers
Hi Adam. Column headers in pgbench log files seem helpful. Besides programs, it seems helpful for humans to understand the column data as well. I was able to apply your patch and verify that the headers are added to the log file:

andy@MacBook-Air-4 ~/P/postgres (master)> rm pgbench_log.*

andy@MacBook-Air-4 ~/P/postgres (master)> src/bin/pgbench/pgbench postgres://andy:@localhost:5432/postgres --log --log-header

pgbench (17devel)

....


andy@MacBook-Air-4 ~/P/postgres (master)> cat pgbench_log.*

client_id transaction_no time script_no time_epoch time_us

0 1 8435 0 1699902315 902700

0 2 1130 0 1699902315 903973

...



The generated pgbench_log.62387 log file showed headers "client_id transaction_no time script_no time_epoch time_us". Hope that helps with your patch acceptance journey.


Good luck!


Andrew Atkinson


On Mon, Nov 13, 2023 at 11:55 AM Adam Hendel <adam@tembo.io> wrote:

Hello Hackers!

Currently, pgbench will log individual transactions to a logfile when the `--log` parameter flag is provided. The logfile, however, does not include column header. It has become a fairly standard expectation of users to have column headers present in flat files. Without the header in the pgbench log files, new users must navigate to the docs and piece together the column headers themselves. Most industry leading frameworks have tooling built in to read column headers though, for example python/pandas read_csv().

We can improve the experience for users by adding column headers to pgbench logfiles with an optional command line flag, `--log-header`. This will keep the API backwards compatible by making users opt-in to the column headers. It follows the existing pattern of having conditional flags in pgbench’s API; the `--log` option would have both –log-prefix and –log-header if this work is accepted.

The implementation considers the column headers only when the `--log-header` flag is present. The values for the columns are taken directly from the “Per-Transaction Logging” section in https://www.postgresql.org/docs/current/pgbench.html and takes into account the conditional columns `schedule_lag` and `retries`.


Below is an example of what that logfile will look like:


pgbench  postgres://postgres:postgres@localhost:5432/postgres --log --log-header

client_id transaction_no time script_no time_epoch time_us

0 1 1863 0 1699555588 791102

0 2 706 0 1699555588 791812


If the interface and overall approach makes sense, I will work on adding documentation and tests for this too.

Respectfully,

Adam Hendel


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_walfile_name_offset can return inconsistent values
Next
From: Robert Haas
Date:
Subject: Re: trying again to get incremental backup