Re: PATCH: pgbench - merging transaction logs - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: PATCH: pgbench - merging transaction logs
Date
Msg-id alpine.DEB.2.10.1503201327170.12124@sto
Whole thread Raw
In response to Re: PATCH: pgbench - merging transaction logs  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: PATCH: pgbench - merging transaction logs
List pgsql-hackers
Hello Robert,

>> The fprintf we are talking about occurs at most once per pgbench
>> transaction, possibly much less when aggregation is activated, and this
>> transaction involves networks exchanges and possibly disk writes on the
>> server.
>
> random() was occurring four times per transaction rather than once,
> but OTOH I think fprintf() is probably a much heavier-weight
> operation.

Yes, sure.

My point is that if there are many threads and tremendous TPS, the 
*detailed* per-transaction log (aka simple log) is probably a bad choice 
anyway, and the aggregated version is the way to go.

Note that even without mutex fprintf may be considered a "heavy function" 
which is going to slow down the transaction rate significantly. That could 
be tested as well.

It is possible to reduce the lock time by preparing the string (which 
would mean introducing buffers) and just do a "fputs" under mutex. That 
would not reduce the print time anyway, and that may add malloc/free 
operations, though.

> The way to know if there's a real problem here is to test it, but I'd be 
> pretty surprised if there isn't.

Indeed, I think I can contrive a simple example where it is, basically a
more or less empty or read only transaction (eg SELECT 1).

My opinion is that there is a tradeoff between code simplicity and later 
maintenance vs feature benefit.

If threads are assumed and fprintf is used, the feature is much simpler to 
implement, and the maintenance is lighter. The alternative implementation 
means reparsing the generated files over and over for merging their 
contents.

Also, I do not think that the detailed log provides much benefit with very 
fast transactions, where probably the aggregate is a much better choice 
anyway. If the user persists, she may generate a per-thread log and merge 
it later, in which case a merge script is needed, but I do not think that 
would be a bad thing.

Obviously, all that is only my opinion and is quite debatable.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Max Filippov
Date:
Subject: Re: configure can't detect proper pthread flags
Next
From: David Fetter
Date:
Subject: Re: POLA violation with \c service=