Hello,
>> The counters are updated when the transaction is finished anyway?
>
> Yes, but the thread does not know it's time to write the results until
> it completes the first transaction after the interval ends ...
>
> Let's say the very first query in thread #1 takes a minute for some
> reason, while the other threads process 100 transactions per second. So
> before the thread #1 can report 0 transactions for the first second, the
> other threads already have results for the 60 intervals.
>
> I think there's no way to make this work except for somehow tracking
> timestamp of the last submitted results for each thread, and only
> flushing results older than the minimum of the timestamps. But that's
> not trivial - it certainly is more complicated than just writing into a
> shared file descriptor.
I agree that such an approach this would be horrible for a very limited
value. However I was suggesting that a transaction is counted only when it
is finished, so the aggregated data is to be understood as refering to
"finished transactions in the interval", and what is in progress would be
counted in the next interval anyway.
> Merging results for each transaction would not have this issue, but it
> would also use the lock much more frequently, and that seems like a
> pretty bad idea (especially for the workloads with short transactions
> that you suggested are bad match for detailed log - this would make the
> aggregated log bad too).
>
> Also notice that with all the threads will try to merge the data (and
> thus acquire the lock) at almost the same time - this is especially true
> for very short transactions. I would be surprised if this did not cause
> issues on read-only workloads with large numbers of threads.
ISTM that the aggregated version should fare better than the detailed log,
whatever is done: the key performance issue is because fprintf is slow,
with aggregated log these are infrequent, and only arithmetic remains in a
critical section.
>>>> (2) The feature would not be available for the thread-emulation with
>>>> this approach, but I do not see this as a particular issue as I
>>>> think that it is pretty much only dead code and a maintenance burden.
>>>
>>> I'm not willing to investigate that, nor am I willing to implement
>>> another feature that works only sometimes (I've done that in the past,
>>> and I find it a bad practice).
>
> [...]
After the small discussion I triggered, I've submitted a patch to drop
thread fork-emulation from pgbench.
> [...]
> Also, if the lock for the shared buffer is cheaper than the lock
> required for fprintf, it may still be an improvement.
Yep. "fprintf" does a lot of processing, so it is the main issue.
--
Fabien.