Re: PATCH: pgbench - merging transaction logs - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: PATCH: pgbench - merging transaction logs
Date
Msg-id alpine.DEB.2.10.1505020916130.8161@sto
Whole thread Raw
In response to Re: PATCH: pgbench - merging transaction logs  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: pgbench - merging transaction logs
List pgsql-hackers
Hello,

>> The counters are updated when the transaction is finished anyway?
>
> Yes, but the thread does not know it's time to write the results until
> it completes the first transaction after the interval ends ...
>
> Let's say the very first query in thread #1 takes a minute for some
> reason, while the other threads process 100 transactions per second. So
> before the thread #1 can report 0 transactions for the first second, the
> other threads already have results for the 60 intervals.
>
> I think there's no way to make this work except for somehow tracking
> timestamp of the last submitted results for each thread, and only
> flushing results older than the minimum of the timestamps. But that's
> not trivial - it certainly is more complicated than just writing into a
> shared file descriptor.

I agree that such an approach this would be horrible for a very limited 
value. However I was suggesting that a transaction is counted only when it 
is finished, so the aggregated data is to be understood as refering to 
"finished transactions in the interval", and what is in progress would be 
counted in the next interval anyway.

> Merging results for each transaction would not have this issue, but it
> would also use the lock much more frequently, and that seems like a
> pretty bad idea (especially for the workloads with short transactions
> that you suggested are bad match for detailed log - this would make the
> aggregated log bad too).
>
> Also notice that with all the threads will try to merge the data (and
> thus acquire the lock) at almost the same time - this is especially true
> for very short transactions. I would be surprised if this did not cause
> issues on read-only workloads with large numbers of threads.

ISTM that the aggregated version should fare better than the detailed log,
whatever is done: the key performance issue is because fprintf is slow, 
with aggregated log these are infrequent, and only arithmetic remains in a 
critical section.

>>>> (2) The feature would not be available for the thread-emulation with
>>>> this approach, but I do not see this as a particular issue as I
>>>> think that it is pretty much only dead code and a maintenance burden.
>>>
>>> I'm not willing to investigate that, nor am I willing to implement
>>> another feature that works only sometimes (I've done that in the past,
>>> and I find it a bad practice).
>
> [...]

After the small discussion I triggered, I've submitted a patch to drop 
thread fork-emulation from pgbench.

> [...]
> Also, if the lock for the shared buffer is cheaper than the lock
> required for fprintf, it may still be an improvement.

Yep. "fprintf" does a lot of processing, so it is the main issue.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Rounding to even for numeric data type
Next
From: Fabien COELHO
Date:
Subject: Re: PATCH: pgbench - logging aggregated info and transactions at the same time