Re: pgbench - exclude pthread_create() from connection start timing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - exclude pthread_create() from connection start timing
Date
Msg-id alpine.DEB.2.02.1309260852540.29589@sto
Whole thread Raw
In response to Re: pgbench - exclude pthread_create() from connection start timing  (Noah Misch <noah@leadboat.com>)
Responses Re: pgbench - exclude pthread_create() from connection start timing
List pgsql-hackers
>> pgbench changes, when adding the throttling stuff. Having the start time
>> taken when the thread really starts is just sanity, and I needed that
>> just to rule out that it was the source of the "strange" measures.
>
> I don't get it; why is taking the time just after pthread_create() more sane
> than taking it just before pthread_create()?

Thread create time seems to be expensive as well, maybe up 0.1 seconds 
under some conditions (?). Under --rate, this create delay means that 
throttling is laging behind schedule by about that time, so all the first 
transactions are trying to catch up.

> typically far more expensive than pthread_create().  The patch for threaded
> pgbench made the decision to account for pthread_create() as though it were
> part of establishing the connection.  You're proposing to not account for it
> all.  Both of those designs are reasonable to me, but I do not comprehend the
> benefit you anticipate from switching from one to the other.
>
>> -j 800 vs -j 100 : ITM that if I you create more threads, the time delay
>> incurred is cumulative, so the strangeness of the result should worsen.
>
> Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
> calling pthread_create().  However, thread 0 is a special case; we set its
> start time first and actually start it last.  Your observation of cumulative
> delay fits those facts.

Yep, that must be thread 0 which has a very large delay. I think it is 
simpler that each threads record its start time when it has started, 
without exception.

>  Initializing the thread-0 start time later, just before calling its 
> threadRun(), should clear this anomaly without changing other aspects of 
> the measurement.

Always taking the thread start time when the thread is started does solve 
the issue as well, and it is homogeneous for all cases, so the solution I 
suggest seems reasonable and simple.

> While pondering this area of the code, it occurs to me -- shouldn't we 
> initialize the throttle rate trigger later, after establishing 
> connections and sending startup queries?  As it stands, we build up a 
> schedule deficit during those tasks.  Was that deliberate?

On the principle, I agree with you.

The connection creation time is another thing, but it depends on the 
options set. Under some options the connection is open and closed for 
every transaction, so there is no point in avoiding it in the measure or 
in the scheduling, and I want to avoid having to distinguish those cases. 
Morover, ISTM that one of the thread reuse the existing connection while 
other recreate is. So I left it "as is".

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Support for REINDEX CONCURRENTLY
Next
From: Bruce Momjian
Date:
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE