Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers

From Greg Smith
Subject Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date
Msg-id 51CF6999.4060103@2ndQuadrant.com
Whole thread Raw
In response to Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On 6/22/13 12:54 PM, Fabien COELHO wrote:
> After some poking around, and pursuing various red herrings, I resorted
> to measure the delay for calling "PQfinish()", which is really the only
> special thing going around at the end of pgbench run...

This wasn't what I was seeing, but it's related.  I've proved to myself 
the throttle change isn't reponsible for the weird stuff I'm seeing now.  I'd like to rearrange when PQfinish happens
nowbased on what I'm 
 
seeing, but that's not related to this review.

I duplicated the PQfinish problem you found too.  On my Linux system, 
calls to PQfinish are normally about 36 us long.  They will sometimes 
get lost for >15ms before they return.  That's a different problem 
though, because the ones I'm seeing on my Mac are sometimes >150ms. 
PQfinish never takes quite that long.

PQfinish doesn't pause for a long time on this platform.  But it does 
*something* that causes socket select() polling to stutter.  I have 
instrumented everything interesting in this part of the pgbench code, 
and here is the problem event.

1372531862.062236 select with no timeout sleeping=0
1372531862.109111 select returned 6 sockets latency 46875 us

Here select() is called with 0 sleeping processes, 11 that are done, and 
14 that are running.  The running ones have all sent SELECT statements 
to the server, and they are waiting for a response.  Some of them 
received some data from the server, but they haven't gotten the entire 
response back.  (The PQfinish calls could be involved in how that happened)

With that setup, select runs for 47 *ms* before it gets the next byte to 
a client.  During that time 6 clients get responses back to it, but it 
stays stuck in there for a long time anyway.  Why?  I don't know exactly 
why, but I am sure that pgbench isn't doing anything weird.  It's either 
libpq acting funny, or the OS.  When pgbench is waiting on a set of 
sockets, and none of them are returning anything, that's interesting. 
But there's nothing pgbench can do about it.

The cause/effect here is that the randomness to the throttling code 
spreads out when all the connections end a bit. There are more times 
during which you might have 20 connections finished while 5 still run.

I need to catch up with revisions done to this feature since I started 
instrumenting my copy more heavily.  I hope I can get this ready for 
commit by Monday.  I've certainly beaten on the feature for long enough now.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: New regression test time
Next
From: Stephen Frost
Date:
Subject: Re: New regression test time