Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) |
Date | |
Msg-id | 51CF6999.4060103@2ndQuadrant.com Whole thread Raw |
In response to | Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: [PATCH] pgbench --throttle (submission 7 - with lag
measurement)
|
List | pgsql-hackers |
On 6/22/13 12:54 PM, Fabien COELHO wrote: > After some poking around, and pursuing various red herrings, I resorted > to measure the delay for calling "PQfinish()", which is really the only > special thing going around at the end of pgbench run... This wasn't what I was seeing, but it's related. I've proved to myself the throttle change isn't reponsible for the weird stuff I'm seeing now. I'd like to rearrange when PQfinish happens nowbased on what I'm seeing, but that's not related to this review. I duplicated the PQfinish problem you found too. On my Linux system, calls to PQfinish are normally about 36 us long. They will sometimes get lost for >15ms before they return. That's a different problem though, because the ones I'm seeing on my Mac are sometimes >150ms. PQfinish never takes quite that long. PQfinish doesn't pause for a long time on this platform. But it does *something* that causes socket select() polling to stutter. I have instrumented everything interesting in this part of the pgbench code, and here is the problem event. 1372531862.062236 select with no timeout sleeping=0 1372531862.109111 select returned 6 sockets latency 46875 us Here select() is called with 0 sleeping processes, 11 that are done, and 14 that are running. The running ones have all sent SELECT statements to the server, and they are waiting for a response. Some of them received some data from the server, but they haven't gotten the entire response back. (The PQfinish calls could be involved in how that happened) With that setup, select runs for 47 *ms* before it gets the next byte to a client. During that time 6 clients get responses back to it, but it stays stuck in there for a long time anyway. Why? I don't know exactly why, but I am sure that pgbench isn't doing anything weird. It's either libpq acting funny, or the OS. When pgbench is waiting on a set of sockets, and none of them are returning anything, that's interesting. But there's nothing pgbench can do about it. The cause/effect here is that the randomness to the throttling code spreads out when all the connections end a bit. There are more times during which you might have 20 connections finished while 5 still run. I need to catch up with revisions done to this feature since I started instrumenting my copy more heavily. I hope I can get this ready for commit by Monday. I've certainly beaten on the feature for long enough now. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
pgsql-hackers by date: