Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers

From Greg Smith
Subject Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date
Msg-id 51BB56EE.4030405@2ndQuadrant.com
Whole thread Raw
In response to Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
I don't have this resolved yet, but I think I've identified the cause. 
Updating here mainly so Fabien doesn't duplicate my work trying to track 
this down.  I'm going to keep banging at this until it's resolved now 
that I got this far.

Here's a slow transaction:

1371226017.568515 client 1 executing \set naccounts 100000 * :scale
1371226017.568537 client 1 throttling 6191 us
1371226017.747858 client 1 executing \setrandom aid 1 :naccounts
1371226017.747872 client 1 sending SELECT abalance FROM pgbench_accounts 
WHERE aid = 268721;
1371226017.789816 client 1 receiving

That confirms it is getting stuck at the "throttling" step.  Looks like 
the code pauses there because it's trying to overload the "sleeping" 
state that was already in pgbench, but handle it in a special way inside 
of doCustom(), and that doesn't always work.

The problem is that pgbench doesn't always stay inside doCustom when a 
client sleeps.  It exits there to poll for incoming messages from the 
other clients, via select() on a shared socket.  It's not safe to assume 
doCustom will be running regularly; that's only true if clients keep 
returning messages.

So as long as other clients keep banging on the shared socket, doCustom 
is called regularly, and everything works as expected.  But at the end 
of the test run that happens less often, and that's when the problem 
shows up.

pgbench already has a "\sleep" command, and the way that delay is 
handled happens inside threadRun() instead.  The pausing of the rate 
limit throttle needs to operate in the same place.  I have to redo a few 
things to confirm this actually fixes the issue, as well as look at 
Fabien's later updates to this since I wandered off debugging.  I'm sure 
it's in the area of code I'm poking at now though.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Add visibility map information to pg_freespace.
Next
From: Michael Glaesemann
Date:
Subject: Re: another error perhaps to be enhanced