Re: measuring lwlock-related latency spikes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: measuring lwlock-related latency spikes
Date
Msg-id CA+TgmoaY-EPMzgtrxGUa4jua5-MX7cgdQDXO-775=uWYZPCmwg@mail.gmail.com
Whole thread Raw
In response to Re: measuring lwlock-related latency spikes  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: measuring lwlock-related latency spikes  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: measuring lwlock-related latency spikes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Apr 2, 2012 at 7:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Do you consider this proof that it can only be I/O? Or do we still
> need to find out?

I stuck a bunch more debugging instrumentation into the SLRU code.  It
was fairly clear from the previous round of instrumentation that the
problem was that there was a lot of time passing between when
SlruInternalWritePage acquires shared->buffer_locks[slotno] and when
it releases that lock; I added some additional instrumentation to (a)
confirm this and (b) further break down where the time is getting
spent.  Long story short, when a CLOG-related stall happens,
essentially all the time is being spent in this here section of code:
   /*    * If not part of Flush, need to fsync now.  We assume this happens    * infrequently enough that it's not a
performanceissue.    */   if (!fdata)   {       if (ctl->do_fsync && pg_fsync(fd))       {           slru_errcause =
SLRU_FSYNC_FAILED;          slru_errno = errno;           close(fd);           return false;       }
 
       if (close(fd))       {           slru_errcause = SLRU_CLOSE_FAILED;           slru_errno = errno;
returnfalse;       }   }
 

Here's what the debug output looks like:

2012-04-02 15:51:27 UTC [62397] LOG:  SlruPhysicalWritePage(11)
intervals: 0.005000 0.001000 0.013000 0.000000 0.073000 13162.557000
2012-04-02 15:51:27 UTC [62397] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -3060 WHERE aid = 6501332;
2012-04-02 15:51:27 UTC [62430] LOG:  lock 104: waited 13162.676 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62430] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -3692 WHERE aid = 2516692;
2012-04-02 15:51:27 UTC [62428] LOG:  lock 104: waited 13161.409 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62428] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 3281 WHERE aid = 24527957;
2012-04-02 15:51:27 UTC [62443] LOG:  lock 104: waited 13161.146 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62443] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -360 WHERE aid = 6714054;
2012-04-02 15:51:27 UTC [62436] LOG:  lock 104: waited 12094.996 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62436] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -49 WHERE aid = 4080528;
2012-04-02 15:51:27 UTC [62389] LOG:  lock 104: waited 13160.966 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62389] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -563 WHERE aid = 21896604;
2012-04-02 15:51:27 UTC [62407] LOG:  lock 104: waited 13161.034 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62407] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 1437 WHERE aid = 17185681;
2012-04-02 15:51:27 UTC [62432] LOG:  lock 104: waited 13160.983 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62432] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 4330 WHERE aid = 6289956;
2012-04-02 15:51:27 UTC [62403] LOG:  lock 104: waited 11953.875 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62403] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -4717 WHERE aid = 18829978;
2012-04-02 15:51:27 UTC [62438] LOG:  lock 104: waited 11953.987 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62438] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 1361 WHERE aid = 26274208;
2012-04-02 15:51:27 UTC [62400] LOG:  lock 104: waited 10471.223 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62400] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -2002 WHERE aid = 19209246;
2012-04-02 15:51:27 UTC [62427] LOG:  lock 104: waited 10248.041 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62427] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -874 WHERE aid = 4042895;
2012-04-02 15:51:27 UTC [62419] LOG:  lock 104: waited 13161.085 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62419] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -2874 WHERE aid = 11997038;
2012-04-02 15:51:27 UTC [62394] LOG:  lock 104: waited 10171.179 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62394] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -3855 WHERE aid = 12744804;
2012-04-02 15:51:27 UTC [62410] LOG:  lock 104: waited 10247.882 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62410] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 3643 WHERE aid = 16152613;
2012-04-02 15:51:27 UTC [62440] LOG:  lock 104: waited 10169.646 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62440] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 215 WHERE aid = 3276253;
2012-04-02 15:51:27 UTC [62423] LOG:  lock 104: waited 10170.909 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62423] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 2846 WHERE aid = 10337813;
2012-04-02 15:51:27 UTC [62431] LOG:  lock 104: waited 10279.255 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62431] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -419 WHERE aid = 1547227;
2012-04-02 15:51:27 UTC [62397] LOG:  SlruInternalWritePage(11)
intervals: 0.007000 13162.736000 0.000000 0.001000 0.257000
2012-04-02 15:51:27 UTC [62397] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -3060 WHERE aid = 6501332;
2012-04-02 15:51:27 UTC [62424] LOG:  lock 104: waited 10169.181 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62424] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 4771 WHERE aid = 6768873;
2012-04-02 15:51:27 UTC [62426] LOG:  lock 104: waited 10169.890 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62426] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 3644 WHERE aid = 13548862;
2012-04-02 15:51:27 UTC [62441] LOG:  lock 104: waited 10078.669 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62441] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -2318 WHERE aid = 3387689;
2012-04-02 15:51:27 UTC [62434] LOG:  lock 104: waited 10472.164 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62434] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 1884 WHERE aid = 11738348;
2012-04-02 15:51:27 UTC [62442] LOG:  lock 104: waited 10472.116 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62442] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 1314 WHERE aid = 1871577;
2012-04-02 15:51:27 UTC [62429] LOG:  lock 104: waited 12094.703 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62429] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 1188 WHERE aid = 21727162;
2012-04-02 15:51:27 UTC [62425] LOG:  lock 104: waited 10471.408 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62425] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -35 WHERE aid = 16735415;
2012-04-02 15:51:27 UTC [62435] LOG:  lock 104: waited 10247.717 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62435] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + 457 WHERE aid = 20513750;
2012-04-02 15:51:27 UTC [62421] LOG:  lock 104: waited 12094.718 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62421] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -1989 WHERE aid = 26993577;
2012-04-02 15:51:27 UTC [62433] LOG:  lock 104: waited 10471.231 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62433] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -4369 WHERE aid = 17652000;
2012-04-02 15:51:27 UTC [62439] LOG:  lock 104: waited 10170.579 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62439] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -2453 WHERE aid = 27871868;
2012-04-02 15:51:27 UTC [62415] LOG:  lock 104: waited 10170.867 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62415] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -3110 WHERE aid = 28608274;
2012-04-02 15:51:27 UTC [62422] LOG:  lock 104: waited 10082.338 ms at
slru.c:311 blocked by slru.c:529 spin 2
2012-04-02 15:51:27 UTC [62422] STATEMENT:  UPDATE pgbench_accounts
SET abalance = abalance + -3942 WHERE aid = 12669792;
2012-04-02 15:51:27 UTC [62437] LOG:  lock 104: waited 11955.005 ms at
slru.c:311 blocked by slru.c:529 spin 2

This particular example shows the above chunk of code taking >13s to
execute.  Within 3s, every other backend piles up behind that, leading
to the database getting no work at all done for a good ten seconds.

My guess is that what's happening here is that one backend needs to
read a page into CLOG, so it calls SlruSelectLRUPage to evict the
oldest SLRU page, which is dirty.  For some reason, that I/O takes a
long time.  Then, one by one, other backends comes along and also need
to read various SLRU pages, but the oldest SLRU page hasn't changed,
so SlruSelectLRUPage keeps returning the exact same page that it
returned before, and everybody queues up waiting for that I/O, even
though there might be other buffers available that aren't even dirty.

I am thinking that SlruSelectLRUPage() should probably do
SlruRecentlyUsed() on the selected buffer before calling
SlruInternalWritePage, so that the next backend that comes along
looking for a buffer doesn't pick the same one.  Possibly we should go
further and try to avoid replacing dirty buffers in the first place,
but sometimes there may be no choice, so doing SlruRecentlyUsed() is
still a good idea.

I'll do some testing to try to confirm whether this theory is correct
and whether the above fix helps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: log chunking broken with large queries under load
Next
From: Tom Lane
Date:
Subject: Re: log chunking broken with large queries under load