On Sun, Apr 1, 2012 at 4:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> If I filter for waits greater than 8s, a somewhat different picture emerges:
>
> 2 waited at indexam.c:521 blocked by bufmgr.c:2475
> 212 waited at slru.c:310 blocked by slru.c:526
>
> In other words, some of the waits for SLRU pages to be written are...
> really long. There were 126 that exceeded 10 seconds and 56 that
> exceeded 12 seconds. "Painful" is putting it mildly.
Interesting. The total wait contribution from those two factors
exceeds the WALInsertLock wait.
> I suppose one interesting question is to figure out if there's a way I
> can optimize the disk configuration in this machine, or the Linux I/O
> scheduler, or something, so as to reduce the amount of time it spends
> waiting for the disk. But the other thing is why we're waiting for
> SLRU page writes to begin with.
First, we need to determine that it is the clog where this is happening.
Also, you're assuming this is an I/O issue. I think its more likely
that this is a lock starvation issue. Shared locks queue jump
continually over the exclusive lock, blocking access for long periods.
I would guess that is also the case with the index wait, where I would
guess a near-root block needs an exclusive lock, but is held up by
continual index tree descents.
My (fairly old) observation is that the shared lock semantics only
work well when exclusive locks are fairly common. When they are rare,
the semantics work against us.
We should either implement 1) non-queue jump semantics for certain
cases 2) put a limit on the number of queue jumps that can occur
before we let the next x lock proceed instead. (2) sounds better, but
keeping track might well cause greater overhead.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services