Re: measuring lwlock-related latency spikes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: measuring lwlock-related latency spikes
Date
Msg-id CA+TgmoZAffdS7jS7y8zPTCGTvZx-bB6q3GbLpA66citC=HftYQ@mail.gmail.com
Whole thread Raw
In response to Re: measuring lwlock-related latency spikes  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: measuring lwlock-related latency spikes  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sun, Apr 1, 2012 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> First, we need to determine that it is the clog where this is happening.

I can confirm that based on the LWLockIds.  There were 32 of them
beginning at lock id 81, and a gdb session confirms that
ClogCtlData->shared->buffer_locks[0..31] point to exact that set of
LWLockIds.

> Also, you're assuming this is an I/O issue. I think its more likely
> that this is a lock starvation issue. Shared locks queue jump
> continually over the exclusive lock, blocking access for long periods.

That is a possible issue in general, but I can't see how it could be
happening here, because the shared lock is only a mechanism for
waiting for an I/O to complete.  The backend doing the I/O grabs the
control lock, sets a flag saying there's an I/O in progress, takes the
buffer lock in exclusive mode, and releases the control lock.  The
shared locks are taken when someone notices that the flag is set on a
buffer they want to access.  So there aren't any shared lockers until
the buffer is already locked in exclusive mode.  Or at least I don't
think there are; please correct me if I'm wrong.

Now... I do think it's possible that this could happen: backend #1
wants to write the buffer, so grabs the lock and writes the buffer.
Meanwhile some waiters pile up.  When the guy doing the I/O finishes,
he releases the lock, releasing all the waiters.  They then have to
wake up and grab the lock, but maybe before they (or some of them) can
do it somebody else starts another I/O on the buffer and they all have
to go back to sleep.  That could allow the wait time to be many times
the I/O time.  If that's the case we could just make this use
LWLockAcquireOrWait(); the calling code is just going to pick a new
victim buffer anyway, so it's silly to go through additional spinlock
cycles to acquire a lock we don't want anyway.

I bet I can add some more instrumentation to get clearer data on what
is happening here.  What I've added so far doesn't seem to be
affecting performance very much.

> I would guess that is also the case with the index wait, where I would
> guess a near-root block needs an exclusive lock, but is held up by
> continual index tree descents.
>
> My (fairly old) observation is that the shared lock semantics only
> work well when exclusive locks are fairly common. When they are rare,
> the semantics work against us.
>
> We should either implement 1) non-queue jump semantics for certain
> cases 2) put a limit on the number of queue jumps that can occur
> before we let the next x lock proceed instead. (2) sounds better, but
> keeping track might well cause greater overhead.

Maybe, but your point that we should characterize the behavior before
engineering solutions is well-taken, so let's do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Command Triggers patch v18
Next
From: Heikki Linnakangas
Date:
Subject: Autovacuum worker does not set stack_base_ptr