Re: measuring lwlock-related latency spikes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: measuring lwlock-related latency spikes
Date
Msg-id CA+Tgmob3PbqQ8x6ib49SVX-uYiapTjnZ+rQ4az-akMqXmE1V0A@mail.gmail.com
Whole thread Raw
In response to Re: measuring lwlock-related latency spikes  (Greg Stark <stark@mit.edu>)
Responses Re: measuring lwlock-related latency spikes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Mar 31, 2012 at 6:01 PM, Greg Stark <stark@mit.edu> wrote:
> On Sat, Mar 31, 2012 at 10:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Isn't that lock held while doing visibility checks?
>>
>> Nope.  heap_update() and friends do a very complicated little dance to
>> avoid that.
> ...
>>> What about I/O
>>> waiting for a clog page to be read?
>>
>> I'm pretty sure that can happen
>
> I'm confused because i thought these two sentences were part of
> describing the same case.

Oh, I thought you were talking about the visibility *map*.  Sorry.

>> because TransactionIdIsCommitted()
>> can get called from HeapTupleSatisfies*() which pretty much only gets
>> called while holding the page lock.  I don't know whether it's the
>> cause of these particular stalls, but it's plausible if the CLOG cache
>> is getting thrashed hard enough.
>
> I wonder if it would make sense to, if we come across an xid that
> isn't in the slru release the lock while we read in the clog page.
> When we reobtain it we can check if the LSN has changed and if it has
> restart the visibility checks. If it hasn't pick up where we left off.

I've discovered a bug in my code that was causing it to print at most
2 histogram buckets per lwlock, which obviously means that my previous
results were totally inaccurate.  Ah, the joys of benchmarking.  I
found the problem when I added code to log a message any time an
lwlock wait exceeded a certain time threshold, and it fired far more
often than the previous results would have indicated.  In particular,
it turns out that long waits for WALInsertLock are extremely common
and not, as the previous results appeared to indicated, unheard-of.
I'm rerunning my tests now and will post the updated,
hopefully-accurate results when that's done.

/me attempts to remove egg from face.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: new group commit behavior not helping?
Next
From: Robert Haas
Date:
Subject: Re: Command Triggers patch v18