Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept) - Mailing list pgsql-hackers

From Ilya Kosmodemiansky
Subject Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
Date
Msg-id CAG95seWDra-MY3edXUqh6cRqKijUNhqvxxt2W52OeruLvYzU=g@mail.gmail.com
Whole thread Raw
In response to Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, Oct 7, 2014 at 4:12 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I think the easiest way to measure lwlock contention would be to put
>> some counters in the lwlock itself.  My guess, based on a lot of
>> fiddling with LWLOCK_STATS over the years, is that there's no way to
>> count lock acquisitions and releases without harming performance
>> significantly - no matter where we put the counters, it's just going
>> to be too expensive.  However, I believe that incrementing a counter -
>> even in the lwlock itself - might not be too expensive if we only do
>> it when (1) a process goes to sleep or (2) spindelays occur.
>
> Increasing the size will be painful on its own :(.

I am afraid in this case we should think about minimizing overhead but
not about avoiding it at all: having such DBA-friendly feature it is
worth it.

Let me step down a bit, since the discussion went to details, while
the whole design idea stays unclear.

What actually we need: fact, that lwlock acquired? lock count? time
spent in lock? overall lock duration?

Usual way to explain how any of such performance tools work, is
Traffic example (and any oracle/db2 wait-interface aware DBA knows
it):

You have some from home to office way and spend an hour to make it.
You try to optimize it and found, that however you take highway with
no speed limit, you usually stack in traffic turning from highway to
your office and spend there about 10-30 min. Alternative is to take
another way with 2 speed limit zones and one traffic light, totally
you will loose 2 and 5 minutes on speed limit parts and 2 min on red
light - overall better than 30 minutes in a jam and even better than
10 min in a jam. That is all about: to found bottleneck we need
information that process hold certain lock, that it was held certain
time or there are a lot of shorter time locks.

I think, sampling even 1-2 times pro second and building sort of
histogram is well enough at the moment, because it shows (not very in
a very precise manner however) that process hold certain lock, that it
was held certain time or there are a lot of shorter time locks.
After that it is possible to implement something more precise. (As far
as I know, Greg Smith works on some sort of wait events, but it seems
to me there are a lot of work to do to implement exact analog of OWI)

-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
Next
From: Sawada Masahiko
Date:
Subject: Re: pg_receivexlog always handles -d option argument as connstr