Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers

From Satoshi Nagayasu
Subject Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Date
Msg-id 508179CD.5000509@uptime.jp
Whole thread Raw
In response to Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Re: pg_stat_lwlocks view - lwlocks statistics, round 2
List pgsql-hackers
2012/10/19 23:48, Fujii Masao wrote:
> On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> 2012/10/16 2:40, Jeff Janes wrote:
>>>
>>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>
>>>> Satoshi Nagayasu <snaga@uptime.jp> writes:
>>>>>
>>>>> (2012/10/14 13:26), Fujii Masao wrote:
>>>>>>
>>>>>> The tracing lwlock usage seems to still cause a small performance
>>>>>> overhead even if reporting is disabled. I believe some users would
>>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available.
>>>>>> It should be up to a user to decide whether to trace lwlock usage,
>>>>>> e.g.,
>>>>>> by using trace_lwlock parameter, I think.
>>>>
>>>>
>>>>> Frankly speaking, I do not agree with disabling performance
>>>>> instrument to improve performance. DBA must *always* monitor
>>>>> the performance metrix when having such heavy workload.
>>>>
>>>>
>>>> This brings up a question that I don't think has been honestly
>>>> considered, which is exactly whom a feature like this is targeted at.
>>>> TBH I think it's of about zero use to DBAs (making the above argument
>>>> bogus).  It is potentially of use to developers, but a DBA is unlikely
>>>> to be able to do anything about lwlock-level contention even if he has
>>>> the knowledge to interpret the data.
>>>
>>>
>>> Waiting on BufFreelistLock suggests increasing shared_buffers.
>>>
>>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler
>>> (or does it?)
>>>
>>> WALWriteLock suggests doing something about IO, either moving logs to
>>> different disks, or getting BBU, or something.
>>>
>>> WALInsertLock suggests trying to adapt your data loading process so it
>>> can take advantage of the bulk, or maybe increasing wal_buffers.
>>>
>>> And a lot of waiting on any of the locks gives a piece of information
>>> the DBA can use when asking the mailing lists for help, even if it
>>> doesn't allow him to take unilateral action.
>>>
>>>> So I feel it isn't something that should be turned on in production
>>>> builds.  I'd vote for enabling it by a non-default configure option,
>>>> and making sure that it doesn't introduce any overhead when the option
>>>> is off.
>>>
>>>
>>> I think hackers would benefit from getting reports from DBAs in the
>>> field with concrete data on bottlenecks.
>>>
>>> If the only way to get this is to do some non-standard compile and
>>> deploy it to production, or to create a "benchmarking" copy of the
>>> production database system including a realistic work-load driver and
>>> run the non-standard compile there; either of those is going to
>>> dramatically cut down on the participation.
>>
>>
>> Agreed.
>>
>> The hardest thing to investigate performance issue is
>> reproducing a situation in the different environment
>> from the production environment.
>>
>> I often see people struggling to reproduce a situation
>> with different hardware and (similar but) different
>> workload. It is very time consuming, and also it often
>> fails.
>>
>> So, we need to collect any piece of information, which
>> would help us to understand what's going on within
>> the production PostgreSQL, without any changes of
>> binaries and configurations in the production environment.
>>
>> That's the reason why I stick to a "built-in" instrument,
>> and I disagree to disable such instrument even if it has
>> minor performance overhead.
>>
>> A flight-recorder must not be disabled. Collecting
>> performance data must be top priority for DBA.
>
> pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects
> only narrow performance data concerning lwlock. What we should
> have as 'flight-recorder' is something like Oracle wait event, I think.
> Not only lwlocks but also all of wait events should be collected for
> DBA to investigate the performance bottleneck.

That's the reason why I said "I accept that it's not enough
for DBA", and I'm going to work on another lock stats.

> This idea was
> proposed by Itagaki-san before. Though he implemented the
> sampling-profiler patch, it failed to be committed. I'm not sure why
> not.

Yeah, I know the previous patch posted by Itagaki-san.
So, I'm questioning why (again) for this time.
I think this is very important question because it would
be critical in order to involve new DBAs to PostgreSQL.

> Anyway, I think that this would be more right approach to
> provide the 'flight-recorder' to DBA.

Ok, I guess we have reached the consensus to have
"some flight-recorder". Right?

Actually, it seems a great progress from my point of view. :)

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp



pgsql-hackers by date:

Previous
From: Satoshi Nagayasu
Date:
Subject: Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Next
From: Robert Haas
Date:
Subject: assertion failure w/extended query protocol