Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Date
Msg-id CAHGQGwH+JOPg-8XAUqoqtOPwfK79OPdFYg252chGaxrr6UqCdw@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Satoshi Nagayasu <snaga@uptime.jp>)
Responses Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Satoshi Nagayasu <snaga@uptime.jp>)
List pgsql-hackers
On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> 2012/10/16 2:40, Jeff Janes wrote:
>>
>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> Satoshi Nagayasu <snaga@uptime.jp> writes:
>>>>
>>>> (2012/10/14 13:26), Fujii Masao wrote:
>>>>>
>>>>> The tracing lwlock usage seems to still cause a small performance
>>>>> overhead even if reporting is disabled. I believe some users would
>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available.
>>>>> It should be up to a user to decide whether to trace lwlock usage,
>>>>> e.g.,
>>>>> by using trace_lwlock parameter, I think.
>>>
>>>
>>>> Frankly speaking, I do not agree with disabling performance
>>>> instrument to improve performance. DBA must *always* monitor
>>>> the performance metrix when having such heavy workload.
>>>
>>>
>>> This brings up a question that I don't think has been honestly
>>> considered, which is exactly whom a feature like this is targeted at.
>>> TBH I think it's of about zero use to DBAs (making the above argument
>>> bogus).  It is potentially of use to developers, but a DBA is unlikely
>>> to be able to do anything about lwlock-level contention even if he has
>>> the knowledge to interpret the data.
>>
>>
>> Waiting on BufFreelistLock suggests increasing shared_buffers.
>>
>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler
>> (or does it?)
>>
>> WALWriteLock suggests doing something about IO, either moving logs to
>> different disks, or getting BBU, or something.
>>
>> WALInsertLock suggests trying to adapt your data loading process so it
>> can take advantage of the bulk, or maybe increasing wal_buffers.
>>
>> And a lot of waiting on any of the locks gives a piece of information
>> the DBA can use when asking the mailing lists for help, even if it
>> doesn't allow him to take unilateral action.
>>
>>> So I feel it isn't something that should be turned on in production
>>> builds.  I'd vote for enabling it by a non-default configure option,
>>> and making sure that it doesn't introduce any overhead when the option
>>> is off.
>>
>>
>> I think hackers would benefit from getting reports from DBAs in the
>> field with concrete data on bottlenecks.
>>
>> If the only way to get this is to do some non-standard compile and
>> deploy it to production, or to create a "benchmarking" copy of the
>> production database system including a realistic work-load driver and
>> run the non-standard compile there; either of those is going to
>> dramatically cut down on the participation.
>
>
> Agreed.
>
> The hardest thing to investigate performance issue is
> reproducing a situation in the different environment
> from the production environment.
>
> I often see people struggling to reproduce a situation
> with different hardware and (similar but) different
> workload. It is very time consuming, and also it often
> fails.
>
> So, we need to collect any piece of information, which
> would help us to understand what's going on within
> the production PostgreSQL, without any changes of
> binaries and configurations in the production environment.
>
> That's the reason why I stick to a "built-in" instrument,
> and I disagree to disable such instrument even if it has
> minor performance overhead.
>
> A flight-recorder must not be disabled. Collecting
> performance data must be top priority for DBA.

pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects
only narrow performance data concerning lwlock. What we should
have as 'flight-recorder' is something like Oracle wait event, I think.
Not only lwlocks but also all of wait events should be collected for
DBA to investigate the performance bottleneck. This idea was
proposed by Itagaki-san before. Though he implemented the
sampling-profiler patch, it failed to be committed. I'm not sure why
not. Anyway, I think that this would be more right approach to
provide the 'flight-recorder' to DBA.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Philip Taylor
Date:
Subject: Add SHA-3 (Keccak) support to pgcrypto
Next
From: Andrew Dunstan
Date:
Subject: Re: Deprecating RULES