Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers

From Satoshi Nagayasu
Subject Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Date
Msg-id 507D7DB9.5020808@uptime.jp
Whole thread Raw
In response to Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Robert Haas <robertmhaas@gmail.com>)
Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
2012/10/16 2:40, Jeff Janes wrote:
> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Satoshi Nagayasu <snaga@uptime.jp> writes:
>>> (2012/10/14 13:26), Fujii Masao wrote:
>>>> The tracing lwlock usage seems to still cause a small performance
>>>> overhead even if reporting is disabled. I believe some users would
>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available.
>>>> It should be up to a user to decide whether to trace lwlock usage, e.g.,
>>>> by using trace_lwlock parameter, I think.
>>
>>> Frankly speaking, I do not agree with disabling performance
>>> instrument to improve performance. DBA must *always* monitor
>>> the performance metrix when having such heavy workload.
>>
>> This brings up a question that I don't think has been honestly
>> considered, which is exactly whom a feature like this is targeted at.
>> TBH I think it's of about zero use to DBAs (making the above argument
>> bogus).  It is potentially of use to developers, but a DBA is unlikely
>> to be able to do anything about lwlock-level contention even if he has
>> the knowledge to interpret the data.
>
> Waiting on BufFreelistLock suggests increasing shared_buffers.
>
> Waiting on ProcArrayLock perhaps suggests use of a connection pooler
> (or does it?)
>
> WALWriteLock suggests doing something about IO, either moving logs to
> different disks, or getting BBU, or something.
>
> WALInsertLock suggests trying to adapt your data loading process so it
> can take advantage of the bulk, or maybe increasing wal_buffers.
>
> And a lot of waiting on any of the locks gives a piece of information
> the DBA can use when asking the mailing lists for help, even if it
> doesn't allow him to take unilateral action.
>
>> So I feel it isn't something that should be turned on in production
>> builds.  I'd vote for enabling it by a non-default configure option,
>> and making sure that it doesn't introduce any overhead when the option
>> is off.
>
> I think hackers would benefit from getting reports from DBAs in the
> field with concrete data on bottlenecks.
>
> If the only way to get this is to do some non-standard compile and
> deploy it to production, or to create a "benchmarking" copy of the
> production database system including a realistic work-load driver and
> run the non-standard compile there; either of those is going to
> dramatically cut down on the participation.

Agreed.

The hardest thing to investigate performance issue is
reproducing a situation in the different environment
from the production environment.

I often see people struggling to reproduce a situation
with different hardware and (similar but) different
workload. It is very time consuming, and also it often
fails.

So, we need to collect any piece of information, which
would help us to understand what's going on within
the production PostgreSQL, without any changes of
binaries and configurations in the production environment.

That's the reason why I stick to a "built-in" instrument,
and I disagree to disable such instrument even if it has
minor performance overhead.

A flight-recorder must not be disabled. Collecting
performance data must be top priority for DBA.

Regards,

>
> Cheers,
>
> Jeff
>


-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
Next
From: Tom Lane
Date:
Subject: Bugs in planner's equivalence-class processing