Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 |
Date | |
Msg-id | CAHGQGwH+JOPg-8XAUqoqtOPwfK79OPdFYg252chGaxrr6UqCdw@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 (Satoshi Nagayasu <snaga@uptime.jp>) |
Responses |
Re: pg_stat_lwlocks view - lwlocks statistics, round 2
|
List | pgsql-hackers |
On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > 2012/10/16 2:40, Jeff Janes wrote: >> >> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> Satoshi Nagayasu <snaga@uptime.jp> writes: >>>> >>>> (2012/10/14 13:26), Fujii Masao wrote: >>>>> >>>>> The tracing lwlock usage seems to still cause a small performance >>>>> overhead even if reporting is disabled. I believe some users would >>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>>>> It should be up to a user to decide whether to trace lwlock usage, >>>>> e.g., >>>>> by using trace_lwlock parameter, I think. >>> >>> >>>> Frankly speaking, I do not agree with disabling performance >>>> instrument to improve performance. DBA must *always* monitor >>>> the performance metrix when having such heavy workload. >>> >>> >>> This brings up a question that I don't think has been honestly >>> considered, which is exactly whom a feature like this is targeted at. >>> TBH I think it's of about zero use to DBAs (making the above argument >>> bogus). It is potentially of use to developers, but a DBA is unlikely >>> to be able to do anything about lwlock-level contention even if he has >>> the knowledge to interpret the data. >> >> >> Waiting on BufFreelistLock suggests increasing shared_buffers. >> >> Waiting on ProcArrayLock perhaps suggests use of a connection pooler >> (or does it?) >> >> WALWriteLock suggests doing something about IO, either moving logs to >> different disks, or getting BBU, or something. >> >> WALInsertLock suggests trying to adapt your data loading process so it >> can take advantage of the bulk, or maybe increasing wal_buffers. >> >> And a lot of waiting on any of the locks gives a piece of information >> the DBA can use when asking the mailing lists for help, even if it >> doesn't allow him to take unilateral action. >> >>> So I feel it isn't something that should be turned on in production >>> builds. I'd vote for enabling it by a non-default configure option, >>> and making sure that it doesn't introduce any overhead when the option >>> is off. >> >> >> I think hackers would benefit from getting reports from DBAs in the >> field with concrete data on bottlenecks. >> >> If the only way to get this is to do some non-standard compile and >> deploy it to production, or to create a "benchmarking" copy of the >> production database system including a realistic work-load driver and >> run the non-standard compile there; either of those is going to >> dramatically cut down on the participation. > > > Agreed. > > The hardest thing to investigate performance issue is > reproducing a situation in the different environment > from the production environment. > > I often see people struggling to reproduce a situation > with different hardware and (similar but) different > workload. It is very time consuming, and also it often > fails. > > So, we need to collect any piece of information, which > would help us to understand what's going on within > the production PostgreSQL, without any changes of > binaries and configurations in the production environment. > > That's the reason why I stick to a "built-in" instrument, > and I disagree to disable such instrument even if it has > minor performance overhead. > > A flight-recorder must not be disabled. Collecting > performance data must be top priority for DBA. pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects only narrow performance data concerning lwlock. What we should have as 'flight-recorder' is something like Oracle wait event, I think. Not only lwlocks but also all of wait events should be collected for DBA to investigate the performance bottleneck. This idea was proposed by Itagaki-san before. Though he implemented the sampling-profiler patch, it failed to be committed. I'm not sure why not. Anyway, I think that this would be more right approach to provide the 'flight-recorder' to DBA. Regards, -- Fujii Masao
pgsql-hackers by date: