Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers
From | Satoshi Nagayasu |
---|---|
Subject | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 |
Date | |
Msg-id | 508179CD.5000509@uptime.jp Whole thread Raw |
In response to | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Re: pg_stat_lwlocks view - lwlocks statistics, round 2 |
List | pgsql-hackers |
2012/10/19 23:48, Fujii Masao wrote: > On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> 2012/10/16 2:40, Jeff Janes wrote: >>> >>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> Satoshi Nagayasu <snaga@uptime.jp> writes: >>>>> >>>>> (2012/10/14 13:26), Fujii Masao wrote: >>>>>> >>>>>> The tracing lwlock usage seems to still cause a small performance >>>>>> overhead even if reporting is disabled. I believe some users would >>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>>>>> It should be up to a user to decide whether to trace lwlock usage, >>>>>> e.g., >>>>>> by using trace_lwlock parameter, I think. >>>> >>>> >>>>> Frankly speaking, I do not agree with disabling performance >>>>> instrument to improve performance. DBA must *always* monitor >>>>> the performance metrix when having such heavy workload. >>>> >>>> >>>> This brings up a question that I don't think has been honestly >>>> considered, which is exactly whom a feature like this is targeted at. >>>> TBH I think it's of about zero use to DBAs (making the above argument >>>> bogus). It is potentially of use to developers, but a DBA is unlikely >>>> to be able to do anything about lwlock-level contention even if he has >>>> the knowledge to interpret the data. >>> >>> >>> Waiting on BufFreelistLock suggests increasing shared_buffers. >>> >>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler >>> (or does it?) >>> >>> WALWriteLock suggests doing something about IO, either moving logs to >>> different disks, or getting BBU, or something. >>> >>> WALInsertLock suggests trying to adapt your data loading process so it >>> can take advantage of the bulk, or maybe increasing wal_buffers. >>> >>> And a lot of waiting on any of the locks gives a piece of information >>> the DBA can use when asking the mailing lists for help, even if it >>> doesn't allow him to take unilateral action. >>> >>>> So I feel it isn't something that should be turned on in production >>>> builds. I'd vote for enabling it by a non-default configure option, >>>> and making sure that it doesn't introduce any overhead when the option >>>> is off. >>> >>> >>> I think hackers would benefit from getting reports from DBAs in the >>> field with concrete data on bottlenecks. >>> >>> If the only way to get this is to do some non-standard compile and >>> deploy it to production, or to create a "benchmarking" copy of the >>> production database system including a realistic work-load driver and >>> run the non-standard compile there; either of those is going to >>> dramatically cut down on the participation. >> >> >> Agreed. >> >> The hardest thing to investigate performance issue is >> reproducing a situation in the different environment >> from the production environment. >> >> I often see people struggling to reproduce a situation >> with different hardware and (similar but) different >> workload. It is very time consuming, and also it often >> fails. >> >> So, we need to collect any piece of information, which >> would help us to understand what's going on within >> the production PostgreSQL, without any changes of >> binaries and configurations in the production environment. >> >> That's the reason why I stick to a "built-in" instrument, >> and I disagree to disable such instrument even if it has >> minor performance overhead. >> >> A flight-recorder must not be disabled. Collecting >> performance data must be top priority for DBA. > > pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects > only narrow performance data concerning lwlock. What we should > have as 'flight-recorder' is something like Oracle wait event, I think. > Not only lwlocks but also all of wait events should be collected for > DBA to investigate the performance bottleneck. That's the reason why I said "I accept that it's not enough for DBA", and I'm going to work on another lock stats. > This idea was > proposed by Itagaki-san before. Though he implemented the > sampling-profiler patch, it failed to be committed. I'm not sure why > not. Yeah, I know the previous patch posted by Itagaki-san. So, I'm questioning why (again) for this time. I think this is very important question because it would be critical in order to involve new DBAs to PostgreSQL. > Anyway, I think that this would be more right approach to > provide the 'flight-recorder' to DBA. Ok, I guess we have reached the consensus to have "some flight-recorder". Right? Actually, it seems a great progress from my point of view. :) Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
pgsql-hackers by date: