Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 |
Date | |
Msg-id | CAHGQGwF5cQgn14JQK77oQGNPH3NSvt7_1q2=nHF3+7cys_a5Jw@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 (Satoshi Nagayasu <snaga@uptime.jp>) |
List | pgsql-hackers |
On Sat, Oct 20, 2012 at 1:03 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > 2012/10/19 23:48, Fujii Masao wrote: >> >> On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> >> wrote: >>> >>> 2012/10/16 2:40, Jeff Janes wrote: >>>> >>>> >>>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> >>>>> >>>>> Satoshi Nagayasu <snaga@uptime.jp> writes: >>>>>> >>>>>> >>>>>> (2012/10/14 13:26), Fujii Masao wrote: >>>>>>> >>>>>>> >>>>>>> The tracing lwlock usage seems to still cause a small performance >>>>>>> overhead even if reporting is disabled. I believe some users would >>>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not >>>>>>> available. >>>>>>> It should be up to a user to decide whether to trace lwlock usage, >>>>>>> e.g., >>>>>>> by using trace_lwlock parameter, I think. >>>>> >>>>> >>>>> >>>>>> Frankly speaking, I do not agree with disabling performance >>>>>> instrument to improve performance. DBA must *always* monitor >>>>>> the performance metrix when having such heavy workload. >>>>> >>>>> >>>>> >>>>> This brings up a question that I don't think has been honestly >>>>> considered, which is exactly whom a feature like this is targeted at. >>>>> TBH I think it's of about zero use to DBAs (making the above argument >>>>> bogus). It is potentially of use to developers, but a DBA is unlikely >>>>> to be able to do anything about lwlock-level contention even if he has >>>>> the knowledge to interpret the data. >>>> >>>> >>>> >>>> Waiting on BufFreelistLock suggests increasing shared_buffers. >>>> >>>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler >>>> (or does it?) >>>> >>>> WALWriteLock suggests doing something about IO, either moving logs to >>>> different disks, or getting BBU, or something. >>>> >>>> WALInsertLock suggests trying to adapt your data loading process so it >>>> can take advantage of the bulk, or maybe increasing wal_buffers. >>>> >>>> And a lot of waiting on any of the locks gives a piece of information >>>> the DBA can use when asking the mailing lists for help, even if it >>>> doesn't allow him to take unilateral action. >>>> >>>>> So I feel it isn't something that should be turned on in production >>>>> builds. I'd vote for enabling it by a non-default configure option, >>>>> and making sure that it doesn't introduce any overhead when the option >>>>> is off. >>>> >>>> >>>> >>>> I think hackers would benefit from getting reports from DBAs in the >>>> field with concrete data on bottlenecks. >>>> >>>> If the only way to get this is to do some non-standard compile and >>>> deploy it to production, or to create a "benchmarking" copy of the >>>> production database system including a realistic work-load driver and >>>> run the non-standard compile there; either of those is going to >>>> dramatically cut down on the participation. >>> >>> >>> >>> Agreed. >>> >>> The hardest thing to investigate performance issue is >>> reproducing a situation in the different environment >>> from the production environment. >>> >>> I often see people struggling to reproduce a situation >>> with different hardware and (similar but) different >>> workload. It is very time consuming, and also it often >>> fails. >>> >>> So, we need to collect any piece of information, which >>> would help us to understand what's going on within >>> the production PostgreSQL, without any changes of >>> binaries and configurations in the production environment. >>> >>> That's the reason why I stick to a "built-in" instrument, >>> and I disagree to disable such instrument even if it has >>> minor performance overhead. >>> >>> A flight-recorder must not be disabled. Collecting >>> performance data must be top priority for DBA. >> >> >> pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects >> only narrow performance data concerning lwlock. What we should >> have as 'flight-recorder' is something like Oracle wait event, I think. >> Not only lwlocks but also all of wait events should be collected for >> DBA to investigate the performance bottleneck. > > > That's the reason why I said "I accept that it's not enough > for DBA", and I'm going to work on another lock stats. > > >> This idea was >> proposed by Itagaki-san before. Though he implemented the >> sampling-profiler patch, it failed to be committed. I'm not sure why >> not. > > > Yeah, I know the previous patch posted by Itagaki-san. > So, I'm questioning why (again) for this time. > I think this is very important question because it would > be critical in order to involve new DBAs to PostgreSQL. > > >> Anyway, I think that this would be more right approach to >> provide the 'flight-recorder' to DBA. > > > Ok, I guess we have reached the consensus to have > "some flight-recorder". Right? Yes, at least I agree to add that. But whatever is added, if it may have any impact on the performance, on-off switch must be required. Also the default value of the switch should be off until we'll have implemented the low-overhead instrument and agreed to enable it by default. Regards, -- Fujii Masao
pgsql-hackers by date: