Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers
From | Satoshi Nagayasu |
---|---|
Subject | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 |
Date | |
Msg-id | 507C3799.9010803@uptime.jp Whole thread Raw |
In response to | Re: pg_stat_lwlocks view - lwlocks statistics, round 2 (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
2012/10/15 1:43, Tom Lane wrote: > Satoshi Nagayasu <snaga@uptime.jp> writes: >> (2012/10/14 13:26), Fujii Masao wrote: >>> The tracing lwlock usage seems to still cause a small performance >>> overhead even if reporting is disabled. I believe some users would >>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>> It should be up to a user to decide whether to trace lwlock usage, e.g., >>> by using trace_lwlock parameter, I think. > >> Frankly speaking, I do not agree with disabling performance >> instrument to improve performance. DBA must *always* monitor >> the performance metrix when having such heavy workload. > > This brings up a question that I don't think has been honestly > considered, which is exactly whom a feature like this is targeted at. > TBH I think it's of about zero use to DBAs (making the above argument > bogus). It is potentially of use to developers, but a DBA is unlikely > to be able to do anything about lwlock-level contention even if he has > the knowledge to interpret the data. Actually, I'm not a developer. I'm just a DBA, and I needed such instrument when I was asked to investigate storange WAL behavior that produced unexpected/random commit delays under heavy workload. And another patch (WAL dirty flush statistic patch) I have submitted is coming from the same reason. https://commitfest.postgresql.org/action/patch_view?id=893 Unfortunately, since I didn't have such instrument at that time, I used SystemTap to investigate WAL behaviors, including calls and waited time, but using SystemTap was really awful, and I thought PostgreSQL needs to have some "built-in" instrument for DBA. I needed to determine the bottleneck around WAL, such as lock contension and/or write performance of the device, but I couldn't find anything without an instrument. I accept that I'm focusing on only WAL related lwlocks, and it is not enough for ordinally DBAs, but I still need it to understand PostgreSQL behavior without having deep knowledge and experience on WAL design and implementation. > So I feel it isn't something that should be turned on in production > builds. I'd vote for enabling it by a non-default configure option, > and making sure that it doesn't introduce any overhead when the option > is off. There is another option to eliminate performance overhead for this purpose. As I tried in the first patch, instead of reporting through pgstat collector process, each backend could directly increment lwlock counters allocated in the shared memory. http://archives.postgresql.org/message-id/4FE9A6F5.2080405@uptime.jp Here are another benchmark results, including my first patch. [HEAD] number of transactions actually processed: 3439971 tps = 57331.891602 (including connections establishing) tps = 57340.932324 (excluding connections establishing) [My first patch] number of transactions actually processed: 3453745 tps = 57562.196971 (including connections establishing) tps = 57569.197838 (excluding connections establishing) Actually, I'm not sure why my patch makes PostgreSQL faster, :D but the performance seems better than my second patch. I think it still needs some trick to keep counters in "pgstat.stat" over restarting, but it would be more acceptable in terms of performance overhead. Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
pgsql-hackers by date: