Re: pg_stat_lwlocks view - lwlocks statistics, round 2 - Mailing list pgsql-hackers

From Satoshi Nagayasu
Subject Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Date
Msg-id 507C3799.9010803@uptime.jp
Whole thread Raw
In response to Re: pg_stat_lwlocks view - lwlocks statistics, round 2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
2012/10/15 1:43, Tom Lane wrote:
> Satoshi Nagayasu <snaga@uptime.jp> writes:
>> (2012/10/14 13:26), Fujii Masao wrote:
>>> The tracing lwlock usage seems to still cause a small performance
>>> overhead even if reporting is disabled. I believe some users would
>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available.
>>> It should be up to a user to decide whether to trace lwlock usage, e.g.,
>>> by using trace_lwlock parameter, I think.
> 
>> Frankly speaking, I do not agree with disabling performance
>> instrument to improve performance. DBA must *always* monitor
>> the performance metrix when having such heavy workload.
> 
> This brings up a question that I don't think has been honestly
> considered, which is exactly whom a feature like this is targeted at.
> TBH I think it's of about zero use to DBAs (making the above argument
> bogus).  It is potentially of use to developers, but a DBA is unlikely
> to be able to do anything about lwlock-level contention even if he has
> the knowledge to interpret the data.

Actually, I'm not a developer. I'm just a DBA, and I needed such
instrument when I was asked to investigate storange WAL behavior
that produced unexpected/random commit delays under heavy workload.

And another patch (WAL dirty flush statistic patch) I have submitted
is coming from the same reason.

https://commitfest.postgresql.org/action/patch_view?id=893

Unfortunately, since I didn't have such instrument at that time,
I used SystemTap to investigate WAL behaviors, including calls and
waited time, but using SystemTap was really awful, and I thought
PostgreSQL needs to have some "built-in" instrument for DBA.

I needed to determine the bottleneck around WAL, such as lock contension
and/or write performance of the device, but I couldn't find anything
without an instrument.

I accept that I'm focusing on only WAL related lwlocks, and it is not
enough for ordinally DBAs, but I still need it to understand PostgreSQL
behavior without having deep knowledge and experience on WAL design and
implementation.

> So I feel it isn't something that should be turned on in production
> builds.  I'd vote for enabling it by a non-default configure option,
> and making sure that it doesn't introduce any overhead when the option
> is off.

There is another option to eliminate performance overhead for this
purpose.

As I tried in the first patch, instead of reporting through pgstat
collector process, each backend could directly increment lwlock
counters allocated in the shared memory.

http://archives.postgresql.org/message-id/4FE9A6F5.2080405@uptime.jp

Here are another benchmark results, including my first patch.

[HEAD]
number of transactions actually processed: 3439971
tps = 57331.891602 (including connections establishing)
tps = 57340.932324 (excluding connections establishing)

[My first patch]
number of transactions actually processed: 3453745
tps = 57562.196971 (including connections establishing)
tps = 57569.197838 (excluding connections establishing)

Actually, I'm not sure why my patch makes PostgreSQL faster, :D
but the performance seems better than my second patch.

I think it still needs some trick to keep counters in "pgstat.stat"
over restarting, but it would be more acceptable in terms of
performance overhead.

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Truncate if exists
Next
From: Fujii Masao
Date:
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown