Thread: pg_stat_lwlock wait time view

pg_stat_lwlock wait time view

From
Haribabu Kommi
Date:
There was some discussion earlier in adding pg_stat_lwlock view in [1].
The main objections which I observed for that patch was showing LWLOCK
information to the user that don't understand what this lock used for and etc.

Currently as part of wait_event information in pg_stat_activity the LWLOCK
information is available to the user and the details of LWLOCk's that are
used in PostgreSQL are also listed in the documentation and with their
purpose.

So I feel it may be worth to add this view to find out the wait times of the
LWLOCK's. This information can be useful to find out the bottlenecks
around LWLOCK's in production environments. But adding the timing calculations
may cause performance problem. Is there any need of writing this stats
information to file? As this just provides the wait time information.

Based on the performance impact with the additional timing calculations,
we can decide the view default behavior, Are there any objections to the
concept?


[1] - https://www.postgresql.org/message-id/4FE9A6F5.2080405@uptime.jp

Regards,
Hari Babu
Fujitsu Australia



Re: pg_stat_lwlock wait time view

From
Robert Haas
Date:
On Wed, Aug 24, 2016 at 4:23 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> There was some discussion earlier in adding pg_stat_lwlock view in [1].
> The main objections which I observed for that patch was showing LWLOCK
> information to the user that don't understand what this lock used for and etc.
>
> Currently as part of wait_event information in pg_stat_activity the LWLOCK
> information is available to the user and the details of LWLOCk's that are
> used in PostgreSQL are also listed in the documentation and with their
> purpose.
>
> So I feel it may be worth to add this view to find out the wait times of the
> LWLOCK's. This information can be useful to find out the bottlenecks
> around LWLOCK's in production environments. But adding the timing calculations
> may cause performance problem. Is there any need of writing this stats
> information to file? As this just provides the wait time information.
>
> Based on the performance impact with the additional timing calculations,
> we can decide the view default behavior, Are there any objections to the
> concept?

There have been some other recent threads on extending the wait event
stuff.  If you haven't already read those, you should, because the
issues are closely related.  I think that timing LWLock waits will be
quite expensive.  I believe that what the Postgres Pro folks want to
do is add up the wait times or maybe keep a history of waits (though
maybe I'm wrong about that), but showing them in pg_stat_activity is
another idea.  That's likely to add some synchronization overhead
which might be even greater in this case than for a feature that just
publishes accumulated times, but maybe it's all a drop in the bucket
compared to the cost of calling gettimeofday() in the first place.

Personally, my preferred solution is still to have a background worker
that samples the published wait events and rolls up statistics, but
I'm not sure I've convinced anyone else.  It could report the number
of seconds since it detected a wait event other than the current one,
which is not precisely the same thing as tracking the length of the
current wait but it's pretty close.  I don't know for sure what's best
here - I think some experimentation and dialog is needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_stat_lwlock wait time view

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Haribabu Kommi
> calculations may cause performance problem. Is there any need of writing
> this stats information to file? As this just provides the wait time
> information.

Yes, saving the workload diagnosis information would be nice like Oracle AWR.  I mean not the current accumulated
total,but a series of snapshots taken periodically.  It would enable:
 

* Daily monitoring across database restarts.  e.g. The response times of applications degraded after applying a patch.
What'sthe difference between before and after the patch application?
 

* Hint on troubleshooting a crash failure.  e.g. Excessive memory use by PostgreSQL crashed the OS.  What was the
workloadlike just before the crash?
 

The point of discussion may be whether PostgreSQL itself provides the feature to accumulate performance diagnosis
informationon persistent storage.  pg_statsinfo will be able to take on it, but it wouldn't be convenient nor
efficient.

Regards
Takayuki Tsunakawa



Re: pg_stat_lwlock wait time view

From
Haribabu Kommi
Date:
On Thu, Aug 25, 2016 at 6:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 24, 2016 at 4:23 AM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>>
>> Based on the performance impact with the additional timing calculations,
>> we can decide the view default behavior, Are there any objections to the
>> concept?
>
> There have been some other recent threads on extending the wait event
> stuff.  If you haven't already read those, you should, because the
> issues are closely related.  I think that timing LWLock waits will be
> quite expensive.  I believe that what the Postgres Pro folks want to
> do is add up the wait times or maybe keep a history of waits (though
> maybe I'm wrong about that), but showing them in pg_stat_activity is
> another idea.  That's likely to add some synchronization overhead
> which might be even greater in this case than for a feature that just
> publishes accumulated times, but maybe it's all a drop in the bucket
> compared to the cost of calling gettimeofday() in the first place.

Yes, I agree this is an issue for the cases where the wait time is smaller
than the logic that is added to calculate the wait time. Even if we use
clock_gettime with CLOCK_REALTIME_COARSE there will be some
overhead, as this clock method is 8 times faster than gettimeofday
but not that accurate in result. May be we can use the clock_getime
instead of gettimeofday in this case, as we may not needed the fine-grained
value.

> Personally, my preferred solution is still to have a background worker
> that samples the published wait events and rolls up statistics, but
> I'm not sure I've convinced anyone else.  It could report the number
> of seconds since it detected a wait event other than the current one,
> which is not precisely the same thing as tracking the length of the
> current wait but it's pretty close.  I don't know for sure what's best
> here - I think some experimentation and dialog is needed.

Yes, using of background worker can reduce the load of adding all the
wait time calculations in the main backend. I can give a try by modifying
direct calculation approach and background worker (may be pg_stat_collector)
to find the wait time based on the stat messages that are received from
main backend related to wait start and wait end.

I am not sure with out getting any signal or message from main backend,
how much accurate the data can be gathered from a background worker.

Regards,
Hari Babu
Fujitsu Australia



Re: pg_stat_lwlock wait time view

From
Satoshi Nagayasu
Date:
2016-08-25 13:46 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:
> On Thu, Aug 25, 2016 at 6:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Aug 24, 2016 at 4:23 AM, Haribabu Kommi
>> <kommi.haribabu@gmail.com> wrote:
>>>
>>> Based on the performance impact with the additional timing calculations,
>>> we can decide the view default behavior, Are there any objections to the
>>> concept?
>>
>> There have been some other recent threads on extending the wait event
>> stuff.  If you haven't already read those, you should, because the
>> issues are closely related.  I think that timing LWLock waits will be
>> quite expensive.  I believe that what the Postgres Pro folks want to
>> do is add up the wait times or maybe keep a history of waits (though
>> maybe I'm wrong about that), but showing them in pg_stat_activity is
>> another idea.  That's likely to add some synchronization overhead
>> which might be even greater in this case than for a feature that just
>> publishes accumulated times, but maybe it's all a drop in the bucket
>> compared to the cost of calling gettimeofday() in the first place.
>
> Yes, I agree this is an issue for the cases where the wait time is smaller
> than the logic that is added to calculate the wait time. Even if we use
> clock_gettime with CLOCK_REALTIME_COARSE there will be some
> overhead, as this clock method is 8 times faster than gettimeofday
> but not that accurate in result. May be we can use the clock_getime
> instead of gettimeofday in this case, as we may not needed the fine-grained
> value.

Is there any other option (rather than gettimeofday()) to measure elapsed
time with lower overhead?

I've heard about the RDTSC feature (hardware counter) supported by the recent
processors, and have found a few articles [1] [2] on its lower overhead than
gettimeofday().

[1] http://stackoverflow.com/questions/15623343/using-cpu-counters-versus-gettimeofday
[2] http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday

I'm not sure how we can benefit from it so far, because I'm not
familiar with this
facility and of course I don't have the numbers. In addition to that,
I guess it would
bring some portability issues. But I'm still curious to know more
about these stuff.
Anyone has some experiences on it?

>> Personally, my preferred solution is still to have a background worker
>> that samples the published wait events and rolls up statistics, but
>> I'm not sure I've convinced anyone else.  It could report the number
>> of seconds since it detected a wait event other than the current one,
>> which is not precisely the same thing as tracking the length of the
>> current wait but it's pretty close.  I don't know for sure what's best
>> here - I think some experimentation and dialog is needed.
>
> Yes, using of background worker can reduce the load of adding all the
> wait time calculations in the main backend. I can give a try by modifying
> direct calculation approach and background worker (may be pg_stat_collector)
> to find the wait time based on the stat messages that are received from
> main backend related to wait start and wait end.
>
> I am not sure with out getting any signal or message from main backend,
> how much accurate the data can be gathered from a background worker.

It looks a sort of accuracy-performance trade-off.
So, I think the use-cases matter here to get a better design.

I guess that's the reason why llya is looking for feature requests from DBA
in another thread [3].

[3] https://www.postgresql.org/message-id/CAG95seUAQVj09KzLwU%2Bz1B-GqdMqerzEkPFR3hn0q88XzMq-PA@mail.gmail.com

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>



Re: [HACKERS] pg_stat_lwlock wait time view

From
Haribabu Kommi
Date:


[ Re sending to Hackers as the earlier mail failed to deliver to Hackers mailing list]

On Mon, Jan 9, 2017 at 4:13 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Thu, Aug 25, 2016 at 2:46 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Thu, Aug 25, 2016 at 6:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Personally, my preferred solution is still to have a background worker
> that samples the published wait events and rolls up statistics, but
> I'm not sure I've convinced anyone else.  It could report the number
> of seconds since it detected a wait event other than the current one,
> which is not precisely the same thing as tracking the length of the
> current wait but it's pretty close.  I don't know for sure what's best
> here - I think some experimentation and dialog is needed.

Yes, using of background worker can reduce the load of adding all the
wait time calculations in the main backend. I can give a try by modifying
direct calculation approach and background worker (may be pg_stat_collector)
to find the wait time based on the stat messages that are received from
main backend related to wait start and wait end.

I am not sure with out getting any signal or message from main backend,
how much accurate the data can be gathered from a background worker.

Apologies to come back to an old thread.

I tried of using "stats collector process" as a background worker to calculate
the wait times for LWLocks instead of adding another background worker
for proof of concept.

Created two hash tables, one is to store the "LWLock stats" and another
is to store the "Backend's information" with PID as a key.

Whenever the Backend is waiting for an LWLock, it sends the message to
"stats collector" with PID and wait_event_info of the lock. Once the stats
collector receives the message, Adds that Backend entry to Hash table after
getting the start time. Once the Backend ends the waiting for the Lock, it 
sends the signal to the "stats collector" and it gets the entry from Hash table
and finds out the wait time and update this time to the corresponding LWLock
entry in another Hash table.

The LWLock wait stats are stored in the stats file for persistence.

Currently no stats reset logic.

This patch is helpful in creating a view to display wait times of all wait events
that is discussed in [1].

Comments?




Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] pg_stat_lwlock wait time view

From
Robert Haas
Date:
On Mon, Jan 9, 2017 at 12:13 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Whenever the Backend is waiting for an LWLock, it sends the message to
> "stats collector" with PID and wait_event_info of the lock. Once the stats
> collector receives the message, Adds that Backend entry to Hash table after
> getting the start time. Once the Backend ends the waiting for the Lock, it
> sends the signal to the "stats collector" and it gets the entry from Hash
> table
> and finds out the wait time and update this time to the corresponding LWLock
> entry in another Hash table.

I will be extremely surprised if this doesn't have a severe negative
impact on performance when LWLock contention is high (e.g. a pgbench
read-only test using a scale factor that fits in the OS cache but not
shared_buffers).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company