Thread: CPU time for pg_stat_statement
Hello, hackers. Today I was doing some aggregates over pg_stat_statements in order to find types of queries consuming most of the CPU. Aggregates were made on two pg_state_statement snapshots within 30 sec delay. The sum(total_time) had the biggest value for a very frequent query with about 10ms execution. I was thinking it is the biggest CPU consumer. But after reducing the frequency of queries a lot I was unable to see any significant difference in server CPU usage... So, looks like clock_gettime is not so accurate to measure real CPU usage for some OLTP workloads. I suppose it is caused by the wall time vs CPU time difference (IO, thread switch, etc). But what do you think about adding cpu_time (by calling getrusage) to pg_stat_statements? Seems it could be very useful for CPU profiling. I am probably able to prepare the patch, but it is always better to get some feedback on the idea first :) Best regards, Michail.
Michail Nikolaev <michail.nikolaev@gmail.com> writes: > So, looks like clock_gettime is not so accurate to measure real CPU > usage for some OLTP workloads. I suppose it is caused by the wall time > vs CPU time difference (IO, thread switch, etc). This is a pretty broad claim to make on the basis of one undocumented test case on one unmentioned platform. > But what do you think about adding cpu_time (by calling getrusage) to > pg_stat_statements? Seems it could be very useful for CPU profiling. On what grounds do you claim getrusage will be better? One thing we can be pretty certain of is that it will be slower, since it has to return many more pieces of information. And the API for it only allows time info to be specified to microseconds, versus nanoseconds for clock_gettime, so it's also going to be taking a precision hit. regards, tom lane
On Sat, May 21, 2022 at 6:50 AM Michail Nikolaev <michail.nikolaev@gmail.com> wrote: > But what do you think about adding cpu_time (by calling getrusage) to > pg_stat_statements? Seems it could be very useful for CPU profiling. > > I am probably able to prepare the patch, but it is always better to > get some feedback on the idea first :) This might be interesting: https://github.com/powa-team/pg_stat_kcache
Hello, Thomas. > This might be interesting: > https://github.com/powa-team/pg_stat_kcache Oh, nice, looks like it could help me to reduce CPU and test my assumption (using exec_user_time and exec_system_time). BWT, do you know why extension is not in standard contrib (looks mature)? Best regards, Michail.
Hello, Tom. > This is a pretty broad claim to make on the basis of one undocumented > test case on one unmentioned platform. I'll try to use pg_stat_kcache to check the difference between Wall and CPU for my case. > On what grounds do you claim getrusage will be better? One thing we > can be pretty certain of is that it will be slower, since it has to > return many more pieces of information. And the API for it only allows > time info to be specified to microseconds, versus nanoseconds for > clock_gettime, so it's also going to be taking a precision hit. My idea was to not replace wall-clock (clock_gettime) by cpu-clock (getrusage). I think about adding getrusage as an additional column (with flag to enable actual measuring). Looks like I need to be more precise in words :) It is just two different clocks - and sometimes you need physical time, sometimes CPU time (and sometimes, for example, amount of WAL written). Best regards, Michail.
Hi, On Sat, May 21, 2022 at 12:21:49AM +0300, Michail Nikolaev wrote: > > > This might be interesting: > > https://github.com/powa-team/pg_stat_kcache > > Oh, nice, looks like it could help me to reduce CPU and test my > assumption (using exec_user_time and exec_system_time). > > BWT, do you know why extension is not in standard contrib (looks mature)? Because contrib isn't meant to eventually contain all possible extensions. There is an official postgres extension network, and also community deb/rpm repositories that are intended to handle postgres extensibility, and this extension is available on all of that, same as a lot of other extensions, which are at least as mature.
Hello, Tom. >> This is a pretty broad claim to make on the basis of one undocumented >> test case on one unmentioned platform. > I'll try to use pg_stat_kcache to check the difference between Wall > and CPU for my case. In my case I see pretty high correlation of pg_stat_kcache and pg_stat_statement (clock_gettime vs getrusage). Looks like CPU usage is hidden somewhere else (planning probably, not measured on postgres 11, but I see really high *clauselist_selectivity* in perf). Thanks, Michail.