Thread: stored procedure stats in collector
Howdy, I'm working on a patch to extend the stats collector to handle stored procedure statistics (call counts, duration etc). The goal is to make this information visible via pg_stat functions/views. The collection would be controllable via "stats_function_level" GUC and will have minimal overhead when turned off. At our company we depend heavily on stored procedures and such a tool would be of great help. Perhaps others could also find it somewhat useful. So far I have a "working" patch against 8.2.4 (attached) that places some instrumentation in the executor (ExecMakeFunctionResult, ExecMakeFunctionResultNoSets and ExecMakeTableFunctionResult) and send the stats to the collector (handled similarly to table stats). The results are visible through pg_stat_user_functions: martinp=# set stats_function_level = on; SET Time: 0.172 ms martinp=# select count(f3()) from generate_series(1, 1000); count ------- 1000 (1 row) Time: 40059.713 ms martinp=# select * from pg_stat_user_functions ; procid | schemaname | procname | nargs | calls | total_time | total_cpu | self_time | self_cpu --------+------------+----------+-------+-------+------------+-----------+-----------+---------- 16388 | public | f1 | 0 | 4000 | 14978 | 8352 | 14978 | 8352 16389 | public | f2 | 0 | 2000 | 40044 | 8364 | 25066 | 12 16390 | public | f3 | 0 | 1000 | 40054 | 8364 | 9 | 0 (3 rows) Time units are in milliseconds. Only functions with oid >= FirstNormalObjectId are accounted. This is of course still very experimental - some work is required to get this into production shape. I was hoping to get some feedback before I continue though. Maybe there are some obvious flaws that I'm not seeing, maybe something needs to be changed to make this more generally useful. Feedback is greatly appreciated. Regards, Martin
Attachment
On Fri, 2007-07-13 at 14:11 +0300, Martin Pihlak wrote: > I'm working on a patch to extend the stats collector to handle stored procedure > statistics (call counts, duration etc). The goal is to make this information > visible via pg_stat functions/views. The collection would be controllable via > "stats_function_level" GUC and will have minimal overhead when turned off. At > our company we depend heavily on stored procedures and such a tool would be of > great help. Perhaps others could also find it somewhat useful. Very cool, certainly sounds like a useful feature to me. > martinp=# select * from pg_stat_user_functions ; > procid | schemaname | procname | nargs | calls | total_time | total_cpu | self_time | self_cpu > --------+------------+----------+-------+-------+------------+-----------+-----------+---------- > 16388 | public | f1 | 0 | 4000 | 14978 | 8352 | 14978 | 8352 > 16389 | public | f2 | 0 | 2000 | 40044 | 8364 | 25066 | 12 > 16390 | public | f3 | 0 | 1000 | 40054 | 8364 | 9 | 0 > (3 rows) (schemaname, procname, nargs) is still ambiguous in the face of function overloading. Although the presence of procid uniquely identifies each function anyway, if you're going to include the name and argument information, it might be worth including the argument types as well (as an array of regtype, perhaps). > Only functions with oid >= FirstNormalObjectId are accounted. Sounds reasonable to me; adding profiling to every DirectFunctionCall invocation is likely to be too expensive anyway. >From looking quickly at the patch, I don't think the current coding handles set-returning functions (ExecMakeTableFunctionResult). -Neil
Martin Pihlak <martin.pihlak@gmail.com> writes: > I'm working on a patch to extend the stats collector to handle stored procedure > statistics (call counts, duration etc). > ... > Only functions with oid >= FirstNormalObjectId are accounted. I really dislike that approach to deciding which functions to count. The main problem with it is that it will try to count C-language functions that are added after initdb, such as contrib stuff and third-party add-ons like postgis. The percentage overhead for a typical short C function will be large, and I'm not sure anything much is to be gained by counting these. I think a more reasonable approach would be to count PL-language functions; which in turn suggests that the instrumentation hooks should be in the PL call handlers, not in ExecMakeFunctionResult and friends where they will drag down performance of all functions. BTW, I dunno if you've thought about the implications of inlining of SQL functions ... it's hard to see how to count those reasonably. regards, tom lane
Neil Conway wrote: > > (schemaname, procname, nargs) is still ambiguous in the face of function > overloading. Although the presence of procid uniquely identifies each > function anyway, if you're going to include the name and argument > information, it might be worth including the argument types as well (as > an array of regtype, perhaps). > This is true. I was being a bit selfish here - in our environment we don't use overloading that much - schema, name and nargs are usually sufficient. It is also convinient to be able to do "select * from pg..." and have the output not to wrap around. >>From looking quickly at the patch, I don't think the current coding > handles set-returning functions (ExecMakeTableFunctionResult). > Hmm, should be handled. The number of tuples returned is not counted though, perhaps this is another thing to add ... Regards, Martin
Tom Lane wrote: > I really dislike that approach to deciding which functions to count. > The main problem with it is that it will try to count C-language > functions that are added after initdb, such as contrib stuff and > third-party add-ons like postgis. The percentage overhead for a > typical short C function will be large, and I'm not sure anything > much is to be gained by counting these. > Agreed, it is a bit ugly. In the initial version there was no such exclusion. This came only after I "accidentally" activated the collection on a busy server. Checked cpu usage some half an hour later and immediately disabled it. System mode consumption was unacceptably high, but the results indicated that something funny was going on inside application as well - some internal functions such as texteq() and now() had unreasonably high call counts. I'm still investigating those issues, but maybe it might be useful to have stats for internal and C language functions as well? Perhaps it is possible to make the collection configurable per language. So that for instance, default is to only collect pl languages, with the option to add C or internal. Not sure how to approach this though. > I think a more reasonable approach would be to count PL-language > functions; which in turn suggests that the instrumentation hooks > should be in the PL call handlers, not in ExecMakeFunctionResult > and friends where they will drag down performance of all functions. > It would be great if all the procedural languages would go through a single entry point. Right now the more exotic PL-s would need separate patching. But indeed, this approach would be less intrusive. > BTW, I dunno if you've thought about the implications of inlining > of SQL functions ... it's hard to see how to count those reasonably. > Yes, this creates some inconsistencies in what is collected and what not. Unless, of course, only PL functions are counted :) Regards, Martin
Howdy, Here's an updated version of the function stats patch. The biggest change is that by default only procedural language functions are counted. The GUC variable stats_function_level now takes 3 values: on, off and all. The latter also counts SQL and C language functions, "on" means only to count procedural language functions. The decision is now made in fmgr_info_cxt_security() Trigger functions are now also counted. Sample output: select procname, calls, total_time, total_cpu, self_time, self_cpu from pg_stat_user_functions order by self_cpu desc limit 5; procname | calls | total_time | total_cpu | self_time | self_cpu --------------------+-------+------------+-----------+-----------+---------- next_batch | 32765 | 27139 | 8574 | 27139 | 8574 fetch_batch_events | 3636 | 9252 | 5622 | 3771 | 2717 batch_event_sql | 3636 | 5454 | 2888 | 3910 | 1962 finish_batch | 3420 | 3215 | 1475 | 3215 | 1475 batch_event_tables | 3636 | 1448 | 865 | 1434 | 858 (5 rows) This is still 8.2 only (tested on 8.2.4 and 8.2.5), has seen some production usage here at Skype (about a month on reasonably busy boxes). So far so good. Couple of issues: - sometimes self_cpu > self_time - currently blaming it on Linux version of gettimeofday(). - dropped functions are not purged from stats, might bloat the stats file for some usage patterns. PS. Would something like this be a canditate for 8.4 inclusion (if polished up)? Regards, Martin
Attachment
On Thu, 2007-09-20 at 16:08 +0300, Martin Pihlak wrote: > The GUC variable stats_function_level now takes 3 values: on, off and all. That seems a confusing set of values. Perhaps "off", "pl", and "all" would be clearer? I'm curious if you've measured the performance overhead of enabling this functionality. > PS. Would something like this be a canditate for 8.4 inclusion (if polished up)? It sounds like a useful feature to me. -Neil
Neil Conway wrote: > On Thu, 2007-09-20 at 16:08 +0300, Martin Pihlak wrote: >> The GUC variable stats_function_level now takes 3 values: on, off and all. > > That seems a confusing set of values. Perhaps "off", "pl", and "all" > would be clearer? Makes sense. It appears that the stats_ prefixed GUC names are deprecated now. Will rename to "track_functions" and change values to "off", "pl" and "all". Or should I use "none" instead of "off"? > I'm curious if you've measured the performance overhead of enabling this > functionality. > Just finished a simple benchmark comparing unpatched, "off" and "all": unpatched usr sys total dummy 21.36 0.68 22.04 rfact 51.52 0.07 51.59 ifact 20.40 0.06 20.45 "off" usr sys total diff x dummy 21.59 0.39 21.97 1.00 rfact 52.40 0.06 52.46 1.02 ifact 20.78 0.02 20.8 1.02 "all" usr sys total diff x dummy 21.69 35.18 56.87 2.58 rfact 47.83 38.99 86.83 1.68 ifact 20.65 0.36 21.01 1.03 dummy is blank pl/pgsql function (ran 10000000 times). rfact is recursive factorial, ifact is iterative (both ran 1000000 times). Comparing median of 3 runs. The usr and sys values are from log_executor_stats. It seems that the overhead is unnoticeable if disabled, very visible for lightweight functions and heavy callers. Almost unnoticeable for more compute intensive functions. >> PS. Would something like this be a canditate for 8.4 inclusion (if polished up)? > > It sounds like a useful feature to me. > Good. I'll bring the patch up to date with HEAD. Regards, Martin
Martin Pihlak <martin.pihlak@gmail.com> writes: > Neil Conway wrote: >> That seems a confusing set of values. Perhaps "off", "pl", and "all" >> would be clearer? > Makes sense. It appears that the stats_ prefixed GUC names are deprecated now. > Will rename to "track_functions" and change values to "off", "pl" and > "all". Or should I use "none" instead of "off"? "None" seems good, by analogy with log_statement's values. > It seems that the overhead is unnoticeable if disabled, very visible > for lightweight functions and heavy callers. Almost unnoticeable for > more compute intensive functions. The very high system times seem odd. Maybe you have a machine with slow gettimeofday()? regards, tom lane
>> It seems that the overhead is unnoticeable if disabled, very visible >> for lightweight functions and heavy callers. Almost unnoticeable for >> more compute intensive functions. > > The very high system times seem odd. Maybe you have a machine with > slow gettimeofday()? > So it seems (on Ubuntu 7.04). Reran the benchmark on another system and got more reasonable results: "none" usr sys total dummy 15.9 0.68 16.58 rfact 39.37 0.07 39.44 ifact 12.89 0.01 12.9 "all" usr sys total diff x dummy 20.82 4.44 25.26 1.52 rfact 45.17 4.17 49.34 1.25 ifact 12.92 0.03 12.96 1.00 Regards, Martin
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Martin Pihlak wrote: > Howdy, > > Here's an updated version of the function stats patch. The biggest change is > that by default only procedural language functions are counted. The GUC variable > stats_function_level now takes 3 values: on, off and all. The latter also > counts SQL and C language functions, "on" means only to count procedural > language functions. The decision is now made in fmgr_info_cxt_security() > Trigger functions are now also counted. > > Sample output: > > select procname, calls, total_time, total_cpu, self_time, self_cpu > from pg_stat_user_functions order by self_cpu desc limit 5; > > procname | calls | total_time | total_cpu | self_time | self_cpu > --------------------+-------+------------+-----------+-----------+---------- > next_batch | 32765 | 27139 | 8574 | 27139 | 8574 > fetch_batch_events | 3636 | 9252 | 5622 | 3771 | 2717 > batch_event_sql | 3636 | 5454 | 2888 | 3910 | 1962 > finish_batch | 3420 | 3215 | 1475 | 3215 | 1475 > batch_event_tables | 3636 | 1448 | 865 | 1434 | 858 > (5 rows) > > This is still 8.2 only (tested on 8.2.4 and 8.2.5), has seen some production > usage here at Skype (about a month on reasonably busy boxes). So far so good. > > Couple of issues: > - sometimes self_cpu > self_time - currently blaming it on Linux version of > gettimeofday(). > - dropped functions are not purged from stats, might bloat the stats file for > some usage patterns. > > PS. Would something like this be a canditate for 8.4 inclusion (if polished up)? > > Regards, > Martin > [ application/x-gzip is not supported, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, Sep 20, 2007 at 05:34:32PM -0700, Neil Conway wrote: > That seems a confusing set of values. Perhaps "off", "pl", and "all" > would be clearer? > I'm curious if you've measured the performance overhead of enabling this > functionality. i'm quite worried about "all" setting. all operators are functions as well, so tracking literally all function calls might be too heavy. on the other hand i would like to be able to track some non-pl functions like RI%. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
hubert depesz lubaczewski wrote: > On Thu, Sep 20, 2007 at 05:34:32PM -0700, Neil Conway wrote: >> That seems a confusing set of values. Perhaps "off", "pl", and "all" >> would be clearer? >> I'm curious if you've measured the performance overhead of enabling this >> functionality. > > i'm quite worried about "all" setting. > all operators are functions as well, so tracking literally all function > calls might be too heavy. > "all" would additionally count only C and SQL functions, builtins (language "internal") are skipped. Maybe there should be option to count these as well, perhaps with a really scare name (track_functions = all_plus_internal). > on the other hand i would like to be able to track some non-pl functions > like RI%. > These are builtin. Regards, Martin
This has been saved for the next commit-fest: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Martin Pihlak wrote: > Howdy, > > Here's an updated version of the function stats patch. The biggest change is > that by default only procedural language functions are counted. The GUC variable > stats_function_level now takes 3 values: on, off and all. The latter also > counts SQL and C language functions, "on" means only to count procedural > language functions. The decision is now made in fmgr_info_cxt_security() > Trigger functions are now also counted. > > Sample output: > > select procname, calls, total_time, total_cpu, self_time, self_cpu > from pg_stat_user_functions order by self_cpu desc limit 5; > > procname | calls | total_time | total_cpu | self_time | self_cpu > --------------------+-------+------------+-----------+-----------+---------- > next_batch | 32765 | 27139 | 8574 | 27139 | 8574 > fetch_batch_events | 3636 | 9252 | 5622 | 3771 | 2717 > batch_event_sql | 3636 | 5454 | 2888 | 3910 | 1962 > finish_batch | 3420 | 3215 | 1475 | 3215 | 1475 > batch_event_tables | 3636 | 1448 | 865 | 1434 | 858 > (5 rows) > > This is still 8.2 only (tested on 8.2.4 and 8.2.5), has seen some production > usage here at Skype (about a month on reasonably busy boxes). So far so good. > > Couple of issues: > - sometimes self_cpu > self_time - currently blaming it on Linux version of > gettimeofday(). > - dropped functions are not purged from stats, might bloat the stats file for > some usage patterns. > > PS. Would something like this be a canditate for 8.4 inclusion (if polished up)? > > Regards, > Martin > [ application/x-gzip is not supported, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +