Thread: Obtaining resource usage statistics from execution? (v 9.1)

Obtaining resource usage statistics from execution? (v 9.1)

From

Karl Denninger

Date:

16 March 2012, 11:32:17

Hi folks;

I am trying to continue profiling which in turn feeds query and index tuning changes for the AKCS-WWW forum software, and appear to have no good way to do what I need to do -- or I've missed something obvious.

The application uses the libpq interface from "C" to talk to Postgres which contains all the back end data. Since this is a forum application it is very read-heavy (other than accounting and of course user posting functionality), and is template-driven. All of the table lookup functions that come from the display templates are compartmentalized in one function in the base code.

What I want to be able to do is to determine the resource usage by Postgres for each of these calls.

I can do this by adding a call into the function just before the "real" call to PQexec() that prepends "explain analyze" to the call, makes a preamble call to PQexec() then grabs the last tuple returned which is the total execution time (with some text), parse that and there is the total time anyway. But I see no way to get stats on I/O (e.g. Postgres buffer hits and misses, calls to the I/O operating system level APIs, etc.)

But while I can get the numbers this way it comes at the expense of doubling the Postgres processing. There does not appear, however, to be any exposition of the processing time requirements for actual (as opposed to "modeled" via explain analyze) execution of queries -- at least not via the libpq interface.

Am I missing something here -- is there a way to get resource consumption from actual queries as they're run? What I'm doing right now is the above, with a configuration switch that has a minimum reportable execution time and then logging the returns that exceed that time, logging the queries that have the above-threshold runtimes for analysis and attempted optimization. This works but obviously is something one only does for profiling as it doubles database load and is undesirable in ordinary operation. What I'd like to be able to do is have the code track performance all the time and raise alerts when it sees "outliers" giving me a continually-improving set of targets for reduction of resource consumption (up until I reach the point where I don't seem to be able to make it any faster of course :-))

Thanks in advance!

--
-- Karl Denninger
The Market Ticker ®
Cuda Systems LLC

Re: Obtaining resource usage statistics from execution? (v 9.1)

From

Tom Lane

Date:

16 March 2012, 11:54:06

Karl Denninger <karl@denninger.net> writes:
> I am trying to continue profiling which in turn feeds query and index
> tuning changes for the AKCS-WWW forum software, and appear to have no
> good way to do what I need to do -- or I've missed something obvious.

Do you really want the application involved in this?  Usually, when
people try to profile a workload, they want everything to just get
logged on the server side.  There are some contrib modules to help
with that approach --- see auto_explain and pg_stat_statements.

            regards, tom lane

Re: Obtaining resource usage statistics from execution? (v 9.1)

From

"Tomas Vondra"

Date:

16 March 2012, 12:08:43

Hi,

first of all, which PostgreSQL version are you using, what platform is it
running on? What level of control do you have over the database (are you
just a user or can you modify the postgresql.conf file)?

On 16 Březen 2012, 15:31, Karl Denninger wrote:
> Hi folks;
>
> I am trying to continue profiling which in turn feeds query and index
> tuning changes for the AKCS-WWW forum software, and appear to have no
> good way to do what I need to do -- or I've missed something obvious.

Why do you need to do that? Have you checked log_duration /
log_min_duration_statement configuration options? What about auto_explain
and maybe pg_stat_statements?

The aggregated data (e.g. provided by pg_stat_statements or pgfounie) are
IMHO much more useful than having to deal with data collected for each
query separately.

> The application uses the libpq interface from "C" to talk to Postgres
> which contains all the back end data.  Since this is a forum application
> it is very read-heavy (other than accounting and of course user posting
> functionality), and is template-driven.  All of the table lookup
> functions that come from the display templates are compartmentalized in
> one function in the base code.
>
> What I want to be able to do is to determine the resource usage by
> Postgres for each of these calls.
>
> I can do this by adding a call into the function just before the "real"
> call to PQexec() that prepends "explain analyze" to the call, makes a
> preamble call to PQexec() then grabs the last tuple returned which is
> the total execution time (with some text), parse that and there is the
> total time anyway.  But I see no way to get stats on I/O (e.g. Postgres
> buffer hits and misses, calls to the I/O operating system level APIs,
> etc.)
>
> But while I can get the numbers this way it comes at the expense of
> doubling the Postgres processing.  There does not appear, however, to be
> any exposition of the processing time requirements for actual (as
> opposed to "modeled" via explain analyze) execution of queries -- at
> least not via the libpq interface.

Yup, that's the problem of EXPLAIN ANALYZE. IMHO it's a 'no go' in this
case I guess. Not only you have to run the query twice, but it may also
significantly influence the actual runtime due to gettimeofday overhead
etc.

You can use auto_explain to eliminate the need to run the query twice, but
the overhead may still be a significant drag, not reflecting the actual
performance (and thus not useful to perform reasonable profiling).

> Am I missing something here -- is there a way to get resource
> consumption from actual queries as they're run?  What I'm doing right
> now is the above, with a configuration switch that has a minimum
> reportable execution time and then logging the returns that exceed that
> time, logging the queries that have the above-threshold runtimes for
> analysis and attempted optimization.  This works but obviously is
> something one only does for profiling as it doubles database load and is
> undesirable in ordinary operation.  What I'd like to be able to do is
> have the code track performance all the time and raise alerts when it
> sees "outliers" giving me a continually-improving set of targets for
> reduction of resource consumption (up until I reach the point where I
> don't seem to be able to make it any faster of course :-))

If all you want is outliers, then set log_min_duration_statement and use
pgfounie to process the logs. That's very simple and very effective way to
deal with them.

If you really need the resource consumption stats, you may write a simple
SRF that calls getrusage and returns the data as a row so that you'll be
able to do something like

  select * from pg_rusage()

This seems like a neat idea, and writing an extension that should be
fairly simple. Still, it will be a Linux-only (because getrusage is) and
I'm not quite sure the collected data are very useful.

Tomas

Re: Obtaining resource usage statistics from execution? (v 9.1)

From

Karl Denninger

Date:

16 March 2012, 12:38:56

On 3/16/2012 9:53 AM, Tom Lane wrote:

Karl Denninger <karl@denninger.net> writes:

I am trying to continue profiling which in turn feeds query and index
tuning changes for the AKCS-WWW forum software, and appear to have no
good way to do what I need to do -- or I've missed something obvious.

Do you really want the application involved in this?  Usually, when
people try to profile a workload, they want everything to just get
logged on the server side.  There are some contrib modules to help
with that approach --- see auto_explain and pg_stat_statements.
		regards, tom lane

Well, I do in some cases, yes. Specifically I'd like to flag instances "on the fly" when total execution time in the calls exceed some amount. I can (of course) track the time in the application but what I'd like to be able to do is track the time that a particular template uses at the millisecond level in database calls.

pg_stat_statements is generalized across the server which is useful for profiling on an aggregate basis but not so much among individual uses. auto_explain doesn't tell me what upstream call generated the log; I have to go back and figure that out by hand and it's in the server logs which isn't really where I want it.

One of the things that doing this in the application would allow is that the queries in question are all generated by a template file that is basically HTML with carefully-crafted commentish-appearing things that the parser uses; for instance:

 (push previously-obtained table value for "number" in the current row into the selector)
 (look up the post using the selector that was just pushed)
 (replace the comment text with the field "message" from the returned lookup)
 (loop back / recursion level closure)
 (end of loop / recursion level)

That's a simple example; in the real template for each of those messages there are other things pushed off and other tables referenced, such as the user's statistics, their avatar, recursion levels can be tagged in closing statements, etc. I'd like to be able to, in the code that executes that lookup (when "post" is parsed) grab the time required to execute that SQL statement and stash it, possibly into another table with the tag ("post") so I know what query class generated it, and then at the end of the template itself pick up the aggregate for the entire template's database processing. If I can pull the time for each query generated by the template in the code adding it all up is trivial for the template as a whole.

This would allow me to profile the total execution time for each of the templates in the system at both the template level and the individual function call level out of the template. If I then stash that data somewhere (e.g. into a different table) I can then look for the worst-performing queries and templates and focus my attention on the ones that get run the most so my "bang for the buck in optimization time" is improved.

Most of these queries are very standard but not all. One of the deviant examples of the latter is the "search" function exposed to the user that can craft a rather-complex query in response to what the user fills in on the form that is submitted targeting both simple field lookups and GiST full-text indices.

Doing this the hard way has returned huge dividends as I found a query that was previously running in the ~2,000 ms range and is executed very frequently that responded extraordinarily well to a compound index addition on one of the target tables. That in turn dropped its execution time to double-digit milliseconds (!!) for most cases and the user experience improvement from that change was massive. I don't expect to find many more "wins" of that caliber but I'll take as many as I can get :-)

The argument for this in the application comes down to the granularity and specificity that I gain from knowing exactly what function call in the template generated the timing being reported.

(If it matters, Postgres 9.1.3 on FreeBSD 8.x)

--
-- Karl Denninger
The Market Ticker ®
Cuda Systems LLC

Re: Obtaining resource usage statistics from execution? (v 9.1)

From

Ants Aasma

Date:

16 March 2012, 13:18:28

On Fri, Mar 16, 2012 at 4:31 PM, Karl Denninger <karl@denninger.net> wrote:
> What I'd
> like to be able to do is have the code track performance all the time and
> raise alerts when it sees "outliers" giving me a continually-improving set
> of targets for reduction of resource consumption (up until I reach the point
> where I don't seem to be able to make it any faster of course :-))

Sounds almost exactly like what the auto_explain contrib module is
designed to do:
http://www.postgresql.org/docs/9.1/static/auto-explain.html

It can run with reasonably low overhead if your system has fast
timing. You can check the timing performance of your system with the
tool attached here:
http://archives.postgresql.org/message-id/4F15B930.50108%402ndQuadrant.com

Anything under 200ns should be ok.

Cheers,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de