Re: Obtaining resource usage statistics from execution? (v 9.1) - Mailing list pgsql-performance

From Karl Denninger
Subject Re: Obtaining resource usage statistics from execution? (v 9.1)
Date
Msg-id 4F635E78.8080001@denninger.net
Whole thread Raw
In response to Re: Obtaining resource usage statistics from execution? (v 9.1)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 3/16/2012 9:53 AM, Tom Lane wrote:
Karl Denninger <karl@denninger.net> writes:
I am trying to continue profiling which in turn feeds query and index
tuning changes for the AKCS-WWW forum software, and appear to have no
good way to do what I need to do -- or I've missed something obvious.
Do you really want the application involved in this?  Usually, when
people try to profile a workload, they want everything to just get
logged on the server side.  There are some contrib modules to help
with that approach --- see auto_explain and pg_stat_statements.
		regards, tom lane
Well, I do in some cases, yes.  Specifically I'd like to flag instances "on the fly" when total execution time in the calls exceed some amount.  I can (of course) track the time in the application but what I'd like to be able to do is track the time that a particular template uses at the millisecond level in database calls.

pg_stat_statements is generalized across the server which is useful for profiling on an aggregate basis but not so much among individual uses.  auto_explain doesn't tell me what upstream call generated the log; I have to go back and figure that out by hand and it's in the server logs which isn't really where I want it.

One of the things that doing this in the application would allow is that the queries in question are all generated by a template file that is basically HTML with carefully-crafted commentish-appearing things that the parser uses; for instance:

<!--&number-->  (push previously-obtained table value for "number" in the current row into the selector)
<!--*post--> (look up the post using the selector that was just pushed)
<!--@message-->  (replace the comment text with the field "message" from the returned lookup)
<!--^--> (loop back / recursion level closure)
<!--$--> (end of loop / recursion level)

That's a simple example; in the real template for each of those messages there are other things pushed off and other tables referenced, such as the user's statistics, their avatar, recursion levels can be tagged in closing statements, etc.  I'd like to be able to, in the code that executes that lookup (when "post" is parsed) grab the time required to execute that SQL statement and stash it, possibly into another table with the tag ("post") so I know what query class generated it, and then at the end of the template itself pick up the aggregate for the entire template's database processing.  If I can pull the time for each query generated by the template in the code adding it all up is trivial for the template as a whole.

This would allow me to profile the total execution time for each of the templates in the system at both the template level and the individual function call level out of the template.  If I then stash that data somewhere (e.g. into a different table) I can then look for the worst-performing queries and templates and focus my attention on the ones that get run the most so my "bang for the buck in optimization time" is improved.

Most of these queries are very standard but not all.  One of the deviant examples of the latter is the "search" function exposed to the user that can craft a rather-complex query in response to what the user fills in on the form that is submitted targeting both simple field lookups and GiST full-text indices.

Doing this the hard way has returned huge dividends as I found a query that was previously running in the ~2,000 ms range and is executed very frequently that responded extraordinarily well to a compound index addition on one of the target tables.  That in turn dropped its execution time to double-digit milliseconds (!!) for most cases and the user experience improvement from that change was massive.  I don't expect to find many more "wins" of that caliber but I'll take as many as I can get :-)

The argument for this in the application comes down to the granularity and specificity that I gain from knowing exactly what function call in the template generated the timing being reported.

(If it matters, Postgres 9.1.3 on FreeBSD 8.x)

--
-- Karl Denninger
The Market Ticker ®
Cuda Systems LLC

pgsql-performance by date:

Previous
From: "Tomas Vondra"
Date:
Subject: Re: Obtaining resource usage statistics from execution? (v 9.1)
Next
From: Ants Aasma
Date:
Subject: Re: Obtaining resource usage statistics from execution? (v 9.1)