Thread: Review of: pg_stat_statements with query tree normalization

Review of: pg_stat_statements with query tree normalization

From
Daniel Farina
Date:
I've *finally* gotten around to reviewing this patch.

My first step was to de-bitrot it very slightly.  More on that in a moment.

After that, I tried using it.  Installation worked nicely -- I did
CREATE EXTENSION and then tried reading from pg_stat_statements.  I
was then given an error message to add pg_stat_statements into
shared_preload_libraries, and so I did.  Given its use of shared
memory, there is not much other choice, but it's neat that I got it
running in just a few minutes without reading any documentation.  I
ran a non-trivial but not-enormous application against the
pg_stat_statements-enabled database, and the results were immediate
and helpful, being correctly normalized and statistics apparently
accrued.

'make check' also worked fine.  I ran with assertions and debugging enabled.

This query I suspect will quickly become a new crowd-pleaser:
 SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

The output looks like this (fixed width looks best):

userid              | 16385
dbid                | 16386
query               | SELECT * FROM lock_head($1, $2);
calls               | 12843
total_time          | 19.795586
rows                | 8825
shared_blks_hit     | 194583
shared_blks_read    | 19
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0

A couple of nit-picks: a query execution is probably not often
referred to as a "call", and total_time doesn't signify its units in
the name, but I eyeballed it to be seconds.

I cannot see superuser query text when I'm in a non-superuser role,
however, I can see the execution statistics:

userid              | 10
dbid                | 16386
query               | <insufficient privilege>
calls               | 1448
total_time          | 0.696188
rows                | 205616
shared_blks_hit     | 13018
shared_blks_read    | 14
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0

Prepared statements are less informative, unless one knows their
naming convention:

query               | execute foo;

I don't know what JDBC users or ActiveRecord 3 users are going to
think about that.  However, just about everyone else will be happy.

The patch very nicely handles its intended task: folding together
queries with the same literal.  This is the key feature that makes the
output really useful, and is quite unique: the next best thing I know
of is pgfouine, which uses some heuristics and needs to be run over
specially formatted logs, usually offline.  This is much better, much
faster, more interactive, and much more sound when it comes to
normalizing queries.

Onto the mechanism: the patch is both a contrib and changes to
Postgres.  The changes to postgres are mechanical in nature, but
voluminous.  The key change is to not only remember the position of
Const nodes in the query tree, but also their length, and this change
is really extensive although repetitive.  It was this swathe of change
that bitrotted the source, when new references to some flex constructs
were added.  Every such reference has needs to explicitly refer to
'.begins', which is the beginning position of the const -- this used
to be the only information tracked.  Because .length was never
required by postgres in the past, it fastidiously bookkeeps that
information without ever referring to it internally: only
pg_stat_statements does.

There is a performance test program (written in Python) conveniently
included in the contrib to allay fears that such const-length
bookkeeping may be too expensive.  I have not run it yet myself.

There is also a regression test -- also a python program -- presumably
because it would be more difficult to test this the "normal" way via
pg_regress, at least to get a full integration test.

I'm a little surprised that pg_stat_statements.c uses direct recursion
instead of the mutually recursive walker interface, which makes it
quite a bit different than most passes in the optimizer I've seen.

I'm still looking at the mechanism in more detail, but I am having a
fairly easy time following it -- there's just a lot of it to cover the
litany of Nodes.  I'll submit my editorializations of whitespace and
anti-bitrot to Peter Geoghegan via git (actually, that information is
below, for any curious onlookers).  If anyone wants to play with the
this anti-bitrotted copy against a very recent master, please see:

$ git fetch https://github.com/fdr/postgres.git
pg_stat_statements_norm:pg_stat_statements_norm
$ git checkout pg_stat_statements_norm

All in all, it looks and works well, inside and out.

-- 
fdr


Re: Review of: pg_stat_statements with query tree normalization

From
Peter Geoghegan
Date:
On 15 January 2012 11:41, Daniel Farina <daniel@heroku.com> wrote:
> I've *finally* gotten around to reviewing this patch.
>
> My first step was to de-bitrot it very slightly.  More on that in a moment.

Thanks.

> Prepared statements are less informative, unless one knows their
> naming convention:
>
> query               | execute foo;
>
> I don't know what JDBC users or ActiveRecord 3 users are going to
> think about that.  However, just about everyone else will be happy.

I should point out, though I think you might be aware of this already,
that the patch actually behaves in the same way as the existing
implementation here. It will only normalise prepared queries that are
prepared with PQprepare() or its underlying wire-protocol facility. If
you run the example in the pg_stat_statements docs, where pgbench is
passed "-M prepared", that will work just as well as before. Perhaps
it's something that the patch should be able to do. That said, it
seems pretty likely that client libraries won't be dynamically
generating SQL Prepare/Execute statements under the hood.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Review of: pg_stat_statements with query tree normalization

From
Alvaro Herrera
Date:
Excerpts from Daniel Farina's message of dom ene 15 08:41:55 -0300 2012:

> Onto the mechanism: the patch is both a contrib and changes to
> Postgres.  The changes to postgres are mechanical in nature, but
> voluminous.  The key change is to not only remember the position of
> Const nodes in the query tree, but also their length, and this change
> is really extensive although repetitive.  It was this swathe of change
> that bitrotted the source, when new references to some flex constructs
> were added.  Every such reference has needs to explicitly refer to
> '.begins', which is the beginning position of the const -- this used
> to be the only information tracked.  Because .length was never
> required by postgres in the past, it fastidiously bookkeeps that
> information without ever referring to it internally: only
> pg_stat_statements does.

I wonder if it would make sense to split out those changes from the
patch, including a one-member struct definition to the lexer source,
which could presumably be applied in advance of the rest of the patch.
That way, if other parts of the main patch are contentious, the tree
doesn't drift under you.  (Or rather, it still drifts, but you no longer
care because your bits are already in.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Review of: pg_stat_statements with query tree normalization

From
Greg Smith
Date:
On 01/16/2012 06:19 PM, Alvaro Herrera wrote:
> I wonder if it would make sense to split out those changes from the
> patch, including a one-member struct definition to the lexer source,
> which could presumably be applied in advance of the rest of the patch.
> That way, if other parts of the main patch are contentious, the tree
> doesn't drift under you.  (Or rather, it still drifts, but you no longer
> care because your bits are already in.)

The way this was packaged up was for easier reviewer consumption, just 
pull down the whole thing and run with it.  I was already thinking that 
if we've cleared the basics with a positive review and are moving more 
toward commit, it would be better to have it split into three pieces:

-Core parsing changes
-pg_stat_statements changes
-Test programs

And then work through those in that order.  Whether or not the test 
programs even go into core as contrib code is a useful open question.

While Peter had a version of this that worked completely within the 
boundaries of an extension, no one was really happy with that.  At a 
minimum the .length changes really need to land in 9.2 to enable this 
feature to work well.  As Daniel noted, it's a lot of code changes, but 
not a lot of code complexity.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



Re: Review of: pg_stat_statements with query tree normalization

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Daniel Farina's message of dom ene 15 08:41:55 -0300 2012:
>> Onto the mechanism: the patch is both a contrib and changes to
>> Postgres.  The changes to postgres are mechanical in nature, but
>> voluminous.  The key change is to not only remember the position of
>> Const nodes in the query tree, but also their length, and this change
>> is really extensive although repetitive.

> I wonder if it would make sense to split out those changes from the
> patch, including a one-member struct definition to the lexer source,
> which could presumably be applied in advance of the rest of the patch.
> That way, if other parts of the main patch are contentious, the tree
> doesn't drift under you.  (Or rather, it still drifts, but you no longer
> care because your bits are already in.)

Well, short of seeing an acceptable patch for the larger thing, I don't
want to accept a patch to add that field to Const, because I think it's
a kluge.  I'm still feeling that there must be a better way ...
        regards, tom lane


Re: Review of: pg_stat_statements with query tree normalization

From
Peter Geoghegan
Date:
On 16 January 2012 23:43, Greg Smith <greg@2ndquadrant.com> wrote:
> While Peter had a version of this that worked completely within the
> boundaries of an extension, no one was really happy with that.  At a minimum
> the .length changes really need to land in 9.2 to enable this feature to
> work well.  As Daniel noted, it's a lot of code changes, but not a lot of
> code complexity.

Right. As I've said in earlier threads, we're mostly just making the
YYLTYPE representation closer to that of the default, which has the
following fields:  first_line, first_column, last_line and
last_column. I just want two fields. I think we've already paid most
of the price that this imposes, by using the @n feature in the first
place. Certainly, I couldn't isolate any additional overhead.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Review of: pg_stat_statements with query tree normalization

From
Peter Geoghegan
Date:
On 16 January 2012 23:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, short of seeing an acceptable patch for the larger thing, I don't
> want to accept a patch to add that field to Const, because I think it's
> a kluge.  I'm still feeling that there must be a better way ...

What does an acceptable patch look like? Does your objection largely
hinge on the fact that the serialisation is performed after the
re-writing stage rather on the raw parse tree, or is it something
else?

Despite my full plate this commitfest, I am determined that this
feature be available in 9.2, as I believe that it is very important.
Instrumentation of queries is something that it just isn't possible to
do well right now, with each of the available third party solutions or
pg_stat_statements. That really needs to change.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Review of: pg_stat_statements with query tree normalization

From
Daniel Farina
Date:
On Mon, Jan 16, 2012 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, short of seeing an acceptable patch for the larger thing, I don't
> want to accept a patch to add that field to Const, because I think it's
> a kluge.  I'm still feeling that there must be a better way ...

Hm. Maybe it is tractable to to find the position of the lexme
immediately after the Const?  Outside of the possible loss of
whitespace (is that a big deal? I'm not sure) that could do the
trick...after all, the entire lexeme stream is annotated with the
beginning position, if memory serves, and that can be related in some
way to the end position of the previous lexeme, sort-of.

--
fdr