Thread: Review of: pg_stat_statements with query tree normalization
I've *finally* gotten around to reviewing this patch. My first step was to de-bitrot it very slightly. More on that in a moment. After that, I tried using it. Installation worked nicely -- I did CREATE EXTENSION and then tried reading from pg_stat_statements. I was then given an error message to add pg_stat_statements into shared_preload_libraries, and so I did. Given its use of shared memory, there is not much other choice, but it's neat that I got it running in just a few minutes without reading any documentation. I ran a non-trivial but not-enormous application against the pg_stat_statements-enabled database, and the results were immediate and helpful, being correctly normalized and statistics apparently accrued. 'make check' also worked fine. I ran with assertions and debugging enabled. This query I suspect will quickly become a new crowd-pleaser: SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; The output looks like this (fixed width looks best): userid | 16385 dbid | 16386 query | SELECT * FROM lock_head($1, $2); calls | 12843 total_time | 19.795586 rows | 8825 shared_blks_hit | 194583 shared_blks_read | 19 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 A couple of nit-picks: a query execution is probably not often referred to as a "call", and total_time doesn't signify its units in the name, but I eyeballed it to be seconds. I cannot see superuser query text when I'm in a non-superuser role, however, I can see the execution statistics: userid | 10 dbid | 16386 query | <insufficient privilege> calls | 1448 total_time | 0.696188 rows | 205616 shared_blks_hit | 13018 shared_blks_read | 14 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 Prepared statements are less informative, unless one knows their naming convention: query | execute foo; I don't know what JDBC users or ActiveRecord 3 users are going to think about that. However, just about everyone else will be happy. The patch very nicely handles its intended task: folding together queries with the same literal. This is the key feature that makes the output really useful, and is quite unique: the next best thing I know of is pgfouine, which uses some heuristics and needs to be run over specially formatted logs, usually offline. This is much better, much faster, more interactive, and much more sound when it comes to normalizing queries. Onto the mechanism: the patch is both a contrib and changes to Postgres. The changes to postgres are mechanical in nature, but voluminous. The key change is to not only remember the position of Const nodes in the query tree, but also their length, and this change is really extensive although repetitive. It was this swathe of change that bitrotted the source, when new references to some flex constructs were added. Every such reference has needs to explicitly refer to '.begins', which is the beginning position of the const -- this used to be the only information tracked. Because .length was never required by postgres in the past, it fastidiously bookkeeps that information without ever referring to it internally: only pg_stat_statements does. There is a performance test program (written in Python) conveniently included in the contrib to allay fears that such const-length bookkeeping may be too expensive. I have not run it yet myself. There is also a regression test -- also a python program -- presumably because it would be more difficult to test this the "normal" way via pg_regress, at least to get a full integration test. I'm a little surprised that pg_stat_statements.c uses direct recursion instead of the mutually recursive walker interface, which makes it quite a bit different than most passes in the optimizer I've seen. I'm still looking at the mechanism in more detail, but I am having a fairly easy time following it -- there's just a lot of it to cover the litany of Nodes. I'll submit my editorializations of whitespace and anti-bitrot to Peter Geoghegan via git (actually, that information is below, for any curious onlookers). If anyone wants to play with the this anti-bitrotted copy against a very recent master, please see: $ git fetch https://github.com/fdr/postgres.git pg_stat_statements_norm:pg_stat_statements_norm $ git checkout pg_stat_statements_norm All in all, it looks and works well, inside and out. -- fdr
On 15 January 2012 11:41, Daniel Farina <daniel@heroku.com> wrote: > I've *finally* gotten around to reviewing this patch. > > My first step was to de-bitrot it very slightly. More on that in a moment. Thanks. > Prepared statements are less informative, unless one knows their > naming convention: > > query | execute foo; > > I don't know what JDBC users or ActiveRecord 3 users are going to > think about that. However, just about everyone else will be happy. I should point out, though I think you might be aware of this already, that the patch actually behaves in the same way as the existing implementation here. It will only normalise prepared queries that are prepared with PQprepare() or its underlying wire-protocol facility. If you run the example in the pg_stat_statements docs, where pgbench is passed "-M prepared", that will work just as well as before. Perhaps it's something that the patch should be able to do. That said, it seems pretty likely that client libraries won't be dynamically generating SQL Prepare/Execute statements under the hood. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Excerpts from Daniel Farina's message of dom ene 15 08:41:55 -0300 2012: > Onto the mechanism: the patch is both a contrib and changes to > Postgres. The changes to postgres are mechanical in nature, but > voluminous. The key change is to not only remember the position of > Const nodes in the query tree, but also their length, and this change > is really extensive although repetitive. It was this swathe of change > that bitrotted the source, when new references to some flex constructs > were added. Every such reference has needs to explicitly refer to > '.begins', which is the beginning position of the const -- this used > to be the only information tracked. Because .length was never > required by postgres in the past, it fastidiously bookkeeps that > information without ever referring to it internally: only > pg_stat_statements does. I wonder if it would make sense to split out those changes from the patch, including a one-member struct definition to the lexer source, which could presumably be applied in advance of the rest of the patch. That way, if other parts of the main patch are contentious, the tree doesn't drift under you. (Or rather, it still drifts, but you no longer care because your bits are already in.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 01/16/2012 06:19 PM, Alvaro Herrera wrote: > I wonder if it would make sense to split out those changes from the > patch, including a one-member struct definition to the lexer source, > which could presumably be applied in advance of the rest of the patch. > That way, if other parts of the main patch are contentious, the tree > doesn't drift under you. (Or rather, it still drifts, but you no longer > care because your bits are already in.) The way this was packaged up was for easier reviewer consumption, just pull down the whole thing and run with it. I was already thinking that if we've cleared the basics with a positive review and are moving more toward commit, it would be better to have it split into three pieces: -Core parsing changes -pg_stat_statements changes -Test programs And then work through those in that order. Whether or not the test programs even go into core as contrib code is a useful open question. While Peter had a version of this that worked completely within the boundaries of an extension, no one was really happy with that. At a minimum the .length changes really need to land in 9.2 to enable this feature to work well. As Daniel noted, it's a lot of code changes, but not a lot of code complexity. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Daniel Farina's message of dom ene 15 08:41:55 -0300 2012: >> Onto the mechanism: the patch is both a contrib and changes to >> Postgres. The changes to postgres are mechanical in nature, but >> voluminous. The key change is to not only remember the position of >> Const nodes in the query tree, but also their length, and this change >> is really extensive although repetitive. > I wonder if it would make sense to split out those changes from the > patch, including a one-member struct definition to the lexer source, > which could presumably be applied in advance of the rest of the patch. > That way, if other parts of the main patch are contentious, the tree > doesn't drift under you. (Or rather, it still drifts, but you no longer > care because your bits are already in.) Well, short of seeing an acceptable patch for the larger thing, I don't want to accept a patch to add that field to Const, because I think it's a kluge. I'm still feeling that there must be a better way ... regards, tom lane
On 16 January 2012 23:43, Greg Smith <greg@2ndquadrant.com> wrote: > While Peter had a version of this that worked completely within the > boundaries of an extension, no one was really happy with that. At a minimum > the .length changes really need to land in 9.2 to enable this feature to > work well. As Daniel noted, it's a lot of code changes, but not a lot of > code complexity. Right. As I've said in earlier threads, we're mostly just making the YYLTYPE representation closer to that of the default, which has the following fields: first_line, first_column, last_line and last_column. I just want two fields. I think we've already paid most of the price that this imposes, by using the @n feature in the first place. Certainly, I couldn't isolate any additional overhead. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 16 January 2012 23:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, short of seeing an acceptable patch for the larger thing, I don't > want to accept a patch to add that field to Const, because I think it's > a kluge. I'm still feeling that there must be a better way ... What does an acceptable patch look like? Does your objection largely hinge on the fact that the serialisation is performed after the re-writing stage rather on the raw parse tree, or is it something else? Despite my full plate this commitfest, I am determined that this feature be available in 9.2, as I believe that it is very important. Instrumentation of queries is something that it just isn't possible to do well right now, with each of the available third party solutions or pg_stat_statements. That really needs to change. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Jan 16, 2012 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, short of seeing an acceptable patch for the larger thing, I don't > want to accept a patch to add that field to Const, because I think it's > a kluge. I'm still feeling that there must be a better way ... Hm. Maybe it is tractable to to find the position of the lexme immediately after the Const? Outside of the possible loss of whitespace (is that a big deal? I'm not sure) that could do the trick...after all, the entire lexeme stream is annotated with the beginning position, if memory serves, and that can be related in some way to the end position of the previous lexeme, sort-of. -- fdr