Thread: EXPLAIN (plan off, rewrite off) for benchmarking
Hi, For benchmarking the parser I added the above options (dim suggested this on irc) which proved to be rather useful for me. I added the additional rewrite option because the overhead of copying the tree around makes the profile significantly less expressive. I would also like an option which would only do the actual parsing instead of parse + parse analyse but that seemed a tad more complicated... Is anybody else interested in such EXPLAIN options? Andres
Andres Freund <andres@anarazel.de> writes: > Hi, > For benchmarking the parser I added the above options (dim suggested this on > irc) which proved to be rather useful for me. What exactly is EXPLAIN printing, if you've not done planning? Also, I believe the planner depends on the assumption that the rewriter has done its work, so these seem to amount to EXPLAIN (break_it). If you just want to benchmark parsing, perhaps CREATE RULE would be a useful environment. regards, tom lane
On Saturday, November 19, 2011 12:16:18 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Hi, > > For benchmarking the parser I added the above options (dim suggested this > > on irc) which proved to be rather useful for me. > > What exactly is EXPLAIN printing, if you've not done planning? Nothing very interesting: postgres=# EXPLAIN (rewrite off) SELECT 1, 'test', "pg_class"."oid",relname, relkind FROM pg_class WHERE oid = 1000; QUERY PLAN ------------------------------------------not rewriting query because auf !rewrite (1 row) Explain is just a vehicle here, I admit that. But on what else should I bolt it? The only thing I could think of otherwise would be to do the parsing via from a C func. But to simmulate a real scenario there would require too much bootstrapping for my taste. > Also, I > believe the planner depends on the assumption that the rewriter has done > its work, so these seem to amount to EXPLAIN (break_it). "rewrite off" currently simply aborts before doing the rewriting and copyObject(). copyObject is the expensive part there for simple queries. rewriting happened to be the functional part where I wanted to stop - because of the overhead of copyObject - instead of a goal in itself. Btw, optimizing copyObject() memory usage wise would be another big performance gain. But even murkier than the stuff over in the lists thread... > If you just want to benchmark parsing, perhaps CREATE RULE would be a > useful environment. I don't really see how one can use that to benchmark parsing. CREATE OR REPLACE VIEW is - not unexpectedly - far slower than EXPLAIN (rewrite off) or EXPLAIN (plan off). for the statement: SELECT * FROM pg_class WHERE oid = 1000; rewrite off: tps = 16086.694353 plan off, no copyObject() tps = 15663.280093 plan off: tps = 13471.272551 explain: tps = 6162.161946 explain analyze: tps = 5744.172839 normal execution: tps = 6991.398740 CREATE OR REPLACE VIEW (after changing the log level): tps = 2550.246625 Greetings, Andres
Andres Freund <andres@anarazel.de> writes: > Explain is just a vehicle here, I admit that. But on what else should I bolt > it? If you don't like CREATE RULE, try having your test program send just Parse messages, and not Bind/Execute. I still dislike the idea of exposing a fundamentally-broken-and-useless variant of EXPLAIN in order to have a test harness for a variant of performance testing that hardly anyone cares about. (There is no real-world case where the performance of the parser matters in isolation.) If we do that, it will be a feature that we have to support forever, and possibly fix bugs in --- what if the system crashes because the rewriter wasn't invoked, for example? regards, tom lane
On Saturday, November 19, 2011 04:52:10 PM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Explain is just a vehicle here, I admit that. But on what else should I > > bolt it? > > If you don't like CREATE RULE, try having your test program send just > Parse messages, and not Bind/Execute. That sounds like a plan. Except that I would prefer to use pgbench. To avoid the planning overhead... I see it correctly that I would need to I tpgbench is a more appropriate place to add such an option... > If we do that, it will be a > feature that we have to support forever, and possibly fix bugs in --- > what if the system crashes because the rewriter wasn't invoked, for > example? rewrite=off aborts before planning, so that specific problem I don't see. The name is rather bad I admit. Its mostly there to avoid the copyObject() which skews results considerably: * Because the rewriter and planner tend to scribble on the input, we make * a preliminary copy of the source querytree. This prevents problems in * the case that the EXPLAIN is in a portalor plpgsql function and is * executed repeatedly. (See also the same hack in DECLARE CURSOR and * PREPARE.) XXX FIXMEsomeday. */ rewritten = QueryRewrite((Query *) copyObject(stmt->query)); > I still dislike the idea of > exposing a fundamentally-broken-and-useless variant of EXPLAIN in order > to have a test harness for a variant of performance testing that hardly > anyone cares about. (There is no real-world case where the performance > of the parser matters in isolation.) I absolutely cannot agree on the fact that the speed parse-analyze is irrelevant though. In several scenarios using prepared statements is not a viable/simple option. Think transaction-level pooling for example. Or the queries generated by all those ORMs out there. When executing many small statments without prepared statments a rather large portion of time is spent parsing. Consider: statement: EXPLAIN SELECT * FROM pg_class WHERE oid = 1000; pgbench -M simple -f ~/.tmp/simple1.sql -T 3 tps = 16067.780407 pgbench -M prepared -f ~/.tmp/simple1.sql -T 3 tps = 24348.244630 In both variants the queries are fully planned as far as I can see. The only difference there is parsing. I do find the difference in speed rather notable. That does represent measurements from realworld profiles I gathered were functions related to parsing + parse analysis contribute up to 1/3 of the runtime. Obviously nobody needs to benchmark the parser alone in a production scenario. But if you want to improve the overall performance of some workload analysing bits and pieces alone to get a useful, more detailed profile is a pretty sane approach. Why should that be a variant of performance testing that nobody cares about? Andres
Andres Freund <andres@anarazel.de> writes: > On Saturday, November 19, 2011 04:52:10 PM Tom Lane wrote: >> If you don't like CREATE RULE, try having your test program send just >> Parse messages, and not Bind/Execute. > That sounds like a plan. Except that I would prefer to use pgbench. Well, how about plan C: write a small C function that consists of a loop around calling just the part of the parser you want to test? I've done that in the past when I wanted fine-grained profiles, and it works a whole lot better than anything involving per-query messages from the client side. regards, tom lane
On Sat, Nov 19, 2011 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote: > I absolutely cannot agree on the fact that the speed parse-analyze is > irrelevant though. Tom may be right that the speed of the parser *in isolation* is irrelevant, in the narrow sense that if we made the parser twice as slow but somehow by that change made up the time in the executor, nobody would care; in fact, it would be a net win for people using prepared statements. But I completely agree that parsing speed is something we need to worry about. Unfortunately, I don't have a lot of good ideas for improving it. A while back I tried ripping out most of the parser to see whether that would speed up performance parsing very simple statements, but the improvement was pretty small. Maybe a more thorough job than what I did is possible, but it didn't seem promising. Maybe we could find a way to reduce the size of the parse tree (i.e. fewer nodes), or the number of times that it has to be walked/copied. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > ... Maybe we could find a way to reduce the size of the parse > tree (i.e. fewer nodes), or the number of times that it has to be > walked/copied. We could eliminate some annoying tree-copy steps if we could institute the policy that parse analysis doesn't scribble on the raw parse tree, rewriter doesn't modify parse analysis output, and planner doesn't modify rewriter output. However, it would be a lot of work, and I'm not entirely sure that we'd end up with a significant speed benefit. In a lot of places, the only way to not scribble on the input is to copy it anyway ... regards, tom lane
On Mon, Nov 21, 2011 at 8:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... Maybe we could find a way to reduce the size of the parse >> tree (i.e. fewer nodes), or the number of times that it has to be >> walked/copied. > > We could eliminate some annoying tree-copy steps if we could institute > the policy that parse analysis doesn't scribble on the raw parse tree, > rewriter doesn't modify parse analysis output, and planner doesn't > modify rewriter output. However, it would be a lot of work, and I'm not > entirely sure that we'd end up with a significant speed benefit. In a > lot of places, the only way to not scribble on the input is to copy it > anyway ... This is probably a stupid question, but why does it matter if parse analysis scribbles on the raw parse tree, or the rewriter on the parse analysis output? I understand that we may sometimes need to replan the output of the rewriter, so we'd better not modify it destructively, but I would have thought that parse and parse analysis would always be done together, in which case it doesn't obviously matter. I'm probably missing something here... Another thing we might want to consider doing is introducing some kind of container data structure other than List. I think that trying to change the semantics of the existing List datatype in any meaningful way is probably doomed, but if we introduce a new abstraction and gradually convert things over, we have a lot more flexibility. What I'm imagining is something that is optimized for holding a small number of pointers (say, 4) that are normally added in order, but which can grow if needed, at some cost in performance. For example: struct Thingy { unsigned short nused; unsigned short nalloc; struct Thingy *next; /* if we run out of elements in this Thingy, we can link to another Thingy with more space */ void *item[FLEXIBLE_ARRAY_MEMBER]; /* really nalloc */ }; This would mean fewer palloc() calls and less overall memory usage than a List, and might also improve memory access locality. Right now, walking a three element list could involve pulling in one cache line for the list, three more for the list cells, and then (presumably) three more for the elements themselves. There's not much help for the fact that the elements might end up in different cache lines, but at least if we did something like this you wouldn't need to bounce around to find the list cells. This is particularly obnoxious for things like RangeVars, where we build up the list representation mostly as an intermediate step and then flatten it again. Now, one obvious problem with this line of attack is that each individual patch in this area isn't likely to save anything noticeable on real workloads. But I don't think that should stop us from trying.With every performance patch that goes in, the numberof workloads that are dominated by the cost of backend-local memory allocation is growing. The cost is extremely distributed and it's very hard to pin down where it's all coming from, but we've gotta start somewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Nov 21, 2011 at 8:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We could eliminate some annoying tree-copy steps if we could institute >> the policy that parse analysis doesn't scribble on the raw parse tree, >> rewriter doesn't modify parse analysis output, and planner doesn't >> modify rewriter output. > This is probably a stupid question, but why does it matter if parse > analysis scribbles on the raw parse tree, or the rewriter on the parse > analysis output? Because we frequently need to save the original tree for possible re-analysis later. This doesn't matter in the simple-query protocol, but it does matter in any code path that involves plancache. > I understand that we may sometimes need to replan > the output of the rewriter, so we'd better not modify it > destructively, but I would have thought that parse and parse analysis > would always be done together, in which case it doesn't obviously > matter. No, actually it's the raw grammar output tree that gets saved for re-analysis in case we are told of a DDL change. (I've considered having the code save only the original query string, but then you'd have to repeat the flex/bison work, and those things show up high enough on any profile to make it seem unlikely that this is cheaper than copying the parse tree.) It's possible that we don't need a read-only guarantee for the rewriter versus parse analysis output, but I doubt that helps much. regards, tom lane