Optimizing prepared statements - Mailing list pgsql-hackers
From | Jeroen T. Vermeulen |
---|---|
Subject | Optimizing prepared statements |
Date | |
Msg-id | 21267.125.24.241.41.1157277849.squirrel@webmail.xs4all.nl Whole thread Raw |
Responses |
Re: Optimizing prepared statements
(Gregory Stark <stark@enterprisedb.com>)
|
List | pgsql-hackers |
I've rigged up a simple simulator for the scheme I described for detecting pseudo-constant parameters to prepared statements. It withstands simple tests, and it neatly picks up cases where some parameters are pseudo-constants and others aren't--even if some of them are more "pseudo" while others are more "constant." What I need now is some realistic test data. Does anyone have realistic SQL logs that use prepared statements? For now, I'll summarize some results I got from randomized input data. I used very simple traces, with 11 prepared statements, each taking a different number of parameters (0 through 10, inclusive). All calls were uniformly randomized. I used LRU replacement of cached plans, with up to 4 retained plans per statement. Confidence counters ran from 0 to 3 inclusive, with the confidence threshold lying between 1 and 2. Per simulation run, 20,000 statement invocations were processed. Runs of 20,000 took about 3.5 seconds of wall-clock time each, or 0.175 milliseconds per statement, on a lightly-loaded 1.8 GHz 32-bit Athlon XP. That's for simulation in Python 2.4, with no effort to optimize and no precompilation, and several lines of information composed and printed to /dev/null for every invocation. So I think the overhead of the matching and comparing that the algorithm does is not a performance worry. In my first test, parameters were uniformly-distributed integers in the range [0, 999]. Over this test, 104 plans were generated for the 11 plans, for an average 192 calls per generated plan. Only 133 calls out of 20,000 used optimized plans, in this case optimizing out only one pseudo-constant each. When parameters were made to follow the normal distribution with mean 500 and standard deviation 100 (rounded to integers), the number of generated plans went up to 305 as more patterns were recognized, and of course the number of calls per generated plan dropped to 65. Of the 20,000 invocations here, 770 used plans with one parameter optimized away, and 2 used plans with two. These results don't look very good, but bear in mind this is for randomized data. Can't expect to exploit many patterns in random inputs! Real-life use is probably going to be much more favourable. If we want to guard against fluke patterns in highly-variable parameters, we can always increase the range of the confidence counters. That would make the predictor more skeptical when it comes to accepting reptitions as patterns. Just how we tune the counters would be a tradeoff between the cost of additional planning and the benefits of optimizing out more parameters. So once again, does anyone know of any realistic logs that I can use for more useful simulations? Jeroen
pgsql-hackers by date: