Optimizing prepared statements - Mailing list pgsql-hackers

From Jeroen T. Vermeulen
Subject Optimizing prepared statements
Date
Msg-id 21267.125.24.241.41.1157277849.squirrel@webmail.xs4all.nl
Whole thread Raw
Responses Re: Optimizing prepared statements  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
I've rigged up a simple simulator for the scheme I described for detecting
pseudo-constant parameters to prepared statements.  It withstands simple
tests, and it neatly picks up cases where some parameters are
pseudo-constants and others aren't--even if some of them are more "pseudo"
while others are more "constant."

What I need now is some realistic test data.  Does anyone have realistic
SQL logs that use prepared statements?

For now, I'll summarize some results I got from randomized input data.  I
used very simple traces, with 11 prepared statements, each taking a
different number of parameters (0 through 10, inclusive).  All calls were
uniformly randomized.  I used LRU replacement of cached plans, with up to
4 retained plans per statement.  Confidence counters ran from 0 to 3
inclusive, with the confidence threshold lying between 1 and 2.

Per simulation run, 20,000 statement invocations were processed.  Runs of
20,000 took about 3.5 seconds of wall-clock time each, or 0.175
milliseconds per statement, on a lightly-loaded 1.8 GHz 32-bit Athlon XP. 
That's for simulation in Python 2.4, with no effort to optimize and no
precompilation, and several lines of information composed and printed to
/dev/null for every invocation.  So I think the overhead of the matching
and comparing that the algorithm does is not a performance worry.

In my first test, parameters were uniformly-distributed integers in the
range [0, 999].  Over this test, 104 plans were generated for the 11
plans, for an average 192 calls per generated plan.  Only 133 calls out of
20,000 used optimized plans, in this case optimizing out only one
pseudo-constant each.

When parameters were made to follow the normal distribution with mean 500
and standard deviation 100 (rounded to integers), the number of generated
plans went up to 305 as more patterns were recognized, and of course the
number of calls per generated plan dropped to 65.  Of the 20,000
invocations here, 770 used plans with one parameter optimized away, and 2
used plans with two.

These results don't look very good, but bear in mind this is for
randomized data.  Can't expect to exploit many patterns in random inputs! 
Real-life use is probably going to be much more favourable.  If we want to
guard against fluke patterns in highly-variable parameters, we can always
increase the range of the confidence counters.  That would make the
predictor more skeptical when it comes to accepting reptitions as
patterns.  Just how we tune the counters would be a tradeoff between the
cost of additional planning and the benefits of optimizing out more
parameters.

So once again, does anyone know of any realistic logs that I can use for
more useful simulations?


Jeroen




pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Postgres tracking - the pgtrack project
Next
From: Michael Glaesemann
Date:
Subject: Re: [PATCHES] Interval month, week -> day