Re: [PERFORM] Query much slower when run from postgres function - Mailing list pgsql-jdbc

From Guillaume Cottenceau
Subject Re: [PERFORM] Query much slower when run from postgres function
Date
Msg-id 87wsaxk94w.fsf@meuh.mnc.lan
Whole thread Raw
In response to Re: [PERFORM] Query much slower when run from postgres function  (Oliver Jowett <oliver@opencloud.com>)
Responses Re: [PERFORM] Query much slower when run from postgres function
List pgsql-jdbc
Oliver Jowett <oliver 'at' opencloud.com> writes:

> Guillaume Cottenceau wrote:
>> Oliver Jowett <oliver 'at' opencloud.com> writes:
>>
>>> The idea behind the threshold is that if a PreparedStatement object is
>>> reused, that's a fairly good indication that the application wants to
>>> run the same query many times with different parameters (since it's
>>> going to the trouble of preserving the statement object for reuse). But
>>
>> Or it may just need the safeness of driver/database parameter
>> "interpolation", to get a "free" efficient safeguard against SQL
>> injection.
>
> In which case, the application usually throws the PreparedStatement
> object away after executing it once, and the threshold is never reached.
> As I said, the application has to do extra work to preserve exactly the
> same PreparedStatement object for reuse before the threshold applies, at
> which point it's reasonable to assume that it could be a
> performance-sensitive query that would benefit from preserving the query
> plan and avoiding parse/plan costs on every execution.

Thanks for the clarification!

That may just be me, but I see two issues here: first, parsing
and planning are tied together, but parsing should be always done
first time only as I see no point in reparsing in subsequent uses
of the PreparedStatement?; second, it's still questionable that a
"performance-sensitive" query should mean benefiting from
preserving the query plan: I have seen dramatic use cases where
the preserved query plan opted for a seqscan and then the query
was orders of magnitude slower than it should because the actual
then used values would have qualified for an indexscan.

> It's just a heuristic because there *is* a tradeoff and many/most
> applications are not going to be customized specifically to know about
> that tradeoff. And it's configurable because the tradeoff is not the
> same in every case.

Yes, and it's well documented, actually. I obviously didn't read
it carefully enough last time :/ I guess my approach of using the
protocol version 2 should be replaced by unsetting the prepared
threshold.. I think I came up with that workaround after that
post from Kris:

http://archives.postgresql.org/pgsql-jdbc/2008-03/msg00070.php

because strangely, you and I intervened in that thread, but the
prepared threshold issue was not raised, so I followed the
protocolVersion=2 path. Did I miss something - e.g. is the topic
today different from the topic back then, for some reason? Am I
wrong in assuming that your "please replan this statement every
time you get new parameters" suggestion is nearly-achievable with
unsetting the prepared threshold ("nearly" being the difference
between replanning always, and replanning only when parameters
are new)?

Anyway, documentation-wise, I've tried to think of how the
documentation could be a little more aggressive with the warning:

http://zarb.org/~gc/t/jdbc-more-cautious-preparedstatements.diff

That said, there's something more: when the documentation says:

     There are a number of ways to enable server side prepared
     statements depending on your application's needs. The general
     method is to set a threshold for a PreparedStatement.

I assume that by default server side prepared statements are
*not* enabled, although it seems to be the case, with a threshold
of 5 as a simple test shows when using driver 8.3-604.jdbc3 (on
PG 8.3.6).

I think that either they should not be enabled by default
(really, it could be better with, but it could be so much worse
that is it really a good idea to make a "dropin" use of the
driver use it?), or the documentation should clearly state they
are, and add even more warnings about potential drawbacks. WDYT?

http://zarb.org/~gc/t/jdbc-more-cautious-preparedstatements2.diff

Btw, how can the doc be built? "ant doc" failed on missing
docbook.stylesheet but I was unable to find how to set that
value.

> Do you have a suggestion for a better way to decide when to use a named
> statement?

Oh, I feel I don't have the qualifications to answer that
question, sorry! The only thing I could think of, was what I
talked about in a previous mail, e.g. save all plans of the first
xx queries before reaching the threshold, and then when the
threshold is reached, compare the global cost estimates of the
saved plans, and do not activate server side prepare if they are
too different, as caching the plan for that query would probably
yield too slow results sometimes. Ideally, I guess a new
PG-specific method should be added to activate that feature (and
set the value for "are the plans too different?"). But bear in
mind that it may be a stupid idea :)

--
Guillaume Cottenceau

pgsql-jdbc by date:

Previous
From: Oliver Jowett
Date:
Subject: Re: [PERFORM] Query much slower when run from postgres function
Next
From: "Dickson S. Guedes"
Date:
Subject: Re: Renaming sequence auto generated by SERIAL type don't update pg_attrdef