Re: v3proto Parse/Bind and the query planner - Mailing list pgsql-jdbc

From Oliver Jowett
Subject Re: v3proto Parse/Bind and the query planner
Date
Msg-id 40AAA131.3010602@opencloud.com
Whole thread Raw
In response to Re: v3proto Parse/Bind and the query planner  (Kris Jurka <books@ejurka.com>)
Responses Re: v3proto Parse/Bind and the query planner
List pgsql-jdbc
Kris Jurka wrote:

> I seem to recall Tom Lane speculating about delaying the planning of a
> prepared statement until it's first execution so it would have the bound
> values and then using that plan with subsequent parameters.  In the common
> case multiple executions of a prepared query would use values of similar
> statistical likelihood, but this wouldn't solve the example you gave if
> you executed the plan twice for values that aren't similar.  I think this
> would be a reasonable compromise position.

I don't see why the first execution is special in the general case.. you
will suddenly get queries where the performance depends on the parameter
values of *previous* queries (well beyond cache effects) which is not a
very nice property to have.

Planning on the first Bind would be particularly bad for the case I
presented earlier (highly uneven index distribution) -- if the first
execution happens to use a highly selective value and chooses an index
scan, subsequent executions with non-selective values will perform very
badly (considerably worse than the seqscan currently chosen for a
parameterized query). The cure seems worse than the disease in this case.

Instead, how about something like:

  - For named statements, plan at Parse time always.
  - For unnamed statements, plan at Bind time always.

The assumption here is that if the client is using the unnamed
statement, it's unlikely that it will be repeatedly reusing that
statement with different parameter values so there is little benefit to
preserving the query plan at the cost of being unable to plan for
specific parameter values. If the client is using named statements,
there's no change in behaviour from the current approach, so presumably
the client knows what it's doing! :)

A client that's aware of this behaviour can get some control over when
the planning is done without needing a protocol change. e.g. in the JDBC
driver we already have setUseServerPrepare(); that could be used to
control whether named statements are used when executing a
PreparedStatement or not. The application can enable named statements
where it knows the cost of re-planning the query repeatedly is greater
than the benefit of having actual parameter values to plan from -- e.g.
if the indexes have an even distribution of values.

(is this better discussed on -hackers?)

>>The existing strategy of doing parameter replacement on the driver side
>>works, but we lose the benefits of passing parameters via Bind.
>
>
> We also have the possibility of doing selective replacement and binding
> other values.  This would allow using binary bytea transfers (which are
> unlikely to be used in selectivity estimates) while doing parameter
> replacement for other values.

This is a possibility but it's going to lead to pretty ugly code to
track the JDBC parameter index vs. the backend's idea of the parameter
index. We also go back to playing games with the parser to make it
interpret our parameters as the type we specified at the JDBC level. It
seems a bit counterproductive to jump through several layers of parsing
and re-substitution just to specify parameter types that JDBC already
knew, when there's already a protocol mechanism to specify type
information directly to the backend.

We'd also end up with two code paths again if we wanted to benefit from
named statements for queries that aren't affected by this case and would
benefit from all of their parameters being real parameters (this is one
of the reasons I'm making these changes in the first place, so..)

-O

pgsql-jdbc by date:

Previous
From: Dave Cramer
Date:
Subject: Re: patch for getUDT #2
Next
From: Kris Jurka
Date:
Subject: Re: v3proto Parse/Bind and the query planner