Thread: Bad plan for queries with IN clause

Bad plan for queries with IN clause

From

Csaba Nagy

Date:

29 July 2005, 06:57:11

Hi all,

Sorry for the cross-post, I need the bigger exposure, and I'm not sure
if it's JDBC issue or not.

This is a follow-up from an earlier post, with a new twist.

We have a Java application which is designed initially to work with
Oracle, and then gradually adapted to also work with postgres, sometimes
even connecting at the same time to both oracle and postgres data bases.

Until now we didn't have any big postgres data-base, which has just
changed by moving a big one from oracle to postgres... so now we start
to notice bad query plans.

We have quite a lot of data import/exporting, and we use the following
type of query to chunk these:

... WHERE primarykey IN (?, ?, ...., ?) ...

We use JDBC prepared statements, and fill the parameters with chunks of
the looked up data. The last chunk is smaller of course, and the
remaining parameters are filled up with nulls.

On oracle this works just fine.

Before my last post (reporting bad plan on this type of query when nulls
are in the IN list) we were using the 7.4 JDBC driver, which AFAICT is
not actually using server side prepared statements, but it sends each
query with the parameters inlined. I guess that's why the nulls were
taken into account in the query plan.

Now we upgraded the JDBC driver (to 8.0), and I think this one is
actually using server side prepared statements for java
PreparedStatements. This eliminates the null problem, and indeed most of
our queries take now index scans.

However, some of the queries still take the sequential scan route. The
most puzzling in all this is that I've tried to "prepare" the same query
in psql, and then "explain execute" the prepared query, and it gave me
an index scan plan... so now I'm clueless, and have no idea why would
the same query prepared by java yield a different plan than prepared
manually... I thought that the query plan is created when you prepare
the statement, and not on each execution, right ? And I would expect
that the same query prepared multiple times would give the same plan,
provided that the tables didn't change significantly...

The sequential scan on those queries is even more surprising as the
execution time is so radically different, taking milliseconds for an
index scan, and even hours for the sequential scan (the involved tables
are from a few million rows to a few hundred million rows size).

BTW, I've tried to set "enable_seqscan = false", but those queries were
still going the sequential scan way. Could it be that the JDBC driver is
preparing with wrong parameter types ? I thought 8.0 is more forgiving
in this respect anyway.

I also tried to tweak the memory settings per
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
, but it didn't help either.

I have enabled plan logging at one point, I can provide an actual plan
for one of those queries if that would help diagnose what's happening...
for me the only relevant thing was:
"DETAIL: {SEQSCAN :startup_cost 0.00 :total_cost 1832574.05 :plan_rows 1
:plan_width 8 ... "
the rest is too low level and I don't understand it.

I would appreciate any help, as I am mostly clueless what the problem
could be.

Thanks,
Csaba.

Re: [JDBC] Bad plan for queries with IN clause

From

Oliver Jowett

Date:

29 July 2005, 07:29:11

Csaba Nagy wrote:

> However, some of the queries still take the sequential scan route. The
> most puzzling in all this is that I've tried to "prepare" the same query
> in psql, and then "explain execute" the prepared query, and it gave me
> an index scan plan... so now I'm clueless, and have no idea why would
> the same query prepared by java yield a different plan than prepared
> manually... I thought that the query plan is created when you prepare
> the statement, and not on each execution, right ? And I would expect
> that the same query prepared multiple times would give the same plan,
> provided that the tables didn't change significantly...

One gotcha is that by default the JDBC driver will use an unnamed
statement for the first few executions of a particular
PreparedStatement, then switch to using a (reused) named statement
thereafter. The unnamed statement path can result in different plans to
what you'd get with PREPARE or a named statement, as it delays planning
until the first execution, then takes the actual parameter values into
account when doing selectivity estimates. In contrast PREPARE and named
statements plan immediately using placeholder estimates.

You can tweak the threshold for this on a per-connection or
per-statement basis via PGConnection.setPrepareThreshold() and
PGStatement.setPrepareThreshold() (statements inherit the connection's
value on creation by default). The connection default is also settable
via the prepareThreshold URL parameter. If you set it to 1, *every*
PreparedStatement execution uses a named statement. If you set it to 0,
named statements are never used.

> Could it be that the JDBC driver is
> preparing with wrong parameter types ? I thought 8.0 is more forgiving
> in this respect anyway.

One thing that may be useful for debugging this: if you set logLevel=2
as a URL parameter the JDBC driver will log the protocol messages it
sends and receives to the JDBC log writer (stderr by default) -- that
includes the type OIDs and whether it's using an unnamed or a named
statement.

-O

Re: [JDBC] Bad plan for queries with IN clause

From

Csaba Nagy

Date:

29 July 2005, 11:01:29

OK, found the source of the problem: our code is setting nulls via:
PreparedStatement.setNull(int parameterIndex, int sqlType)
using Types.NUMERIC as the type, but the data base type is actually
BIGINT. Using Types.BIGINT in this method makes the problem go away.
Hopefully there are no more bad surprises for us coming from the
oracle/postgres incompatibility twilight zone... (crossing fingers).

Cheers,
Csaba.


On Fri, 2005-07-29 at 12:29, Oliver Jowett wrote:
> Csaba Nagy wrote:
>
> > However, some of the queries still take the sequential scan route. The
> > most puzzling in all this is that I've tried to "prepare" the same query
> > in psql, and then "explain execute" the prepared query, and it gave me
> > an index scan plan... so now I'm clueless, and have no idea why would
> > the same query prepared by java yield a different plan than prepared
> > manually... I thought that the query plan is created when you prepare
> > the statement, and not on each execution, right ? And I would expect
> > that the same query prepared multiple times would give the same plan,
> > provided that the tables didn't change significantly...
>
> One gotcha is that by default the JDBC driver will use an unnamed
> statement for the first few executions of a particular
> PreparedStatement, then switch to using a (reused) named statement
> thereafter. The unnamed statement path can result in different plans to
> what you'd get with PREPARE or a named statement, as it delays planning
> until the first execution, then takes the actual parameter values into
> account when doing selectivity estimates. In contrast PREPARE and named
> statements plan immediately using placeholder estimates.
>
> You can tweak the threshold for this on a per-connection or
> per-statement basis via PGConnection.setPrepareThreshold() and
> PGStatement.setPrepareThreshold() (statements inherit the connection's
> value on creation by default). The connection default is also settable
> via the prepareThreshold URL parameter. If you set it to 1, *every*
> PreparedStatement execution uses a named statement. If you set it to 0,
> named statements are never used.
>
> > Could it be that the JDBC driver is
> > preparing with wrong parameter types ? I thought 8.0 is more forgiving
> > in this respect anyway.
>
> One thing that may be useful for debugging this: if you set logLevel=2
> as a URL parameter the JDBC driver will log the protocol messages it
> sends and receives to the JDBC log writer (stderr by default) -- that
> includes the type OIDs and whether it's using an unnamed or a named
> statement.
>
> -O