Thread: Bad plan for queries with IN clause
Hi all, Sorry for the cross-post, I need the bigger exposure, and I'm not sure if it's JDBC issue or not. This is a follow-up from an earlier post, with a new twist. We have a Java application which is designed initially to work with Oracle, and then gradually adapted to also work with postgres, sometimes even connecting at the same time to both oracle and postgres data bases. Until now we didn't have any big postgres data-base, which has just changed by moving a big one from oracle to postgres... so now we start to notice bad query plans. We have quite a lot of data import/exporting, and we use the following type of query to chunk these: ... WHERE primarykey IN (?, ?, ...., ?) ... We use JDBC prepared statements, and fill the parameters with chunks of the looked up data. The last chunk is smaller of course, and the remaining parameters are filled up with nulls. On oracle this works just fine. Before my last post (reporting bad plan on this type of query when nulls are in the IN list) we were using the 7.4 JDBC driver, which AFAICT is not actually using server side prepared statements, but it sends each query with the parameters inlined. I guess that's why the nulls were taken into account in the query plan. Now we upgraded the JDBC driver (to 8.0), and I think this one is actually using server side prepared statements for java PreparedStatements. This eliminates the null problem, and indeed most of our queries take now index scans. However, some of the queries still take the sequential scan route. The most puzzling in all this is that I've tried to "prepare" the same query in psql, and then "explain execute" the prepared query, and it gave me an index scan plan... so now I'm clueless, and have no idea why would the same query prepared by java yield a different plan than prepared manually... I thought that the query plan is created when you prepare the statement, and not on each execution, right ? And I would expect that the same query prepared multiple times would give the same plan, provided that the tables didn't change significantly... The sequential scan on those queries is even more surprising as the execution time is so radically different, taking milliseconds for an index scan, and even hours for the sequential scan (the involved tables are from a few million rows to a few hundred million rows size). BTW, I've tried to set "enable_seqscan = false", but those queries were still going the sequential scan way. Could it be that the JDBC driver is preparing with wrong parameter types ? I thought 8.0 is more forgiving in this respect anyway. I also tried to tweak the memory settings per http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html , but it didn't help either. I have enabled plan logging at one point, I can provide an actual plan for one of those queries if that would help diagnose what's happening... for me the only relevant thing was: "DETAIL: {SEQSCAN :startup_cost 0.00 :total_cost 1832574.05 :plan_rows 1 :plan_width 8 ... " the rest is too low level and I don't understand it. I would appreciate any help, as I am mostly clueless what the problem could be. Thanks, Csaba.
Csaba Nagy wrote: > However, some of the queries still take the sequential scan route. The > most puzzling in all this is that I've tried to "prepare" the same query > in psql, and then "explain execute" the prepared query, and it gave me > an index scan plan... so now I'm clueless, and have no idea why would > the same query prepared by java yield a different plan than prepared > manually... I thought that the query plan is created when you prepare > the statement, and not on each execution, right ? And I would expect > that the same query prepared multiple times would give the same plan, > provided that the tables didn't change significantly... One gotcha is that by default the JDBC driver will use an unnamed statement for the first few executions of a particular PreparedStatement, then switch to using a (reused) named statement thereafter. The unnamed statement path can result in different plans to what you'd get with PREPARE or a named statement, as it delays planning until the first execution, then takes the actual parameter values into account when doing selectivity estimates. In contrast PREPARE and named statements plan immediately using placeholder estimates. You can tweak the threshold for this on a per-connection or per-statement basis via PGConnection.setPrepareThreshold() and PGStatement.setPrepareThreshold() (statements inherit the connection's value on creation by default). The connection default is also settable via the prepareThreshold URL parameter. If you set it to 1, *every* PreparedStatement execution uses a named statement. If you set it to 0, named statements are never used. > Could it be that the JDBC driver is > preparing with wrong parameter types ? I thought 8.0 is more forgiving > in this respect anyway. One thing that may be useful for debugging this: if you set logLevel=2 as a URL parameter the JDBC driver will log the protocol messages it sends and receives to the JDBC log writer (stderr by default) -- that includes the type OIDs and whether it's using an unnamed or a named statement. -O
OK, found the source of the problem: our code is setting nulls via: PreparedStatement.setNull(int parameterIndex, int sqlType) using Types.NUMERIC as the type, but the data base type is actually BIGINT. Using Types.BIGINT in this method makes the problem go away. Hopefully there are no more bad surprises for us coming from the oracle/postgres incompatibility twilight zone... (crossing fingers). Cheers, Csaba. On Fri, 2005-07-29 at 12:29, Oliver Jowett wrote: > Csaba Nagy wrote: > > > However, some of the queries still take the sequential scan route. The > > most puzzling in all this is that I've tried to "prepare" the same query > > in psql, and then "explain execute" the prepared query, and it gave me > > an index scan plan... so now I'm clueless, and have no idea why would > > the same query prepared by java yield a different plan than prepared > > manually... I thought that the query plan is created when you prepare > > the statement, and not on each execution, right ? And I would expect > > that the same query prepared multiple times would give the same plan, > > provided that the tables didn't change significantly... > > One gotcha is that by default the JDBC driver will use an unnamed > statement for the first few executions of a particular > PreparedStatement, then switch to using a (reused) named statement > thereafter. The unnamed statement path can result in different plans to > what you'd get with PREPARE or a named statement, as it delays planning > until the first execution, then takes the actual parameter values into > account when doing selectivity estimates. In contrast PREPARE and named > statements plan immediately using placeholder estimates. > > You can tweak the threshold for this on a per-connection or > per-statement basis via PGConnection.setPrepareThreshold() and > PGStatement.setPrepareThreshold() (statements inherit the connection's > value on creation by default). The connection default is also settable > via the prepareThreshold URL parameter. If you set it to 1, *every* > PreparedStatement execution uses a named statement. If you set it to 0, > named statements are never used. > > > Could it be that the JDBC driver is > > preparing with wrong parameter types ? I thought 8.0 is more forgiving > > in this respect anyway. > > One thing that may be useful for debugging this: if you set logLevel=2 > as a URL parameter the JDBC driver will log the protocol messages it > sends and receives to the JDBC log writer (stderr by default) -- that > includes the type OIDs and whether it's using an unnamed or a named > statement. > > -O