Re: Prepared Statements and large where-id-in constant blocks? - Mailing list pgsql-jdbc
From | James Robinson |
---|---|
Subject | Re: Prepared Statements and large where-id-in constant blocks? |
Date | |
Msg-id | 6035D816-92E0-11D8-A4C2-000A9566A412@socialserve.com Whole thread Raw |
In response to | Re: Prepared Statements and large where-id-in constant blocks? (Oliver Jowett <oliver@opencloud.com>) |
List | pgsql-jdbc |
On Apr 19, 2004, at 10:57 PM, Oliver Jowett wrote: > Unfortunately a bit of experimentation indicates that the planner > doesn't do anything clever with ANY + constant array values (at least > in 7.4.1 which is what I have to hand): Not only that, but it seems to get planned only as an index scan. Preparing the statement using "SELECT ... WHERE id = ANY (?)" plus a call to a really-hacked up version of setObject() would solve the issue of getting better use out of fewer cached prepared statements, but the only-sequential scan planning would be a downer. And while "OR (id=N)" plans exactly like "id IN (N)", there seems to be nothing really worth doing. I examined plans comparing a 6-way join used in our production code with 4 ids in the tail "OR (id=N)" nodes, then with the full 723 ids, and the plans were markedly different, preferring sequential scans for many of the intermediate table joins in the 723-id case.The runtimes for the sequential scans were faster than forcing index scans, so the planner's voodoo definitely benefits from full knowledge of how many rows should be expected (which appears crystal clear in hindsight). So, I doubt that any single server-side preparation using a single parameter representing an entire collection of ids could perform as well as it does currently with full information. Oliver, I tested your proposal of providing more-id-params-than-necessary, passing in a dummy value (-1) which will never be found as a pk in that table, and the planner handled it efficiently. The repeated instances of "or u.id=-1::int8" were, when not pre-planned using PREPARE, were nicely crunched down to a single index condition clause of " OR (id= -1::BIGINT)". But, when transforming this to PREPARE and EXECUTE pairs, the planner cannot crunch the plan down, since it has no idea that, say, 500 of the 700 params will all be the same, so it cannot factor them out at planning time (hence, I guess the warning about constant values in the notes section of the manual page for PREPARE). All roads seem to lead to don't attempt to change a thing -- there is no easy or medium difficulty way to better solve this. In writing this, I went so far as to think about shunting the list of ids into a temporary table to join off of. Doing this at the SQL-level would be far too costly in round-trip times, but could the "WHERE id in (A, B, C, ...)" form somehow be transformed into a hashjoin operation on the backend when the size of the static set is 'high'? Could this not perform (theoretically) better than what appears to be an O(N) index condition evaluation? I am asking only for personal edification -- I have no sample live query where the index condition solution performs too slowly. In Java-land, if we suppose that the size of the set could potentially be 'large', we quickly defer to containing the values in a HashSet if we're going to test for membership as opposed to performing selection searches on a list. Probably a dead-horse beaten elsewhere. Many Thanks. ---- James Robinson Socialserve.com
pgsql-jdbc by date: