Stephen Harris <lists@spuddy.org> writes:
> Postgres version 8.0.9 on Solaris 2.8. I know it's old but...
> I have a table with a million rows.
> I need to select data from this table based on an indexed column; I need
> to select 600 possible values from the column, returning around 24,000
> rows of data.
> In perl I have a hash which has 600 key values in it.
> So I did:
> "select stuff from table where index_key in (" .
> join(",",keys %hash) . ") AND non_index_row in ('xyz','abc','def')
> And in the perl
> while(fetch())
> {
> do_stuff
> }
> This resulted in a query string which executed in 12 minutes. If I
> did an "explain" on the query string then I can see it was being expanded
> to 600 OR statements
> where (index_key = 1) OR (index_key = 2) OR .....
In what, a seq scan?
That plan will require executing 600 integer comparisons at each of a
million rows, with only some trivial fraction avoided because of early
success. So it works out that your machine is able to do something over
800K such comparisons per second, which seems a bit slow for any modern
machine ... but I note 8.0 didn't have any of the "virtual slot"
optimizations added in later releases, and is doing a fresh
heap_getattr() for each access to the variable. If it's having to
grovel over a lot of variable-width fields to get to that field each
time, I can see where the time might get eaten up. Where is the
index_key column in the tuples, exactly?
regards, tom lane