Thread: BUG #4974: Equivalent of "= ANY" and "BETWEEN" not observed by planner.
The following bug has been logged online: Bug reference: 4974 Logged by: Ian Turner Email address: ian.turner@deshaw.com PostgreSQL version: 8.3 Operating system: Ubuntu 8.10 Description: Equivalent of "= ANY" and "BETWEEN" not observed by planner. Details: Consider the following table with a few thousand rows: CREATE TABLE example (pk INTEGER PRIMARY KEY); The following queries are equivalent, because there are no integers between 5 and 6 and because the BETWEEN operator contemplates a closed range. SELECT * FROM example WHERE pk IN (5,6); SELECT * FROM example WHERE pk BETWEEN 5 AND 6; Yet the two queries generate very different plans: sysdb=# explain select * from example where pk between 5 and 6; QUERY PLAN ---------------------------------------------------------------------------- ----- Index Scan using example_pkey on example (cost=0.00..8.27 rows=1 width=71) Index Cond: ((uid >= 5) AND (uid <= 6)) (2 rows) ysdb=# explain select * from example where pk IN (5, 6); QUERY PLAN ---------------------------------------------------------------------------- - Bitmap Heap Scan on example (cost=8.52..14.88 rows=2 width=71) Recheck Cond: (pk = ANY ('{5,6}'::integer[])) -> Bitmap Index Scan on example_pkey (cost=0.00..8.52 rows=2 width=0) Index Cond: (pk = ANY ('{5,6}'::integer[])) (4 rows) The bug is that the planner should be able to consider the use of a vanilla index scan for = ANY operators when the values are consecutive for the value type in question. Probably the easiest way is to detect this case and rewrite it as using <= / >= operators. More generally, it might be desirable to use the index scan even when values are not consecutive (but are very close). This last idea is a lot more complex, however, as it depends on the distribution of values in the table.
"Ian Turner" <ian.turner@deshaw.com> writes: > The following queries are equivalent, because there are no integers between > 5 and 6 and because the BETWEEN operator contemplates a closed range. > SELECT * FROM example WHERE pk IN (5,6); > SELECT * FROM example WHERE pk BETWEEN 5 AND 6; The planner intentionally does not do very many inferences that are as datatype-dependent as this one would be. It doesn't fit into the system design. For the most part the possible gain is not large anyway. regards, tom lane
On Wed, Aug 12, 2009 at 3:15 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > "Ian Turner" <ian.turner@deshaw.com> writes: >> The following queries are equivalent, because there are no integers betw= een >> 5 and 6 and because the BETWEEN operator contemplates a closed range. >> SELECT * FROM example WHERE pk IN (5,6); >> SELECT * FROM example WHERE pk BETWEEN 5 AND 6; > > The planner intentionally does not do very many inferences that are as > datatype-dependent as this one would be. =A0It doesn't fit into the system > design. =A0For the most part the possible gain is not large anyway. Hm, we could do it in a data-type independent way which would work even for non-integral values by imposing a recheck condition. That would work for any data type with a btree ordering. I think you're right that the potential gain isn't very big. A series of equality values versus a range scan which uses the same index pages is going to be about the same i/o. The only saving is the repeated descent of the tree to find the leaf pages. Which should hopefully all be in cache anyways. I suppose it might help if you were then doing a merge join or order by on the same key since the range scan will be in order but the bitmap index scan would have to be sorted. But if there were enough tuples for the sort to actually matter surely we would want the bitmap index scan anyways. --=20 greg http://mit.edu/~gsstark/resume.pdf