Re: IN operator causes sequential scan (vs. multiple OR expressions) - Mailing list pgsql-performance
From | Ryan Holmes |
---|---|
Subject | Re: IN operator causes sequential scan (vs. multiple OR expressions) |
Date | |
Msg-id | 4F003DBB-34E7-4E9D-A8B3-27AA0CBD66C3@hyperstep.com Whole thread Raw |
In response to | Re: IN operator causes sequential scan (vs. multiple OR expressions) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: IN operator causes sequential scan (vs. multiple OR expressions)
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-performance |
On Jan 27, 2007, at 3:53 PM, Tom Lane wrote: > Ryan Holmes <ryan@hyperstep.com> writes: >> I have a relatively simple query where the planner chooses a >> sequential scan when using the IN operator but chooses an index scan >> when using logically equivalent multiple OR expressions. > > EXPLAIN ANALYZE for both, please? > > If you set enable_seqscan = off, does that force an indexscan, and > if so > what does EXPLAIN ANALYZE show in that case? > > regards, tom lane Wow, I didn't expect such a quick response -- thank you! Note: I rebuilt the state column index and ran a VACUUM ANALYZE since my original post, so the planner's "rows" estimate is now different than the 6830 I mentioned. The planner estimate is actually *less* accurate now, but still in the ballpark relatively speaking. Here is the EXPLAIN ANALYZE for both queries with enable_seqscan = on : EXPLAIN ANALYZE SELECT * FROM pool_sample ps WHERE ps.state = 'PENDING_REPOOL_REVIEW' OR ps.state = 'READY_FOR_REPOOL'; Bitmap Heap Scan on pool_sample ps (cost=985.51..61397.50 rows=50022 width=539) (actual time=13.560..39.377 rows=518 loops=1) Recheck Cond: (((state)::text = 'PENDING_REPOOL_REVIEW'::text) OR ((state)::text = 'READY_FOR_REPOOL'::text)) -> BitmapOr (cost=985.51..985.51 rows=50084 width=0) (actual time=9.628..9.628 rows=0 loops=1) -> Bitmap Index Scan on "idx_poolSample_state" (cost=0.00..480.25 rows=25042 width=0) (actual time=0.062..0.062 rows=4 loops=1) Index Cond: ((state)::text = 'PENDING_REPOOL_REVIEW'::text) -> Bitmap Index Scan on "idx_poolSample_state" (cost=0.00..480.25 rows=25042 width=0) (actual time=9.563..9.563 rows=514 loops=1) Index Cond: ((state)::text = 'READY_FOR_REPOOL'::text) Total runtime: 39.722 ms EXPLAIN ANALYZE SELECT * FROM pool_sample ps WHERE ps.state IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL'); Seq Scan on pool_sample ps (cost=0.00..331435.92 rows=9667461 width=539) (actual time=1060.472..47584.542 rows=518 loops=1) Filter: ((state)::text = ANY (('{PENDING_REPOOL_REVIEW,READY_FOR_REPOOL}'::character varying [])::text[])) Total runtime: 47584.698 ms And now with enable_seqscan = off: EXPLAIN ANALYZE SELECT * FROM pool_sample ps WHERE ps.state = 'PENDING_REPOOL_REVIEW' OR ps.state = 'READY_FOR_REPOOL'; Bitmap Heap Scan on pool_sample ps (cost=985.51..61397.50 rows=50022 width=539) (actual time=0.324..0.601 rows=518 loops=1) Recheck Cond: (((state)::text = 'PENDING_REPOOL_REVIEW'::text) OR ((state)::text = 'READY_FOR_REPOOL'::text)) -> BitmapOr (cost=985.51..985.51 rows=50084 width=0) (actual time=0.287..0.287 rows=0 loops=1) -> Bitmap Index Scan on "idx_poolSample_state" (cost=0.00..480.25 rows=25042 width=0) (actual time=0.109..0.109 rows=4 loops=1) Index Cond: ((state)::text = 'PENDING_REPOOL_REVIEW'::text) -> Bitmap Index Scan on "idx_poolSample_state" (cost=0.00..480.25 rows=25042 width=0) (actual time=0.176..0.176 rows=514 loops=1) Index Cond: ((state)::text = 'READY_FOR_REPOOL'::text) Total runtime: 0.779 ms EXPLAIN ANALYZE SELECT * FROM pool_sample ps WHERE ps.state IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL'); Bitmap Heap Scan on pool_sample ps (cost=150808.51..467822.04 rows=9667461 width=539) (actual time=0.159..0.296 rows=518 loops=1) Recheck Cond: ((state)::text = ANY (('{PENDING_REPOOL_REVIEW,READY_FOR_REPOOL}'::character varying [])::text[])) -> Bitmap Index Scan on "idx_poolSample_state" (cost=0.00..148391.65 rows=9667461 width=0) (actual time=0.148..0.148 rows=518 loops=1) Index Cond: ((state)::text = ANY (('{PENDING_REPOOL_REVIEW,READY_FOR_REPOOL}'::character varying [])::text[])) Total runtime: 0.445 ms So, yes, disabling seqscan does force an index scan for the IN version. My question now is, how do I get PostgreSQL to make the "right" decision without disabling seqscan? Here are the non-default resource usage and query tuning settings from postgresql.conf: shared_buffers = 512MB work_mem = 6MB maintenance_work_mem = 256MB random_page_cost = 3.0 effective_cache_size = 1536MB from_collapse_limit = 12 join_collapse_limit = 12 The server has 4GB RAM, 2 X 2.4GHz Opteron dual core procs, 5 x 15k RPM disks in a RAID 5 array and runs Windows Server 2003 x64. Thanks, -Ryan
pgsql-performance by date: